Get to Know the Website Architecture
Learn how websites are structured and built. Also, learn about some core concepts of web scraping.
We'll cover the following...
Before learning web scraping, we need to understand the structure of a website, how it is built, and some key topics like DOM that relate closely to web scraping.
Structure of a website
The basic thing to learn about a website is its structure. Usually, a website has multiple pages organizing its content, and how content is organized on each page may differ. The following diagram shows the generic structure of a webpage.
Here are some of the most common types of pages we might encounter when web scraping and why they are important.
Home page: The home page is the main entry point to a website and usually provides an overview of the site’s content and navigation to other pages. Scraping the home page can be useful for identifying the different sections or categories of content on the site.
Category pages: Category pages are typically used to group related content, such as articles on a news website or products in an online store. Scraping category pages can be useful for identifying individual items within the category and extracting their details. ...