Search⌘ K
AI Features

The Basics of Web Scraping

Explore web scraping fundamentals with Rust to extract data formatted for humans from web pages. Learn to use hooks like CSS classes and element IDs for targeting data, and implement crates like select and scraper to parse HTML and gather useful content efficiently.

The basics of web scraping

Web scraping is the art of getting data from web pages. The difference between scraping and polling a web service is that web pages are meant to be seen by humans, while web services are for machines.

How, then, can we teach machines to read data meant for humans?

The need for hooks

Apart from some applications of artificial intelligence, machines have to be guided to retrieve data meant to be visualized on a page. We need to use some tricks and hooks to allow the program to navigate a page and recognize data.

When we create web pages, we typically define them with parts that are all formatted the same way. For this reason, we usually assign CSS classes for consistency across the whole website.

Tip: Our first hook is to look for CSS classes.

Sometimes web developers assign an element ID to some elements on the page. This is a unique identifier that, if present, is a powerful hook.

Tip: The ...