npm install cheerio
Warning: Be careful to only scrape websites that you have permission to scrape. Scraping text from certain websites may be a breach of the copywrite, a violation of privacy, and/or against the terms of service.
Cheerio js is built over fb55’s htmlparser2, which parses HTML pages and allows the user to traverse/manipulate the resulting data structure. The syntax of cheerio js is similar to jQuery and the implementation is efficient and robust.
You can specify (find) elements on a web page and analyze the information depending on your use case. With this information, you can do everything you could do with objects in a programming language including counting instances of a specific object, looping through the instances to extract useful information, and more. For instance, you may want to extract all the text in the
<h1> (or headline) tags from a web-page.
If you’re interested in a Python-based solution for web-scraping, click here.
Read more about cheerio in the official docs.
View all Courses