This device is not compatible.

Web Crawling in JavaScript Using Cheerio

PROJECT

Web Crawling in JavaScript Using Cheerio

In this project, we will crawl a real-world website using features provided by the Cheerio library in Node.js. We will learn to automatically extract URLs from link HTML elements across an entire site. Lastly, we will export the collected data to CSV.

You will learn to:

Understand the fundamentals of crawling a site.

Build an automated software tool that can crawl an entire site.

Populate a set of URLs discovered in the target site.

Export the discovered URLs to CSV.

Skills

Web Scraping

HTML Elements

Data Collection

Prerequisites

Basic understanding of HTTP and the client/server architecture

Basic understanding of JavaScript

Basic understanding of Node.js

Technologies

HTML

Node.js

Cheerio

JavaScript

Project Description

The Cheerio library in Node.js provides a powerful API for parsing HTML documents. It can easily traverse and manipulate HTML structures, making it an ideal choice for data collection and web crawling.

In this project, we will build a Node script to crawl an entire site with Cheerio and its capabilities. We will download one page of a target site with the Node.js Fetch API. Next, we will use Cheerio’s functions to select HTML link elements using CSS selectors, extract their URLs, and repeat this procedure for other site URLs until all pages have been discovered.

Finally, we will take advantage of the Node I/O capabilities to export the scraped data in human-readable CSV format.

Project Tasks

Initial Setup

Task 0: Get Started

Implement Link Discovery Logic

Task 1: Navigate to a Web Page

Task 2: Select All Link HTML Elements

Task 3: Extract URLs from the Links

Task 4: Filter Out Undesired URLs

Task 5: Encapsulate the Link Discovery Logic in a Function

Crawl the Entire Site

Task 6: Initialize Data Structures for Web Crawling

Task 7: Loop through the Pages to Crawl

Task 8: Create a CSV File from the Pages Discovered

Congratulations!

Subscribe to project updates

Hear what others have to say

Join 1.4 million developers working at companies like

"Another great hands on project to apply your knowledge learned. Thank you Educative ❤️"

Atabek BEKENOV

Senior Software Engineer

"Super excited to learn E-commerce website for my own startup venture. Thanks for your great learning platform."

Pradip Pariyar

Senior Software Engineer

"This was an excellent lesson. I learned a lot working through the process. I enjoyed it so much that I rebuilt it my AWS account to see how hard it would be to deploy to a production environment."

Renzo Scriber

Senior Software Engineer

"It was my first proper data engineering project and it was amazing."

Vasiliki Nikolaidi

Senior Software Engineer

"It's a fantastic way to do hands-on practice; I enjoy this way of learning."

Juan Carlos Valerio Arrieta

Senior Software Engineer

Relevant Courses

Use the following content to review prerequisites or explore specific concepts in detail.