Headless Web Scraping Using Puppeteer

The Node library Puppeteer is used to control browsers through an API. Initially, it was designed to only work with Chromium-based browsers, but now it supports multiple browsers. It runs in headless mode by default, but it can also be configured to run in a non-headless mode.

In this project, we’ll build a Node application to scrape data from a web-based e-library application using Puppeteer and a headless Chromium browser. Throughout this project, we’ll use multiple puppeteer functions to fetch HTML elements using CSS class names and HTML tags.

Furthermore, we’ll use Node functions to automate the processes on this website.

1.Introduction

2.Introduction to Web Scraping

3.Puppeteer Fundamentals

4.Advanced Concepts

5.Storing Scraped Data

6.Scraping a Book Store

7.Best Practices for Web Scraping

8.Conclusion

Project

Headless Web Scraping Using Puppeteer