Beautiful Soup (Scraping Data from Simple HTML)

In this lesson, we’ll discuss how Beautiful Soup is used to extract data out of the Public Web.

We'll cover the following

Beautiful Soup

Have you ever wondered how people extract data out of the public web and analyze it? They use libraries and other resources that ease the process of extracting the data from the public web (see in more technical terms, “Scraping the Data”). Beautiful Soup is the library that is meant for such purposes. The Data Science community has used it for a long time to scrape data from the public web because it pulls data from HTML and XML files. It has comprehensive documentation that makes it easy to use.

Web pages are made up of HTML (Hypertext Markup Language). Beautiful Soup gets the data straight out of it by specifying the individual elements of HTML. Below is an example of how HTML markup is built and an overview of its different tags.

Get hands-on with 1200+ tech skills courses.