Introduction to Web Scraping

This lesson will explain web scraping and how we can download files and parse them when needed.

We'll cover the following

Web scraping is where a programmer will write an application to download web pages and parse out specific information from them. Usually when we are scraping data we will need to make our application navigate the website programmatically. In this chapter, we will learn how to download files from the internet and parse them if needed. We will also learn how to create a simple spider that we can use to crawl a website.

Tips for scraping

There are a few tips that we need to go over before we start scraping.

  • Always check the website’s terms and conditions before we scrape them. They usually have terms that limit how often we can scrape or what we can scrape.

  • Because our script will run much faster than a human can browse, make sure we don’t hammer their website with lots of requests. This may even be covered in the terms and conditions of the website.

  • We can get into legal trouble if we overload a website with our requests or we attempt to use it in a way that violates the terms and conditions we agreed to.

  • Websites change all the time, so our scraper will break someday. Know this: We will have to maintain our scraper if we want it to keep working.

  • Unfortunately the data we get from websites can be a mess. As with any data parsing activity, we will need to clean it up to make it useful for us.

With that out of the way, let’s start scraping!

Get hands-on with 1200+ tech skills courses.