Search⌘ K

Solution Review: Scrape the Web Page Using Beautiful Soup

Explore the process of scraping web pages using Python's Requests and Beautiful Soup libraries. Understand how to inspect web elements, extract images, ratings, titles, and prices from HTML structures, and handle common challenges in data retrieval.

We'll cover the following...

Solution

We start by inspecting the web page and finding the elements we want.

Inspecting the DOM of the first page
Inspecting the DOM of the first page
Python 3.8
import requests
from requests.compat import urljoin
from bs4 import BeautifulSoup
base_url = "https://books.toscrape.com/"
titles = []
images = []
rates = []
prices = []
# Solution
response = requests.get(base_url)
soup = BeautifulSoup(response.content, 'html.parser')
articles = soup.find_all("article", {"class":"product_pod"})
for article in articles:
image = urljoin(base_url,
article.find("div", {"class":"image_container"}).a.img['src'])
rate = article.find("p", {"class":"star-rating"})['class'][1]
title = article.find("h3").a['title']
price = article.find("div", {"class":"product_price"}).p.string
titles.append(title)
images.append(image)
rates.append(rate)
prices.append(price)
print("Length of scraped titles: ", len(titles))
print("Length of scraped images: ", len(images))
print("Length of scraped rates: ", len(rates))
print("Length of scraped prices: ", len(prices))
print(titles)

Code explanation

  • Lines 13–14: We request the site URL using request.get() and pass the response.content to BeautifulSoup(). ...