Search⌘ K
AI Features

Data APIs and Web Scraping

Explore methods for collecting real-world data by retrieving information from RESTful APIs, managing paginated data responses, and extracting product details from websites using Python tools like requests and BeautifulSoup. This lesson equips you to implement key techniques for building effective data pipelines and enriching datasets.

APIs and web scraping are essential techniques for gathering external data. Whether you're building a data pipeline or enriching your training dataset, you’ll need to interact with REST endpoints or parse HTML content. In this lesson, we’ll implement three common tasks: retrieving API data, handling pagination, and scraping product information from a website. Let’s get started.

Retrieving data from a RESTful API

An interviewer may ask, “How would you retrieve data from a RESTful API? Walk me through the steps and demonstrate with a sample Python code.”

This question is frequently asked in entry-level data science interviews to check basic API literacy.

Python - RESTFUL API
import requests
def get_weather_data(city: str, api_key: str) -> dict:
#TODO your implementation

Sample answer

There are several ways to approach this. For our solution, let’s implement a simple Python function that uses the requests library to fetch weather data from an API (e.g., wttr.in) that does not require an API key for access. To modify this code snippet to use authentication, you would simply ensure your function can support an API key.

Here’s a sample implementation:

Python - RESTFUL API
import requests
def get_weather_data(city: str) -> dict:
"""
Fetch weather data for a given city using a public weather API.
Args:
city (str): Name of the city to get weather data for
Returns:
dict: Weather data or error message
"""
url = f"https://wttr.in/{city}?format=j1"
try:
response = requests.get(url)
response.raise_for_status() # Raise an exception for bad responses
# Parse and return weather data
weather_data = response.json()
return weather_data
except requests.RequestException as e:
return {"error": f"Unable to fetch data: {str(e)}"}
def main():
cities = ["New York", "London", "Tokyo", "Sydney"]
for city in cities:
print(f"\nWeather for {city}:")
weather_data = get_weather_data(city)['current_condition'][0]
if "error" in weather_data:
print(weather_data["error"])
else:
print(f"Temperature: {weather_data['temp_F']}°F")
print(f"Feels like: {weather_data['FeelsLikeF']}°F")
print(f"Humidity: {weather_data['humidity']}%")
print(f"Description: {weather_data['windspeedMiles']}")
main()
...