Data APIs and Web Scraping
Explore a variety of questions around APIs and web scraping in Python.
APIs and web scraping are essential techniques for gathering external data. Whether you're building a data pipeline or enriching your training dataset, you’ll need to interact with REST endpoints or parse HTML content. In this lesson, we’ll implement three common tasks: retrieving API data, handling pagination, and scraping product information from a website. Let’s get started.
Retrieving data from a RESTful API
An interviewer may ask, “How would you retrieve data from a RESTful API? Walk me through the steps and demonstrate with a sample Python code.”
This question is frequently asked in entry-level data science interviews to check basic API literacy.
import requestsdef get_weather_data(city: str, api_key: str) -> dict:#TODO your implementation
Sample answer
There are several ways to approach this. For our solution, let’s implement a simple Python function that uses the requests
library to fetch weather data from an API (e.g., wttr.in
) that does not require an API key for access. To modify this code snippet to use authentication, you would simply ensure your function can support an API key.
Here’s a sample implementation:
import requestsdef get_weather_data(city: str) -> dict:"""Fetch weather data for a given city using a public weather API.Args:city (str): Name of the city to get weather data forReturns:dict: Weather data or error message"""url = f"https://wttr.in/{city}?format=j1"try:response = requests.get(url)response.raise_for_status() # Raise an exception for bad responses# Parse and return weather dataweather_data = response.json()return weather_dataexcept requests.RequestException as e:return {"error": f"Unable to fetch data: {str(e)}"}def main():cities = ["New York", "London", "Tokyo", "Sydney"]for city in cities:print(f"\nWeather for {city}:")weather_data = get_weather_data(city)['current_condition'][0]if "error" in weather_data:print(weather_data["error"])else:print(f"Temperature: {weather_data['temp_F']}°F")print(f"Feels like: {weather_data['FeelsLikeF']}°F")print(f"Humidity: {weather_data['humidity']}%")print(f"Description: {weather_data['windspeedMiles']}")main()
Let’s look at a code breakdown of some of the key areas of the code:
Line 1: We import the
requests
library to make HTTP calls to a public weather API.Line 3: We define a
get_weather_data
function.Line 10: We construct a URL using the
wttr.in
API, requesting weather in JSON format for ...