Meet the Data
Learn how to fetch data from files, APIs, and the web using Python.
Retrieving data is the first and most critical step in any data analysis workflow. Whether it’s pulling a file from local storage, connecting to an external API, or scraping content from the web, everything starts with acquiring the data.
Why data fetching matters
Fetching data isn’t just a “first step”; it’s a foundational skill. Data doesn’t always arrive neatly packaged and waiting in a database. Sometimes we’re pulling logs from a server, reading raw files, calling APIs, or scraping a site for details. The more comfortable we are in acquiring data from different places, the more flexible and powerful our analysis will be.
In this lesson, we’ll explore three common sources of data and how to fetch each using Python:
Files (like CSVs and JSON).
APIs (application programming interfaces).
Web pages (using scraping).
Let’s explore these three common data sources and how we can fetch data from each using Python.
1. Working with files
Files are one of the most common ways to store and exchange data. Two popular formats we often encounter are CSV (comma-separated values) and JSON (JavaScript Object Notation).
A simple way to read data from these files is by using pandas, a powerful Python library built for data manipulation and analysis. At the heart of pandas is the DataFrame, a two-dimensional, table-like structure with rows and columns, similar to a spreadsheet.
With pandas, we can easily load data into a DataFrame. Once the data is in this structure, it can be explored, modified, cleaned, and analyzed using pandas’ rich set of built-in functions.
Working with CSV files using pandas
CSV (comma-separated values) files store tabular data in plain text, with each line representing a row and commas separating the values. Python’s pandas library makes it simple to read, modify, and write CSV files ...