...

/

Meet the Data

Meet the Data

Learn how to fetch data from files, APIs, and the web using Python.

We'll cover the following...

Retrieving data is the foundational step of any data science project. Whether we’re reading a file stored locally, connecting to an external API, or collecting information from the web, our analysis can be as strong as the data we bring. Accessing relevant, reliable data is where insight begins.

How to meet the data

Python gives us the tools to easily pull data from various sources. Libraries like pandas, requests, and Beautiful Soup make it simple to read files, make web requests, and parse content. Once we know how to gather data from different places, we open the door to deeper insights and more impactful analysis.

Let’s explore three common data sources and how to fetch data from each using Python.

1. Working with files

Files are one of the most common ways to store and exchange data. We often encounter two popular formats: CSV (Comma-Separated Values) and JSON (JavaScript Object Notation).

A simple way to read data from these files is to use pandas, a powerful Python library built for data manipulation and analysis. At the heart of pandas is the DataFrame, a two-dimensional, table-like structure with rows and columns, similar to a spreadsheet.

With pandas, we can easily load data into a DataFrame. Once the data is in this structure, it can be explored, modified, cleaned, and analyzed using pandas’ rich built-in functions.

Working with CSV files using pandas

CSV (Comma-Separated Values) files store tabular data in plain text, with each line representing a row and commas separating the values. Python’s pandas library makes reading, modifying, and writing CSV files simple through its read_csv() and to_csv() functions.

Python 3.10.4
import pandas as pd
# Load the CSV into a DataFrame
df = pd.read_csv('sales_data.csv')
# Preview the first few rows
print("Initial data:\n", df.head())
# Add a new column for total order value
df['total_value'] = df['quantity'] * df['price']
# Save the updated DataFrame to the CSV file
df.to_csv('sales_data.csv', index=False)
# Reload the CSV to confirm changes
df2 = pd.read_csv('sales_data.csv')
# Preview the updated data
print("Updated data:\n", df2.head())

The ...