Smartwatch data analysis using Python

Analyzing smartwatch data using Python takes multiple steps. It includes loading the dataset, cleaning the dataset, exploring the dataset, visualizing the data, and getting useful insights from the data.

  • Libraries to use: For this data analysis task, we’ll use matplotlib and Plotly Graph Objects.

  • Dataset: The dataset that we will be using in this analysis task can be downloaded from herehttps://www.kaggle.com/datasets/arashnic/fitbit.

Step-by-step guide

Let’s start the analysis task by importing the required libraries or modules.

Import the libraries

To import the libraries, follow the code given below:

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt # import required libraries
import plotly.express as px
import plotly.graph_objects as go

Import the dataset

To import the dataset, follow the code given below:

df = pd.read_csv("/content/dailyActivity_merged.csv")

Then, print the records:

df.head()

Data preprocessing

  1. See which columns have null values and drop the null values:

Columns_with_null = df.isnull().sum()
print(Columns_with_null)
  1. Change the data type of the column:

df["TotalDistance"] = df["TotalDistance"].astype('int64')
print(df.info())
  1. Sum all of the minutes in a column Total_Minutes and convert the minutes into hrs:

df["Total_Minutes"] = df["VeryActiveMinutes"] + df['FairlyActiveMinutes'] + df["LightlyActiveMinutes"] + df["SedentaryMinutes"]
print(df.info())
df["Total_Hours"] = df["Total_Minutes"]/60
print(df.head())
  1. Change the ActivityDate from object to datetime:

df['ActivityDate'] = pd.to_datetime(df['ActivityDate'] , format='%m/%d/%Y')

Analysis and visualization

Create a pie chart to see the distribution of active and inactive minutes during the day:

labels = ['Very Active Minutes', 'Fairly Active Minutes', 'Lightly Active Minutes', 'Inactive Minutes']
counts = df[['VeryActiveMinutes', 'FairlyActiveMinutes', 'LightlyActiveMinutes', 'SedentaryMinutes']].max()
colors = ['red','green', "pink", "blue"]
fig = go.Figure(data=[go.Pie(labels=labels, values=counts)])
fig.update_layout(width = 500, height = 400,
paper_bgcolor="white", autosize=False, showlegend=True)
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=15,
marker=dict(colors=colors, line=dict(color='black', width=1))
)
fig.show()

Add the day of the week to the dataset:

df["Day"] = df["ActivityDate"].dt.day_name()
print(df)

See the days of the week with highly active minutes and fairly active minutes:

fig = go.Figure()
fig.add_trace(go.Bar(x= df['Day'],
y= df['VeryActiveMinutes'],
name= 'Very Active',
marker_color = 'red'
))
fig.add_trace(go.Bar(x= df['Day'],
y= df['FairlyActiveMinutes'],
name= 'Fairly Active',
marker_color = 'blue'
))
fig.update_layout(barmode='group', xaxis_tickangle=-45)
fig.show()

Count the number of steps covered in each day:

day = df["Day"].value_counts()
label = day.index
counts = df["TotalSteps"]
colors = ['gold','lightgreen', "pink", "blue", "skyblue", "cyan", "orange"]
fig = go.Figure(data=[go.Pie(labels=label, values=counts)])
fig.update_layout(width = 500, height = 400, paper_bgcolor="white", autosize=False, showlegend=True)
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=15,
marker=dict(colors=colors, line=dict(color='black', width=1)))
fig.show()

Find how many calories were burned in a day:

calorie_count = df["Day"].value_counts()
label = calorie_count.index
counts = calorie_count.values
colors = ['blue', 'green', 'pink', 'purple', 'skyblue', 'orange', 'brown']
fig = go.Figure(data=[go.Bar(x=label, y=counts, marker_color=colors)])
fig.update_layout(width = 500, height = 400, paper_bgcolor="white", autosize=False, showlegend=True, title = "Calorie count per day", xaxis_title='Day', yaxis_title='Calories')
fig.show()

Create a pie chart to see the total distance covered each day in integers:

distance_covered = df["Day"].value_counts()
labels = distance_covered.index
counts = df["TotalDistance"]
color = ['blue', 'green', 'pink', 'purple', 'skyblue', 'orange', 'brown']
fig = go.Figure(data=[go.Pie(labels=labels, values=counts, marker_colors= color)])
fig.update_layout(width = 500, height = 400, paper_bgcolor="white", autosize=False, showlegend=True, title ='Distance covered each day')
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=15,
marker=dict(line=dict(color='black', width=1)))
fig.show()

Report the insight

  1. After reviewing the dataset, Tuesday emerges as a particularly active day for individuals, displaying the highest calorie burn compared to other weekdays. However, it’s noteworthy that despite this heightened activity level, the total distance covered on Tuesdays appears comparatively lower.

  2. This incongruity might be attributed to potential inaccuracies in the smartwatch’s positioning data. Supplementary data on the precision of smartwatch recordings could provide additional insights.

  3. Among the days analyzed, Sunday emerges as the least active for individuals, evidenced by the lowest calorie burn and minimal step count. Interestingly, while Sunday registers low activity levels, it doesn’t consistently demonstrate the lowest total distance covered. This observation underscores the importance of scrutinizing the accuracy of smartwatch data recording mechanisms.

Try it yourself

Click the “Run” button and then click the link provided under it to open the Jupyter Notebook.

Please note that the notebook cells have been pre-configured to display the outputs
for your convenience and to facilitate an understanding of the concepts covered. 
You are encouraged to actively engage with the material by changing the 
variable values. 
Smartwatch data analysis using Python

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved