Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

python
excel
automation

Why is Python better than Excel for Data analytics?

Educative Team

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

There is no doubt that Excel has been an incredibly important tool for companies and still has a place in the toolkit of every data analyst and scientist, but for most of your work, you need to stop using Excel and upgrade to Python.

Why:

  • scale and automation
  • reproducibility
  • transferable skills
  • advanced capabilities

Scale and Automation

Excel is great when you want to do quick, ad-hoc analysis with small data, but once you want to move to a larger scale, it doesn’t work. Excel can support data up to 1,048,576 rows by 16,384 columns. On the other hand, Python can scale to the size of your memory and also has many tools that support out-of-memory computations.

For example, the Dask library allows you to scale your computations to run on a cluster of machines, not just your laptop. In fact, if you are familiar with Pandas, it’s almost exactly the same code to read in a CSV:

import dask.dataframe as dd
# Load the data with Dask instead of Pandas.
df = dd.read_csv(<file_path>)

One line of code and you are now reading in data larger than your computer’s memory – this is not possible in Excel.

Some other differences:

  • Python can scale when it comes to multiple data sources. While Excel is both the data store and the computation engine, Python is completely data-agnostic.If you can find a way to read your data into Python, you can use it. Since Python has so many great libraries, it is trivial to read in data from many sources such as CSV, Excel, JSON, and SQL databases.
  • It is very easy to automate updates in Python. Since you can connect Python directly to any data source, it is easy to schedule a job that will re-pull your data with any updates, run your calculations, and even create a report or dynamic dashboard saving you tons of time. Excel, on the other hand, requires far too much manual labor and cannot automate updates.
Python can connect directly to your database for automated updates. Source: NobleDesktop

Reproducibility

Reproducibility means that any analytics or visualizations you create should be easy and straightforward enough for someone else to reproduce. Not only does someone need to be able to re-run your process and end up with the same result; they should also be able to walk through your steps to ensure accuracy. This concept is extremely important once you begin relying on automation. Automation is amazing when it works correctly, but when it doesn’t, automated reports can be a nightmare.

Reproducibility with Excel is very challenging. Excel calculations in cells are all but impossible to check at any type of scale. Data types are extremely confusing because what you see isn’t always what is represented in the raw data. While VBA does make reproducibility slightly better, at that point, you’re much better off investing in learning Python.

Take a look at this Excel document:

You know the sum column should be the sum of a and b, but how do you confirm that? You could check one of the formulas and see that it is in fact the sum, but since every cell can be its own formula, how do we know they are all correct? If you were not paying attention, you might have missed that row x was wrong.

But in Python, your sum would look like this:

a = [1,2,3,4]
b = [5,6,7,8]
sum = []
for i in range(len(a)):
sum.append(a[i] + b[i])
print(sum)

This code is clear and it is easy to confirm that the sum is always calculated correctly.

With Python, you get all the tools developed to make reproducibility and collaboration better for software engineers. On top of that, Python is superior for data connectivity, which allows us to analyze data in the cloud and repeat a process instantly. Git, unit testing, documentation, and code formatting standards are all prevalent in the Python community.

With Python 3, you can even add static typing to make your code clearer. All of these tools make it easier to ensure that your code is written well and correctly, this way, the next time you look at your code, or someone else picks it up, it’s easy to reproduce and understand.

Transferable skills

If you know Excel, you know Excel. While the skills you learn are useful, they are not transferable to anything else.

Python is much closer to other programming languages than Excel, which makes it much easier to pick up other languages you may encounter along the way. When you learn Python, you open far more doors than you could with just Excel.

Plus, the demand for Python is incredibly high. According to StackOverflow, in 2019, it was ranked as the world’s 4th most popular programming language among professional software developers as well as the first most wanted programming language. Not bad.

Advanced capabilities

Excel has a lot of built-in formulas, but it pales in comparison to Python’s capabilities. Not only does Python offer hundreds of libraries to make advanced statistics and analytics easier, but it can also take your visualizations to another level. Tools like Matplotlib, Plotly, Streamlit, and Seaborn allow you to create beautiful visualizations of data, interactive dashboards, and plots.

Numpy and scipy have amazing support for scientific computations, linear algebra, and vectorized calculations. And, scikit-learn lets you train machine learning algorithms from decision trees to gradient boosting machines.

Written by Tyler Folkman

RELATED TAGS

python
excel
automation
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.