Data Science vs. Data Analysis vs. Data Engineering

In this lesson, you'll learn about Data Science, Data Analysis, and Data Engineering. These fields are differentiated based on the activities that we do in them.

Data Science vs. Data Analysis

Data Science is about unearthing insights from data that can impact future decisions. It involves the part of predictive analytics using Machine Learning and related fields.

Required skills

  • Python language, R Programming, and SQL
  • Machine Learning
  • Deep Learning
  • Natural Language Processing
  • Computer Vision
  • Information Retrieval
  • Python libraries for Machine Learning like scikit-learn, Ttensorflow, keras, and PyTorch.
  • NoSQL Databases and ElasticSearch

Data analysis deals with extracting the information out of data by cleaning, and transforming it. It has many overlapping areas with Data Science like presenting the insights to the team and making visualizations. But it misses the Machine Learning Part.

Required Skills

  • Any open source tool which is provided with complete documentation
  • SAS
  • Rapid Miner
  • Microsoft Excel
  • Python
  • Tableau
  • PowerBI
  • OpenRefine

Data Engineering

Data Engineering is about assisting data scientists and data analysts in their day-to-day jobs. It involves making data pipelines, developing architectures, storing data in the most efficient manner, and making huge amounts of data accessible to data scientists and data analysts in an optimized way. It also involves using tools to uncover hidden insights in your data.

Required Skills

  • Python language, Scala language or Java Language
  • SQL
  • Sqoop
  • Flume
  • Hadoop
  • Spark
  • Hive and Drill
  • Kafka and Storm
  • Mahout and Spark MLlib
  • Pig
  • Hbase
  • Solr and Lucene
  • Knime, Splunk and Neo4j
  • Docker and Kubernetes
  • Flask (for writing APIs)