Data Science vs. Data Analysis vs. Data Engineering
Explore the distinct roles of data science data analysis and data engineering. Understand how each discipline contributes to extracting insights managing data and supporting machine learning. This lesson helps you gain clarity on the skills tools and responsibilities defining these interconnected fields.
We'll cover the following...
Data Science vs. Data Analysis
Data Science is about unearthing insights from data that can impact future decisions. It involves the part of predictive analytics using Machine Learning and related fields.
Required skills
- Python language, R Programming, and SQL
- Machine Learning
- Deep Learning
- Natural Language Processing
- Computer Vision
- Information Retrieval
- Python libraries for Machine Learning like scikit-learn, Ttensorflow, keras, and PyTorch.
- NoSQL Databases and ElasticSearch
Data analysis deals with extracting the information out of data by cleaning, and transforming it. It has many overlapping areas with Data Science like presenting the insights to the team and making visualizations. But it misses the Machine Learning Part.
Required Skills
- Any open source tool which is provided with complete documentation
- SAS
- Rapid Miner
- Microsoft Excel
- Python
- Tableau
- PowerBI
- OpenRefine
Data Engineering
Data Engineering is about assisting data scientists and data analysts in their day-to-day jobs. It involves making data pipelines, developing architectures, storing data in the most efficient manner, and making huge amounts of data accessible to data scientists and data analysts in an optimized way. It also involves using tools to uncover hidden insights in your data.
Required Skills
- Python language, Scala language or Java Language
- SQL
- Sqoop
- Flume
- Hadoop
- Spark
- Hive and Drill
- Kafka and Storm
- Mahout and Spark MLlib
- Pig
- Hbase
- Solr and Lucene
- Knime, Splunk and Neo4j
- Docker and Kubernetes
- Flask (for writing APIs)