Data Science and YOU!

Discussion about the field of data science so that you will understand the need for data science in solving industry problems. You will also learn about the different roles that exist in data science, and what skills are needed to perform those roles well.

Data science is a blend of inferences, mathematics, algorithms, and technology, all combined to solve analytical problems. These problems exist across all domains.

For example, consider an online website that sells different types of shoes. They have a good number of customers, and they are serving them very well. Now, suppose that they launch 20 new products. Not all their customers are interested in buying every product. Some customers are only interested in a specific product, so, in the best interests of the business, it is highly recommended that customers receive personalized product recommendations.

So, how will the business know what each customer wants? Should they call every customer and ask about their specific preferences? This is not possible or scalable with a sizable customer base. Here, data science comes to the rescue. Using data science recommendation algorithms, we can produce individual product suggestions for each customer.

Data science is a practice that uncovers insights from data. Data scientists use their techniques to discover the story behind the data. There is no one way to tell stories from data. We have to put in the time to understand the data first. Then, we can find insights by applying basic operations, use advanced or custom algorithms to get complex insights, and present the results in a way that everyone (especially non-data science people) can understand.

Common examples of data science usage

Here are a few examples of data science problems:

  • Finding the best route for a delivery person to deliver food on time.
  • Suggesting the best movies/series for a user based on their past ratings.
  • Providing relevant search results under price expectations and requirements.
  • Connecting like minded people on social networks.
  • Suggesting an investment based on a person’s profile.
  • Finding the best market to sell fruits or grocery items based on demand, renting cost of a shop, and distance.
  • Improving healthcare by suggesting a specific treatment to a patient.

Why is data science getting popular?

Data science has existed since the advent of the data. Think of a local businessman who sells you something. When you repeat your purchasing habits, he starts to recognize you, and gives you your products as soon as you enter his shop. He understands your expectations and requirements.

Now that businesses are moving online, and organizations have millions of customers, simple tools that used to work on a smaller scale are no longer valid, and cannot provide the best recommendations. With the capability to process large amounts of data every second, we can build a better system to improve the businesses’ responses.

Initially, most of the incoming data was structured. We used SQL to store the data or to store it in a file with the proper format. Now, we have realized that predictive algorithms can be powered using unstructured data as well. Unstructured data exists in many forms, such as text logs, machine logs, multimedia data, sensor data, video streams, etc.



Data science jobs

It is a common misunderstanding that only data scientists work in the field of data science. The data science is one type of role, but not everything that those in data science do. In this lesson, we will learn about different other roles in data science. This course is design to support those interested in all types of roles.



Data analyst

Data analysts deal with the data in its raw form. They are mostly required to query data and convert it into the necessary format. For example, if the data is coming in the form of a time series from a sensor, their job is to store the data in a format that can be quickly accessed later. They also perform data cleaning, merging, and performing exploratory data analysis. Exploratory data analysis, or EDA, is a method used to produce visual or explainable insights from the data.

Business analyst

Business analysts have a good understanding of business and of how to access data relevant to their needs. They are very domain-specific. For example, a business analyst might perform an analysis of bank branches in a city in terms of different measures, such as the number of customer’s ratings, customer satisfaction, new accounts opened, availability of staff, etc. Another analyst in a medical company, might examine different drug requirements over time, and could help manufacturing units perform well in advance.

Data engineer

Data engineers are required to build data-in and data-out systems. They have the skills necessary to insert the data into big data storage services to perform operations that are needed to receive the data in the required format and send it back when called. Data engineers know how to use data software such as Hadoop, Spark, MapReduce, Pig, hive, etc. to analyze and process large amounts of data. Data engineers also have a knowledge of programming and having a good understanding of data structures, and algorithms.

Data scientist

Data scientists connect the dots in the system. First, they understand the business’ values and requirements. Then, they start to work with small data and receive the help of others to access large data. They analyze data, prepare initial-level insights, and communicate where the business can improve. They also build predictive modules to satisfy the business’ requirements. They are responsible for preparing the results for stockholders, and designing algorithms for engineers to help build scalable, efficient systems.

Machine learning engineer

Machine learning engineers are required to take the results from data scientists and transform them into a product. They need strong skills in data structure and algorithms. They should understand the architecture of the existing system and how the prediction module can fit within that system. They usually prepare the data science process pipeline and make sure that the system remains available for high volume requests. Algorithms are prepared by data scientists in languages like R, SAS, or Python, and machine learning engineers implement in the required language efficiently (better use of memory, storage, etc.)

Statistician

Statisticians apply statistics to data to get insights. They prepare hypotheses and analyze data in order to support or refute those hypotheses. For example, a statistician might ask: is the behavior of the customers in one particular store similar to the behavior of all customers? A statistician’s tools are R and SAS. They prepare their results by using the UI plugins available in these tools.

Data architect

Data architects create a strategy for data management: how the data will flow in the system, how we can access it, the different parameters of security, data storage location, accessibility within the system, etc. Data architects are, in general, people who have experience working with large amounts of data.

Interview question

Question

Do you consider yourself to be a data scientist or as a data engineer?

Show Answer

Let’s get ready for this journey! 🚀

Are you ready for this?
Are you ready for this?