Grokking Modern System Design Interview for Engineers & Managers
Analyzed data is used to discover and explore opportunities, improve decision processes, and build intelligence for future occurrences. Data is generated from all activities done in online and offline platforms, which could be in structured, semi-structured, or unstructured forms. Data science has many branches such as programming, artificial intelligence, web development, and data analysis.
These disciplines are interdependent and interrelated. Therefore, there is a need to make a clear distinction between the fields of data science. Data analysis is another broad field that cuts across many fields of studies in regards to applications such as engineering, business intelligence, visualizations of data, optimization of systems, and data mining processes.
Data analysis is the process of making decisions with data collected and analyzed (sorted, organized, and visualized) from different data sources. Data analysis can be performed in two approaches in the research study: quantitative and qualitative. One of the most important aspects of data analysis is that it helps to evaluate customer behavior and improve market strategies and engagements. It also helps to facilitate and build decision processes carried out on facts, not assumptions.
There are several categories of data analysis:
The types of data analysis:
Data mining is a form of data exploration analysis known as engineering metrics and insight evaluation. Data mining is a knowledge discovery database system which facilitates the process of digging through a data set to predict possible future outcomes. It involves searching to uncover the loopholes in a data set, a hidden pattern (or trend), and connections within a set of data, so as to predict future occurrences effectively. Its application investigates analyzed data to discover the pattern of occurrences with that data set. Data mining is made up of three disciplines: statistics, machine learning, and artificial intelligence. It is used by industries to price and process optimization, promotion strategies, and business economics.
The table shown below contains the summary of the difference between data analysis and data mining.
|Data analysis||Data Mining|
|Data is extracted from semi-structured and unstructured form.||Data is extracted from a structured form.|
|It is visualized using charts and tables and interpreted using statistical tools.||It is studied using equations generated from the data set.|
|Data analysis is used in optimization processes.||Data mining investigates the hypotheses of data analysis.|
|A team is required in data analysis.||Only one specialist is required in data mining.|
|Data mining through models can be used to predict future values.|
Data analysis involves the extraction of data from different sources (unstructured data, semi-structured data, and structured data), the sorting of data, the transformation of data, and the visualization of data to enhance decision making processes. Data mining is a subset of data analysis. It is an exploration of structured data to reveal the hidden pattern within a structured data set.
Data analysis aids in organizing data into structured form and with its knowledge the behavior of the data is studied using data mining. This shows that data mining starts to explore the meaning of a data set from the organized form of data handled in data analysis. The image below show the subset of data analysis and data mining.
Data is sourced and processed from three different forms, which are structured, semi-structured, and unstructured. The work description of data science disciplines reveals that data analysis conducts its searches from a semi-structure and unstructured data set while data mining conducts its research pattern from semi-structure and structured data.
The pattern of search of data analysis reveals and answers the questions of what, why, how, and where to draw conclusions of a set of data. Data mining uses how and why rhetoric to launch its search pattern to provide the evidence of occurrences and possibly predict the future variables using generated models. Data analysis studies how a data set affects a system.
Data analysis requires visualization (charts and tables) and statistical tools to interpret data, while data mining requires models and statistical tools to unveil the pattern of data to predict future values. Modeling helps in prediction and to study the pattern of data movement. Models are equations generated from the study of the kinetics or changes in data. Through data mining, models are generated. Generation of models is possible because of effective data visualization (graphs and charts).
Consider a business data chart showing the number of sales and reviews for three months. If you check through the data chart, you will notice that every week, a different number of customers give similar reviews as being satisfactory, while others are unsatisfied with reasons. Now, if effective records are taken to capture the names of customers, you will realized that some customers also come back to purchase while others do not. Now, data analysis through visualization and charts reveals the trend of the data on number of weekly customers and daily reviews but it does not capture the reasons why poor reviews are made. Through data mining, specific searches can be made to know why there were poor reviews, and through this, models can be generated from the customer’s responses to predict the future of the business if changes in organizational strategy is not made.
Now, the reasons for the poor reviews from the customers which were not shown on the chart are the loopholes of data. When we search the trend of the data, lope holes that make data inconsistent to the business expectations are revealed. The models that predict the future values are generated through the loopholes discovered from the visualization of data analyzed. This example suggest that data mining is more specific in nature, while data analysis is more general in nature.
Data analysis requires a team of operators or specialists to access data from different sources, transform the data using visual aids for better comprehension, and draw conclusions, while data mining could be performed by one specialist. Due to the fact that data mining functions from a semi structured or structured data, predictions could be studied by one specialist to unveil the data pattern.
Data mining is used to optimize data through the models generated, while data analysis uses its observation to interpret data. Optimization helps in efficient utilization of resources. Data analysis is used to test hypotheses and translate the search results into accessible information, but in the case of data mining, no preconceived hypotheses are used to explain or investigate data but rather, data mining investigates the hypotheses of data analysis.
Regression analysis is used for the prediction of future behavior of data series. Data mining predicts the future after unveiling hidden patterns within a data set while data analysis interprets the reason for the past and present occurrences. There are many other forecasting techniques that can be used to model business and economic data (such as linear, exponential, power, logarithmic, quadratic, cubic, and polynomials). In the case of data analysis, it cannot be used to predict future occurrences but rather it can be used for the purpose of further study.
Grokking Modern System Design Interview for Engineers & Managers