Processing of data

Switching from history, let’s now talk about how we derive knowledge from data. To derive knowledge from data, we can adapt the scientific method—a method that has been studied and discussed for thousands of years. Principles of empirical observation and experimentation have been around at least since the Ancient Greeks. The modern principles of empirical science were born from the natural sciences in the seventeenth century.

The scientific method describes a specific process through which knowledge is generated to ensure that it’s testable and falsifiable. It’s of paramount importance that the conclusions we draw from data are as valid as possible. If we make guesses or know of biases in data sets, we need to state this clearly. The scientific method prescribes several steps and considerations that must be followed in order for the knowledge to be valid, falsifiable, and generalizable.

Valid broadly means measuring what we intend to measure, falsifiable means that we should be able to apply the same method to a similar data set and achieve the same result, and generalizable means that we can generalize from a specific data set or sample to the population we’re interested in saying something about. These brief definitions simplify many underlying complexities for the sake of brevity here, and any data scientist needs to understand the principles of empirical science in more depth.

In this course, we’re making the assumption that we already know about the fundamental aspects of empirical and experimental research.

Knowledge discovery process

In data science, the scientific method has been adapted for the analysis of data via a framework referred to as the knowledge discovery process. Within the process of knowledge discovery, there’s a constant shift between inductive and deductive reasoning and the particular algorithms used throughout the process.

In deductive reasoning (top-down), a particular conclusion is reached from a general set of rules or a theory. In inductive reasoning (bottom-up), the conclusion is reached from specific observations that can then be generalized to abstract rules or theories. Both approaches are necessary. Inductive reasoning is more open-ended and exploratory, and is useful especially at the beginning of the knowledge discovery process during exploratory analyses. Deductive reasoning, on the other hand, is narrower and more concerned with testing or confirming hypotheses, useful later on in the process as confirmatory analysis. Throughout this course, several methods will be introduced for both types of processes.

Get hands-on with 1200+ tech skills courses.