The Basic Idea of Machine Learning

Discover the main reason behind the emergence of machine learning.

This chapter provides a high-level overview of machine learning, particularly how it relates to building models from data.

We start with a basic idea in the historical context and phrase the learning problem in a simple mathematical term as a function approximation as well as in a probabilistic context. In contrast to more traditional models, we can characterize machine learning as nonlinear regression in high-dimensional spaces. This chapter seeks to point out how diverse subareas such as deep learning and Bayesian networks fit into the larger scheme of things and aims to motivate further study with some examples of recent progress.

Emergence of machine learning

The recent importance of machine learning and its rapid development with new industrial applications has been breathtaking, and it is beyond the scope of this course to anticipate the multitude of developments that will occur. However, the knowledge of basic ideas behind machine learning— many of which have been around for some time— and their formalization for building probabilistic models to describe data are now important basic skills.

Machine learning is about modeling data. Describing data and uncertainty has been the traditional domain of Bayesian statistics and probability theory. In contrast, it seems that many exciting recent techniques come from an area now called deep learning. The specific contribution of this course is its attempt to highlight the relationship between these areas.

Importance of data

We often simply say that we learn from data, but it is useful to realize that data can mean several things. In its most fundamental form, data usually consists of measurements. The following are a few examples of measurements:

  • The intensity of light in a digital camera
  • The measurement of electric potentials in Electroencephalography (EEG)
  • The recording of stock-market data

However, what we need for learning is a teacher who provides us with information about what this data should predict. Such information can take many different forms. For example, we might have a form of data that we call labels, such as the identity of objects in a digital photograph. This is exactly the kind of information we need to learn optical object recognition. The teacher provides examples of the desired answers that the student (learner) should learn to predict for novel inputs.

Types of machine learning

The following are types of machine learning:

  • Supervised Learning: Learning will always involve optimizing an objective function, and we will see that the objective function can easily be formulated with specific examples of the desired answers for a learner. The desired answers are the labels. This kind of guidance in a learning algorithm is traditionally called supervised learning.

  • Unsupervised Learning: At the other extreme, we might not have any labels. This has traditionally been called unsupervised learning. However, a teacher still needs to provide some guidance in the form of an objective, such as ordering data with certain rules. An example of this is clustering when a teacher specifies a distance measure like the Euclidean distance between feature vectors. We will see that such methods are important for representational learning.

  • Reinforcement Learning: Finally, a much more general form of learning is
    when the teacher provides some guidance, but the learner, in addition, has to explore possible actions to find novel solutions. Such learning is formalized in reinforcement learning, where the objective functions are a slightly more general form compared to the simpler supervised learning.

While we will encounter all these different types of learning in this course, most of the fundamentals of learning theory and building models can be demonstrated in the simplest setting of supervised learning.

In machine learning, we are trying to solve problems with computers without explicitly programming them for specific tasks. We will still need to program the learning machine and we often have to make some adjustments to such programs for a specific task.

However, such an approach is somewhat more general than coding a specific logic for a specific problem. Programming general learning machines instead of specific solutions to a problem is desirable, specifically for tasks that would be difficult to program in an explicit rule-based system.

A classic example that we will discuss in some length is that of character recognition, as illustrated in the figure above. Writing a program that can translate a visual representation of a character, say the letter A, to its computer-interpretable meaning of this character, such as representing this letter as the ASCII string 01000001, is not easy when considering all the shapes and styles that this character can take.