Catalog: machine learning

Machine learning is ubiquitous in just about every industry right now. Every time you browse Facebook, Youtube, or Amazon, your recommended feeds are generated using machine learning.

Our catalog of quick Machine learning shots is ever-evolving. Our current selection of shots is organized by:

Theory
Code starters (Python)
Understanding results
Good to know

Disclaimer: A catalog answer links together all the answers on a particular topic and outlines how they fit together. A catalog does not attempt to cover the scope of a topic. It is only a catalog of the answers we have on the topic thus far.

Machine learning leverages data to answer questions that may not be easy to define computationally. Instead of us having to define what a cat looks like to a computer, we can have the computer understand on its own by looking at many pictures of cats on the internet. The implications of what this technology can do for us include self-driving cars, machines that can check for cancer, and more!

Here are some shots on the theory of machine learning to get you started:

Basic

These days, deep learning is the predominant approach to machine learning.
What is machine learning?
What is deep learning?
There are two broad types of learning techniques: supervised and unsupervised.
What is supervised learning?
What is unsupervised learning?
Difference between supervised and unsupervised learning
We approach supervised learning tasks with two approaches: regression or classification.
What is the difference between regression and classification?
A supervised learning technique is k-nearest neighbors.
Using the k-nearest neighbors algorithm in Python
For classification problems, we use Naive Bayes classifiers.
What are Naive Bayes classifiers?
These days, most machine learning models are neural networks that use something called a multilayer perceptron.
What is a multilayer perception?
Data is often represented in graphs that show clustering.
Definition: Data clustering
During training your neural network gets better by changing the weights of connections in the network. The weight changes happen in the backpropagation step.
What is backpropagation?
You can tweak the speed of convergence for the neural network by tweaking the learning rate.
Learning rate in machine learning
It is general practice to check the performance of your neural network (which has been training on training data) against a cross-validation data set.
What is cross-validation?
When training your machine learning model, it may not give you the best predictions possible on real-world data because you may be overfitting or underfitting to your training data.
Overfitting and underfitting
You can reduce overfitting by employing a technique called regularization.
What is regularization in machine learning?

Advance

Collaborative Filtering is a Machine Learning technique used to identify relationships between pieces of data. This technique is frequently used in recommender systems to identify similarities between user data and items.
What is collaborative filtering?
Content-based Filtering is a Machine Learning technique that uses similarities in features to make decisions. This technique is often used in recommender systems, which are algorithms designed to advertise or recommend things to users based on knowledge accumulated about the user.
What is content-based filtering?
Autoencoders are a type of neural network that can be used to reduced dimensionality or remove noise from a particular type of input. They have many real-world uses (e.g., facial recognition).
What is an autoencoder?
You will, more often than not, be able to use someone else’s model for your application, such as VGG19, for computer vision. You can tailor the model to your own in a process called transfer learning.
What are the strategies for using transfer learning? (CC)
What is transfer learning, and why is it needed? (CC)
An approach to giving answers on a spectrum (e.g., sure vs. somewhat sure) is by using fuzzy logic.
What is fuzzy logic?
Natural Language Processing (NLP) is the study of how machines analyze natural languages and produce meaningful information about the text.
What is Natural Language Processing?
What are the necessary tools and libraries for NLP? (CC)

2. Code starters (Python)

Much of the code for machine learning applications is done in the Python programming language. With Python, you can use utilize powerful machine learning libraries such as NumPy, pandas, TensorFlow, and PyTorch to abstract away most of the work (mathematics) for you.

Python libraries for machine learning

Learn more about popular Python libraries for machine learning:
What is NumPy?
What is pandas in Python?
What is PyTorch?
Essential Python libraries for machine learning

Jargon

Get familiar with the technical terms in the language:
Sparse matrices in Python
One-hot encoding in Python

Key operations in Python for ML

Learn more about key operations in machine learning:

Data science is a broad field of study aimed at maintaining data sets and deriving meaning out of them before feeding this data to any machine learning model.
Data Science complete guide (CC)
To ensure that the scales of your feature set (e.g., a scale of 1-100 vs. 1-5) do not skew results, we normalize data.
Data normalization in Python
You’ll want to vectorize your data before further processing it.
CountVectorizer in Python

After you have trained your models, you need to understand how it’s performing:

Before applying any of the techniques mentioned, be sure they apply to your model first.

One way to understand the trade-off between the true positive rate (TPR) and the false positive rate (FPR) of your model is to use Receiver Operating Characteristic (ROC) curves.
What are ROC curves?
You can get a tabular overview of the performance of your model by looking at its confusion matrix (how confused is your model?).
What is a confusion matrix?

4. Good to know

In this section, we will go over miscellaneous topics:

A possible source of error in your model may be data leakage, which is when your model trains with data it is not supposed to.
Data leakage in machine learning
You can forecast a time series by using the series’ past values with the ARIMA model.
What is an ARIMA model?
If you want to make inferences from small data samples, the bootstrap method is a technique you can use for these estimations.
What is the Bootstrap method in data science?

Comparisons

Certain languages may be better suited to a particular use case:

Python vs. R
Machine learning: Python vs. R

5. Application

We can use machine learning for automating interesting stuff.

Eye-blink detection
Eye blink detection using OpenCV (CC)

Free Resources