What is Hugging Face?

Hugging Face is an AI community that promotes open source contributions. It is a hub of open source models for Natural Language Processing, computer vision, and other fields where AI plays its role. Even the tech giants like Google, Facebook, AWS, Microsoft, and others use the models, datasets, and libraries.

Models

Hugging Face provides state-of-the-art models for different tasks. It has a vast number of pre-trained models for different tasks. At the time of writing this article (August 2022), there were more than $61k$ pre-trained models. Following are the tasks supported by Hugging Face:

NLP tasks

Hugging Face is famous for its contribution to the NLP domain. The NLP tasks are:

Text classification
Text generation
Translation
Summarization
Fill-mask
Question-Answering
Zero-shot classification
Sentence similarity

Computer vision tasks

The computer vision tasks are as follows:

Image classification
Image segmentation
Object detection

Audio tasks

The audio tasks are as follows:

Speech recognition
Text-to-speech
Automatic Speech recognition
Audio classification

In Hugging Face, the Transformers library, allows us to use these models in a way that abstracts unnecessary details.

Datasets

There are more than $7k$ datasets present in the Hugging Face Dataset collection at the time of writing this article. These datasets are available in multiple languages to help us train our own models or fine-tune them using these datasets.

The Datasets library by Hugging Face provides us the facility to load these datasets, as well as our own datasets. This library also provides us with the most commonly used operations for processing the datasets. These operations include shuffling, sampling, filtering, etc. With the help of Apache Arrow, this library allows us to work with datasets that are larger than our memory.

Example

Here, we use the Transformers library to use a pre-trained model to generate predictions for a missing word.

Explanation

Line 4: In this line, we use pipeline to automatically configure a pipeline for our task, which is denoted as fill-mask. We have specified the use bert-base-uncased model.
Line 5: The string variable text will be the input to our pipeline. Notice that we have placed a [MASK] word in a place where we want our model to generate the actual word.
Line 6: To get the output from the model, we simply need to call the pipeline with the input.
Line 7–8: The output of the pipeline is in the form of a list of suggestions. Here we've used a loop to print them.

As we can see by using pipeline we've abstracted away a lot of unnecessary details. It is similar for other tasks as well.

Free Resources