What is Hugging Face?
Hugging Face is an AI community that promotes open source contributions. It is a hub of open source models for Natural Language Processing, computer vision, and other fields where AI plays its role. Even the tech giants like Google, Facebook, AWS, Microsoft, and others use the models, datasets, and libraries.
Models
Hugging Face provides state-of-the-art models for different tasks. It has a vast number of pre-trained models for different tasks. At the time of writing this article (August 2022), there were more than
NLP tasks
Hugging Face is famous for its contribution to the NLP domain. The NLP tasks are:
Text classification
Text generation
Translation
Summarization
Fill-mask
Question-Answering
Zero-shot classification
Sentence similarity
Computer vision tasks
The computer vision tasks are as follows:
Image classification
Image segmentation
Object detection
Audio tasks
The audio tasks are as follows:
Speech recognition
Text-to-speech
Automatic Speech recognition
Audio classification
In Hugging Face, the Transformers library, allows us to use these models in a way that abstracts unnecessary details.
Datasets
There are more than
The Datasets library by Hugging Face provides us the facility to load these datasets, as well as our own datasets. This library also provides us with the most commonly used operations for processing the datasets. These operations include shuffling, sampling, filtering, etc. With the help of Apache Arrow, this library allows us to work with datasets that are larger than our memory.
Example
Here, we use the Transformers library to use a pre-trained model to generate predictions for a missing word.
from transformers import pipeline# specifying the pipelinebert_unmasker = pipeline('fill-mask', model="bert-base-uncased")text = "I have to wake up in the morning and [MASK] a doctor"result = bert_unmasker(text)for r in result:print(r)
Explanation
Line 4: In this line, we use
pipelineto automatically configure a pipeline for our task, which is denoted asfill-mask. We have specified the usebert-base-uncasedmodel.Line 5: The string variable
textwill be the input to our pipeline. Notice that we have placed a[MASK]word in a place where we want our model to generate the actual word.Line 6: To get the output from the model, we simply need to call the pipeline with the input.
Line 7–8: The output of the pipeline is in the form of a list of suggestions. Here we've used a loop to print them.
As we can see by using pipeline we've abstracted away a lot of unnecessary details. It is similar for other tasks as well.
Free Resources