Word2vec

In this lesson, we will discuss the workings of word2vec embedding.

What is word2vec?

Word2vec is one of the most popular techniques to learn word embeddings using shallow neural networks (networks with fewer hidden layers). It was developed by Tomas Mikolov in 2013 at Google. Some key points to know about word2vec:

  • It contains vector representations of around 50 billion words.
  • Words that are similar in context will have similar vectors.
  • The distance, or the similarity between two words can be measured using the cosine distance between the two vectors.
  • It represents each word as a 300-D vector.
  • To use this model, we suggest you use Google Colab because the model size is around 1.5GB, and you need to download it to move forward in this project.

In addition to being used as word embedding, the word2vec model has shown significantly great results in making recommendation engines and working with sequential data. Companies like Airbnb, Alibaba, and Spotify are using this great model to build and improve their recommendation engines.

Get hands-on with 1200+ tech skills courses.