Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

machine learning
definition

What is TF-IDF?

Nouman Abbasi

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

TF-IDF stands for “Term Frequency – Inverse Document Frequency.” It reflects how important a word is to a document in a collection or corpus. This technique is often used in information retrieval and text mining as a weighing factor.

TF-IDF is composed of two terms:

widget
  • Term Frequency (TF):
    The number of times a word appears in a document divided by the total number of words in that document.
widget
  • Inverse Document Frequency (IDF):
    The logarithm of the number of the documents in the corpus divided by the number of documents where the specific term appears.
widget

So, essentially, the TF-IDF value increases as the word’s frequency in a document (TF) increases. However, this is offset by the number of times the word appears in the entire collection of documents or corpus (IDF).

We have IDF to help remove common words like “the” or “is” that would, otherwise, have a high term frequency but are not that important.

Example

Let’s look at an example of how TF-IDF works.

Consider two sentences (or documents):

  1. “The cat is white”
  2. “The cat is black”

Notice that the only difference between the two sentences is the words “white” and “black”. These are important words that should get a high TF-IDF value, while words like “the” and “cat” should get a low value.

svg viewer
TF-IDF value for the word "white"
svg viewer
TF-IDF value for the word "the"

RELATED TAGS

machine learning
definition

CONTRIBUTOR

Nouman Abbasi
Copyright ©2022 Educative, Inc. All rights reserved

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring