Training Data Generation

Let's collect and label training data for the feed ranking ML model.

Your user engagement prediction model’s performance will depend largely on the quality and quantity of the training data. So, let’s see how you can generate training data for your model.

📝 Note that the term training data row and training example will be used interchangeably.

Training data generation through online user engagement

The users’ online engagement with Tweets can give us positive and negative training examples. For instance, if you are training a single model to predict user engagement, then all the Tweets that received user engagement would be labeled as positive training examples. Similarly, the Tweets that only have impressions would be labeled as negative training examples.

📝 Impression: If a Tweet is displayed on a user’s Twitter feed, it counts as an impression. It is not necessary that the user reads it or engages with it, scrolling past it also counts as an impression.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.