Training Data Generation

Explore how to generate training data for user engagement prediction models in feed-based systems. Understand balancing positive and negative examples, the impact of sampling on model calibration, and effective train-test splitting based on time intervals to improve model performance in real scenarios.

We'll cover the following...

Training data generation through online user engagement
Balancing positive and negative training examples
Train test split

Your user engagement prediction model’s performance will depend largely on the quality and quantity of the training data. So, let’s see how you can generate training data for your model.

📝 Note that the term training data row and training example will be used interchangeably.

Training data generation through online user engagement

The users’ online engagement with Tweets can give us positive and negative training examples. For instance, if you are training a single model to predict user engagement, then all the Tweets that received user engagement would be labeled as positive training examples. Similarly, the Tweets that only have impressions would be labeled as negative training examples.

📝 Impression: If a Tweet is displayed on a user’s Twitter feed, it counts as an impression. It is not necessary that the user reads it or engages with it, scrolling past it also counts as an impression.

1.Introduction

2.Practical ML Techniques/Concepts

Breakout Session

3.Search Ranking

Breakout Session

4.Feed Based System

5.Recommendation System

Breakout Session

Mock Interview

6.Self-Driving Car: Image Segmentation

7.Entity Linking System

Mock Interview

8.Ad Prediction System

Breakout Session

Mock Interview

Mock Interview

Mock Interview

Training Data Generation

Training data generation through online user engagement

Balancing positive and negative training examples