Duplicates

This lesson will focus on how to deal with data that has duplicates.

Duplicates

Repeated data rows in the dataset are called duplicates. These can arise from a number of ways. The most common are:

  • The same data is entered twice by accident, such as the same article is scraped twice or booking for an online product is made twice.

  • If data is being collected in online forms or surveys and the user presses the submit button twice.

  • If data is collected from multiple sources.

Get hands-on with 1200+ tech skills courses.