This device is not compatible.


Detect a Writer’s Fingerprints Using Machine Learning

In this project, we will study the writing style of writers through quantitative analysis and learn how an author’s style evolves over time.

Detect a Writer’s Fingerprints Using Machine Learning

You will learn to:

Explore a dataset using Python packages.

Prepare texts for stylometric analysis.

Extract textual features that help establish authorship.

Use Burrows's Delta to compare authors’ writing styles.


Natural Language Processing

Machine Learning

Data Analysis


Basic understanding of Python

Intermediate knowledge of pandas

Intermediate knowledge of seaborn







Project Description

In this project, we will explore authorship attribution by analyzing the unique traits in an author’s written works. Our dataset comprises a collection of songs from well-known songwriters and includes song titles, lyrics, and author information. We will develop a model that will accurately attribute authorship to a given text. Such a model can have applications in various fields, such as plagiarism detection, literary analysis, and authorship attribution.

To get started, we will load the dataset and language model that will help us in processing the text. Then, we will preprocess the text to minimize noise and extract linguistic features that can help in identifying an author, for example, word length distribution, word frequency, and word co-occurrences. Next, we will learn to create a training corpus, and use it to attribute authorship to a text using Burrows's Delta.

By the end of this project, we will build a model that can attribute authorship with high accuracy. We will also explore how these techniques can be extended to analyze how an author’s style evolves over time.

Project Tasks


Getting Started

Task 0: Introduction

Task 1: Import the Libraries

Task 2: Load the Dataset


Authorship Attribution

Task 3: Preprocess Song Lyrics for Analysis

Task 4: Get Word Lengths

Task 5: Get Word Frequencies

Task 6: Get Bigram Frequencies

Task 7: Create a Test and Train Corpora

Task 8: Tokenize Both Corpora and Calculate the Distance


Author Evolution

Task 9: Split the Dataset into Early Songs and Last Songs

Task 10: Compare Word Length

Task 11: Compare Frequent Words

Task 12: Compare Lexical Diversity

Task 13: Compare Function Words