How to perform note tracking in librosa
In music theory, notes represent a musical sound that can represent a sound’s
How it works
To track different notes in an audio file, we can detect the change in the onset_detect() function to get the frames for all the onsets in the audio.
We also need to find the chroma_stft() function provided by librosa. This function returns the chromatogram from a short-time Fourier transform (STFT) representation of an audio signal. This chromatogram represents the energy distribution of pitch classes over time. The mathematical equation for the short-time Fourier transform is as follows:
In this equation:
represents the short-time Fourier transform at time and frequency . is the input signal. is the window function applied to the signal to create a short segment centered at time . represents the complex exponential used to analyze the frequency content at frequency . denotes the integral, indicating that the STFT is computed as the integral of the product of the signal, the window function, and the complex exponential over a small time window.
Once we have the chroma values, we can simply use the frames we obtain from the onset_detect() function to obtain the maximum chroma value at these frames, and we can use the frames_to_time() function to calculate the duration of each note. This way, we can have each note’s pitch intensity and time duration.
Code
The following code tracks the notes in a trumpet audio clip:
import librosa# Loading the audio fileaudio_file = '../trumpet.ogg'y, sr = librosa.load(audio_file)# Extracting the chroma features and onsetschroma = librosa.feature.chroma_stft(y=y, sr=sr)onset_frames = librosa.onset.onset_detect(y=y, sr=sr)first = Truenotes = []for onset in onset_frames:chroma_at_onset = chroma[:, onset]note_pitch = chroma_at_onset.argmax()# For all other notesif not first:note_duration = librosa.frames_to_time(onset, sr=sr)notes.append((note_pitch,onset, note_duration - prev_note_duration))prev_note_duration = note_duration# For the first noteelse:prev_note_duration = librosa.frames_to_time(onset, sr=sr)first = Falseprint("Note pitch \t Onset frame \t Note duration")for entry in notes:print(entry[0],'\t\t',entry[1],'\t\t',entry[2])
Lines 4–5: We import the audio file we will use for this task. The audio file is available for free and comes by default with the librosa library. Here is the link to the audio file:
Lines 8–9: We extract the chroma features and onset frames from the audio file. The onset values return an array of frames where a new musical note starts.
Lines 13–15: Here, we go through the onset frames and pick the index of the maximum chroma value at that frame. The
note_pitchvalue indicates the note with the maximum chroma value, meaning it will give us one of the 12 musical notes.Lines 17–24: If this is the first note that we track, then we convert the frame to time using the
librosa.frames_to_time()function and set thefirstflag asFalse. If it is not the first note, we take the most recent note we tracked and calculate the difference between the last and current notes’ duration.
Output
In the output, we get a table of all the notes in the audio. We can see that we have tracked different notes and can distinguish them. Moreover, the output also gives us the onset frame, which tells us at which frame the note starts exactly. Lastly, we also get the duration of the note.
Conclusion
In conclusion, note tracking using librosa is a powerful and versatile tool for analyzing and extracting valuable information from audio data. Throughout this task, we explored how librosa can be applied to various use cases, demonstrating its significance in various fields.
Librosa’s note-tracking capabilities offer versatile applications in music transcription, speech analysis, and environmental monitoring. It aids in music interpretation, speech recognition, and acoustic event detection. Librosa is a valuable tool with broad implications for multiple industries, driving innovation and enhancing data analysis and decision-making.
Free Resources