Project Creation: Part Two
In this lesson, we will perform some preprocessing on our dataset.
We'll cover the following...
Padding
In the previous lesson, we preprocessed our data and created a numeric representation of the test sentences. We will be using the same function to work with our original dataset.
First, we will create the padding functionality.
Explanation:
-
First, we imported the required packages.
-
From line 4 to line 7, we defined a function that will pad our data. We are trying to find the sequence that is of maximum length. After that, we used the
pad_sequences()function to pad extra 0’s at the end by providing thepadding="post"parameter and also providing the maximum length of the sequence (which is never going to be more than the maximum length). -
On line 9 we called the
pad()function on the sequences that we created in the previous lesson. -
Finally, we printed the sequence without padding and the sequence again after padding. Take a look at the output for one of the sequences below.
Sequence 1 in x Input: [ 4 7 2 1 16 10 5 11 17 1 18 8 3 19 12 1 20 3 21 1 22 10 23 14 6 1 3 24 2 8 1 4 7 2 1 25 13 26 9 1 27 3 28 1 15] Output: [ 4 7 2 1 16 10 5 11 17 1 18 8 3 19 12 1 20 3 21 1 22 10 23 14 6 1 3 24 2 8 1 4 7 2 1 25 13 26 9 1 27 3 28 1 15 0 0 0 0 0 0 0 0 0]You can see that ...