Get hands-on experience in NLP with challenging applications.

Mastering Natural Language Processing

In this module, we’ll learn advanced deep learning concepts and practice building some advanced deep learning and natural language processing (NLP) projects. By the end, we’ll be able to utilize deep learning algorithms used at large in the industry. This is a project-based module with 12 projects in total. This will get us used to building real-world applications that are being used in a wide range of industries. We’ll be exposed to the most common tools used for machine learning projects, including NumPy, Matplotlib, scikit-learn, Tensorflow, and more.
Once we’re finished, we’ll have the experience to start building our amazing projects and some great new additions to our portfolio.

Building Advanced Deep Learning and NLP Projects

## Padding

In the previous lesson, we preprocessed our data and created a numeric representation of the test sentences. We will be using the same function to work with our original dataset.

First, we will create the padding functionality.


import numpy as np
from tensorflow.python.keras.preprocessing.sequence import pad_sequences

def pad(x, length=None):
    if length is None:
        length = max([len(sentence) for sentence in x])
    return pad_sequences(x, maxlen=length, padding='post')

test_pad = pad(text_tokenized)
for sample_i, (token_sent, pad_sent) in enumerate(zip(text_tokenized, test_pad)):
    print('Sequence {} in x'.format(sample_i + 1))
    print('  Input:  {}'.format(np.array(token_sent)))
    print('  Output: {}'.format(pad_sent))

python3

**Explanation:**
* First, we imported the required packages.

* From *line 4 to line 7*, we defined a function that will pad our data. We are trying to find the sequence that is of maximum length. After that, we used the `pad_sequences()` function to pad extra 0's at the end by providing the `padding="post"` parameter and also providing the maximum length of the sequence (which is never going to be more than the maximum length).

* On *line 9* we called the `pad()` function on the sequences that we created in the previous lesson.
* Finally, we printed the sequence without padding and the sequence again after padding. Take a look at the output for one of the sequences below.
   ```
   Sequence 1 in x
  Input:  [ 4  7  2  1 16 10  5 11 17  1 18  8  3 19 12  1 20  3 21  1 22 10 23 14
  6  1  3 24  2  8  1  4  7  2  1 25 13 26  9  1 27  3 28  1 15]
  Output: [ 4  7  2  1 16 10  5 11 17  1 18  8  3 19 12  1 20  3 21  1 22 10 23 14
  6  1  3 24  2  8  1  4  7  2  1 25 13 26  9  1 27  3 28  1 15  0  0  0
  0  0  0  0  0  0]
   ```
   You can see that extra 0's have been added at the end of the sequence.

Now we are ready to create our NLP pipeline which will take our original data.


## The preprocess pipeline on our original data

Now we will define the pipeline for our original data.

# Padding

In the previous lesson, we preprocessed our data and created a numeric representation of the test sentences. We will be using the same function to work with our original dataset.

First, we will create the padding functionality.


**Explanation:**
* First, we imported the required packages.

* From *line 4 to line 7*, we defined a function that will pad our data. We are trying to find the sequence that is of maximum length. After that, we used the `pad_sequences()` function to pad extra 0's at the end by providing the `padding="post"` parameter and also providing the maximum length of the sequence (which is never going to be more than the maximum length).

* On *line 9* we called the `pad()` function on the sequences that we created in the previous lesson.
* Finally, we printed the sequence without padding and the sequence again after padding. Take a look at the output for one of the sequences below.
   ```
   Sequence 1 in x
  Input:  [ 4  7  2  1 16 10  5 11 17  1 18  8  3 19 12  1 20  3 21  1 22 10 23 14
  6  1  3 24  2  8  1  4  7  2  1 25 13 26  9  1 27  3 28  1 15]
  Output: [ 4  7  2  1 16 10  5 11 17  1 18  8  3 19 12  1 20  3 21  1 22 10 23 14
  6  1  3 24  2  8  1  4  7  2  1 25 13 26  9  1 27  3 28  1 15  0  0  0
  0  0  0  0  0  0]
   ```
   You can see that extra 0's have been added at the end of the sequence.

Now we are ready to create our NLP pipeline which will take our original data.


# The preprocess pipeline on our original data

Now we will define the pipeline for our original data.

In this lesson, we will perform some preprocessing on our dataset.

Project: Build a COVID-19 Detection System Using X-Rays

Project: Building a Pokemon Classifier Using Transfer Learning

Project: Text Generation Using Markov Chains

Word Embedding: Two Mini Projects

Project: IMDB Reviews Sentiment Analysis

Project: Deciphering Text Using Character-Level RNNs

Project: Emoji Predictor Using Transfer Learning in NLP

Final Exam

Where to Go Next?

Conslusion

Project Creation: Part Two

Padding