Data Preprocessing
Learn how to clean and process data using NLTK before using the LTSM model.
We'll cover the following...
Prior to designing a model, it's important to process the data that was covered previously
Text vectorization with Keras
We’ll use scikit-learn’s TfidfVectorizer function to convert the text data to integer representations. The function expects the maximum number of features.
In the code above:
Lines 1–3: We import the required modules:
numpyfromjaxasjnp,TfidfVectorizerfromsklearn.feature_extraction.text, andtrain_test_splitfromsklearn.model_selection.Line 5: We create an instance of the
TfidfVectorizerclass with themax_featuresof10000.Line 6: We call the
fit_transforms()method of theTfidfVectorizerclass to convert thedocsinto TF-IDF values. We also call theto_array()method to convert these TF-IDF values into an array and store it in theXvariable.Lines 7: We ...