- Model Application
Using logistic regression in MLlib to fit the training dataframe.
We'll cover the following...
We'll cover the following...
Logistic regression
Now that we have prepared our training and test datasets, we can use the logistic regression algorithm provided by MLlib to fit the training dataframe.
- We first create a logistic regression object and define the columns to use as labels and features.
- Next, we use the
fitfunction to train the model on the training dataset. - In the last step in the snippet below, we use the
transformfunction to apply the model to our test dataset.
from pyspark.ml.classification import LogisticRegression# specify the columns for the modellr = LogisticRegression(featuresCol='features', labelCol='label')# fit on training datamodel = lr.fit(trainVec)# predict on test datapredDF = model.transform(testVec)
Results
The resulting dataframe now has a probability column, as shown in the table below. This column is a ...