Trusted answers to developer questions

Zain Ali Babar

**Logistic regression** is a supervised classification algorithm.
We’ve discussed what logistic regression is here. Now we will implement logistic regression using the Scikit learn toolkit.

We’ll use the wine dataset to train on the logistic regression model from scikit learn. We split the data into train and test (80-20 split) to make sure the classification algorithm is able to generalize well to unseen data.

We import the dataset from sklearn’s provided dataset. We will use the sklearn train test `split`

function to split the data into train and test samples. For evaluation, we use sklearn’s provided confusion matrix and accuracy functions. Finally, we import the `LogisticRegression`

from the sklearn library, as shown below:

```
import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
```

We load the `dataset`

into a local variable, and we call it dataset:

```
dataset = load_wine()
```

We split the data into test and train using the **sklearn library function** imported above. We use an 80-20 split, where 80% of the data is train and 20% is test. `x-train`

and `y_train`

contain the training data and labels respectively, while `x_test`

and `y_test`

contain the testing data and labels.

```
x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.20, random_state=15)
```

We make a logistic regression model and call it `logistic_model`

, as shown below:

```
logistic_model = LogisticRegression()
```

We train the model on the training data and the training labels.

```
logistic_model.fit(x_train, y_train)
```

The model uses the trained parameters learnt from the training data to `predict`

the labels of the test data.

```
y_pred = logistic_model.predict(x_test)
```

We use the `accuracy`

function and predicted labels to find the accuracy of the model. We multiply by 100 to get accuracy out of 100.

Similarly, we use the predicted labels to find the `confusion matrix`

.

```
accuracy = accuracy_score(y_test,y_pred)*100
confusion_mat = confusion_matrix(y_test,y_pred)
```

```
print("Accuracy is",accuracy)
print("Confusion Matrix")
print(confusion_mat)
```

#Importing the necessary libraries import numpy as np from sklearn.datasets import load_wine from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import confusion_matrix, accuracy_score # Importing the dataset from the sklearn library into a local variable called dataset dataset = load_wine() # Splitting the data test into train 80% and test 20%. # x_train, y_train are training data and labels respectively # x_test, y_test are testing data and labels respectively x_train, x_test, y_train, y_test = train_test_split(dataset.data, dataset.target, test_size=0.20, random_state=15) # Making the logistic regression model logistic_model = LogisticRegression() # Training the model on the training data and labels logistic_model.fit(x_train, y_train) # Using the model to predict the labels of the test data y_pred = logistic_model.predict(x_test) # Evaluating the accuracy of the model using the sklearn functions accuracy = accuracy_score(y_test,y_pred)*100 confusion_mat = confusion_matrix(y_test,y_pred) # Printing the results print("Accuracy is",accuracy) print("Confusion Matrix") print(confusion_mat)

The logistic regression model defined above gives 94% accuracy on the wine dataset. The `confusion matrix`

analysis shows that the model is performing well.

RELATED TAGS

machine learning

communitycreator

scikit-learn

CONTRIBUTOR

Zain Ali Babar

RELATED COURSES

View all Courses

Keep Exploring

Related Courses