Exercise: Linear Decision Boundary of Logistic Regression

Learn to fit a logistic regression model and visualize the decision boundary for synthetic data with two features.

We'll cover the following...

Logistic regression and decision boundary visualization
Try it yourself

Logistic regression and decision boundary visualization

In this exercise, we illustrate the concept of a decision boundary for a binary classification problem. We use synthetic data to create a clear example of how the decision boundary of logistic regression looks in comparison to the training samples. We start by generating two features, $X_1$ and $X_2$ , at random. Because there are two features, we can say that the data for this problem is two-dimensional. This makes it easy to visualize the data. The concepts we illustrate here generalize to cases of more than two features, such as the real-world datasets you’re likely to see in your work; however, the decision boundary is harder to visualize in higher-dimensional spaces.

Perform the following steps to complete the exercise:

Generate the features using the following code:
```
from numpy.random import default_rng
rg = default_rng(4)
X_1_pos = rg.uniform(low=1, high=7, size=(20,1)) 
print(X_1_pos[0:3]) 
X_1_neg = rg.uniform(low=3, high=10, size=(20,1))
print(X_1_neg[0:3])
X_2_pos = rg.uniform(low=1, high=7, size=(20,1)) 
print(X_2_pos[0:3])
X_2_neg = rg.uniform(low=3, high=10, size=(20,1))
print(X_2_neg[0:3])
```
You don’t need to worry too much about why we selected the values we did; the plotting we do later should make it clear. Notice, however, that we have assigned the true class at the same time, by defining here which points ( $X_1$ , $X_2$ ) will be in the positive and negative classes. The result of this is that we have 20 samples each in the positive and negative classes, for a total of 40 samples, and that we have two features for each sample. We show the first three values of each feature for both the positive and negative classes.

The output should be the following:
```
#[[6.65833663]
# [4.06796532]
# [6.85746223]]
#[[7.93405322]
# [9.59962575]
# [7.65960192]]
#[[5.15531227]
# [5.6237829 ]
# [2.14473103]]
#[[6.49784918]
# [9.69185251]
# [9.32236912]]
```
Plot this data, coloring the positive samples as red squares and the negative samples as blue x’s. The plotting code is as follows:
```
plt.scatter(X_1_pos, X_2_pos, color='red', marker='s')
plt.scatter(X_1_neg, X_2_neg,
```

...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Exercise: Linear Decision Boundary of Logistic Regression

Logistic regression and decision boundary visualization