Exercise: Linear Decision Boundary of Logistic Regression

Learn to fit a logistic regression model and visualize the decision boundary for synthetic data with two features.

Logistic regression and decision boundary visualization

In this exercise, we illustrate the concept of a decision boundary for a binary classification problem. We use synthetic data to create a clear example of how the decision boundary of logistic regression looks in comparison to the training samples. We start by generating two features, X1X_1 and X2X_2, at random. Because there are two features, we can say that the data for this problem is two-dimensional. This makes it easy to visualize the data. The concepts we illustrate here generalize to cases of more than two features, such as the real-world datasets you’re likely to see in your work; however, the decision boundary is harder to visualize in higher-dimensional spaces.

Perform the following steps to complete the exercise:

  1. Generate the features using the following code:

    from numpy.random import default_rng
    rg = default_rng(4)
    X_1_pos = rg.uniform(low=1, high=7, size=(20,1)) 
    X_1_neg = rg.uniform(low=3, high=10, size=(20,1))
    X_2_pos = rg.uniform(low=1, high=7, size=(20,1)) 
    X_2_neg = rg.uniform(low=3, high=10, size=(20,1))

    You don’t need to worry too much about why we selected the values we did; the plotting we do later should make it clear. Notice, however, that we have assigned the true class at the same time, by defining here which points (X1X_1, X2X_2) will be in the positive and negative classes. The result of this is that we have 20 samples each in the positive and negative classes, for a total of 40 samples, and that we have two features for each sample. We show the first three values of each feature for both the positive and negative classes.

    The output should be the following:

    # [4.06796532]
    # [6.85746223]]
    # [9.59962575]
    # [7.65960192]]
    # [5.6237829 ]
    # [2.14473103]]
    # [9.69185251]
    # [9.32236912]]
  2. Plot this data, coloring the positive samples as red squares and the negative samples as blue x’s. The plotting code is as follows:

    plt.scatter(X_1_pos, X_2_pos, color='red', marker='s')
    plt.scatter(X_1_neg, X_2_neg, color='blue', marker='x')
    plt.legend(['Positive class', 'Negative class']) 

    The result should look like this:

Get hands-on with 1200+ tech skills courses.