LR Implementation Steps: 1 to 3
Explore how to start building a linear regression model by importing necessary Python libraries, loading datasets, and deciding which variables to keep or remove to improve model effectiveness. Understand the importance of handling missing data and avoiding multicollinearity in your features.
We'll cover the following...
1) Import libraries
Let’s begin by importing the following Python libraries:
Note: Codes of further steps won’t include codes of previous steps. They’re already appended at the backend for you.
2) Import dataset
Using the Pandas pd.read_csv command, load the CSV dataset into a data frame and assign the data frame as a variable called df using the equals operator.
Melbourne housing dataset variables
Feature | Data Type | Continuous/Discrete |
Suburb | String | Discrete |
Address | String | Discrete |
Rooms | Integer | Continuous |
Type | String | Discrete |
Price | Integer | Continuous |
Method | String | Discrete |
SellerG (seller's name) | String | Discrete |
Date | TimeDate | Discrete |
Distance | Floating-point | Continuous |
Postcode | Integer | Discrete |
Bedroom2 | Integer | Continuous |
Bathroom | Integer | Continuous |
Car | Integer | Continuous |
Landsize | Integer | Continuous |
BuildingArea | Integer | Continuous |
YearBuilt | TimeDate | Discrete |
CouncilArea | String | Discrete |
Latitude | String | Discrete |
Longitude | String | Discrete |
Regionname | String | Discrete |
Propertycount (is that suburb) | Integer | Continuous |
Please note that the Latitude and Longitude variables are misspelled in this dataset, but this won’t affect our code, as you’ll remove these two variables in Step 3.
3) Remove variables
Regression models can be developed in two ways. The first is to follow the principle of ...