Scaling the Dataset
Explore how scaling datasets properly influences gradient descent optimization by achieving equally steep loss curves. Learn to standardize features using StandardScaler to ensure zero mean and unit variance, which helps improve learning rates and prevents vanishing gradients during model training.
Overview of learning rate results
The conclusion that we drew from looking at the results of the different learning rates was that it is best if all the curves are equally steep, so the learning rate is closer to optimal for all of them!
Achieving equally steep curves
How do we then achieve equally steep curves? The short answer: you have to “correctly” scale your dataset. Let us now go into depth about how scaling your dataset helps to achieve equally steep curves.
“Bad” feature
First, let us take a look at a slightly modified example, which we would be calling the “bad” dataset:
-
Over here, we multiplied our feature (
x) by 10, so it is in the range[0, 10]now and renamed tobad_x. -
But since we do not want the labels (
y) to change, we also divided the originaltrue_wparameter by 10 and renamed itbad_w. this way, bothbad_w * bad_xandw * xyields the same results.
Then, we performed the same split as before for both the original and bad datasets, and plot the training sets side-by-side, as seen below:
The following figure shows the difference between the original training dataset and the bad ...
The only difference between the two plots is the scale ...