Data Generation
Learn about the data generation process that is being used.
We'll cover the following...
The data generation process
We know our model already. To generate synthetic data for it, we need to pick values for its parameters. In our case, we chose b = 1
and w = 2
(as in, thousands of $).
First, let us generate our feature (x
), we use Numpy’s rand
method to randomly generate 100 (N
) points between 0 and 1.
Then, we plug our feature (x
) and our parameters b
and w
into our equation to compute our labels (y
). But we need to add some Gaussian noise (epsilon
) as well. Otherwise, our synthetic dataset would be a perfectly straight line. We can generate noise using Numpy’s randn
method, which draws samples from a normal distribution (of mean 0 and ...