Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

sklearn
machine learning
communitycreator

What is datasets load_linnerud() in sklearn?

Salman Yousaf

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

The load_linnerud() method of sklearn helps to load and return the linnerud dataset. Linnerud dataset is perfect for multi-output regression problems.

Dataset overview

This data set consists of three exercise and three physiological features. These features or variables are collected from 20 middle aged men in a fitness club. Here, we have some characteristics of these features:

  • Exercise (data): It contains 20 observations on 3 exercise. Features: situps, chins, and jumps.
  • Physiological (target): It contains 20 observations on 3 physiological tests. Features: waist, weight and pulse.

#

Description

Sample count

20

Dimensionality

3

Features type

integer

Targets type

integer

Missing Values

None

Syntax

sklearn.datasets.load_linnerud( *,
    return_X_y = False,
    as_frame= False
  )

Parameters

  • *: It can take a list of argument values.

  • return_X_y: If return_X_y= True, then it returns a tuple of data and target value rather than a bunch object. It is False by default.

  • as_frame: If as_frame= True, then it returns the data as a pandas data frame. This data frame may contain columns with different data types (numeric, character, etc.).

Return value

  • data: It consists of multiple attributes.
    • data contains the data matrix. but if as_frame is set to True, then the data will be a data frame.
    • target represents the regression targets. if as_frame is set to True, then the target will be a data frame.
    • feature_names is a list containing names of columns for the dataset.
    • target_names is a list containing names of the target columns of the dataset.
    • frame returns a data frame with data and target tuples. It is present only if as_frame is set to True.
    • DESCR details descriptions of a dataset as a string.
    • data_filename is the path regarding the location of data as a string.
    • target_filename is the path regarding the location of a target as a string.

frame and (data,target) return values are new in version 0.23 and 0.20 respectively.

Why you should know about this dataset?

Question

Why use Linnerud dataset?

Show Answer
# Loading linnerud dataset from sklearn.datasets module
from sklearn.datasets import load_linnerud
data = load_linnerud()
# Exercise observations
X = data.data
# Physiological observations
Y = data.target
# printing dataset
print("Exercise observations")
print(X)
print("Physiological observations")
print(Y)

Explanation

  • Line 3: Calls the load_Linnerud() method to load the dataset from the sklearn.datasets module in the data variable.
  • Line 5: Slices exercise observations stored as data in X.
  • Line 7: Slices physiological observations stored as target in Y.

Application example

# Program to load linnerud dataset
# We will use load_linnerud() method from sklearn
from sklearn.datasets import load_linnerud
from sklearn.cross_decomposition import pls_
from sklearn.utils.testing import assert_array_almost_equal
def check_univariate_pls_regression():
# Ensure 1d Y is correctly interpreted
data = load_linnerud()
X = data.data
Y = data.target
clf = pls_.PLSRegression()
# Compare 1d to column vector
model1 = clf.fit(X, Y[:, 0]).coef_
model2 = clf.fit(X, Y[:, :1]).coef_
result = assert_array_almost_equal(model1, model2)
print(result)
if __name__ == "__main__":
check_univariate_pls_regression()
Using partial least squares regression on the linnerud dataset

Explanation

  • Line 9: Loads the dataset to a variable named data.
  • Line 10: Assigns independent features in X.
  • Line 11: Loads dependent or target features in Y.
  • Line 12: Creates an object of the PLSRegression class.
  • Line 14: Evaluates coefficient of the features after fitting fit(X, Y[:, 0]) and assigning to model1.
  • Line 15: Evaluates coefficients of the features after fitting fit(X, Y[:, :1]) and assigning to model2.
  • Line 16: Compares model1 and model2 values for similarities. The assert_array_almost_equal() method will return None when there is no assertion error. Otherwise, it will show some error messages or comparison summaries.
  • Line 17: Prints the results on the console.

RELATED TAGS

sklearn
machine learning
communitycreator

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring