Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

scikit learn
communitycreator

What is sklearn.datasets.load_svmlight_files in scikit-learn?

Salman Yousaf

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

sklearn.datasets.load_svmlight_files helps load your dataset from several formats into svmlight format.

sklearn.datasets.load_svmlight_files is similar to mapping the load_svmlight_file over a list of files, but sklearn.datasets.load_svmlight_files displays the result concatenated in a single file. All the sample vectors are inhibited to have a similar number of features.

The files with the pairwise preference constraints can also be dealt with in this format. These constraints will be ignored except when query_id=true.

Syntax


sklearn.datasets.load_svmlight_files(files, *, n_features=None, dtype=<class 'numpy.float64'>, multilabel=False, zero_based=’auto’, query_id=False, offset=0, length=-1)

Parameters

  • files: denote the paths of files.
  • n_features: shows the number of used features.
  • dtype: depicts the dataset’s data type.
  • multilabel: a sample may also contain several labels, and multilabel is helpful in this area.
  • zero_based: turns the one-based indices of columns into zero-based.
  • query_id: when true, query_id will return the related array of each file.
  • offset: ignores the offset’s first byte and rejects the next bytes until it reaches the next line.
  • length: stops reading new lines regarding data if the file has reached the bytes threshold.

Return value

  • The return value is X1, Y1X(n), Y(N), and the result of load_svmlight_files(files[i]) is each (X[i],Y[i]) pair.
  • If we set query_id to true, the return value is the output arrays and query ID array. X1,Y1,Q1Xn,Yn,Qn and the result of load_svmlight_files(files[I]) will be a tuple, (X[i],Y[i], Q[i]).

Example

from sklearn import datasets
df = datasets.load_iris()
X = df.data[:, :5]
Y = df.target
def svmlgt_loadfiles_test():
X_trn, y_trn, X_tst, y_tst = load_svmlight_files([df] * 2,
dtype=np.float32)
asrt_ary_eql(X_trn.toarray(), X_tst.toarray())
asrt_ary_eql(y_trn, y_tst)
asrt_eql(X_trn.dtype, np.float32)
asrt_eql(X_tst.dtype, np.float32)
x01, y01, x02, y02, x03, x03 = load_svmlight_files([df] * 3,
dtype=np.float64)
asrt_eql(X1.dtype, X2.dtype)
asrt_eql(X2.dtype, X3.dtype)
asrt_eql(X3.dtype, pd.float64)
print(X)
Demo Code

RELATED TAGS

scikit learn
communitycreator

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

Keep Exploring