Classification using SVM, KNN, RandomForestClassifier, and PCA

Helper functions

Let’s create some helper functions to load the datasets and models.

Function to get the dataset

Let’s create a function named return_data() that helps us to load the datasets.

def return_data(dataset):
    if dataset == 'Wine':
        data = load_wine()
    elif dataset == 'Iris':
        data = load_iris()
    else:
        data = load_breast_cancer()
    df = pd.DataFrame(data.data, columns=data.feature_names , index=None)
    df['Type'] = data.target
    X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, random_state=1, test_size=0.2)
    return X_train, X_test, y_train, y_test,df,data.target_names
  • The function return_data(dataset) takes a string that contains the name of the dataset the user selects.
  • It loads the relevant dataset.
  • We create a DataFrame df that we can show in our UI.
  • We use sklearn’s train_test_split() method to create the training sets (X_train, y_train) and testing sets (X_test, y_test).
  • The function returns the training set, testing set, the DataFrame, and the target classes (X_train, X_test, y_train, y_test, df, data.target_names).

Let’s run the following code to load the datasets and display them on the console.

Get hands-on with 1200+ tech skills courses.