Explanation
Line 1–2: Firstly, we import the necessary modules and functions. The xgb module and the function load_iris from scikit-learn’s datasets module to load the famous Iris dataset.
Line 3–4: Next, we import the train_test_split function from scikit-learn’s model_selection module to split the dataset into training and test sets, and the accuracy_score and classification_report functions from scikit-learn’s metrics module to evaluate the model’s performance.
Line 7: Now, we load the Iris dataset using load_iris() and storing it in the data variable.
Line 8: We separate the features X and target labels y from the loaded dataset in this line.
Line 11: Here, we split the data into training and test sets using train_test_split. It takes the features X and target labels y as input and splits them. The test set size is 0.2, which makes 20% of the whole dataset, and the random state is 42 to provide consistency.
Line 14: We create an XGBoost classifier using the XGBClassifier class with default hyperparameters.
Line 17: We train the XGBoost classifier on the training data X_train, y_train using the fit method.
Line 20: Next, we predict target labels on the test set X_test using our trained model and the predict method.
Line 23: Moving on, we calculate the model’s accuracy by comparing the predicted target labels predictions with the true target labels from the test set y_test.
Line 25–27: Finally, we print the model’s accuracy on the test set and the classification report, which contains precision, recall, F1-score, and support for each class in the Iris dataset. Instead of numerical numbers, the target names are given to show the class labels or species names.
Output
Upon execution, the code will show the model’s accuracy on the test set and the detailed classification report with precision, recall, F1-score, and support for each class.
The output shows that the model achieved an accuracy of 100%, meaning it correctly classified all samples. The precision, recall, and F1-score are also perfect, i.e., 1.00 for each class, indicating that the model predicted each class without any mistakes. This result shows that the model performed exceptionally well on this dataset.
Conclusion
To conclude, XGBoost is a powerful library for machine learning tasks, especially classification. It offers high-performance and regularization strategies that make it suitable for various applications. Using XGBoost’s capabilities, we obtained 100% or 1.0 accuracy in classifying Iris flowers into their respective species. XGBoost’s versatility and efficiency are potent tools for various real-world classification problems.
If you’re curious to learn more about how XGBoost is used in machine learning, check out these helpful resources: