The best model varies by dataset but commonly includes logistic regression, decision trees, random forests, and gradient boosting machines.
Develop model for customer churn prediction using decision tree
Key takeaways
Customer churn prediction helps businesses identify which customers are likely to stop using their products or services. This is done by analyzing past behavior and patterns in customer data to understand signs of potential churn.
By predicting churn, businesses can take proactive measures like targeted campaigns and personalized offers to retain at-risk customers, boosting overall customer satisfaction and profitability.
Machine learning models such as decision trees can efficiently predict churn, offering actionable insights into customer retention strategies, while model tuning can further improve prediction accuracy and business outcomes.
Customer churn prediction involves identifying individuals who may discontinue their usage of a product or service. This is achieved by analyzing past customer data to recognize patterns and behaviors indicating potential churn. By utilizing machine learning algorithms, businesses can predict which customers are at risk of churning. The objective is to implement preemptive measures, like targeted marketing campaigns and personalized offers, to retain customers and enhance satisfaction, thereby bolstering business profitability.
Step-by-step guide
We'll develop a model which involves several steps.
Initialize DecisionTreeClassifier
We import the DecisionTreeClassifier from scikit-learn and train_test_split for data splitting, then initialize a DecisionTreeClassifier object, and finally display the first few rows of the DataFrame df.
from sklearn.tree import DecisionTreeClassifierfrom sklearn.model_selection import train_test_splitdectree=DecisionTreeClassifier()df.head()
Split data into training and testing sets
We split the dataset into features (X) and the target variable (y), then further split the data into training and testing sets using a 70-30 split ratio. The test_size=0.3 indicates that 30% of the data will be used for testing and the remaining 70% for training. It fits the DecisionTreeClassifier model to the training data and subsequently makes predictions on the test data, storing the predictions in the variable dectree_predict.
X=df.drop('Exited',axis=1)y=df['Exited']X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=101)dectree.fit(X_train,y_train)# Making predictionsdectree_predict=dectree.predict(X_test)
Evaluate classifier performance
We compute and print a classification report, which includes precision, recall, F1-score, and support for each class, based on the predictions made by the decision tree model (dectree_predict) on the test data (y_test). Additionally, we calculate and print the accuracy and f1_score for the test set predictions.
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score,f1_scoreprint(f" Classification report :\n {classification_report(y_test,dectree_predict)}")print("Accuracy (Test Set): %.2f" % accuracy_score(y_test, dectree_predict))print("F1-Score (Test Set): %.2f" % f1_score(y_test, dectree_predict))
Visualize confusion matrix
We create a DataFrame matrix_df containing the confusion matrix computed from the predictions (dectree_predict) and the actual labels (y_test). It then plots the confusion matrix as a heatmap using Seaborn, annotating the cell values with the actual counts. The title, x-axis label, and y-axis label are set accordingly, and the plot is displayed.
matrix_df = pd.DataFrame(confusion_matrix(y_test,dectree_predict))#plot the resultax = plt.axes()sns.set(font_scale=1.3)plt.figure(figsize=(10,7))sns.heatmap(matrix_df, annot=True, fmt="g", ax=ax, cmap="magma")#set axis titlesax.set_title('Confusion Matrix - Decision Tree')ax.set_xlabel("Predicted label", fontsize =15)ax.set_ylabel("True Label", fontsize=15)plt.show()
Tuning the parameters
We initialize a new DecisionTreeClassifier with specified hyperparameters (criterion='entropy', min_samples_split=10, min_samples_leaf=6, max_features='sqrt', random_state=1), train it on the training data, and make predictions on the test data. Then, we print the classification report, confusion matrix, accuracy, and F1-score for the new decision tree classifier (dectreeclasfier_new).
dectreeclasfier_new = DecisionTreeClassifier(criterion = 'entropy', min_samples_split = 10, min_samples_leaf = 6 , max_features = 'sqrt', random_state = 1)dectreeclasfier_new.fit(X_train,y_train)dectreeclasfier_predict=dectreeclasfier_new.predict(X_test)print(f" Classification report :\n {classification_report(y_test,dectreeclasfier_predict)}")print(f" Confusion Matrix :\n {confusion_matrix(y_test,dectreeclasfier_predict)}")print("Accuracy (Test Set): %.2f" % accuracy_score(y_test, dectreeclasfier_predict))print("F1-Score (Test Set): %.2f" % f1_score(y_test, dectreeclasfier_predict))
Try it yourself
Click the "Run" button and then click the link provided under the "Run" button to open the Jupyter Notebook.
Please note that the notebook cells have been pre-configured to display the outputs for your convenience and to facilitate an understanding of the concepts covered. You are encouraged to actively engage with the material by changing the variable values.
Frequently asked questions
Haven’t found what you were looking for? Contact Us
What is the best model for customer churn prediction?
What regression model is used for churn prediction?
What is a decision tree?
Free Resources