How to perform e-payments fraud detection with machine learning

E-payments have made transactions easier for users, but on the other hand, they have also increased fraudulent activities. Online payment fraud can happen anytime. Hence, credit card companies need to keep track of fraudulent transactions. Monitoring fraudulent transactions can help detect fraud in the future so that users don’t get themselves involved in unwanted transactions. Fraud in performing transactions can lead to significant loss and, therefore, must be monitored to avoid loss.

We’ll use machine learning prediction algorithms to analyze the history of fraudulent transactions to classify future fraudulent transactions. The dataset has certain metrics, and the uniformity in the fraudulent and current transaction metrics can help classify fraudulent transactions.

Defining the dataset

The dataset used to classify the fraudulent transaction contains a history of transactions with a mix of authentic and fraudulent transactions. The authenticity of the transaction is calculated from the relation between the available balance and the used balance. The metrics used for the classification of the transaction are amount, name of customer starting the transaction and recipient, old and new bank balance of the customer and recipient, and the type of online transaction. Next, the machine learning algorithm sets the value of the metric isFraud to highlight the fraudulent transaction. The training data is stored in transaction_sample_logs.csv.

Fraud detection process

The detection process contains a series of steps, starting from installing the dependencies, training the model, and using the trained model to detect the authenticity of the transaction performed. Here is a step-by-step process for fraud detection:

Installing dependencies: To perform the fraud detection process, certain dependencies are required. In Python3, we use pip3 to install the required dependencies. For this specific process, we require numpy, pandas, and scikit_learn. To install the dependencies, we use the following command:

In the code:

Line 1: Import the pandas library to read the dataset for training the model.
Line 2: Import the train_test_split model to split the dataset into two categories: testing and training.
Line 3: Import the numpy library to use arrays in your code.

Dataset usage: For model training, it is important to read the dataset and divide it among training and test sets. Before dividing, we must identify our value to identify, y and the value used for classification, x. We use multiple metrics as input to classify the transaction as fraudulent and set the value of the output metric, isFraud. To perform the process, we use the following code:

In the code:

Line 1: Read the dataset from the .csv file, transaction_sample_log.csv.
Line 2-8: Map the values of the variable type to float values. It maps 1, 2, 3, 4, and 5 to CASH_OUT, PAYMENT, CASH_IN, TRANSFER, and DEBIT, respectively.
Line 9-13: Map the values of the variable isFraud to float values. It maps 0, and 1 to the Not Fraud and Fraud.
Line 15: Define the input array as x that uses variables from the dataset as a metric.
Line 16: Create the output variable, y, with the data from the isFraud column.

Model training: Divide the dataset among testing and training sets. The training set is used to train the model on transaction history, and the testing set is used to test the classification using the trained model. The model used is DecisionTreeClassifier() and is trained using the training set. Using a decision tree for classification helps in the efficient detection of fraudulent transactions.

In the code:

Line 1: Import the classifier DecisionTreeClassifier() from scikit-learn.
Line 2: Use the function train_test_split to divide the dataset so that 10 percent is in the testing set to test the model and the rest 90 percent is in the training set to train the model.
Line 3: Define the DecisionTreeClassifier() model.
Line 4: Use the model defined to train on the dataset.

Now, the model is trained and is ready to perform the classification of transactions. The model is used to determine the value of the output variable, y, which points to whether the transaction is fraud or not fraud.

Testing example: Here, we can test the accuracy of the model using:

In the code:

Line 1: Define the metrics array with two sample transactions, one pointing to fraudulent and the other pointing to non-fraudulent transactions.
Line 2: Use the predict function of the model to predict the value of isFraud variable for the first transaction.
Line 3: Print the value of the isFraud variable for the first transaction.
Line 5: Use the predict function of the model to predict the value of isFraud variable for the second transaction.
Line 6: Print the value of the isFraud variable for the second transaction.

We can add further values in the metric array representing sample transactions. The transactions are represented in the format:

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources