How to perform e-payments fraud detection with machine learning

E-payments have made transactions easier for users, but on the other hand, they have also increased fraudulent activities. Online payment fraud can happen anytime. Hence, credit card companies need to keep track of fraudulent transactions. Monitoring fraudulent transactions can help detect fraud in the future so that users don’t get themselves involved in unwanted transactions. Fraud in performing transactions can lead to significant loss and, therefore, must be monitored to avoid loss.

We’ll use machine learning prediction algorithms to analyze the history of fraudulent transactions to classify future fraudulent transactions. The dataset has certain metrics, and the uniformity in the fraudulent and current transaction metrics can help classify fraudulent transactions.

Defining the dataset

The dataset used to classify the fraudulent transaction contains a history of transactions with a mix of authentic and fraudulent transactions. The authenticity of the transaction is calculated from the relation between the available balance and the used balance. The metrics used for the classification of the transaction are amount, name of customer starting the transaction and recipient, old and new bank balance of the customer and recipient, and the type of online transaction. Next, the machine learning algorithm sets the value of the metric isFraud to highlight the fraudulent transaction. The training data is stored in transaction_sample_logs.csv.

Fraud detection process

The detection process contains a series of steps, starting from installing the dependencies, training the model, and using the trained model to detect the authenticity of the transaction performed. Here is a step-by-step process for fraud detection:

  1. Installing dependencies: To perform the fraud detection process, certain dependencies are required. In Python3, we use pip3 to install the required dependencies. For this specific process, we require numpy, pandas, and scikit_learn. To install the dependencies, we use the following command:

pip3 install numpy
pip3 install pandas
pip3 install scikit_learn
  1. Importing: The next step is to import libraries in the Python code. To import the dependencies, use the following statements:

import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np

In the code:

  • Line 1: Import the pandas library to read the dataset for training the model.

  • Line 2: Import the train_test_split model to split the dataset into two categories: testing and training.

  • Line 3: Import the numpy library to use arrays in your code.

  1. Dataset usage: For model training, it is important to read the dataset and divide it among training and test sets. Before dividing, we must identify our value to identify, y and the value used for classification, x. We use multiple metrics as input to classify the transaction as fraudulent and set the value of the output metric, isFraud. To perform the process, we use the following code:

data = pd.read_csv('transaction_sample_logs.csv')
data_type = {
"CASH_OUT":1,
"PAYMENT":2,
"CASH_IN":3,
"TRANSFER":4,
"DEBIT":5
}
data["type"] = data["type"].map(data_type)
fraud_valid = {
0: "Not Fraud",
1: "Fraud"
}
data["isFraud"] = data["isFraud"].map(fraud_valid)
x = np.array(data[["type","amount","oldbalanceOrg", "newbalanceOrig"]])
y = np.array(data["isFraud"]

In the code:

  • Line 1: Read the dataset from the .csv file, transaction_sample_log.csv.

  • Line 2-8: Map the values of the variable type to float values. It maps 1, 2, 3, 4, and 5 to CASH_OUT, PAYMENT, CASH_IN, TRANSFER, and DEBIT, respectively.

  • Line 9-13: Map the values of the variable isFraud to float values. It maps 0, and 1 to the Not Fraud and Fraud.

  • Line 15: Define the input array as x that uses variables from the dataset as a metric.

  • Line 16: Create the output variable, y, with the data from the isFraud column.

  1. Model training: Divide the dataset among testing and training sets. The training set is used to train the model on transaction history, and the testing set is used to test the classification using the trained model. The model used is DecisionTreeClassifier() and is trained using the training set. Using a decision tree for classification helps in the efficient detection of fraudulent transactions.

from sklearn.tree import DecisionTreeClassifier
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
model = DecisionTreeClassifier()
model.fit(xtrain, ytrain)

In the code:

  • Line 1:  Import the classifier DecisionTreeClassifier() from scikit-learn.

  • Line 2: Use the function train_test_split to divide the dataset so that 10 percent is in the testing set to test the model and the rest 90 percent is in the training set to train the model.

  • Line 3: Define the  DecisionTreeClassifier() model.

  • Line 4: Use the model defined to train on the dataset.

Now, the model is trained and is ready to perform the classification of transactions. The model is used to determine the value of the output variable, y, which points to whether the transaction is fraud or not fraud.

  1. Testing example: Here, we can test the accuracy of the model using:

print(model.score(xtest,ytest))

The ideal value for accuracy is 1.0 pointing to a 100 percent accurate model for the classification of transactions. The goal is to get the accuracy as close to 100 percent as possible. The metrics array is used to hold two sample inputs, representing transactions and the fraud array is used to display the classification value of isFraud metric.

metrics = np.array([[[4, 9000.6,9000.0, 0.6]],[[1, 9000.6,900.6, 8100.0]]])
fraud = model.predict(metrics[0])
print("The case is = ",fraud[0])
fraud = model.predict(metrics[1])
print("The case is = ",fraud[0])

In the code:

  • Line 1:  Define the metrics array with two sample transactions, one pointing to fraudulent and the other pointing to non-fraudulent transactions.

  • Line 2: Use the predict function of the model to predict the value of isFraud variable for the first transaction.

  • Line 3: Print the value of the isFraud variable for the first transaction.

  • Line 5: Use the predict function of the model to predict the value of isFraud variable for the second transaction.

  • Line 6: Print the value of the isFraud variable for the second transaction.

We can add further values in the metric array representing sample transactions. The transactions are represented in the format:

["type","amount","oldbalanceOrg", "newbalanceOrig"]

The fraud variable will have a value of either Fraud or Not Fraud, indicating the model classification of the transaction.

The running example of the following algorithm is shown below. Run and navigate to the working model to test your custom data:

import React from 'react';
require('./style.css');

import ReactDOM from 'react-dom';
import App from './app.js';

ReactDOM.render(
  <App />, 
  document.getElementById('root')
);
The execution of e-payments fraud detection in Python

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved