E-payments have made transactions easier for users, but on the other hand, they have also increased fraudulent activities. Online payment fraud can happen anytime. Hence, credit card companies need to keep track of fraudulent transactions. Monitoring fraudulent transactions can help detect fraud in the future so that users don’t get themselves involved in unwanted transactions. Fraud in performing transactions can lead to significant loss and, therefore, must be monitored to avoid loss.
We’ll use machine learning prediction algorithms to analyze the history of fraudulent transactions to classify future fraudulent transactions. The dataset has certain metrics, and the uniformity in the fraudulent and current transaction metrics can help classify fraudulent transactions.
The dataset used to classify the fraudulent transaction contains a history of transactions with a mix of authentic and fraudulent transactions. The authenticity of the transaction is calculated from the relation between the available balance and the used balance. The metrics used for the classification of the transaction are amount, name of customer starting the transaction and recipient, old and new bank balance of the customer and recipient, and the type of online transaction. Next, the machine learning algorithm sets the value of the metric isFraud
to highlight the fraudulent transaction. The training data is stored in transaction_sample_logs.csv
.
The detection process contains a series of steps, starting from installing the dependencies, training the model, and using the trained model to detect the authenticity of the transaction performed. Here is a step-by-step process for fraud detection:
Installing dependencies: To perform the fraud detection process, certain dependencies are required. In Python3, we use pip3 to install the required dependencies. For this specific process, we require numpy
, pandas
, and scikit_learn
. To install the dependencies, we use the following command:
pip3 install numpypip3 install pandaspip3 install scikit_learn
Importing: The next step is to import libraries in the Python code. To import the dependencies, use the following statements:
import pandas as pdfrom sklearn.model_selection import train_test_splitimport numpy as np
In the code:
Line 1: Import the pandas
library to read the dataset for training the model.
Line 2: Import the train_test_split
model to split the dataset into two categories: testing and training.
Line 3: Import the numpy
library to use arrays in your code.
Dataset usage: For model training, it is important to read the dataset and divide it among training and test sets. Before dividing, we must identify our value to identify, y
and the value used for classification, x
. We use multiple metrics as input to classify the transaction as fraudulent and set the value of the output metric, isFraud
. To perform the process, we use the following code:
data = pd.read_csv('transaction_sample_logs.csv')data_type = {"CASH_OUT":1,"PAYMENT":2,"CASH_IN":3,"TRANSFER":4,"DEBIT":5}data["type"] = data["type"].map(data_type)fraud_valid = {0: "Not Fraud",1: "Fraud"}data["isFraud"] = data["isFraud"].map(fraud_valid)x = np.array(data[["type","amount","oldbalanceOrg", "newbalanceOrig"]])y = np.array(data["isFraud"]
In the code:
Line 1: Read the dataset from the .csv
file, transaction_sample_log.csv
.
Line 2-8: Map the values of the variable type
to float values. It maps 1
, 2
, 3
, 4
, and 5
to CASH_OUT
, PAYMENT
, CASH_IN
, TRANSFER
, and DEBIT
, respectively.
Line 9-13: Map the values of the variable isFraud
to float values. It maps 0
, and 1
to the Not Fraud
and Fraud
.
Line 15: Define the input array as x
that uses variables from the dataset as a metric.
Line 16: Create the output variable, y
, with the data from the isFraud
column.
Model training: Divide the dataset among testing and training sets. The training set is used to train the model on transaction history, and the testing set is used to test the classification using the trained model. The model used is DecisionTreeClassifier()
and is trained using the training set. Using a decision tree for classification helps in the efficient detection of fraudulent transactions.
from sklearn.tree import DecisionTreeClassifierxtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)model = DecisionTreeClassifier()model.fit(xtrain, ytrain)
In the code:
Line 1: Import the classifier DecisionTreeClassifier()
from scikit-learn
.
Line 2: Use the function train_test_split
to divide the dataset so that 10 percent is in the testing set to test the model and the rest 90 percent is in the training set to train the model.
Line 3: Define the DecisionTreeClassifier()
model.
Line 4: Use the model defined to train on the dataset.
Now, the model is trained and is ready to perform the classification of transactions. The model is used to determine the value of the output variable, y
, which points to whether the transaction is fraud or not fraud.
Testing example: Here, we can test the accuracy of the model using:
print(model.score(xtest,ytest))
The ideal value for accuracy is 1.0
pointing to a 100 percent accurate model for the classification of transactions. The goal is to get the accuracy as close to 100 percent as possible. The metrics
array is used to hold two sample inputs, representing transactions and the fraud
array is used to display the classification value of isFraud
metric.
metrics = np.array([[[4, 9000.6,9000.0, 0.6]],[[1, 9000.6,900.6, 8100.0]]])fraud = model.predict(metrics[0])print("The case is = ",fraud[0])fraud = model.predict(metrics[1])print("The case is = ",fraud[0])
In the code:
Line 1: Define the metrics
array with two sample transactions, one pointing to fraudulent and the other pointing to non-fraudulent transactions.
Line 2: Use the predict
function of the model to predict the value of isFraud
variable for the first transaction.
Line 3: Print the value of the isFraud
variable for the first transaction.
Line 5: Use the predict
function of the model to predict the value of isFraud
variable for the second transaction.
Line 6: Print the value of the isFraud
variable for the second transaction.
We can add further values in the metric
array representing sample transactions. The transactions are represented in the format:
["type","amount","oldbalanceOrg", "newbalanceOrig"]
The fraud
variable will have a value of either Fraud
or Not Fraud
, indicating the model classification of the transaction.
The running example of the following algorithm is shown below. Run and navigate to the working model to test your custom data:
import React from 'react'; require('./style.css'); import ReactDOM from 'react-dom'; import App from './app.js'; ReactDOM.render( <App />, document.getElementById('root') );
Free Resources