Addressing Imbalance in Credit Card Fraud Detection Data

Over time, credit cards have experienced a notable rise in popularity as the favored choice for transactions, primarily due to their widespread acceptance and convenience. Nevertheless, the widespread adoption of credit cards has brought new challenges, with fraudulent transactions emerging as a notable issue. As the number of credit cards issued by banks continues to grow, automated fraud detection methods, including the utilization of machine learning, have been introduced as a proactive measure to address this growing concern.

This project aims to illustrate how machine learning can enhance banks' ability to identify transaction anomalies. We employed the Kaggle Credit Card Fraud Detection for this purpose. However, upon classifying the data, it became evident that the "Not Fraud" category comprises only 0.2%, making it the overwhelming majority class at over 99%.

Training the model quickly using logistic regression yields a remarkably high accuracy of 0.999 (99.9%). At first glance, one might conclude that our model is exceptionally accurate. However, considering our awareness of the highly imbalanced nature of the data, it is crucial not to take this accuracy at face value.

In the case of highly imbalanced data, we often observe that the model's accuracy is high. However, a model trained on imbalanced data tends to be biased toward the majority class (Not Fraud in this case). Consequently, the model's predictions are skewed toward the majority class.

Imagine a scenario where someone is using your credit card for a transaction, but the model fails to detect an anomaly in the transaction. Similarly, envision a model that overlooks a customer about to churn or fails to identify cardiovascular disease. In such cases, the limitations of relying solely on accuracy become apparent.

Evaluating Models Trained on Imbalanced Data Using Precision, Recall and F1-Score

Accuracy is frequently used as a metric to evaluate a model. It represents the ratio of the total number of correct predictions to the total number of predictions. As mentioned earlier, the trained model achieved 99.9% accuracy. But how did the model achieve this level of accuracy? Let's plot the confusion matrix,

True Negative is 56,855
False Positive is 9
False Negative is 41
True Positive is 57

The model correctly predicts class 0 (negative) in 56,855 samples but fails to do so when the actual class is 1 (positive) in 41 samples. Additionally, the model falsely predicts class 1 when the actual class is 0 in 9 samples, while it correctly predicts class 1 in 57 samples. If accuracy is denoted as,

Then it becomes,

After looking at it, we realize that the model tends to favor majority class which is made of 99% of total sample. As mentioned earlier in the above section, the model tends to predict the majority class hence for this case, we cannot simply use accuracy as evaluation metrics. Hence we need others metrics, which in this case we will look at, precision, recall and F1-Score that give us,

Precision 0.86
Recall 0.58
f1 score is 0.69

The follow-up question now is: which metrics should be used? It's hard to answer specifically since different metrics are influenced by different businesses or problems. For instance, correctly classifying patients with cancer is much more important than not, so we can consider using recall. On the other hand, banks might find it better to predict more non-default credit card users and risk losing potential customers due to denial; in this case, precision might be suitable.

Addressing Imbalanced Data Through Class Weight Adjustment

In this particular project, we will explore methods for addressing imbalanced data through algorithmic modification (using the Scikit-learn library). For the machine learning model itself, our emphasis will be on logistic regression, with the addition of XGBoost for the purpose of comparison. The approach involves simply incorporating different class weights into the cost function of the algorithm.

Cost function is defined as measurement of how well a machine learning model is performing in terms of its ability to predict the desired output. The cost function quantifies the difference between the predicted values and the actual values, providing a numerical value that the optimization algorithm seeks to minimize during the training process.

In logistic regression, we utilize log loss as the cost function, defined by the formula,

With Class Weight defined by the formula,

By adding class weight into the function, the log loss becomes,

This modified log loss incorporates the class weights wo and w1 to adjust the contribution of each instance to the overall loss. The goal is to penalize misclassifications of the minority class (positive class) more heavily, addressing the imbalance in the dataset.

Implementing Class Weights in Python

Scikit Learn provides us a way to adjust class weight through arguments class_weight. When not stated class_weight by default for all class_weight is equal to 1. When you set class_weight to 'balanced', the model automatically it adjusts the weights inversely proportional to class frequencies. This means that the model gives more weight to the minority class, effectively addressing the imbalance.

def log_reg_balance(X_train_scaled, y_train, X_test_scaled, y_test):
    logreg_balance = LogisticRegression(class_weight='balanced',max_iter=1000)
    logreg_balance.fit(X_train_scaled,y_train)
    y_pred=logreg_balance.predict(X_test_scaled)
    print(confusion_matrix(y_test,y_pred))
    #plot confusion matrix
    plt.figure(figsize=(10,8))
    sns.heatmap(confusion_matrix(y_test,y_pred),annot=True)
    plt.xlabel('Predict')
    plt.ylabel('Actual')
    plt.show()

If you want to adjust manually (by giving each class its own weight), you can achieve it by using this formula,

In this case, we have 227,451 samples for class 0 and 394 samples for class 1. Therefore, w0 is set to 0.5, and w1 is set to 289.

Plug in the class weight of each class in class_weight, will achieve the same result as set class_weight to 'balanced'

def train_class_weight(X_train_scaled,y_train,X_test_scaled,y_test):
    class_weight={0:0.5,1:289}
    logreg_class_weight=LogisticRegression(class_weight=class_weight,solver='newton-cg',max_iter=1000)
    #train model
    logreg_class_weight.fit(X_train_scaled,y_train)
    #predict
    y_pred=logreg_class_weight.predict(X_test_scaled)
    #print confusion matrix
    print(confusion_matrix(y_test,y_pred))
    #plot confusion matrix
    plt.figure(figsize=(10,8))
    sns.heatmap(confusion_matrix(y_test,y_pred),annot=True)
    plt.xlabel('Predict')
    plt.ylabel('Actual')
    plt.show()

The updated model, after adjusting the class_weight, shows an improvement in terms of correctly predicting true positives, as observed from this confusion matrix,

As evident, the number of true positives increases from 57 samples to 90 samples. However, conversely, the count of false negatives also rises significantly, escalating from 9 samples to 1,300 samples. This suggests that the model now places more emphasis on the minority class, even though it incorrectly predicts class 1 when the reality is class 0, resulting in an increase in false positives. This implies a trade-off between sensitivity and specificity, which is a common observation when dealing with imbalanced datasets and adjusting class weights. The changes in the number of true positives, false positives, true negatives, and false negatives automatically update the precision, recall, and F1 score, respectively.

Precision 0.06
Recall 0.92
f1 score is 0.12

Another approach involves determining optimal weights through hyperparameter tuning, specifically employing grid search. This process utilizes fractions, such as 0:0.1 and 1:0.99, to represent the assigned weights for each class,

def param_log_reg(X_train_scaled, y_train, X_test_scaled, y_test):
    logreg = LogisticRegression(solver='newton-cg', max_iter=1000)
    # create class weight
    weights = {'class_weight': [{0: x, 1: 1.0 - x} for x in np.linspace(0.0, 0.99, 200)]}
    grid_search = GridSearchCV(logreg, weights, cv=5, scoring='f1', n_jobs=-1)
    # Perform the grid search
    grid_search.fit(X_train_scaled, y_train)

    # Get the best model from the search
    best_model = grid_search.best_estimator_

    # Evaluate the best model on the test set
    y_pred = best_model.predict(X_test_scaled)

    # Print confusion matrix
    print(confusion_matrix(y_test, y_pred))
    # Plot confusion matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(confusion_matrix(y_test, y_pred), annot=True)
    plt.xlabel('Predict')
    plt.ylabel('Actual')
    plt.show()

Hyperparameter tuning resulted in the optimal class weights are 0:0.095, 1:0.905, or class 1 is approximately 9.59 higher than class 0.

The results are intriguing as the number of true positives increases to 79 samples, while the false positives also increase, albeit not as significantly as when using class_weight='balanced'. The false negatives decrease to 19. The updated precision, recall, and F1 score are now as follows:

Precision: 0.79
Recall: 0.81
F1 Score: 0.80

This approach, involving hyperparameter tuning to find optimal class weights, seems to offer a more controlled and effective way to balance the trade-off between sensitivity and specificity compared to the automatic adjustment with 'balanced' class weights. The model in the second statement achieves better precision, recall, and F1 score, indicating a more balanced performance in handling imbalanced datasets.

Cost-Sensitive Learning

In credit card fraud detection, failing to classify the minority class (fraud) is more detrimental than incorrectly classifying actual observations of the majority class (non-fraud). This implies that when we fail to classify fraud, the consequences are more costly than incorrectly classifying non-fraudulent transactions.

In cost-sensitive learning, a common heuristic approach involves using the imbalance ratio, which is defined as the number of samples in the minority class divided by the number of samples in the majority class.

The other approach is to strongly penalize mistakes in fraud classification. This is achieved by assigning a higher cost for misclassifying non-fraud transactions. This encourages the model to prioritize correct predictions for the minority class.

For example we can say that the cost of misclassifying a fraudulence (Class 1) as legitimate (Class 0) is 10 times higher than the cost of misclassifying a legitimate transaction as fraudulent, then we can set the class weight as follow,

class_weight = {0: 1, 1: 10}  # Assuming class 1 (fraudulent) is 10 times more important

Here, the weight for Class 1 is set to 10, indicating that misclassifying a fraudulent transaction carries 10 times the penalty compared to misclassifying a legitimate transaction.

By using these class weights in logistic regression, the model is encouraged to pay more attention to correctly classifying fraudulent transactions, even if they are fewer in number. This is a form of cost-sensitive learning where the model's training is influenced by the specified class weights.

def cons_learn_logreg(X_train_scaled, y_train, X_test_scaled, y_test):
    class_weight={0:1,1:10}
    logreg = LogisticRegression(class_weight=class_weight, solver='newton-cg', max_iter=1000)
    logreg.fit(X_train_scaled, y_train)
    y_pred = logreg.predict(X_test_scaled)
    print(confusion_matrix(y_test, y_pred))
    # Plot confusion matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(confusion_matrix(y_test, y_pred), annot=True)
    plt.xlabel('Predict')
    plt.ylabel('Actual')
    plt.show()

Adjusting Class Weight In XGBoost

For comparison, we also include modifying class weight in XGBoost. In XGBoost, you can utilize the scale_pos_weight parameter to address imbalanced class distribution. This parameter plays a role similar to adjusting class weights in logistic regression, where the goal is to mitigate the impact of imbalanced classes on the model's training process.

In logistic regression, you can assign different weights to classes using the class_weight parameter. This allows the algorithm to give more importance to the minority class during optimization. Similarly, in XGBoost, the scale_pos_weight parameter serves a comparable purpose by adjusting the weights assigned to positive instances. The value of scale_pos_weight is typically set based on the ratio of negative to positive instances in the dataset.

Both methods aim to address the challenge of imbalanced data by influencing the learning algorithm to pay more attention to the minority class. However, the specific implementation details and parameter tuning processes may differ between logistic regression and XGBoost.

It's worth noting that while logistic regression directly incorporates class weights during optimization, XGBoost, being a tree-based ensemble method, uses boosting to adaptively assign weights to instances during each iteration. Cross-validation is often employed in both cases to fine-tune the weight parameters for optimal model performance.

def param_xgb(X_train,y_train,X_test,y_test):
    # Calculate class weights
    class_counts = y_train.value_counts()
    scale_pos_weight = class_counts[0] / class_counts[1]

    # Set up XGBoost classifier with scale_pos_weight
    clf = XGBClassifier(scale_pos_weight=scale_pos_weight)

    # Train model
    clf.fit(X_train, y_train)

    # Predict
    y_pred = clf.predict(X_test)

    print(confusion_matrix(y_test, y_pred))
    # Plot confusion matrix
    plt.figure(figsize=(10, 8))
    sns.heatmap(confusion_matrix(y_test, y_pred), annot=True)
    plt.xlabel('Predict')
    plt.ylabel('Actual')
    plt.show()

Now we can use a confusion matrix to compare with the methods discussed in the previous sections, specifically focusing on logistic regression.

And respective precision, recall and f1 score,

Precision: 0.96
Recall: 0.83
F1 Score: 0.90

Summary

Both XGBoost and logistic regression with hyperparameter tuning offer a more controlled and effective way to balance the trade-off between sensitivity and specificity compared to automatic adjustment with 'balanced' class weights. The model in the second statement achieves better precision, recall, and F1 score, indicating a more balanced performance in handling imbalanced datasets. However, the decision remains dependent on business requirements. For instance, if correctly identifying the positive class is more crucial and the cost of misclassifying the negative class when it's actually the positive class is higher, then a cost-sensitive approach by setting class_weight = balanced could be a good option as well.

Some points that we can offer as follow:

XGBoost and Logistic Regression with Hyperparameter Tuning:Offer a more controlled and effective way to balance sensitivity and specificity.Achieve better precision, recall, and F1 score, indicating a more balanced performance with imbalanced datasets.
'Balanced' Class Weights:Automatic adjustment with 'balanced' class weights may not perform as well in achieving a balance between precision, recall, and F1 score.
Decision Considerations:The decision on which approach to choose depends on business requirements.For instance, if correctly identifying the positive class is more important and misclassifying the negative class as positive is costlier, a cost-sensitive approach by setting class_weight = balanced might be a good option.

You can access the source code here: Credit Card Fraud Detection Source

And I love to share these great reading here

Fraud Detection Handbook

Cost Sensitive Learning for Imbalanced Classification

Improve Class Imbalance in Machine Learning

Addressing Imbalance in Credit Card Fraud Detection Data

Rian Rachmanto

Subsurface Domain Machine Learning Engineer || Integrating A.I/ML with oil and gas operation, bringing data driven decision making and streamlining analysis process.

Evaluating Models Trained on Imbalanced Data Using Precision, Recall and F1-Score

Addressing Imbalanced Data Through Class Weight Adjustment

Implementing Class Weights in Python

Recommended by LinkedIn

Cost-Sensitive Learning

Adjusting Class Weight In XGBoost

Summary

More articles by this author

Insights from the community

Others also viewed

AI Revolutionizes Finance: Stopping Fraud & Maximizing Returns

There’s no silver bullet for fraud prevention

The Invisible Shield

2025 Global Payments & Fraud Survey Open, Mastercard Master Class, and More!

Fraud prevention: holistic & integrated approach

How AI is Already Protecting Your Payments

FT Partners Advises NeuroID on its Sale to Experian

The biggest identity market is the one you’ve never heard of

Outsmarting Scammers: The Power of AI & ML in Banking Fraud Prevention

Results obtained with machine learning models to predict credit card fraud

Explore topics

Evaluating Models Trained on Imbalanced Data Using Precision, Recall and F1-Score

Addressing Imbalanced Data Through Class Weight Adjustment

Implementing Class Weights in Python

Recommended by LinkedIn

Cost-Sensitive Learning

Adjusting Class Weight In XGBoost

Summary

Building Machine Learning Products Using Lending Club Case

Jul 9, 2023

Insights from the community

Others also viewed

AI Revolutionizes Finance: Stopping Fraud & Maximizing Returns

There’s no silver bullet for fraud prevention

The Invisible Shield

2025 Global Payments & Fraud Survey Open, Mastercard Master Class, and More!

Fraud prevention: holistic & integrated approach

How AI is Already Protecting Your Payments

FT Partners Advises NeuroID on its Sale to Experian

The biggest identity market is the one you’ve never heard of

Outsmarting Scammers: The Power of AI & ML in Banking Fraud Prevention

Results obtained with machine learning models to predict credit card fraud

Explore topics