ISO 9001:2015

Identifying Key Predictors of Loan Fraud: A Machine Learning Approach

Shama Rani & Prof. Anil Kumar Mittal

Loan fraud detection is a critical task for financial institutions to minimize risks and operational costs. This study revisits a comparative analysis of feature importance across four supervised machine learning models — Decision Tree, Random Forest, Logistic Regression, and XGBoost   — trained on a cleaned dataset of 32,580 loan applications. Key preprocessing steps included encoding categorical variables, handling missing values, and ensuring consistent feature sets across models. Each model was trained to predict the Current Loan Status as the dependent variable, and feature importance was systematically extracted and compared. The results indicate that ‘Historical Default’, ‘Loan Grade’, and ‘Credit History Length’ consistently emerged as top predictors across multiple models. Logistic Regression demonstrated the highest sensitivity to all input features, while tree-based models such as Random Forest, Decision Tree, and XGBoost highlighted specific high-impact variables, particularly historical default and loan grade. These findings provide a clearer understanding of the critical determinants of loan fraud and support the use of supervised learning models for effective fraud detection. These results align with the study’s objective by highlighting key predictive features, enabling more effective model selection and targeted fraud prevention strategies in financial institutions.

Rani, S., & Mittal, A. (2025). Identifying Key Predictors of Loan Fraud: A Machine Learning Approach. International Journal of Global Research Innovations & Technology, 03(04), 31–35. https://doi.org/10.62823/ijgrit/03.04.8120

DOI:

Article DOI: 10.62823/IJGRIT/03.04.8120

DOI URL: https://doi.org/10.62823/IJGRIT/03.04.8120


Download Full Paper:

Download