The best classifier - A real Machine Learning project

Erickson Figueroa

Data & Database Analyst | SQL Server Specialist | Data Science | ML | AI | Python | Power BI | Tableau | Azure

Published Dec 9, 2020

Hi there, this is my final project in the course "Machine Learning with Python" from the "IBM Data Science Professional Certificate". It's was the last course in the program but I need to complete the "Data Science Capstone" to get the professional certificate, so I'll try to explain to you what this project is.

The project consists of a Banking problem to predict if a customer going to pay or not his loan based on a historical dataset that has 346 customers whose loans are already paid off or defaulted (When the loan falls in collection status). Let me show you an example of this data set:

This is a Classification Problem. In Machine Learning a classification problem is when you need to predict the class of given data points. Classes are sometimes called target labels or categories. In our case e.g. Will be the loan paid or not?

Classification belongs to the category of Supervised Learning where the targets also provided with the input data.

I used four algorithms of this category:

K-Near Neighbour (KNN)
Decision Tree
Support Vector Machine (SVM)
Logistic Regression

Why four algorithms and not only one?

When you try to solve any machine learning problem you need to test different algorithms with the same category because you don't know which of them fit better with the problem and the dataset, so you will need to make some statistics metrics technics like:

Jaccard Index
F1-Score
Log Loss

Note: These metrics apply in classification problems algorithms. There are many metrics for each problem category in machine learning. The goal is to measure the accuracy of the algorithm and choose the better of them, the most accurate, in our case predicting the label "Loan_ status": PAIDOFF or COLLECTION.

Let me show you another sample, this time is the "Final Report" about all the metrics for each algorithm that I showed to you above.

When you have it is easier to choose the algorithm for your problem. For know Don't worry about the values, codes, libraries, used in this project because the important thing is to understand all the process from beginning to end.

Remember the entire process to solve a machine learning problem:

Identify the data
Prepare the data
Select the algorithm
Train the algorithm
Evaluates
Deployment it
Predict
Evaluate predictions

So, here the public link to my final project:

I hope that helps you!

Best regards,

IBM

Way to go!

See more comments

To view or add a comment, sign in

The best classifier - A real Machine Learning project

Erickson Figueroa

Data & Database Analyst | SQL Server Specialist | Data Science | ML | AI | Python | Power BI | Tableau | Azure

More articles by Erickson Figueroa

Insights from the community

Others also viewed

How much Mathematics is required for Data Science - Simplified

Comprehensive Machine Learning Solution

Exploring Scikit-Learn in 10 Examples

Varshittha Chennamsetti Prospect 33 Data Lab

How to Master Scikit-learn for Data Science

These books will help you learn machine learning

An unofficial insider's guide to learning BQL on BQuant

Guide to Learning Data Science and Machine Learning

Decision Tree: Building Machine Learning Model

XG Boost Algorithm

Explore topics

More articles by Erickson Figueroa

La importancia de las matemáticas en la ciencia de datos y la inteligencia artificial

"Explorando el Corazón de la Ciencia de Datos: Un Viaje Personal a través de Matemáticas y Estadísticas"

Diferencias entre Ciencia de Datos, Análisis de Datos, Analítica de Datos, Business Intelligence y Big Data:

Métodos de implementación ETL

What can be done to deal with the problem of workers being replaced by automation and AI technologies?

Los Beneficios del Cloud Computing

The battle of the Neighborhoods

Hablemos de Machine Learning