The best classifier - A real Machine Learning project

The best classifier - A real Machine Learning project

Hi there, this is my final project in the course "Machine Learning with Python" from the "IBM Data Science Professional Certificate".  It's was the last course in the program but I need to complete the "Data Science Capstone" to get the professional certificate, so I'll try to explain to you what this project is.

The project consists of a Banking problem to predict if a customer going to pay or not his loan based on a historical dataset that has 346 customers whose loans are already paid off or defaulted (When the loan falls in collection status). Let me show you an example of this data set:

No alt text provided for this image

This is a Classification Problem. In Machine Learning a classification problem is when you need to predict the class of given data points. Classes are sometimes called target labels or categories. In our case e.g. Will be the loan paid or not?

Classification belongs to the category of Supervised Learning where the targets also provided with the input data.

I used four algorithms of this category:

  • K-Near Neighbour (KNN)
  • Decision Tree
  • Support Vector Machine (SVM)
  • Logistic Regression

Why four algorithms and not only one?

When you try to solve any machine learning problem you need to test different algorithms with the same category because you don't know which of them fit better with the problem and the dataset, so you will need to make some statistics metrics technics like:

  • Jaccard Index
  • F1-Score
  • Log Loss

Note: These metrics apply in classification problems algorithms. There are many metrics for each problem category in machine learning. The goal is to measure the accuracy of the algorithm and choose the better of them, the most accurate, in our case predicting the label "Loan_ status": PAIDOFF or COLLECTION.

Let me show you another sample, this time is the "Final Report" about all the metrics for each algorithm that I showed to you above.

No alt text provided for this image

When you have it is easier to choose the algorithm for your problem. For know Don't worry about the values, codes, libraries, used in this project because the important thing is to understand all the process from beginning to end.

Remember the entire process to solve a machine learning problem:

  • Identify the data
  • Prepare the data
  • Select the algorithm
  • Train the algorithm
  • Evaluates
  • Deployment it
  • Predict
  • Evaluate predictions

So, here the public link to my final project:

I hope that helps you!

Best regards,



To view or add a comment, sign in

More articles by Erickson Figueroa

Insights from the community

Others also viewed

Explore topics