A Novel Approach to Heart Failure Prediction and Classification through Advanced Deep Learning Model ()
1. Introduction
Cardiovascular diseases or CVDs, such as heart attacks or heart failure is one of the major factors of death globally. It is estimated by the World Health Organization (WHO) that over 17.9 millions died from cardiovascular-related diseases every year, representing 32% of total deaths in the world [2] . This constitutes a substantial portion of the worldwide mortality rate, underscoring the importance of fighting and preventing heart diseases. These conditions encompass irregularities in blood flow, blood vessels, and heart function disorders. The application of Data Science and Machine Learning has been extensive in the medical sector, aiding doctors in predicting and making crucial decisions using extensive datasets. This not only preserves lives but also generates improved machine-learning models through research documents stemming from these predictions.
There are number of machine learning algorithms to use to predict heart disease from a given dataset, including K-means clustering, Random Forest, logistic regression, SVM, as well as ensemble and Neural Network algorithms.
2. Summary
In the manuscript titled “A Novel Approach to Heart Failure Prediction and Classification through Advanced Deep Learning Model”. The objective of this scientific project is to utilize various machine learning techniques and algorithms to achieve better results when selecting the best features.
The successful implementation of this project will hopefully aid in detecting heart disease symptoms and categorizing it, leading to earlier diagnosis for patients and further scientific advancement.
3. Research Questions
· The aim of using standard classification algorithms such as logistic regression, Naïve Bayes, Support vector Machines and decision trees, etc., is to increase accuracy.
· The objective of utilizing deep learning algorithms is to determine if they can produce better results consistently.
4. Literature Reviews
On Pooja Anbuselvan’s paper which was published on 11-Nov-2020 [3] , she employed a range of different supervised classification algorithms to address the challenge of achieving improved results and accuracy. The algorithms tested included Logistic Regression, Naïve Bayes, Support Vector Machine, K-Nearest Neighbor, Decision Tree, Random Forest, and XGBoost. Reported accuracies ranged from 57.83% to 86.89%. I have some reservations about this paper as some of these algorithms could be omitted in favor of more complementary choices. For instance, the Decision Tree could be replaced with the Random Forest algorithm, as Random Forest is essentially a collection of Decision Tree algorithms working in tandem. Apart from the XGBoost algorithm used for the optimization process, the author did not explore other optimization alternatives. The test and training data were divided into an 80/20 ratio.
As for the scientific paper obtained from Hal Science, written by Anna Karen Gárate-Escamila, Amir Hajjam El Hassani, and Emmanuel Andres [4] , which was submitted and published on 22-Aug-2022, the authors constructively used different feature selection processes to select relevant features for the study of heart diseases. They successfully showcased the use of Chi-square and PCA techniques to reduce the number of features, and objectively arrived at a suitable dataset that could be validated by six machine learning algorithms: logistic regression, decision tree, gradient-boosted tree, multilayer perceptron, Naïve Bayes, and random forest. The authors achieved an impressive accuracy average between 98.7% to 99.4%, considering they reduced the total number of features from 74 to 17 and 13 features. The evaluation metric used to assess the performance of these six machine learning classifiers was the confusion matrix. The only drawback of this paper is that the authors did not offer a broader view of their cleaning and wrangling techniques, apart from reducing the number of features in the dataset. Additionally, their literature review and related work were not comprehensive enough to draw a conclusive assessment of the overall subject at hand. Other relatable literature reviews were studied and reviewed [5] [6] [7] [8] .
5. Dataset Description
The dataset is a multi-dimensional dataset, consisting of 76 attributes, including but not limited to age, sex, chest pain readings represented by the abbreviation ”cp”, resting blood pressure represented by the abbreviation ”trestbps”, etc. However, based on all previous experiments by other scientists who utilized this dataset, only 14 features were selected for this project. Therefore, I will use the same features to determine if better results can be achieved.
The following is a summary of the dataset description for the 14 features that will be used, presented in a table format as shown below Table 1.
Table 1. List of features & definitions—This list contains all of the features in the dataset and provides insight into the behavior of patients that either resulted in a heart attack or not.
The dataset has a shape of (1025, 14), with 14 columns representing the total number of features and 1025 rows representing the results and outcomes of 1025 patients. The data is in numerical form, facilitating its cleaning and manipulation in Python. Cleaning the dataset can be a daunting task, but in this paper, we will thoroughly examine it as part of the scientific study. The following is a screenshot taken from a Jupyter Notebook showing the dataset being displayed using a pandas DataFrame, as illustrated in Figure 1.
Figure 1. List of features on pandas data frame—As shown above, the heart disease dataset presents all the features for 1025 different cases.
6. Ethical Considerations
When it comes to the implementation of regulations and ethical reviews in AI-driven applications within the healthcare industry, there is nothing more important and interesting to discuss. It is crucial to ensure that the data and the functionality of data science-based applications operate within specific parameters of data protection and confidentiality, free from any breaches [9] . In my opinion, the following list of ethical reviews deserves the utmost attention:
· Data Protection Rules: The healthcare industry, being responsible for handling and managing data, must prioritize safeguarding it to prevent any breaches or hacking attempts.
· Data Quality: The data used in the application should be of the highest quality, ensuring it is easily accessible and usable for all relevant parties.
· Complete Transparency: Researchers, scientists, and scholars should have access to the data, enabling them to contribute to scientific advancements that can benefit society as a whole [10] .
7. Methodology & Model Evaluation
The methodology that I will follow to conduct a thorough analysis and draw a conclusion on the best model to use for the Heart Disease application is presented in the following cycle of steps:
a) Data collection
b) Data cleaning and pre-processing
c) Feature selection
d) Exploratory data analysis (EDA)
e) Model selection and training
f) Model evaluation
1) Data collection
As mentioned earlier, the data was obtained from the world-renowned website “Kaggle” [11] .
Originally the dataset was created by the below list of creators:
· Hungarian Institute of Cardiology, Budapest: Andras Janosi, M.D. [12] .
· University Hospital, Zurich, Switzerland: William Steinbrunn, M.D. [12] .
· University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D. [12] .
· V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D [12] .
Furthermore, the dataset contains a list of 76 experiments on heart disease or failure. However, only 14 attributes were selected based on the Cleveland researcher medical database approach, which are relevant for determining and predicting the presence of heart failure in a patient. Therefore, it can be concluded that the dataset represents a binary classification problem. The data in the dataset is considered to be of a well-structured dataset.
2) Data cleaning and pre-processing
All missing values were handled using various Python techniques and subsequently missing data were removed. As a result of our cleaning and processing efforts, the provided dataset now includes 14 features.
3) Feature selection
The cleaning process mentioned in point 2) led to the reduction of the number of relevant features from 76 to 14, which are the most important features used for predicting and evaluating heart failure or heart attacks.
4) Exploratory data analysis (EDA)
As we have established, the cleaned dataset contains over 1000 cases of heart readings, with the target variable output determining whether a particular patient suffered from heart failure or not. Upon examining and conducting various data analyses on the dataset, we arrived at the following findings.
We can simply establish from Figure 2 above that, on average, males suffer from heart-related diseases such as heart attacks and failures more frequently than females. However, from a gender perspective, males are less prone to heart failures than females. The graph illustrates that the number of males without heart disease exceeds 400, whereas the count for males with heart disease is around 300.
On the other hand, the data for females indicates that fewer than 100 females are without heart-related diseases, while over 200 have some form of heart failure or disease. It is important to note that the total number of males and females presented in this paper is 713 and 312, respectively.
Figure 2. Graph of gender vs target values—Total number of gender patients vs target values.
Figure 3. Graph of target values between male and female patients—Target values between heart and non-heart disease patients.
As shown in Figure 3, we can observe that the data illustrates a slightly higher number of individuals with heart disease compared to those without heart failure or disease.
Figure 4. Chest pain vs gender—A graph showing different types of chest pain between males and females.
Chest pain types consist of Typical Angina, Atypical Angina, non-anginal pain, and asymptomatic. The term “Angina” refers to normal chest pain, where “Typical (classic) angina chest pain consists of (1) Substernal chest pain or discomfort that is (2) Provoked by exertion or emotional stress and (3) relieved by rest or nitroglycerine (or both). Atypical (probable) angina chest pain applies when 2 out of 3 criteria of classic angina are present” [13] . On the other hand, non-anginal chest pain is simply non-cardiac chest pain, which consists of a feeling of heaviness and shortness of breath, often described as a painful squeezing sensation in the chest. As for asymptomatic chest pain, it is defined as a silent or absence of chest pain, known as SMI (Silent (asymptomatic) myocardial ischemia). Simply put, chest pain refers to the type of pain that occurs in the heart muscles, leading to discomfort in breathing and pain in the chest. This discomfort can be accompanied by pain in other areas of the body, such as the shoulders, neck, arms, and jaw.
Taking Figure 4 & Figure 2 into consideration and in conjunction with Figure 5 (Chest pain vs Gender), we can also establish from the figure that females experience heart failure more frequently than men.
Cholesterol is simply a fatty substance that travels through the bloodstream. There are two ways in which cholesterol can travel: high-density lipoproteins (HDL) and low-density lipoproteins (LDL). The difference between the two lies in the fact that HDL is considered good cholesterol because it picks up fatty substances and transports them to the liver for disposal. On the other hand, LDL is considered bad cholesterol because if there is an excess amount of it in the bloodstream, it can lead to the clogging of arteries, ultimately resulting in heart failures and heart attacks [14] . As shown in Figure 6, we can observe that as the cholesterol level in the bloodstream increases, the heart rate may also rise even
Figure 5. Chest pain types vs target values—A graph showing different types of chest pain vs target values.
Figure 6. Cholesterol vs heart rate—Here the graph shows patients with different cholesterol levels and their relation to heart rates.
with minimal exercise. This indicates a higher susceptibility to heart failure and heart attacks.
5) Model selection and training
The features and variables identified in the dataset are labelled with a known set of target values, indicating that we are dealing with a classification problem. Since the target values are known and discrete in nature, supervised learning algorithms such as regression algorithms, decision trees, and k-nearest neighbors algorithms are ideal for building a better supervised learning classifier to predict unseen patient data.
A data split of 30/70 was implemented, with 30% of the data allocated to the test dataset and 70% assigned to the training dataset.
To efficiently test over 25 different machine learning classifiers, a new and practical machine learning technique was employed, known as the Lazy Predict project [15] . This technique is implemented as a library model that can be installed in your system environment, such as Jupyter Notebook, VS Code, or any similar interactive development platform.
6) Model Evaluation
However, I have selected the results of six machine learning algorithms specifically for prediction and testing purposes. I also utilized the confusion matrix for model evaluation, as it is the best fit for our classification model challenge. The figure below illustrates the accuracy outcomes obtained from these chosen algorithms.
Table 2. List of machine learning algorithm results—Here is a list of the results obtained from various machine learning models for our dataset.
As observed in Table 2, these traditional machine learning algorithms have demonstrated several remarkable accuracies, averaging at 87.1%. If we exclude the Decision Tree algorithm, the average accuracy drops to 84.8%. The decision to exclude the Decision Tree algorithm was based on the fact that achieving 97% accuracy appears to be an overfitting scenario for the provided dataset.
Deep Learning/Neural Network
I have also opted to utilize a comprehensive algorithm in the realm of Deep Learning or Neural Network algorithms for the following reasons:
Deep learning has gained tremendous popularity in the field of scientific computing, and industries heavily depend on its algorithms to tackle complex problems. Deep learning algorithms employ various types of neural networks to address specific tasks [16] .
Deep learning algorithms encompass a wide range of types, with each type being suitable for specific datasets and problem statements. The list below highlights some of the types of deep learning algorithms:
· Generative Adversarial Networks (GANs)
· Recurrent Neural Networks (RNNs)
· Deep Belief Networks (DBNs)
· Multilayer Perceptrons (MLPs)
· Long Short Term Memory Networks (LSTMs)
· Autoencoders
· Convolutional Neural Networks (CNNs)
While many of these deep learning algorithms can yield a satisfactory outcome, in our case the most suitable deep learning to employ for our classification problem is Multilayer Perceptrons (MLPs).
Multilayer Perceptrons (MLPs)
MLPs are designed to address machine learning problems, especially those related to image recognition models. When you think of a perceptron, I want you to envision a brain neuron. The purpose of each neuron is to analyze the provided data and transmit information to another neuron.
In the field of deep learning, a perceptron or neuron analyzes this data using an activation function and then forwards the output to the next neuron.
Figure 7. Neural network node structure—A simple architecture of a neural network with a single node.
As depicted in Figure 7 above, the data is fed into the input layer of the network. Each neuron is initialized with its own weights and learning rate, also known as bias. The layer then passes the output to the next neuron in the subsequent layer, after computing the output using an activation function.
Once the final output is determined, the model can train itself by initiating a backpropagation algorithm.
This algorithm involves reassessing the weights by traversing back through the network to determine the extent of error responsibility for each node. The model is then retrained to provide improved results and outcomes [17] .
Define MLP-Neural Network Architecture
1) Network type: Multilayer Perceptron (MLP).
2) Number of hidden layers: 3.
3) Total layers: 3 (Three hidden layers + input layer + output layer).
4) Input shape: between 14 to 20 nodes in the input layer.
5) Hidden layer 2: between 10 to 15 nodes.
6) Output layer: between 8 to 12 nodes.
7) Activation function: ReLU, Tanh and Logistics independently.
Model Evaluation—Deep Learning
MLP Classifier were established with the above parameters, different iteration count were used to test out not only one activation function but also Tanh, logistics and many others, the below table is summary of the results of running numerous tests and training on our MLP classifier.
Table 3. MLP algorithm results—Here is a summary of the results obtained from the neural networks (MLP) algorithm classifier.
The developed MLP classifier yielded a satisfactory set of results. Various activation functions, including ReLU, Tanh, and Logistics, were utilized during testing, as illustrated in Table 3. Multiple hidden layers were also incorporated to ensure random distribution of initial weights and their subsequent updates through the backpropagation technique. The average accuracy rates for ReLU, Tanh, and Logistics were approximately 85.49%, 81.06%, and 86.24%, respectively, resulting in an overall accuracy of 84.26%. This represents a penalty for not achieving the expected output or an error rate of 15.7%.
8. Discussion of Results (Model Analysis and Reporting)
All the machine learning algorithms and techniques we’ve introduced have provided exceptional and remarkably accurate evaluation results for our heart failure dataset. Among the conventional machine learning algorithms, our model can accurately predict outputs from unseen data with an average accuracy exceeding 86%. Notably, the Support Vector Machine (SVM) algorithm emerges as the top-performing machine learning model, with the exception of the decision tree algorithm. SVM has produced a predictive model with 89% accuracy and an error rate of 11%.
Consequently, we have excluded the decision tree algorithm, which achieved 99% accuracy, from our considerations due to its tendency to overfit. An overfit model is not suitable for real-world scenarios and is prone to making inaccurate predictions on unseen data. Nevertheless, there are potential solutions to tackle the overfitting problem in decision tree algorithms. These include training the model on even larger datasets or implementing regularization techniques to penalize excessive parameter values. These solutions are designed to decrease the overall complexity of the model.
On the other hand, the usage of the MLP algorithm has produced similar results to the traditional machine learning algorithms. As shown in Figure 8, the average accuracy percentages for the ReLU, Tanh, and Logistic activation functions are 85%, 81%, and 86%, respectively, with an overall average of 84% and a loss value of 16%.
Figure 8. Loss curve graph using logistic or sigmoid activation function—The graph depicts the loss curve of the MLP model with the Logistic (Sigmoid) activation function.
In my opinion, employing the MLP classifier in our classification scenario is a superior approach for the following reasons. Firstly, the MLP algorithm continuously learns from itself in real-life environments by adjusting the total weights during each iteration, enhancing the overall prediction model and generating accurate outputs. Additionally, as the dataset continues to grow in size, the model can self-train on unseen data, further improving its performance.
The second reason why the MLP algorithm is the most suitable choice in this case is its handling of unseen data. Based on the loss curve results obtained from all the classifiers used in this scientific paper, the loss curve of MLPs shows signs of progressive improvement during each iteration phase, particularly when the MLP classifier uses logistic-based activation functions.
This theory has been tested using three different activation functions for the MLP classifier.
Let’s begin with the MLP classifier using the ReLU activation function. As illustrated in Figure 9, we can observe that the model quickly adapted to the given dataset and successfully dealt with test data, producing accurate predictions within the first 10 iterations.
Figure 9. Loss Curve graph using ReLU Activation function—The graph depicts the loss curve of the MLP model with the ReLU activation function.
The utilization of the ReLU activation function significantly improved the overall performance. However, after the 10th iteration, the model’s training progress became more static with no visible improvement.
This limitation of the model can be attributed to the use of ReLU, as it assigns a zero value to inactive elements on the negative axis. Consequently, it failed to recognize data in that region and was unable to effectively train our data to gradually recognize unseen or new data.
Moving on to the next model with the implementation of the Tanh activation function, as depicted in Figure 10 above, we can observe that this model performs better than the previous one. The model demonstrates gradual improvement with each iteration, indicating that it is learning from past data and utilizing that knowledge to predict future data points.
Figure 10. Loss Curve graph using Tanh Activation function—The graph depicts the loss curve of the MLP model with the Tanh activation function.
Our last test is with the use of the Logistic regression activation function or Sigmoid in the above Figure 8, we can see the smooth descent with every iteration from iteration number 1 to 20, the result was a loss of 55%. The model accuracy is tested with every iteration and each iteration is considered a progressive iteration as the more the model is trained and tested against the test dataset the more the model is able to recognize the given data and perform positively. So far, the use of the Sigmoid activation function has resulted in producing a better and more stable model.
MLP classifier with Logistic feature as an activation function will perform the best in real life, also the computational power of the use of this method will not only produce an optimal results but also will leave room for improvement.
On another hand, we can realize that with an MLP classifier the model produces a better predictive result gradually and the model recognize the given dataset much efficiently than other Machine learning algorithms such as naive Bayes or decision tree, where they merely perform well on real-life datasets.
9. Conclusion and Future Work
In this paper, we propose two different comparative machine learning approaches to predict heart disease cases in unseen data. These approaches encompass traditional machine learning concepts and a neural network algorithm.
The results from both approaches are promising, particularly for the neural network or deep learning based algorithms.
Previous research literature suggests that classification algorithms like decision trees are not ideal for this challenge due to their tendency to overfit [18] . However, the use of deep learning algorithms proves to be the best approach for predicting unseen data and handling large datasets in real life. We evaluate the models using a confusion matrix and F1 Score metrics.
As predicted, the MLP algorithm outperforms other classification algorithms such as decision trees, random forest, KNN algorithm, and regression when predicting unseen heart patient data.
To enhance and build upon this research, we may explore the adaptation of various feature selection methods and preprocessing techniques, including data binning and the removal of irrelevant features that may lack significance. Furthermore, should a larger dataset become available, we can delve into additional optimization techniques.
Furthermore, we can test different neural network algorithms, including the CNN neural network, which is tailored and designed for binary classification tasks.
Acknowledgements
I would like to express my deep appreciation to my wife, Nozima Nazarova, whose inspiration has fuelled my desire to contribute to the medical community and advance knowledge in the field of data science research.
I am also grateful to my alma mater at Goldsmiths, University of London, for providing me with the foundation in Data Science and Artificial Intelligence. I extend special thanks to Prof. Zimmer and all the faculty and lecturers at the university for their invaluable roles in stimulating our intellectual growth.
Authors Contribution Statement
The Author conceived and designed the study, collected and analyzed the data, interpreted the results, and drafted the manuscript. Also, the Author provided valuable guidance and critical revisions throughout the research process. The author read and approved the final version of the manuscript.
Data Availability and Access
The datasets generated during and/or analyzed during the current study are available in the Zenodo repository at the following link: DOI-link, A Novel Approach to Heart Failure Prediction and Classification through Advanced Deep Learning Model|Zenodo.
Funding
The author would like to acknowledge that this research project did not receive any external funding or financial support. All aspects of this study, including data collection, analysis, and manuscript preparation, were self-funded.
Author’s Information
Abdalla Ali Mahgoub holds a Master’s degree of Science in Data Science and Financial Technology from Goldsmiths, The University of London. He is currently a renowned Investment Ops/Data Analyst at Ghobash Trading and Investment Co Limited, specializing in financial operations and financial data analysis. Abdalla has extensive publications in the fields of finance, Investment, healthcare, and retail, and has presented his work at numerous conferences and social media channels.
Additionally, he helped build various startups related not only to e-commerce businesses but also to AI-based ventures.
List of Abbreviations
Adaboost Adaptive Boosting Algorithm
AI Artificial Intelligence
ca The Number of Major Vessels
chol Cholesterol
CNNs Convolutional Neural Networks
cp Chest Pain
CVDs CardioVascular Diseases
DBNs Deep Belief Networks
DL Deep Learning
EDA Exploratory Data Analysis
exang Exercise Induced Angina
fbs Fasting Blood Sugar
GANs Generative Adversarial Networks
HDL High-Density Lipoproteins
KNN K-Nearest Neighbors Algorithm
LDL Low-Density Lipoproteins
LSTMs Long Short Term Memory Networks
M.D. Doctor of Medicine
ML Machine Learning
MLPs Multilayer Perceptrons
NB Naive Bayes algorithm
No. Number
oldpeak ST Depression Induced by Exercise Relative to Rest
Ph.D Doctor of Philosophy
ReLU Rectified Linear Unit
restecg Resting Electrocardiography or Resting ECG
RNNs Recurrent Neural Networks
slope The Slope of the Peak Exercise ST Segment
SMI Silent (Asymptomatic) Myocardial Ischemia
SVM Support Vector Machine Algorithm
Tanh Hyperbolic Tangent
thal A Blood Disorder Called Thalassemia
thalach Maximum Heart Rate Achieved
trestbps Resting Blood Pressure
V.A. Medical Center Department of Veterans Affairs Medical Center
vs Versus
VS Code Visual Studio Code
WHO World Health Organization