Titanic - Predicting survivors with Machine learning
In the quiet hours of the night and the stillness of a full weekend, I embarked on a voyage through the vast ocean of data, a journey that would lead me to explore one of the most legendary datasets in the realm of data science. This dataset, set adrift on the virtual shores of Kaggle, held the key to a tantalizing question that has piqued the curiosity of many:
"Would I have been among the passengers who had a better chance of surviving the 1912 Titanic sinking?"
A Glimpse into History:
On the fateful night of April 15, 1912, the grand and ostensibly unsinkable RMS Titanic met its tragic fate, colliding with an iceberg during her maiden voyage. This ill-fated voyage resulted in the loss of 1502 lives out of the 2224 passengers and crew on board. In the heart-wrenching chaos that ensued, the scarcity of lifeboats would become a symbol of the unforgiving nature of the sea, where luck alone was insufficient to guarantee salvation.
Yet, amidst the turmoil of that night, there was a glimmer of hope. It appeared that certain groups of passengers were more likely to defy the cruel hand of fate and emerge as survivors. This challenge, then, was to create a predictive model that could illuminate the characteristics of those who stood a better chance of survival. It was a journey into the past, a quest to uncover the story of who lived to see another day.
Revealing Insights from the Depths:
Unveiling the Predictive Model:
As the journey progressed, I enlisted the Recursive Feature Elimination (RFE) to discern the most influential factors in predicting survival. The top ten features emerged, including 'passenger_id,' 't_class,' 'sex,' 'age,' 'sib_sp,' 'par_ch,' 'fare,' 'embarked,' 'age_group_Child,' and 'age_group_Adult.'
In the first round, the model embraced these features, revealing an improvement in precision, accuracy, and F1 score while surrendering some ground in recall. The second decision tree model, born from a refined feature set, outshone the first, displaying higher precision, accuracy, and F1 score at the cost of reduced recall.
Recommended by LinkedIn
In the second round, the model demonstrated its prowess, correctly identifying 495 survivors and 245 non-survivors. Yet, it also erred, predicting 54 survivors who were not, and overlooking 97 passengers who did survive. The voyage continued, deeper into the heart of the data, guided by the light of exploration.
The Decision tree, when examined, took an unexpected course, choosing to split the data based on the 'sex' variable before turning to the 'ticket_class,' an enigma that hinted at the intricate web of factors influencing survival.
The Random Forest, wielding its forest of features, sang a different song. In its melody, 'sex,' 't_class,' 'fare,' and 'age' took center stage, exhibiting the highest feature importance. These were the pillars of prediction, closely tied to the target variable and confirmed by the correlation heatmap.
The Symphony of Evaluation Metrics:
In our quest, we tuned our ears to the symphony of evaluation metrics. Each note in this ensemble, from AUC's serenade under the ROC curve to the precision, recall, accuracy, and F1-score, offered a glimpse into the model's performance. It was through these harmonious measures that we could gauge the success of our predictive voyage.
Conclusion:
In the wake of our data-driven expedition, we emerged with profound insights. Gender, ticket class, fare, and age proved to be the guiding stars of the Titanic's passenger survival. Gender and ticket class, in particular, shone brightly, guiding many to safety.
Our journey was marked by the presence of two models – the decision tree and the random forest. The latter, with its rich canopy of features, outshone its counterpart, displaying a superior performance in most evaluation metrics. Yet, the persistent refrain of low recall reminded us that even the finest models can overlook some who survived.
In closing, the voyage is far from over. There is a sea of opportunity for model refinement, much like the waiting lifeboats on that historic night. With gender, ticket class, fare, and age as our guiding stars, we embark on a new leg of this journey. Through enhancements and adjustments, we aim to raise our chances of predicting who survived that unforgettable night, and in doing so, we continue to honor the memory of those who sailed on the RMS Titanic.
Curious folks can view the competition Here.
Data & Digital Leader
1yGreat post Aniss - good to see you are familiar with ML models. You've got me motivated to join the challenge and give it a go as well💪😁