Titanic - Predicting survivors with Machine learning
Thank you Kaggle.com for the challenge.

Titanic - Predicting survivors with Machine learning

Access the details of my analysis either through Github or Kaggle

In the quiet hours of the night and the stillness of a full weekend, I embarked on a voyage through the vast ocean of data, a journey that would lead me to explore one of the most legendary datasets in the realm of data science. This dataset, set adrift on the virtual shores of Kaggle, held the key to a tantalizing question that has piqued the curiosity of many:

"Would I have been among the passengers who had a better chance of surviving the 1912 Titanic sinking?"

A Glimpse into History:

On the fateful night of April 15, 1912, the grand and ostensibly unsinkable RMS Titanic met its tragic fate, colliding with an iceberg during her maiden voyage. This ill-fated voyage resulted in the loss of 1502 lives out of the 2224 passengers and crew on board. In the heart-wrenching chaos that ensued, the scarcity of lifeboats would become a symbol of the unforgiving nature of the sea, where luck alone was insufficient to guarantee salvation.

Yet, amidst the turmoil of that night, there was a glimmer of hope. It appeared that certain groups of passengers were more likely to defy the cruel hand of fate and emerge as survivors. This challenge, then, was to create a predictive model that could illuminate the characteristics of those who stood a better chance of survival. It was a journey into the past, a quest to uncover the story of who lived to see another day.

Revealing Insights from the Depths:

  1. As I delved into the passenger data, a narrative began to emerge. The age of the passengers, in its majority, fell within the span of 20 to 40 years, with a median age of 28, a poignant reminder of the youth that was lost on that icy night.
  2. The price of a ticket, a seemingly mundane detail, held secrets of its own. The 75th quartile for the fare was a modest $31, but a solitary ticket held the staggering value of $512, a stark contrast echoing the profound disparity in socio-economic classes among the passengers.
  3. The data presented a subtle imbalance in the target variable 'survived,' though not grave enough to demand special treatment. It whispered that, in this machine learning odyssey, the choice of evaluation metrics would be vital to navigate the treacherous waters, and so we would turn to precision, recall, F1-score, and ROC-AUC for guidance.
  4. In the tale of the Titanic, 38.38% of passengers defied the odds and survived, a testament to human resilience in the face of disaster.
  5. Amongst the passengers, the dominance of the male contingent was unmistakable, with a staggering 64.76% majority. The ship's gender dynamics reflected the social norms of the time, and this played a role in survival.
  6. The third-class ticket was the most common, held by 55.11% of the passengers, underscoring the prevalence of the working class on board.
  7. Southampton, a port of dreams and departures, stood as the point of origin for the majority, with 72.44% embarking from its shores.
  8. The ratio of parents to children on board was a modest 0.39, signifying that 39% of passengers were bound by familial ties.
  9. In the tapestry of relationships, the ratio of siblings to spouses stood at 0.52, weaving the story of those who shared their voyage with brothers, sisters, or life partners.
  10. In a cruel twist of fate, the more significant the number of siblings, spouses, or parents and children on board, the lower the chances of survival.

Unveiling the Predictive Model:

As the journey progressed, I enlisted the Recursive Feature Elimination (RFE) to discern the most influential factors in predicting survival. The top ten features emerged, including 'passenger_id,' 't_class,' 'sex,' 'age,' 'sib_sp,' 'par_ch,' 'fare,' 'embarked,' 'age_group_Child,' and 'age_group_Adult.'

In the first round, the model embraced these features, revealing an improvement in precision, accuracy, and F1 score while surrendering some ground in recall. The second decision tree model, born from a refined feature set, outshone the first, displaying higher precision, accuracy, and F1 score at the cost of reduced recall.

In the second round, the model demonstrated its prowess, correctly identifying 495 survivors and 245 non-survivors. Yet, it also erred, predicting 54 survivors who were not, and overlooking 97 passengers who did survive. The voyage continued, deeper into the heart of the data, guided by the light of exploration.

The Decision tree, when examined, took an unexpected course, choosing to split the data based on the 'sex' variable before turning to the 'ticket_class,' an enigma that hinted at the intricate web of factors influencing survival.

The Random Forest, wielding its forest of features, sang a different song. In its melody, 'sex,' 't_class,' 'fare,' and 'age' took center stage, exhibiting the highest feature importance. These were the pillars of prediction, closely tied to the target variable and confirmed by the correlation heatmap.

The Symphony of Evaluation Metrics:

In our quest, we tuned our ears to the symphony of evaluation metrics. Each note in this ensemble, from AUC's serenade under the ROC curve to the precision, recall, accuracy, and F1-score, offered a glimpse into the model's performance. It was through these harmonious measures that we could gauge the success of our predictive voyage.

Conclusion:

In the wake of our data-driven expedition, we emerged with profound insights. Gender, ticket class, fare, and age proved to be the guiding stars of the Titanic's passenger survival. Gender and ticket class, in particular, shone brightly, guiding many to safety.

Our journey was marked by the presence of two models – the decision tree and the random forest. The latter, with its rich canopy of features, outshone its counterpart, displaying a superior performance in most evaluation metrics. Yet, the persistent refrain of low recall reminded us that even the finest models can overlook some who survived.

In closing, the voyage is far from over. There is a sea of opportunity for model refinement, much like the waiting lifeboats on that historic night. With gender, ticket class, fare, and age as our guiding stars, we embark on a new leg of this journey. Through enhancements and adjustments, we aim to raise our chances of predicting who survived that unforgettable night, and in doing so, we continue to honor the memory of those who sailed on the RMS Titanic.

Curious folks can view the competition Here.

Great post Aniss - good to see you are familiar with ML models. You've got me motivated to join the challenge and give it a go as well💪😁

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics