Google Capstone Project: Cyclistic Ride Share Analysis
For the past few months, I, Ali Rizvi, have been learning about and training in Data Analytics especially through the Google Data Analytics Certification offered via Coursera. Demonstrated below is my final capstone project; the culmination of my learning throughout the entire eight-course series. Through analyzing the differences in the bike usage habits of casual and annual members in the fictional “Cyclistic” bike-share program, I will demonstrate my acumen in Excel, SQL, and Tableau.
Links to the code I used and the visualizations I created:
Background
Cyclistic is a fictional bike-share offering with over 5,800 bicycles and 600 docking stations in the Chicago area. They have built a client base by marketing to broad consumer segments and offering a variety of bike styles to accommodate all riders. Casual riders in the system purchase either single or full-day passes while members hold annual passes. The finance analysts have discovered that annual members are more profitable than casual riders and wish to increase the number of members.
I am a junior data analyst working in the marketing analyst team at Cyclistic. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, I was asked to answer the question: "How do annual and casual riders use Cyclistic bikes differently?"
Note: Two other questions were also present, but I was not directly assigned to answer them. My plan-of-action recommendations will take these questions into account:
- Depending on their habits regarding bike usage, how could casual riders be persuaded to buy annual Cyclistic memberships?
- How can Cyclistic use digital media to influence casual riders to become members?
Ask: Business Task
I will analyze data on the bike usage habits of current customers of Cyclistic bikes for trends such as usage patterns, frequency of usage, etc. to decipher the consumer characteristics of the two customer groups (annual and casual) so that our marketing team can develop and implement optimal marketing strategies.
Prepare: Description of Data Source
The data used to help answer the business question came directly from Cyclistic’s own collection of metrics from stations and bikes owned by it. The data collected covered a 6-month period from January 2021 to June 2021 and was split into separate Excel files by month. Each table contained 13 columns including ride IDs, bike type, start and end times of each ride, start and end station identifiers such as name and location, and member type of the rider (annual members were simply mentioned as “members”).
The data collected meets the ROCCC requirements for identifying good data sources and appears free of bias; however, there was some missing data for starting and ending locations for many rides. I downloaded each month’s CSV file and saved it as an .xlsx (Excel Workbook) file as well.
Process: Cleaning the Data
After obtaining the data, I opened each file individually in Excel to check the structure of the data. Due to the large nature of the data, I chose to clean each month's information separately in Excel before attempting to combine the data in SQL. Below is a summary of the cleaning done:
- Formatted all columns so that all were adjusted for column width depending on the cell values. Also center-aligned all cell values.
- For each file, checked all rows for duplicate values using the built-in feature in Excel. None were found.
- Applied Conditional Formatting to color blank cells in dark red to check for missing data. Several such colored cells appeared in, both, the start and end station locations’ columns; all such rows were removed.
- Created a column called “ride_length” to calculate the length of each bike ride by subtracting the end time from the start time. Then, changed the resultant numbers' format to hh:mm:ss.
- Separated the start date and start time, and end date and end time, for each ride into distinct-yet-adjacent columns.
- Rows with a zero or negative ("###...") value for ride_length were filtered out and removed.
- Created a column named "day_of_week" to extract information about the day name from each start time. Each day was saved as a number between 1 (Sunday) and 7 (Saturday).
- Created Pivot Tables to analyze different aspects of each file, such as the average overall ride length for each group: casuals and members.
NOTE: Additional cleaning was done in SQL. Please refer to the SQL code mentioned in the next section - and in the beginning - to see any cleaning that was done with the combined data for all six months.
Analyze & Share: Analysis of Data Through SQL & Tableau
To be able to handle the large volume of rows (> 1,000,000), I moved my analysis from Excel to SQL. I used the open-source platform DB Browser with SQLite to perform a UNION of all the tables from each month - January through June, 2021 - into one amalgamated table. Using SQLite, I explored the data even further. The full list of SQL queries I used can be found here.
Total Number of Rides by Each Member Type
Firstly, the total number of rides by each type of member - casual and annual - for the six-month-span was visualized in a pie chart and is depicted below:
From the pie chart, it is clear that annual members are in the majority for overall bike usage, with 56.19% of all rides having been undertaken by annual members. In order to uncover the patterns behind Cyclistic bike usage more specifically, the data was more incisively examined to unveil trends of weekly bike usage, with results depicted in the graph below:
Recommended by LinkedIn
From the illustration above, it is quite clear that, for casuals, the weekend days are the ones on which they ride most; for members, usage is far more evenly distributed with all seven days of the week having a lot of usage and this can be attributed to the simple fact that there are just more of member riders. In summary, the weekends were deduced to be the time of the week most worthy of targeting.
Digging Deeper
Now, to determine what types of bikes each group - especially casuals - prefers, I analyzed the usage of each bike type categorized by the type of rider and the number of trips taken on it, in total. The results were:
As can be seen, “members” were ahead in the usage of classic and electronic bikes but trailed immensely in docked bikes’ usage; only one member rode a docked bike, ever! Whereas 138,813 casual members used docked bikes in total.
Overall, casual riders were not lagging too far behind classic and electric bike usage, especially when the fact is considered that they are in the minority of all types of riders. What was most striking was the difference in the share of docked bikes’ usage, and this led me to conclude that casuals highly preferred docked bikes overall and, since the aim of the project is to understand bike usage patterns to develop strategies for converting casual riders into annual members, I decided to analyze patterns of docked bikes’ usage by day-of-the-week. The results of my analysis are as follows:
The findings, once again, illustrate the significance of the weekends, Saturdays and Sundays, for casual riders. Altogether, 99.99% of all docked rides were ridden by casual riders and 48.67% of all docked rides were ridden on weekends.
To identify the locations from which the most docked rides start, I deemed it fit to determine the most popular starting locations for said bike type on weekends for casual riders. Using SQLite, I was able to narrow down the five most popular docked ride-start locations by measuring the amount of trips started and arranging the list in descending order, and then went on to visualize this finding in histogram form:
Subsequently, I planned to dissect the hours of the weekend during which the most docked rides are checked out by casual users. This effort is depicted below:
Probing the unearthed patterns, it can be concluded that the mid-afternoon to early-evening - specifically 12 PM to 6 PM - time period is the one in which the most docked rides are started.
On top of all this, in Excel, as mentioned earlier, I constructed Pivot Tables for each month to calculate the average ride length for each member type. Upon comprehensive analysis for the whole six-month timespan, I found that casual riders spent about 37 minutes per ride whereas members spent only about 14 minutes per ride. This was a crucial finding, as this shows that members might lead in the amount of bikes used, but casuals are far ahead in time spent per bike ride. This may be attributed to the fact that casuals are the type of people who like to ride at their own leisure and are not enforced by any membership commitments and the like; they, thus, ride when they are most free and have a fun time during their trips, which are possibly with other casual family members or friends.
I also came to the conclusion that mobile applications are key in the modern era, and so Cyclistic should be utilizing software technologies to create its own application on which potential users can easily log in and book their rides at their own convenience.
Checking Back on Generality
Additionally, I decided to gather data on when the most rides are undertaken in general, in order to get a holistic view of the time frames in which bikes should be made most available for usage. Therefore, I tailored my analysis for this specific goal and attained these results:
The illustration, yet again, emphasizes the significance of the mid-afternoon to early-evening time period and, this time, it is clear that even until 7 PM, many rides are undertaken by members of the Cyclistic program. Even though this examination included both types of members, casual and annual, and all three types of bikes - classic, electrical, and docked - across all seven days of the week, the importance of the 12PM-to-6PM period is further confirmed.
It is understandable that this period includes the most started rides, as it is during these hours that college students need to travel to attend classes or go back home, and this is also the period during which work hours end and, thus, bikers might start traveling back home. In addition, much of the evening hours of leisure time start during this period, and so many people might check out bikes to head to the park with family or to simply exercise on their own, etc.
Act: Findings & Subsequent Recommendations
Comprehensively, the major insights from all the analysis performed are as follows:
- Members made up the majority of all riders, but almost the entirety of all docked bike riders were casuals.
- Members ride more bikes, but casuals spend far more time per ride than members.
- Classic bikes are ridden the most, in total. Members also use classic bikes far more than casuals, but electric bike usage is much closer in quantity between the two rider types.
- Members appear to undertake a lot of trips each day of the week whereas casuals seem to have a strong preference for riding on weekends.
- Both types of riders have a preference for starting to bike in the mid-afternoon to early-evening hours of the day.
Based on these findings, I would recommend the following actions plans to the marketing team:
- As the aim of the project is to understand bike usage patterns to develop better strategies for converting casual riders into annual members, more bikes should be made available on Saturdays and Sundays during the 12 PM to 6 PM time period. This will optimally match ride availability with casual customers’ preferences and habits.
- Make a software application to make it easier for customers to subscribe and book rides.
- Make each bike ride available for longer time periods, especially for casual riders.
- Along with classic and electric bikes, docked bikes should be targeted for the marketing campaign and should be offered most at locations Streeter Dr & Grand Ave, Lake Shore Dr & Monroe St, and Millennium Park.
Thank you very much for reading through my final project! Please feel free to reach out to me as well. And a very special and magnanimous thank you to Mr. John Anderton for being an exemplary mentor to me throughout this journey.
Entry-Level Virtual Assistant | Admin Support, CRM, & Calendar Management | A1 German Online Tutor | Aspiring Data Scientist
1moAs a person who wants to start her journey in data science, this gives me great insight into what this field is about
Medical Oncologist/Hematologist at Oncology San Antonio
1yGreat work