DataScience Road Map for 2024

Parvez Shah Shaik

Data Analyst at Fortray

Published Jun 9, 2022

This article will help you strengthen your plan by providing you with a learning framework, resources, and project ideas to aid in the development of a robust portfolio of work demonstrating data science ability.

Just a note: I created this roadmap based on my own data science experience. This isn't a comprehensive learning strategy. This roadmap can be customised to fit any topic or field of study that interests you. Also, because Python is my preferred programming language, this was built with it in mind.

What is the purpose of a learning roadmap?

A learning roadmap is a curriculum expansion. It creates a multi-level skills map that includes information about the talents you wish to improve, how you'll measure progress at each level, and approaches for mastering each ability.

Each stage in my roadmap is given a weighting depending on the difficulty and commonality of its application in the actual world. I've also included an estimate of how long it would take a beginner to finish each level's exercises and projects.

The following diagram shows the Hierarchy of Needs

This will serve as the foundation for our framework. To complete our framework with more specific, measurable details, we'll need to delve deeper into each of these strata.

Specificity is gained by investigating the critical topics in each layer as well as the resources required to master those topics.

We'd be able to assess our progress by applying what we'd learned to a variety of real-world projects. I've included a few project ideas, portals, and platforms for you to test your knowledge.

Take it one day at a time, one video/blog/chapter per day. It is a broad range to cover. Don't overburden yourself!

Let's take a closer look at each of these layers, beginning at the bottom.

1. How to Educate Yourself on Programming and Software Engineering (Estimated time: 2-3 months)

First and foremost, ensure that you have solid programming skills. At least one programming language will be required in every data science job description.

Common data structures (data types, lists, dictionaries, sets, and tuples), writing functions, logic, control flow, searching and sorting algorithms, object-oriented programming, and dealing with external libraries are among the programming topics to be familiar with.

SQL scripting: Querying databases with joins, aggregations, and subqueries. Experience with the Terminal, Git version control, and GitHub.

Resources to learn Python

learnpython.org is a good place to start learning Python. [free] – a no-cost resource for newcomers. It starts from the beginning and covers all of the fundamental programming subjects. You will be given an interactive shell in which you can practise those concepts side by side.

Kaggle [free] is a free and interactive python tutorial. It's a quick tutorial that covers all of the essential data science subjects.

Python credentials on freeCodeCamp [free] — Python certifications on freeCodeCamp include scientific computing, data analysis, and machine learning.

[Free] Python Course on YouTube by freecodecamp —

This is a 5-hour training that will help you practise the fundamental concepts.

Intermediate Python [free] – Patrick offers another free course on freecodecamp.org.
Coursera Python for Everyone Specialization [charge] – this is a specialisation that covers beginner-level concepts, python data structures, web data collection, and database programming in Python.
Educating resources Git and GitHub [free]: A Guide to Git and GitHub To gain a good grasp on version control, complete these courses and labs. It will assist you in contributing to open-source projects in the future.
On the freeCodeCamp YouTube channel, there's a Git and GitHub crash course.Git and GitHub crash course.

Learn SQL with these resources

On the freeCodeCamp YouTube channel, there's a course on SQL and databases called Intro to SQL and Advanced SQL on Kaggle.
Here's a fantastic beginner SQL training from Treehouse.

1.Solve a number of problems and develop at least two projects to demonstrate your expertise:

Here's where you can solve a lot of problems:

LeetCode and HackerRank (both beginner-friendly) (solve easy or medium-level questions)
Extraction of data from a website/API endpoints – try writing Python scripts to extract data from scrapable websites like soundcloud.com. Save the data you've extracted to a CSV file or a SQL database.
Rock-paper-scissors, spin a yarn, hangman, dice rolling simulator, tic-tac-toe, and other games are examples.
Simple web tools such as a YouTube video downloader, website blocker, music player, plagiarism checker, and others are available.

Create GitHub pages for these projects or simply host the code on GitHub to learn how to use Git.

2. How to Learn About Data Collection and Wrangling (Cleaning)

(Estimated time: 2 months)

Finding appropriate data to assist you solve your problem is an important element of data science job. Data can be gathered from a variety of acceptable sources, including scraping (if the website permits), APIs, databases, and publicly accessible repositories.

An analyst will frequently find themselves cleaning dataframes, working with multi-dimensional arrays, performing descriptive/scientific computations, and manipulating dataframes to aggregate data once they have data.

Data that is clean and prepared for use in the "real world" is rarely available. Pandas and NumPy are the two libraries you can use to transform your data from dirty to ready-to-analyze.

As you gain confidence in building Python programmes, you can begin learning how to use libraries such as pandas and numpy.

Data collection and data cleaning can be learned through resources:

Numpy, Pandas, Matplotlib, and Seaborn are all covered in this freeCodeCamp session.
HackerEarth offers a practical tutorial on data manipulation in Python using NumPy and Pandas.
[Free] Kaggle Pandas Tutorial –A hands-on course that will lead you through common data manipulation abilities in a brief and straightforward manner.
Kaggle's Data Cleaning Training course.
The first course in the Applied Data Science with Python Specialization is a Coursera course on Introduction to Data Science in Python.

3. How to Learn About Exploratory Data Analysis, Business Acumen, and Storytelling

4.How to study about Data Engineering:

At large data-driven companies, data engineering supports R&D teams by making clean data available to research engineers and scientists. It is a separate field, and if you only want to focus on the statistical algorithm side of the problems, you may want to skip this section.

Building an efficient data architecture, simplifying data processing, and sustaining large-scale data systems are all responsibilities of a data engineer.

Engineers develop ETL pipelines, automate file system chores, and optimise database processes to make them high-performance using Shell (CLI), SQL, and Python/Scala.

Another important skill is the ability to deploy these data structures, which necessitates knowledge of cloud service providers such as Amazon Web Services, Google Cloud Platform, Microsoft Azure, and others.

Learning resources Data Engineering:

Udacity's Data Engineering Nanodegree – in terms of a comprehensive collection of resources, I haven't found a better-structured course on data engineering that covers all of the major principles from the ground up.
Data Engineering, Big Data, and Machine Learning on GCP Specialty – This Google Coursera specialisation leads you through all of the major GCP APIs and services to help you construct a full data solution.

Prepare for the following Data Engineering project ideas/certifications:

AWS Certified Machine Learning (300 USD) – AWS offers a proctored exam that adds some weight to your profile (but doesn't guarantee anything), and it demands a good understanding of AWS services as well as machine learning.
GCP offers a certification called Professional Data Engineer. This is a proctored exam that tests your ability to develop data processing systems, deploy machine learning models in a production setting, and assure the quality and automation of your solutions.

5. How to Study Applied Statistics and Mathematics (about 4–5 months):

Data science relies heavily on statistical approaches. The majority of data science interviews focus on descriptive and inferential statistics.

People frequently begin coding machine learning algorithms without first gaining a thorough understanding of the statistical and mathematical principles that explain how the algorithms function. Of course, this isn't the most efficient method.

In Applied Statistics and Math, you should concentrate on the following topics:

Descriptive Statistics – being able to summarise data is useful, but it isn't always the case. To characterise the data, learn about location estimates (mean, median, mode, weighted statistics, trimmed statistics) and variability.
inferentia: Designing hypothesis testing, A/B tests, creating business metrics, and assessing the obtained data and experiment findings using confidence intervals, p-values, and alpha values are all examples of inferential statistics.
To comprehend loss functions, gradients, and optimizers in machine learning, you'll need to know linear algebra, single and multivariate calculus, and multivariate calculus.

Resources to learn about statistics and math:

Take this free 8-hour course on the freeCodeCamp YouTube channel to study college-level statistics.
[Book] (Highly recommended) Practical Statistics for Data Science — A comprehensive reference to all of the most important statistical approaches, with clear and straightforward examples and applications.
[Book] Naked Statistics is a non-technical but comprehensive guide to understanding the impact of statistics on everyday occurrences, sports, and recommendation systems, among other things.
Statistical Thinking in Python is a fundamental course that will teach you how to think statistically. There is also a second section to this course.
Udacity offers a course called Introduction to Descriptive Statistics. It consists of video lectures that teach commonly used location and variability metrics (standard deviation, variance, median absolute deviation).
Udacity's Inferential Statistics course consists of video lessons that teach you how to infer conclusions from data that aren't immediately clear. It focuses on hypotheses development and use popular tests like t-tests, ANOVA, and regression.
Here's a guide on data science statistics to get you started on the correct track.

Project ideas for statistics:

Solve the exercises in the preceding courses, and then try your hand at a few public datasets to see how you can use these statistical concepts. Ask questions like, "At the 0.05 level of significance, is there sufficient evidence to establish that the mean age of moms giving birth in Boston is above 25 years old?"
Ask your peers/groups/classes to participate in mini experiments by interacting with an app or answering a question. Once you get a good amount of data after a period of time, run statistical algorithms on it. This could be difficult to pull off, but it should be fascinating.
Investigate stock prices, cryptocurrencies, and hypotheses based on the average return or any other statistic. Using crucial values, see if you can reject or reject the null hypothesis.

6. Machine Learning and AI: How to Get Started

(Estimated Time: 4–5 Months)

You should now be ready to get started with the advanced ML algorithms after grilling yourself and going over all of the important aforementioned principles.

Learning can be divided into three categories:

Supervsed Learning: Regression and classification problems are included in supervised learning. Simple linear regression, multiple regression, polynomial regression, naive Bayes, logistic regression, KNNs, tree models, and ensemble models are all things to look at. Learn about the different types of evaluation measures.

Unsupervised Learning: The two most common applications of unsupervised learning are clustering and dimensionality reduction. Learn everything there is to know about PCA, K-means clustering, hierarchical clustering, and gaussian mixtures.

Reinforcement Learning: Reinforcement learning aids in the development of self-rewarding systems. Learn how to use the TF-Agents library to maximise rewards, create Deep Q-networks, and more.

Machine Learning Resources:

On the freeCodeCamp YouTube channel, there's a free comprehensive course on Machine Learning in Python with ScikitLearn.
[book] One of my all-time favourite machine learning books is Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition. It not only covers theoretical mathematical derivations, but also shows how algorithms are implemented through examples. The exercises at the end of each chapter should be completed.
Andrew Ng's Machine Learning Course is the go-to course for anyone interested in learning machine learning. Without a doubt!
Kaggle's Introduction to Machine Learning is an interactive course.
Another Kaggle interactive course on reinforcement learning is Intro to Game AI and Reinforcement Learning.

Deep Learning Specialization:

If you want to learn more about deep learning, you can start by completing this specialty offered by deeplearning.ai and reading the Hands-ON book. Unless you're solving a computer vision or natural language processing challenge, this isn't as essential from a data science standpoint.

Deep learning is deserving of its own roadmap. Soon, I'll make that with all of the key notions.

This is simply a high-level summary of data science's vast scope. You could want to delve more into each of these subjects and develop a low-level concept-based approach for each category.

Thank you for spending your valuable time on my Article.

Krisztian Mizun

Wabi Sabi

Wouzers! May we also please get a link to the mind map? Or at keast a high res img of it. Cheers

John O.

I found this very useful. Thanks

1 Reaction

Muhammad Irfan

Looking for a Master's Supervisor for Higher Studies in Data Science | Deep learning | Artificial Intelligence | Machine learning |Bio_informatics

great sir.. You can cover All Things and define very easily inshAllah I can follow...

Ibrahim Hafizullah Dandekar

Analytics • Power BI • Excel • Google Sheets

Great article Parvez. A beginner like me needed this. Many thanks! 😄

1 Reaction

See more comments

To view or add a comment, sign in

DataScience Road Map for 2024

Parvez Shah Shaik

Data Analyst at Fortray

1.Solve a number of problems and develop at least two projects to demonstrate your expertise:

2. How to Learn About Data Collection and Wrangling (Cleaning)

3. How to Learn About Exploratory Data Analysis, Business Acumen, and Storytelling

Recommended by LinkedIn

4.How to study about Data Engineering:

5. How to Study Applied Statistics and Mathematics (about 4–5 months):

In Applied Statistics and Math, you should concentrate on the following topics:

Resources to learn about statistics and math:

Project ideas for statistics:

6. Machine Learning and AI: How to Get Started

More articles by Parvez Shah Shaik

Insights from the community

Others also viewed

Python For Beginners

My Harvard edX Data Science Course Experience

"Python Data Visualization Essentials Guide" - my new book for Data Visualization hobbyists and learners

Python For Beginners

Data Analysis with Pandas: Why Pandas Series Deserve Your Attention, Part 2

2023 Data Analysis & Visualization in python Masterclass

Slithering Back In

Python for Data Professionals: A Complete Step-by-Step Guide

10 Best Data Science Courses for Python and R Developers in 2024

Prepare Yourself for Live Webinar on Applied Machine Learning using Python.

Explore topics

1.Solve a number of problems and develop at least two projects to demonstrate your expertise:

2. How to Learn About Data Collection and Wrangling (Cleaning)

3. How to Learn About Exploratory Data Analysis, Business Acumen, and Storytelling

Recommended by LinkedIn

4.How to study about Data Engineering:

5. How to Study Applied Statistics and Mathematics (about 4–5 months):

In Applied Statistics and Math, you should concentrate on the following topics:

Resources to learn about statistics and math:

Project ideas for statistics:

6. Machine Learning and AI: How to Get Started

More articles by Parvez Shah Shaik

The Vital Role of Data Analysis in Modern Business and Technology

The 40 NumPy Methods Data Scientists Use All the Time

Top 10 Coding Mistakes Made by Data Scientists

How I'd Learn Data Analytics in 2024 | 3 Month Plan

Top 15 BEST Websites For Data Jobs 2024

Data Scientists and ML Engineers Are Luxury Employees

Do You Read Excel Files with Python? There is a 1000x Faster Way

We Don't Need Data Scientists, We Need Data Engineers

10 GitHub Repositories to Master Python

Ultimate Collection of 50 Free Courses for Mastering Data Science

Insights from the community

Others also viewed

Python For Beginners

My Harvard edX Data Science Course Experience

"Python Data Visualization Essentials Guide" - my new book for Data Visualization hobbyists and learners

Python For Beginners

Data Analysis with Pandas: Why Pandas Series Deserve Your Attention, Part 2

2023 Data Analysis & Visualization in python Masterclass

Slithering Back In

Python for Data Professionals: A Complete Step-by-Step Guide

10 Best Data Science Courses for Python and R Developers in 2024

Prepare Yourself for Live Webinar on Applied Machine Learning using Python.

Explore topics