Becoming a data scientist
This article outlines my journey from Academia to Data Science:
Why I left academia, and
How I became a data scientist.
A lot of the content here has come out of my numerous conversations with people who were curious why I decided to leave academia, and also wanted to know how I did it, and what advice I have in hindsight. This article contains a lot of links to resources that I think are very helpful in getting you started to "think like a data scientist" which in my opinion is the most important step of the transition. I hope that you find this useful.
Please feel free to leave comments about questions that you have and don't find here, about new resources that you think I should add, and of course any ideas about how to make things more clear and accessible.
Why did I leave academia?
Because it felt right to do so.
Here is a hypothetical conversation between me and the devil's advocate (inspired by many times I've had this conversation):
So, what did you like about academia?
Problem solving, ability to focus on cutting edge problems
That sounds awesome, what did you not like about it, then?
Well, I was enjoying what I was doing up to that point, but I decided that I didn't really want academia as a career. I didn't really like teaching, or grant application writing, and writing papers was ok but not a big fan of that either. Also, the instability of many postdocs before getting a permanent job, and then tenure track pressure didn't really seem exactly appealing (see The Future of the Postdoc). I understand that to achieve anything major you have to work very very hard, and even if I want to have my own business I have to deal with the same sort of instability and pressure, but the reward of hard work looks much more appealing on the non-academic side.
Do you regret doing a PhD?
Not at all. I am the person I am today because of that experience. I enjoyed what I did, the fact that I didn't want to keep doing it as a career doesn't mean that it wasn't wroth it
But, you're not using your PhD in what you do!
I'm not using my PhD knowledge, but I'm using a fantastic set of skills that I developed during my PhD. Lots of skills that I would have learned had I not done a PhD, but some that were unique to that particular experience. In fact if I want to summarise the skills that I learned, beyond deep knowledge in a particular field, here is my list (thanks to those who brainstormed with me to prep this list):
HINT: writing a list like this for yourself is important, because it gives you an idea of where you stand and it helps you a lot when you want to rewrite/re-imagine your resume/skillset.
So, you think I should do a PhD?
It depends. If you're hoping that doing a PhD would help you land a better job or anything like that, then no. Go out and take a job now and in 6 years you'd be pretty awesome. But if you want to do it to have that as a unique character-building experience, then go for it. The chances is that the skillset you build and the way you learn to approach problem-solving, would look very appealing for a large number of jobs in the future.
How to become a data scientist?
I prepared myself for the transition for almost a year (on average 7-8 hours a week commitment). I had a decent (technical) resume, as many people do, but I lacked the business insight and experience, and importantly the right language to express the ideas I had.
So, I needed to learn to think like a data scientist
In my opinion, the first and foremost hurdle is to stop thinking like a physicist, chemist, biologist, software developer, ..., and start thinking like a data scientist. The good news is that if you are used to thinking about things scientifically, you have the thinking-like-a-data-scientist part mostly nailed down. Pretty much all you need is to re-calibrate what "good enough" means in data science and in your particular industry of interest. You also need to learn the vocabulary of the field. A lot of the time you know the concepts, just with different names, or something "doesn't feel right" but you don't know how to explain it, or you don't know how to talk about something without flooding the audience with technical jargon. So, you need to learn how data scientists talk about their work.
Another important thing is to know what data scientists experience every day, so that you know what parts of your prior experience is relevant and useful to you as a data scientist. This helps you rethink about your skillset in the new context and find where you fit, what you have nailed down, and what you need to improve. This also helps when you're rewriting your resume.
So, how do you achieve this, you ask? The idea here is to immerse yourself in data science/machine learning related topics. The following is a list of resources I used (or later learned that I could have):
Talk to data scientists
gee, do I really have to say this? People who already are doing what you want to do are the most valuable resource you can find. Figure out where they are, approach them, and see if they can share their experience. A lot of the time they might ignore you, esp if they don't know you, but who knows, they might turn out to be nice. You don't lose much if they ignore you, anyway. Maybe get intros through your common connections? I try to answer to most people that approach me, it might be a simple "I'm not the right person to answer this", but people in general have lower or higher response rate depending on how busy/interested they are. I bet if you send a message that is personal and sounds genuine, there's a good chance they respond. On the other hand, if your message essentially reads "hey, i don't give a darn about you or what you do, I'm just hoping that you might refer me" or alternatively, "copy-paste bs ... cliche bs ... cold ice ... brrrr", then well it's all your own fault.
Oh, here's a hint:
don't ask them to refer you without warming up into it!
Ask them about their experience instead:
Guess what, that information is very critical when you're deciding where to work and how you fit there; trust me, they'll offer to refer you when/if they see your interest.
NOTE: notice that the questions I'm suggesting to ask are mostly anecdotal rather than asking their opinion. You want to know what their environment is really like rather than what they think it should be in principle.
This becomes extra important when you start looking at job ads and realise that they contain near zero information about what you'd be doing in that role. Also, when you realise that data science means many things to many people. So, you really need to know what the roles entails by asking people working there/interviewing you/ etc, and asking them to give you examples. There's no way around this, you either have to get a job and potentially learn that it's not what you wanted/imagined, or talk to people and have a better idea of what to expect.
I understand that talking to people might not necessarily be your thing, so at least read or listen to them through blogs or podcasts.
Podcasts
You can convert commute time, your time at gym, etc to useful time by listening to these podcasts.
Recommended by LinkedIn
Blogs
You should keep up to date with the latest news and techniques to be aware of what's happening and what's available, and also you might be asked in interviews what your favorite blogs are.
Books
This is the good old way of learning, but there's a reason it's still around.
Meetups
Meetups are the best place to meet like minded people, and also people who like to hire people who think like you
These are the best platforms to meet people involved in startup scene, though I've been noticing that big companies (aka corporates) are also realising the value of the Meetups and are sponsoring more and more meetup groups. Here are my favorite ones in Toronto:
There are many more, but these are the ones that I either go to frequently, or have been to once or twice but really enjoyed it.
Skills
The most important skill you need to develop is the ability to work with data
This includes learning the tools you need to deal with data but also to understand data. You need to know what issues you run into when working with data and what solutions there are out there to deal with them. You need to be able to explain what approach/tool/technique you'd use in variety of scenarios. In order to achieve this you could take online courses like
or programming courses if that's what you need to improve. However, once you take one of two of these, the better approach to take is to do data projects. The important advantage of this is that you have things to talk about in your interviews, and it often comes up that they ask you to talk about projects that you have done.
See also: What IBM looks for in a data scientist
Soft Skills
You need to remember that most people that you compete with for a role are most probably very technically capable. What could give an advantage over everyone else is your soft skills. You need to demonstrate that throughout your resume and interviews.
Communication is one of the most important assets of a data scientist
You would often spend time with business people who have no idea what the fancy technical terms you're using mean, and your fascinating results mean nothing to them if you can't communicate it to them. In interviews you would often be given a data scenario and asked to talk about the pipeline that you'd use to obtain the particular insight they want. You would be expected to brainstorm through the steps of the pipeline with the interviewer. This tests your technical ability, as well as your ability to work and communicate with people (well, at least one of them who's interviewing you) through problems.
Resume
Your resume needs to be reformatted and tuned to the job you're applying for. Just a few quick tips (based on my own experience and what I see in the resumes people send me):
Decide what you wanna be when you grow up!
One of the problems that I ran into was the breadth of data related positions in job ads and at the beginning it was quite overwhelming to figure out what's what and what I want to be. I could've been a data scientist, data analyst, data engineer, quantitative analyst, business intelligence engineer, and this list goes on and on. The biggest problem is that each company calls what you want a slightly different thing, and also all those roles overlap in a nontrivial way. So you need to educate yourself about what those are, and be able to decipher the job ads to understand what the role actually is regardless of the title (this is specially important because sometimes companies call the role "data scientist" to make it sound sexier but when you read the job ad carefully you see there's not really much science involved in what they want). The following is my take and the starting point, you need to come up with your own understanding of what each term means:
Also check out:
Data Science Bootcamps
Data science bootcamps are supposed to help you fill the gap between your academic training and industry experience requirements. This is usually done by helping/mentoring you in doing a data science related project, rewriting and formatting your resume, and helping you prep for interviews. Most importantly, data science bootcamps are fantastic networking opportunities where you get to know people who have done what you are trying to do, people who are in the same boat as you, and people who like to hire people like you. It is important, however, to remember that these bootcamps are businesses at the end of the day and YOU are their source of revenue. What they are trying to optimise is not always the same as what you are trying to achieve. Therefore, you need to try to formulate and clarify what you want for yourself out of this experience. These programs are fantastic opportunities to accelerate/facilitate your transition if you enter them prepared, and the other items on this list can help you get there. There are many expensive options but here are four 'free' ones that you might want to look into:
How big of a deal are bootcamps/certifications/courses?
My personal opinion is the following: bootcamps and certifications and even courses and all are by themselves nothing valuable. I have been seeing an increasing number of people posting their certifications of all sorts of online courses and degrees online, and "feeling proud of them" as if they were so difficult to achieve. Every year 1000000 people do the coursera machine learning courses and data science specialisations. Having those certificates does not make you unique. Same goes with bootcamps, the fact that you've spent a lot of money to obtain a certificate means nothing.
Is doing these things useful, yes; are they enough, no
These things are good if they help you build a portfolio for yourself, if you just treat them as a work experience, if you have something in your skillset that you can point to and be like "this is what I learned while I did that" and if you can put it in the context of everything else you have to show and claim why you are unique. I think the mentality is kinda coming from finance and other fields where certification is a big deal. I have done a bootcamp myself, but I never included it in my resume or interviews as something that "hey, i have done this, so I'm automatically qualified", but rather talked about it as an "internship" or an opportunity to learn something new.
Job hunting strategy
I rarely applied to jobs online directly, because that is a waste of time. I used popular job posting websites like Glassdoor to find job postings, then used LinkedIn to find my connections who work for that company or know someone who does, and tried to reach out and grab a coffee with them, and get a referral (see Talk To A Data Scientist for do's and don'ts of doing this). This worked very well for me, it didn't always convert but getting interviews like this is like a breeze.
Panel discussion on "Transitioning to Data Science"
Here is the video of a panel discussion I hosted. There are lots of good advice on what you will be asked in interviews and what you should ask about new roles/teams.
Co-founder & Chief AI Officer | Top 20 Women in AI-Industry in North America 2023 | Research Scientist | Certified System Thinker (Cornell University)
3yThank you so much, Amir! It was what I have been looking for for so long...
Business Analytics| Reporting & Data Visualization| Data Science
3yThank you so much Amir for sharing this wonderful article. It was so helpful.
Senior Research Scientist | Software Engineer
3yextremely helpful and well-written, thanks for sharing your experience and the resources.
Passionate about Data science, Women in Tech and Building cultural bridges
4yThis is very interesting and I didn't know many of the resources. Thank you!
Data Scientist | Machine Learning, Deep Learning, Large Language Models (LLMs), Generative AI (GenAI), Cloud Services | Helping companies develop data-driven solutions using machine learning and quantitative analysis
5yThanks, Amir. It is useful for me as a newcomer looking for a data science position.