The Ultimate Interview Prep Guide for Data Scientists and Data Analysts
What helped me interview successfully with FANG as well as unicorns
What helped me interview successfully with FANG as well as unicorns
Preparing for data scientist/data analyst interviews is a time-consuming activity, but the prepping time can be significantly decreased if you have prior experience in the field and/or have a list of the right resources for each topic that potentially will show up in the interviews so that you can focus your efforts. Even though data scientists and data analysts are different career paths in the data world (if you don’t know the differences yet, check out my post on the subject), there are a lot of overlaps when it comes to topics covered in interviews.
In my previous post, the first part of my interview prep guide, I outlined the commonly tested areas that appear in most data-related job interviews (those questions are commonly asked in interviews for data scientists, data analysts, data engineers as well as technical product managers). In this post, I’m going to zoom in and focus specifically on the interviews for data scientists and data analysts; I have been interviewed by dozens of companies in the Valley for those roles and have compiled a list of useful resources along the way. If you are interviewing for a data scientist/data analyst position, make sure you brush up on these topics in addition to the ones I outlined in my previous post.
First things first, determine what topics to focus on
Like I mentioned in my previous post, if you have limited time to prepare for your interviews, prioritization is important. The set of topics that will show up in your interviews (and their weights in the final decision) largely depends on the job description. If the job you applied for is mostly building ML models, modeling skills will definitely be what interviewers will focus on; if the job description mentions A/B tests and metrics analytics, then it shouldn’t be a surprise when statistical knowledge and A/B test design are weighted heavily in the judging criteria.
Statistical/Mathematical Interview
Common for: Data Scientist, Data Analyst
Don’t be discouraged by the mentioning of “math”, nobody will ask you about derivatives, integrals, or matrix multiplications in data science interviews as those are most likely not applicable in your day-to-day job. By “math”, I mostly mean the knowledge about basic statistical concepts and basic probability theory.
Unfortunately, knowing the difference between mean and median won’t cut it for this part of the interview. Most interviewers will be testing your knowledge and understanding of things like the Law of Large Numbers, the Central Limit Theorem as well as your familiarity with Bayesian probability and calculation.
Brilliant (Freemium model): a recommended website by FANG recruiters for statistical interview preparation. There are courses for various levels, so make sure you don’t go down a rabbit hole of the statistical concepts. It should be sufficient to know the fundamentals for most interviews.
MIT OpenCourseWare (free): outlines most fundamental statistical concepts in detail with in-class slides. It starts from the most fundamental concepts such as mean and variance and slowly builds up from there. However, if you majored in math or stats in school, these slides might be too fundamental for you.
Khan Academy (free): the Statistics and Probability course provides a detailed breakdown of fundamental concepts in statistics and probability; it also has practice questions to test your understanding of the topics.
Coding Interviews (Python, R, and others)
Common for: Data Scientist, some data analyst roles for smaller companies
In my last post, I covered the prepping material for SQL coding interviews. In addition to SQL, most of the data scientist roles would require a basic level of familiarity with at least one scripting language; the most common ones are Python and R.
In contrast to most SQL interviews, some interviewers will ask you to run your Python/R code. However, the interviewers’ goal is not to test your familiarity with a specific package or syntax for the function; they usually care more about whether you know the basics of programming (for loops, while loops, etc.) and whether you can debug your code when it throws an error. So make sure you ask your interviewer whether they can give you a hint if you don’t remember whether the random() function is in the random package or NumPy package (that was a trick question, both packages have a random() function).
Udemy: there are a lot of Python and R courses to choose from on the platform, so it’s easy to find a course that aligns with your experience and the level of familiarity with the language of your choice.
Hackr.io has an article ranking top Python courses/resources you can find online; this is a good place to start if you are looking for free resources to learn Python.
Theoretical/Modeling/ML Knowledge Interview
Common for: Data Scientist, Applied Scientist
This part of the interview is not something you can easily pick up by watching several videos, so you should definitely budget more time to prepare for this if the job you applied to has a modeling component. Unless you are interviewing for the core data scientist role on an R&D team, usually you won’t be asked about things like NLP (natural language processing) or deep learning. However, you are expected to know the fundamental modeling concepts such as regression, classification, clustering, and more. There are two books I find useful to pick up those concepts if you don’t have a deep background or abundant experiences with modeling.
Disclaimer: This section contains affiliate links, meaning I get a commission if you decide to make a purchase through the links below, at no cost to you. All the books recommended here are the books I used and liked during my own interview prep. (the affiliate links are marked with “*” and you can always bypass it by searching for the book names directly from Google).
An Introduction to Statistical Learning*: this book covers the most common modeling topics with the assumption that the reader already has some statistics background and knowledge.
The Analytics Edge*: this book is written by the superstar professor Dimitris Bertsimas at MIT’s operations research department and it’s used as the textbook in MIT’s business analytics program. McKinsey also uses it as training material for its analytics Bootcamp for data scientists. This book lays out the details of and compares different models in the ML world, from the most fundamental linear regression to the more advanced machine learning models. You definitely can’t finish this book from cover to cover in a week, so instead of using it as a last-minute crash course, I would read through this book in detail in a longer time span and flip through important chapters before interviews as a refresher.
A/B Test Knowledge
Common for: Data Scientist
A lot of data science job descriptions will mention A/B testing since it’s one of the most common analyses you need to do as a data scientist. Understanding of A/B testing design and analysis relies on an understanding of basic stats; so definitely put this part of the prep after going through the resources for statistical/mathematical interviews if you need a refresher on statistical concepts.
A/B testing course on Udacity (free): I used this course to brush up on my understanding of A/B testing before my interviews and it’s super effective. This course goes into great detail about A/B testing design, analysis, and caveats. The best part is that it’s led by Google engineering VPs who have abundant experience with A/B testing and can shed light on the topic with real-life examples.
Other Skills to Brush Up On
Communication skills: Besides the “hard” skills mentioned above, the single most important skill that’s essential to data scientists/data analysts or really anyone is communication. Most interviewers believe technical skills can be picked up on the job as long as you have a certain foundation of technical background/understanding, but it’s tough to teach communication skills to people who are super strong on the technical front but can’t express their thoughts well. In order to demonstrate your communication skills during the interview, it’s important to walk interviewers through your thinking process, especially for the live coding and case interview sections. It’s important to note that walking people through your thinking process doesn’t mean you should ramble; make sure you take a little time to structure your thoughts or code and always try to adopt the top-down and structured communication style. The best way to practice this is to do mock interviews with a friend and walk them through a coding question by explaining what you did in the code.
Familiarity with cloud platforms/data warehouses: being familiar with Google Cloud Platform, Azure or other cloud platforms is usually a plus since most companies nowadays are using some kind of cloud platform for data storage. If you don’t know where to get hands-on experiences with those platforms, there are countless certificates out there for them that you can pick up during your free time.
Familiarity with data visualization tools: most companies use Looker, Tableau, or the equivalent to visualize data. So familiarity with them means it will take you less time to ramp up when you are brought on board and that’s valuable to employers. Again, even if you can’t find opportunities to develop hands-on work experience with those tools, you can find certificates out there that can prove your competency.
I want to add in the end that interview prep takes time and work; while it might occasionally feel like you are going down a rabbit hole with preparation materials, it’s important to not get frustrated during those times and make sure you leave enough time to go through the basics of all possible topics before spending all your time filling the knowledge gap for one topic.
After all, interview prep is just like the test prep you have been doing for school, you can NEVER anticipate and prepare for every question during an interview (if you could, it would defeat the purpose of an interview, wouldn’t it?); when you encounter a question you have not prepared for, believe in your ability to think on your feet and work out the problem with all the materials you HAVE prepared for.
I hope this series of articles is helpful for your interview preparation; feel free to comment and let me know if there are any other topics in the DS world you would like to know more about.