30 Data Scientist Interview Questions & Answers
Behavioral
1. Describe to me a data project you worked on in the past that you would do differently with the knowledge/experience you have acquired up to this point and/or new technology that was not available at the original time of the project.
How to Answer
This is a Behavioral question, which the interviewer will use to determine how you dealt with a specific issue in the past and how you would deal with it if you encountered it again while working for their organization. Behavioral questions are best answered by using the STAR format. STAR stands for Situation, Task, Action, and Results. Using this format will help you organize your answer and lead the interviewer through your story in a systematic manner.
1st Answer Example
"In my current position, we performed analysis on a large set of data to determine why the company didn't achieve the results they expected from a major initiative they had implemented (Situation). Management had a hypothesis and expected the analysis to confirm it (Task). We completed the analysis, but the data led us to reach a different conclusion (Action). My colleagues and I repeated the analysis but used a modified data set and different analytical tools, and this time, the results matched the original assumptions. However, when management implemented changes based on this study, the results were the same as before (Results). I learned to be true to the methodology and let the results speak for themselves so the problem can be accurately identified and addressed."
2nd Answer Example
"Experience has taught me that as hard as we try, we will still occasionally make mistakes. What's important is turning a failure into a learning opportunity. I once was assigned a project on which the data set was small and the anticipated results were easy to predict. I opted to utilize a simple approach and manual analysis methods instead of following protocol and using automated tools. I thought this would save time since the outcome was predictable. As I progressed through the analysis, it became apparent that the project was more complex than originally thought and the outcome was not as predictable. I had to restart the project, losing time and the resources I had already invested in the project. I learned to not cut corners and always follow protocol because eventually, I would have to anyway. "
Behavioral
2. Data Scientists do a lot of exploring and testing of hypotheses. Tell me about a time you were given freedom to explore a business problem with very few parameters. What was your initial approach in attacking this project?
How to Answer
This is another question that is meant to help the interviewer determine your level of creativity and initiative. These are qualities not typically associated with Data Scientists but which are key to the results they produce. Organizations value employees that are willing to work independently with little supervision and are confident enough to be responsible for their actions. The best way to respond to this question is with confidence and a straightforward answer.
1st Answer Example
"One of the most rewarding parts of my job is the ability to work on a project with few parameters and the freedom to explore and experiment. I was recently assigned to determine why sales of one of our products had fallen off. I decided to approach the problem from two perspectives - those of the sales team and the customers. Using text analysis, data visualization, and other analytical techniques, then comparing the results from the two groups, I determined that the issue was one of communication. I recommended that the sales team modify their messaging and follow up with the customers in a more timely fashion. This helped to reverse the decline in sales."
2nd Answer Example
"Many people believe that Data Scientists are very structured, disciplined individuals with little creativity or imagination. In my experience, just the opposite is true. Successful Data Scientists know that many times the answers they are seeking lay outside of the parameters which define the project they are working on. They realize that they need to be open to exploring new ideas without imposing any restrictions on their thinking or the analysis they perform. While working on a project to help improve the process for onboarding new customers, I was given a set of parameters for the project. Instead of limiting my analysis to these, I took the initiative to explore what other issues were involved with bringing a new customer on board. I discovered six other processes that were not part of the original plan for the project. After careful analysis, it turned out that two of these were critical. By adjusting and improving these, along with some other changes, the onboarding process was accelerated and customer retention was increased."
Behavioral
3. Describe a time when you had to present findings/recommendations to a non-technical audience. What strategies did you use to ensure the audience did not get confused and clearly understood the message?
How to Answer
This is another example of a Behavioral Question, and you can use the STAR framework to organize your answer. The interviewer is interested in learning about your communication skills and style. Walk them through your answer systematically, describing how you communicate complex issues in a clear and non-technical manner.
1st Answer Example
"While Data Science is a complex and highly technical field, the people who use the information I provide are usually experts in other fields. Therefore, when I present my findings, I work hard to communicate them in terminology the audience is familiar with, focusing on the conclusions and recommendations rather than the data, statistics, and analysis methodology. This approach results in the organization attaining the business objective of the analysis. I also prepare myself to answer questions about the science behind the analysis if necessary."
2nd Answer Example
"It is easy to get distracted by the nuances of data analysis when presenting findings of a project I've worked on to the decision-makers from other parts of the organization. This is because I'm most comfortable with the data and the analysis and may not be familiar with the other operational aspects of the company. When preparing to present to a non-technical audience, I take the time to understand how my findings and recommendations will be used, the priorities of my listeners, and the desired outcome of the meeting. I then focus on communicating clearly, and concisely, using terminology I know the audience is familiar with and understands."
Behavioral
4. Describe a project where you had a surprisingly difficult time dealing with unstructured data. How did you overcome the obstacles and what tools did you use?
How to Answer
By asking this question, the interviewer is revealing that their organization deals with unstructured data and needs to hire someone who has experience with this and can organize this type of data to make it usable. Your answer needs to include a specific example of how you accomplished this in a previous role to prove your ability to do it again.
1st Answer Example
"Unstructured data is difficult, but not impossible to work with while performing an analysis. The key is to utilize tools and techniques designed to effectively analyze this type of data. On a recent project, I was tasked with helping the sales team improve its customer relationship management process. I utilized a combination of a NoSQL database and Amazon's Simple Storage Service (S3) to collect and analyze the data, which produced the results the sales team needed."
2nd Answer Example
"Unstructured data, also known as Big Data, is the fastest-growing type of data in the industry. This has presented Data Scientists with a challenge when attempting to use this type of data to analyze trends, processes and customer behavior. I have learned that special techniques and tools are required to incorporate unstructured data into the analysis I'm performing. These include looking for patterns, keywords, and sentiment in textual data, and using natural language processing technology. I recently worked on a project involving employee communications and attempting to improve individual productivity. Using the tools and techniques I previously mentioned, I was able to determine the specific behaviors the most productive employees exhibited. This enabled the management team to train and coach the other employees, thereby improving their productivity."
Behavioral
5. Many companies rely on Data Scientists to tell them what analysis is possible with the data available. Talk about a time when you took the initiative to recommend a new business measure for the company to track.
How to Answer
Data Science is a disciplined practice with little opportunity for creativity or changes. However, organizations like their employees to be able to take the initiative and innovate to improve processes, reduce costs, or increase the outcomes of the actions they take. When preparing for an interview, you should have a few stories available that demonstrate initiative and out-of-the-box thinking.
1st Answer Example
"During one of my previous jobs, I was on a team that focused on customer satisfaction scores. We determined this by analyzing data involving product returns, repeat purchases, customer referrals, and text analysis. While performing this work, I noticed a correlation between customer satisfaction and a specific feature of one of our products. I knew the feature couldn't be altered, but I thought that if it was highlighted in the product documentation and suggestions for its use were emphasized, the customers would use it more often. I recommended this to the product manager who implemented my idea. The result was an increase in the use of the feature and a corresponding reduction in customer complaints."
2nd Answer Example
"As a Data Scientist, my job is to analyze data sets and report the results. The other company stakeholders use this information to improve products, services, and processes or to make better business decisions. However, I like to look beyond the data and identify trends that are more subjective in nature. An example of this was when I was working on a project involving employee absenteeism at one of the company's manufacturing facilities. The data identified some factors contributing to the absenteeism patterns, but I sensed there was something else not indicated by our analysis. I took the initiative to informally interview a sample set of the employees and found that they had strong opinions about the food served in the cafeteria. I then added this factor to the study and confirmed that increases in absenteeism correlated with specific menu items."
General
6. To be a successful Data Scientist, many in the industry believe it is important to stay up-to-date on the newest technologies and methodologies. What new data-related technology/methodology have you heard of that you wish you could learn more about?
How to Answer
In many fields, people become complacent the longer they are in the field, and they tend not to stay abreast of developments in their field. However, Data Science is a relatively new and quickly developing industry, and Data Scientists need to stay abreast of emerging technologies and methodologies. You should be able to demonstrate your curiosity and efforts to stay up to date on developments in the industry. You also need to make sure that you can discuss the technologies you mention if the interviewer asks you any follow-up questions.
1st Answer Example
"Since our field is moving so quickly, it is very important to take the time to learn about new developments and technologies that will help me do my job better. Key trends I'm currently following include Augmented Analytics, Continuous Intelligence, Explainable AI, and Data Fabrics. The one technology I'm most interested in is Blockchain. While this is mostly associated with cryptocurrency, there are many other applications for blockchain technology, and the applications for this are still in their infancy."
2nd Answer Example
"One of the benefits of being a Data Scientist is the ability to learn something new every day. Our field is evolving so quickly that there is always a new technology or methodology to explore. The challenge is to not get distracted by the shiny objects one finds in the literature and on the internet, offering new and better ways to analyze data. Some of the technologies I'm currently exploring include Persistent Memory Servers, which offer better performance at a lower cost, and Graph Analytics, which allow for comparing the relationships between organizations, people and transactions."
General
7. The work of a Data Scientist can have a large impact on the strategy, and ultimately success, of a business. Is there a time you felt your work as a Data Scientist had a profound impact on strategy development? Explain your role and contribution.
How to Answer
Organizations generally hire workers for one reason - their ability to contribute to the attainment of the company's business objectives. This question is meant to determine if you understand the impact your role has on the organizations you work for. It will also help the interviewer learn about your contributions to your previous employers. You should provide specific examples and quantify the benefits.
1st Answer Example
"Data Science has had a profound impact on businesses and their decision-making process. This practice helps businesses make quicker and more accurate decisions, communicate their products' benefits better, encourage innovation, and explore new ideas. In my current role, I was involved in a prototyping project that employed data science methodology to quickly investigate new revisions of the software we were developing without having to take the time to write, debug, and test the code. This led us to determine the best path to take to develop the software our customers needed and would be willing to pay for and reduced the development cycle by over 50%."
2nd Answer Example
"Data Science is revolutionizing the way businesses develop strategies, create new products, make decisions and interact with their customers. Data Science and Machine Learning reduce product development cycles and accelerate the decision-making process. In my last job we were faced with a situation in which there were three distinct new business opportunities, the company could pursue. Historically, analyzing, discussing, and choosing the option to pursue would have taken months, during which time the situation would have changed and the original data and assumptions may no longer have been valid. However, by using data science methodologies and machine learning technology, we were able to analyze each option and present our findings and recommendations to senior management within 4 weeks. The company adopted our suggestions and the new business unit was established and became profitable within its first year of operations."
General
8. When your job requires you to be immersed in data, you can discover some interesting patterns or trends. What is the most interesting discovery you made through the mining/exploration of data?
How to Answer
An interviewer may ask this type of question to learn more about you as a person rather than simply exploring your technical skills. People hire people, and one of the main purposes of an interview, besides confirming your qualifications for the job, is to determine if you would be a good fit for the organization. This question will provide some insight into both your personality and your ability to communicate your ideas.
1st Answer Example
"Being immersed in the data is one of my favorite parts of being a Data Scientist. I often discover things I didn't expect or wasn't even looking for. Once, while parsing data to determine how consumers go about shopping for a new car, I discovered that the most important criterion most people used was color. Respondents to the survey we used listed color more often than make, model, features, or even price. This enabled the organization to redesign their marketing campaign to emphasize the variety of color options available for their vehicles."
2nd Answer Example
"While people outside of the field of Data Science believe our profession is dry and mundane, I find it fascinating. It is called 'data mining' for a reason because buried in the piles of data we examine as part of our job are nuggets of gold. These are in the form of the results we predicted and are supported by the data, as well as discoveries we didn't expect. An example of this was learning that a process improvement our organization made on its manufacturing line not only solved the problem it was meant to address but also resulted in fewer defects in the final product. When we explored this in more detail, using data analytics, we discovered that the new process helped operators detect and remediate a chronic issue earlier, thereby eliminating the need for rework. This 'ah-ha moment' is one of the perks of the job I do."
General
9. In your past positions, have you had experience contributing to the improvement of data analysis processes, database management, data infrastructure, or anything along those lines? If so, please explain your contributions.
How to Answer
Keep in mind that organizations hire people to help them achieve their objectives. This question seeks to learn what past contributions you made in your previous positions and to explore whether these are similar to the challenges their company is facing. You should understand the issues the company routinely faces from your pre-interview research. Make sure your answer to this question aligns with the needs of the employer.
1st Answer Example
"In my most recent role, I introduced my team to new data visualization techniques to enable us to perform the data analysis faster and more accurately. It also enabled us to communicate our findings to the other business stakeholders in a manner they could easily understand and relate to."
2nd Answer Example
"One of the key objectives for any job I have had is continuous process improvement. Early on I realized that investing some of my time in pursuit of making the work I did more productive would benefit both me and the organization I worked for. An example of this was while working with my previous employer. After learning how they currently collected, processed, cleaned and analyzed their data, I researched different tools and methodologies that would improve the process. I developed a set of recommendations and presented them to senior management. They agreed with my assessment and moved forward to implement my suggestions. As a result, the time to complete an average data analysis project was reduced by 40% and the results were more accurate."
General
10. How have past positions unrelated to data analysis helped you in your current profession as a Data Scientist?
How to Answer
The purpose of this question is to gain a broader picture of your background and experience. In addition to being able to perform the tasks related to the job you are interviewing for, organizations prefer to hire individuals who can expand the role of the job to accomplish other organizational objectives. They are also interested in your fit and how you may improve the company culture.
1st Answer Example
"When I was young and still in school, I didn't have the goal of becoming a data scientist. However, I was naturally curious and enjoyed learning new things. I also liked solving problems, especially using numbers and sets of data. One of my first jobs was in a library where I had to reshelve books. I quickly learned that if I spent some time organizing the books before I began placing them on the shelves, I could reduce the amount of time the job took. I challenged myself to continually beat my previous times by devising new ways to organize the books and navigate through the library. I tracked this and charted my progress. This made a routine job more interesting and enjoyable."
2nd Answer Example
"Data Science involves identifying an issue, determining a methodology to analyze it, collecting data, performing the analysis, and interpreting the results. My first experience with this type of process was while working in a grocery store during high school. My job was to replace items that customers either returned or decided not to purchase while they were checking out. At first, I considered this to be a boring and repetitive job, but then I started to notice patterns in the products I was working with. I began keeping track of the products and looking for patterns. I found that many of the products not purchased or returned by customers had ambiguous pricing information. I shared my charts and conclusions with the store manager, who agreed with my results and took steps to clarify the pricing information. I continued to track the products and found that the changes we made resulted in fewer returns and non-purchases. This sparked my interest in data science and led me to this career."
Operational
11. Can you describe some of the steps you take to ensure that a regression model fits the data?
How to Answer
This is an example of a technical question. As a data scientist, you can anticipate that most of the questions you will be asked during a job interview will be technical. Technical questions should be answered succinctly and directly, with no embellishment. You should also anticipate that the interviewer will ask follow-up questions to learn more about the topic or clarify your answer.
Answer Example
"One of the key steps I take to ensure that the regression model fits the data is employing the R-squared methodology. This addresses the relative measure of fit. Another is to use the F1 score to evaluate the null hypothesis. One last methodology is a root-mean-square error, or RMSE, which provides the absolute measure of fit."
Operational
12. As a Data Scientist, how do you employ statistics to analyze data and develop business recommendations?
How to Answer
Data Scientists use a variety of tools, statistics being one of the most used and commonly employed. An interviewer will ask this question early in the interview to set the stage, learn more about your skills and experience, and guide you toward other, more specific questions. Keep this in mind when responding to this question because it will provide you with the opportunity to move the interview in a direction that you are comfortable with and can easily address.
1st Answer Example
"Statistics are probably one of the strongest tools a Data Scientist has in their arsenal. They help us to identify patterns, find hidden insights, and quickly analyze large data sets. Statistics provide information about consumer behavior, interests, engagement, and other aspects of the shopping and purchase process. They also allow for the quick development of models that validate assumptions and inferences."
2nd Answer Example
"Of all the tools I use in the process of analyzing data, statistics is my favorite. This is the most mature methodology in the field of data science and there are a great many programs at our disposal. Statistical analysis is a straight forward way to identify trends, confirm a hypothesis, expose hidden insights and develop models business users need to make intelligent decisions. Statistics can be used to narrow the focus of an analysis and provide the users with the exact information they are looking for."
Anonymous Interview Answers with Professional Feedback
Anonymous Answer
Marcie's Feedback
Operational
13. Do you perform data wrangling and data cleaning before applying machine learning algorithms to your data analysis?
How to Answer
This is an operational question. The interviewer will ask operational questions to learn more about how you go about doing your job. One of the key responsibilities of a data scientist is to ensure that the data sets they are using are appropriate for the analysis they are performing. Data wrangling and cleaning are two processes used to accomplish this. You should be familiar with these and able to explain them. As with any operational question, keep your answer direct and to the point and anticipate a follow-up question or two.
Answer Example
"I believe that it is important to perform both data wrangling and cleaning before applying any machine learning algorithms. This will ensure that the data set is appropriate, they are the data sets I intended to work with for my analysis, the standard deviations meet the study guidelines, the relationships between the data are valid, and the data is normalized and standardized. This eliminates any outliers or variables that would potentially skew the results I obtain."
Operational
14. Can you describe how Data Analysis is used by businesses and other organizations?
How to Answer
While this appears to be another Technical Question, it is more of a General Question. The interviewer is likely to ask this early in the interview to establish a conversational tone for the interview and develop some avenues to follow-up questions. As with any interview question, your answer should relate to the company's operations and how you believe they use data analytics to run their business. You can usually determine this from the information provided on their website and in the job posting.
1st Answer Example
"As a Data Scientist, I've come across many examples of how businesses use data analysis to improve the results of their operations. For example, eCommerce firms can use data analysis to understand customer behavior, reduce churn, and better target their marketing. Financial organizations use it to evaluate investment opportunities and detect fraud. Healthcare companies employ data analysis to develop treatments for specific groups of patients."
2nd Answer Example
"Data analytics is one of the most significant developments in making businesses and other organizations more efficient and effective. The insight data science provides helps virtually every organization to improve its operations through a better focus on outcomes and more targeted information used for intelligent decision making. Examples of this include search engines ranking pages depending on the specific interests of the user, and social media filtering information which the user is not interested in. Another use of data analytics is in robotics, which uses machine learning to handle new situations not previously encountered. Finally, businesses can extract information from large and unstructured sets of data which can then be used to develop products and target their marketing."
Anonymous Interview Answers with Professional Feedback
Anonymous Answer
In simple words, any piece of data that could be used to make a business decision is potentially within the spectrum of a data analysis."
Marcie's Feedback
Operational
15. In your opinion, is mean square error a good or bad measure of model performance?
How to Answer
As with any career or profession, the techniques and methodologies used may differ between individuals. Interviewers will ask you questions similar to this one to learn more about your expertise and how you go about doing your job. They also are interested in your opinion on certain topics, some of which may be controversial. When you answer this type of question, it is best to give them your honest opinion. Trying to please the interviewer may work during the interview, but it may cause issues once you are hired.
Answer Example
"I believe that the mean square error, or MSE, is a flawed measure of a decision model's performance. The problem of using MSE for this purpose is that it weighs larger errors more than smaller ones. This results in applying emphasis on the large deviations in the data. I prefer to use mean absolute deviation, or MAE, which is a more robust model and provides a more accurate measure of a model's performance."
Operational
16. What are some of the assumptions required to accurately perform a linear regression analysis?
How to Answer
As a data scientist, you use many different methodologies to analyze the data sets you are working with. Often these methodologies involve several steps, assumptions, or other items. A common practice during an interview is providing the interviewer with a list of items in your answer. Organize your answer and make sure that none of the items are repeated in your answer.
Answer Example
"There are several assumptions that I use when performing a linear regression analysis. Some of these include:
o Ensuring that the data I use in the sample is representative of the population
o Determining if the variance of the residual is the same for any value of X
o Confirming that the relationship between X and the mean of Y is linear
o Reviewing the observations to ensure that they are unique and independent of each other"
Operational
17. Do you follow the hypothesis that many small decision trees are more accurate than one large one?
How to Answer
This is an example of a follow-up question that the interviewer asks to expand on a previous question you answered. You should always anticipate follow-up questions during an interview. Keeping your answers short and to the point will encourage the interviewer to ask follow-up questions before moving on to the next question. This is beneficial if you have in-depth knowledge about a topic and want to spend some time on it.
Answer Example
"No. I believe just the opposite. The larger a decision tree is, the more accurate it is as a decision process model. Small decision trees can cause problems because the options are limited, and the model may not fit the problem. If possible, I would create a model that looks more like a forest than a tree with multiple options and a distinct direction to help you navigate the woodland."
Operational
18. How do you deal with an unbalanced binary classification when analyzing a data set?
How to Answer
This is yet another operational question asking you about how you react to a specific situation that may occur during a data analysis exercise. You should be able to answer this question easily as an experienced data scientist. Your answer should address the use of metrics and how they impact the analysis. Knowing this and explaining it to the interviewer will reinforce your qualifications for a data scientist's position.
Answer Example
"The easiest way to address an unbalanced binary classification is to review the metrics you are using in your model. Even though some metrics may be accurate, they may skew the results. Another way you can neutralize this issue is to increase the impact on the analysis for incorrectly classified and any minority class data. This results in a superior model, which produces more accurate results. Another solution is to oversample some of the minority class data or under-sample some of the majority class data, which will balance the binary classification."
Operational
19. Data visualization is an important skill that will be used often when communicating results with stakeholders. Describe to me one of your most innovative data visualization ideas that went beyond pie and bar charts.
How to Answer
Data Science and analysis of complex data sets is a very technical discipline. However, the organization's stakeholders who use the results of the analysis need to be able to clearly understand what the data is telling them and use it to improve their operations and help them make business decisions. You need to be able to present your work in a manner that is easy to understand and utilize. This is known as data visualization. The interviewer is seeking to understand how you organize and present the data to accomplish this objective.
1st Answer Example
"While the process of analyzing data is important, the results of the analysis must be useful to the stakeholders in the organization. It is important to understand what the stakeholders' objectives are when deciding how to present my results. One method I've found to be effective is to add graphs, pictures, and illustrations to my presentations and reports. Once, when presenting an analysis of customer usage trends for a product our organization sold, I incorporated images of the product and animated them to expand in size in relation to the growth in customer adoption, adding the statistics and actual growth numbers. The audience remarked how clear this made the information."
2nd Answer Example
"I am always cognizant of how my data analysis will be used by the stakeholders in our organization and seek to present my results in a manner that is relevant to the audience and in terms they can easily understand. I often use visual representations of the data along with the actual numbers to achieve this. Recently, when presenting a study on the implementation of process improvement and the results it generated, I used a photo of the actual production line, and animated it, increasing the speed of the line as the process was improved and the production times were reduced. I included the actual percentages of the time saved above the line so the management team could easily see the results which were achieved."
Operational
20. Describe to me your experience with machine learning methods. Is there a particular method you have more experience with than others?
How to Answer
The purpose of this question is to explore your knowledge and experience with machine learning. The interviewer may want to confirm not only your skills in this area but also your direct knowledge of the machine learning methodologies their company utilizes. Be prepared to provide a concrete example and the rationale behind using the methodologies you chose.
1st Answer Example
"Much of my experience with machine learning is in the area of medical imaging. Our team employed machine learning methodologies including classification, clustering, and regression analysis to help improve the accuracy of the assessment of the images our equipment produced."
2nd Answer Example
"Since most of my work was focused on filtering spam within the email application we developed, I utilized the Classification methodology of machine learning, specifically the NaiveBayes algorithm. This allowed us to address the large data set we used for training and the multiple attributes we filtered for. "
Technical
21. Can you discuss some of the weaknesses of a linear analysis model?
How to Answer
The interviewer is asking another technical question, but this time in a back-handed manner. By asking you to discuss the negative aspects of the topic, they anticipate that you will demonstrate your knowledge of the topic by providing both the negative and positive aspects. Be sure to stay as positive as possible when you answer this question. Being negative will reflect badly on you, even though you were asked to discuss the shortcomings.
Answer Example
"While it is an effective methodology, there are also several drawbacks to a linear analysis model. One issue is that the linear analysis model makes strong assumptions that may not apply to the application used. Also, there is an assumption of a linear relationship, normality between the variables, minimal multicollinearity, and homoscedasticity. Finally, a linear model cannot be used for discrete or binary outcomes. As long as you avoid these pitfalls, the linear analysis model can be very useful to a data scientist."
Technical
22. What are some of the differences between a histogram and a box plot?
How to Answer
The interviewer is asking a technical question, which requires you to compare different types of visual models used to analyze data. Knowing the differences between two similar but different techniques used to illustrate the results of data analysis will confirm that you are qualified for this role. Technical questions like this are best answered by comparing the terms presented by the interviewer and possibly providing an example of how they are used in your profession. Your answer should also be brief and to the point to provide the interviewer the opportunity to ask a follow-up question.
Answer Example
"Boxplots and histograms are both visualizations used to illustrate a data distribution that communicates information in different ways. Histograms are bar charts that illustrate the frequency of a numerical variable's values. I use this to understand the shape of the distribution, the variation, and any potential outliers that may skew the data. While boxplots don't illustrate the shape of the distribution, they enable you to view information such as the quartiles, the range, and outliers. I believe boxplots are more useful than histograms when comparing multiple charts."
Technical
23. What statistical software programs do you have experience using in past positions in this field? Which one do have you the most experience with or feel the most confident using?
How to Answer
This is a technical question, the purpose of which is to determine your familiarity with and knowledge of the software used by data scientists. The interviewer is also interested in learning if you are adept at working with the software and tools their organization utilizes. The best way to answer this question is to first state the names of the software you have used and are familiar with, and then explain that most statistical analysis software products have similar features and that you've been able to easily transition from one to another when necessary.
1st Answer Example
"In my current position, we use Tableau for the majority of our work. We also have licenses for Statgraphics and JMP Statistical Software, but these are only used in circumstances in which their unique features are more suited for the task at hand. I've also used Salesforce Analytics Cloud and MATLAB in previous roles. I've found that transitioning to a new software analytics tool is relatively easy due to the similarity in the features and user interface between the different packages."
2nd Answer Example
"The primary software tools I use for statistical analysis are Tableau, Statgraphics, and JMP. Each of these performs similar functions, but each also offers unique features that make specific types of analysis easier and more accurate. My experience has taught me that once you are familiar with the basic functions of statistical analysis software, you can move between different tools by simply learning the user interface and the functions each one offers. I'm curious about which tools your organization prefers and why."
Technical
24. Can you define cross-validation and describe how you use this process when analyzing a data set?
How to Answer
This is a technical question asking for both the definition of the term and an explanation of how you use it in your work as a data scientist. During an interview, you should make sure you always listen carefully to the complete question. Many candidates will begin formulating their answers as soon as the interviewer begins asking the question. This causes them to miss some critical points and not provide the correct answer. A useful technique to counter this is to pause for two seconds before beginning to answer the interviewer's question. This also ensures you will not 'step on' the interviewer when they are still talking, which is a critical mistake during an interview (or any conversation).
Answer Example
"As a data scientist, I use cross-validation to assess how well the analysis model I am using will perform on a new and independent dataset. A typical way to use cross-validation is to split the data into two sets. You then use one data set to build the model and the second one to test your analysis. This helps to improve the accuracy of and my trust in the results of the analysis."
Technical
25. What is a decision tree, and how do you use this in your job as a data scientist?
How to Answer
Data scientists use various tools and methodologies in their work to accomplish their tasks and ensure the results they produce are accurate. You should be able to discuss these with authority during your interview for a data scientist job. You may have also recognized this as a technical question. This is a typical structure of a technical question - asking you to define a term and provide an example of how it is used in your profession. Your answer should address the definition first, then provide an example of how you would use this item in your job.
Answer Example
"Decision trees are graphical models used to illustrate the options available and the choices made during a decision process. A decision tree is intuitive and easy to build. However, it lacks accuracy, so it should only be used to illustrate the process. Like a tree, it begins with a base, or trunk, and then grows. Each decision option is called a node. The last decision options are at the top of the tree and are known as leaves."
Technical
26. What is Data Cleansing and why is it important in Data Analysis?
How to Answer
Technical questions like this are straightforward ways for the interviewer to explore and confirm your technical competencies related to the position for which they are interviewing you. Your preparation for an interview should include researching and practicing technical questions in addition to general and behavioral questions. Always answer technical questions succinctly without embellishment or additional information.
1st Answer Example
"Data cleansing is the process of ensuring that data obtained from a wide variety of sources is suitable for analysis. It involves a high-level review of the data set, detection of any anomalies or inaccuracies, and correcting these to ensure the data is correct and accurate. It can also be used to eliminate components of the data that are irrelevant to the analysis being performed."
2nd Answer Example
"It is always a good idea to cleans data before analyzing it. This involves reviewing the data for inaccuracies, irrelevant information or other items that will skew the analysis and result in conclusions that are incorrect or not usable. When performing a data cleansing operation, the Data Scientist looks for outliers or information that doesn't fit the pattern of the majority of the data. Inaccuracies are corrected and information not relevant to the analysis being performed is removed."
Anonymous Interview Answers with Professional Feedback
Anonymous Answer
Data Cleaning involves a high-level review of the data set and looks for the missing and inconsistent data, outliers, detect noise and inaccuracies, and correcting them to make sure the data is accurate. For me, data cleansing is the most crucial step in the data analysis and I spend lots of my data analysis time at this step because it is the foundation of the whole analysis. As I read somewhere and also believe that "garbage data gives garbage result, Good data gives good results.'"
Marcie's Feedback
Technical
27. Can you compare Sas, R, and Python programming tools and describe their use in Data Analytics?
How to Answer
This is a Technical Question that seeks to determine your technical capabilities and knowledge of the common tools used by Data Scientists. By specifying these tools, the interviewer is indicating that these are what their organization uses and expects you to be competent in. You should be able to compare these and state their purpose in analyzing data, even if you don't regularly use them.
1st Answer Example
"Sas, R, and Python are probably the most commonly used tools for data analytics. Sas has a wide array of functions, a user-friendly graphical interface, and strong reporting features. R's strength is that it is an open-sourced tool and is widely used in academic and research environments. Python is also an open-sourced product but is more widely used and supported. It is easy to learn and interfaces well with other tools. The best part about Python is its large portfolio of libraries and modules."
2nd Answer Example
"While there are many data analytics tools available, Sas, R and Python are probably the most popular and widely used. Of these, I prefer Python. This is due to its large number of user-created libraries and modules, its ease of use, and its robustness in areas such as statistical operations and model building. R is also open-sourced but is more popular with the academic and scientific community. Sas is by far the most widely used data analytics tool and has an easy-to-use graphic interface and probably the strongest statistical functions. Sas' only drawback is its licensing cost, which can be prohibitive for smaller organizations."
Technical
28. What experience do you have conducting text analytics? Describe a project you worked on that required text analytics.
How to Answer
This is another technical question which the interviewer will ask to confirm your skills and experience as a Data Scientist. They want to ensure that you are qualified for the job and are familiar with a specific process that they use to analyze data to improve the results of their operations.
1st Answer Example
"Text Analytics is the process of creating meaning out of written communications. A common usage of this is in a customer experience context - examining text that was written by, to, or about customers. This helps find patterns and topics of interest and then enables the organization to take action based on what it learns. While working on a recent project involving the review of a service our organization provides, I examined the email communications between our support team and the customers. My analysis identified a specific issue that customers inquired about frequently. We then reviewed the documentation related to this and realized that it was vague and somewhat confusing. After updating the information and performing subsequent text analysis, I confirmed that the number of customer inquiries about this issue had dropped by 70%."
2nd Answer Example
"Text analysis is the process of examining written communications to discover trends or create meaning. It involves sentiment analysis, key phrase extraction, and entity recognition. It usually is used with customer communications to improve a product or service. While working on a recent project to determine which of our company's services customers found the easiest to use, I analyzed email exchanges between the sales team and their customers. The results revealed that while customer surveys indicated that one service was the most popular, the email exchanges found that the customers actually enjoyed using another service more. This enabled the company to adopt some of the features of this service to improve customer satisfaction with the other ones we offered."
Technical
29. What data visualization tools do you have experience using? Which one is your favorite to use and why?
How to Answer
This question is similar to the one about statistical software programs in that it is attempting to discover your technical knowledge and your familiarity with the tools the company you are interviewing with uses. Again, the best response is a direct one, stating your knowledge of software tools, your preference, and the reason for your opinions.
1st Answer Example
"The data visualization tools I use include Google Charts, Tableau, Grafana, Chartist.js, FusionCharts, Datawrapper, Infogram, ChartBlocks, and D3.js. I prefer Tableau because it offers a variety of visualization styles, is easy to use, and can handle large data sets. I also like Tableau because their help desk is very responsive and open to suggestions from the user community. It isn't a true open-source software, but the product is continuously being improved by the developers and the company's customers."
2nd Answer Example
"There are many data visualization tools available and I've used quite a few of them. They include Google Charts, Tableau, Grafana, Chartist.js, FusionCharts, Datawrapper, Infogram, ChartBlocks, and D3.js. My favorite is Google Charts. This is because it is easy to use, robust, widely accepted and the licensing fees are low. Another reason I prefer Google Charts is that it can be customized to make it better suited to the specific project I'm working on. Since most of the work I do is similar, once I have tailored the tool to my needs I don't have to reconfigure it each time I use it."
Technical
30. What programming languages do you have experience using? Of these, which do you have the most experience with? Which do you have the least experience with?
How to Answer
This is another technical question geared toward determining your hard skills and qualifications for the data scientist position. This is relatively easy to answer since you need these skills and experience to work in this industry. Your answer should be honest, straightforward, and brief. This 'Tick the Box' question is required, but likely won't differentiate you from other candidates.
1st Answer Example
"I'm experienced in the majority of the major programming languages including Python, R, Java SQL, C++, and Scala. Of these, I prefer Python due to its applicability to a wide variety of tasks and its general acceptance in the data science community. I also like the fact that it is an open-source language and the syntax is easy to understand."
2nd Answer Example
"While Python is my favorite programing language to work with, I'm also skilled in R, Java, SQL, C++, Tableau and most of the other popular languages. While each of these has its merits, Python offers the most flexibility, is easy to learn and understand, and is popular among the data science community. Another language that is becoming ubiquitous in the field is RapidMiner. I've recently learned this and I'm starting to use it more due to its robust features."