Last updated October 19, 2019
In AI Origins & Evolution

10 Most Frequently Asked Questions In Data Science Interview

Share

Published on May 2, 2019

by Ambika Choudhury

Data Science, Machine Learning, Artificial Intelligence are broad fields and one has to have the core concept in these fields. In this article, we jot down 10 most frequently asked questions in a data science interview.

1| What is regularisation? Explain L1 and L2 regularisation.

Regularisation is a mathematical way of solving the problem of over-fitting. It basically refers to the act of modifying a learning algorithm to favor “simpler” prediction rules to avoid overfitting. It helps to choose preferred model complexity, so that model is better at predicting.

L1 regularisation is also coined as L1 norm or Lasso. Basically, in the L1 norm, the parameters are shrunk to zero. This regularisation does the feature selection by assigning insignificant input features with zero weight and useful features with non-zero weight.

On the other hand, L2 regularisation or Ridge regularisation spreads the error among all the features. This regularization forces the weights to be small but does not make them zero and does non-sparse solution and is not robust to outliers as square terms blow up the error differences of the outliers and the regularization term tries to fix it by penalizing the weights. Ridge regression performs better when all the input features influence the output and all with weights are of roughly equal size.

Click here to learn more.

2| How Data Science differs from Big Data and Data Analytics?

Data Science is a field which contains various tools and algorithms for gaining useful insights from raw data. It involves various methods for data modelling and other data related tasks such as data cleansing, preprocessing, analysis, etc. Big Data implies the enormous amount of data which can be structured, unstructured and semi-structured generated through various channels and organisations. The tasks of Data Analytics involve providing operational insights into complex business situations. This also predicts the upcoming opportunities which the organisation can exploit.

Click here to learn more.

3| How do Data Scientists use statistics?

Statistics plays a powerful role in Data Science. It is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data. It serves a great impact on data acquisition, exploration, analysis, validation, etc.

Click here to learn more.

4| Why data cleansing is important?

Data cleansing is a process in which you go through all of the data within a database and either remove or update information that is incomplete, incorrect, improperly formatted, duplicated, or irrelevant. It usually involves cleaning up data compiled in one area. For individuals, data cleansing is important because it ensures Data cleansing usually involves cleaning up data compiled in one area. In the case of an organisation, data cleansing is important because it improves your data quality and in doing so, increases overall productivity.

Click here to read more.

5| What is Linear and Logistic Regression?

The linear regression method involves continuous dependent variable and contains only one independent variable in case of Simple Linear Regression and multiple independent variables in case of Multiple Linear Regression. Here, the outcome (dependent variable) is continuous and can have any one of an infinite number of possible values. Linear regression gives an equation which is of the form Y = mX + C, means equation with degree 1 and this method is used when your response variable is continuous. For instance, weight, height, number of hours, etc.

While in logistic regression, the outcome (dependent variable) has only a limited number of possible values. Logistic regression is used when the response variable is categorical in nature. For instance, yes/no, true/false, red/green/blue, etc. This method gives an equation which is of the form Y = eX+ e-X.

Click here to learn more.

6| What is Normal Distribution?

The Normal Distribution is a very common distribution and in the statistical term, it is known as Gaussian distribution. A normal distribution has the following characteristics such as the mean, median and mode of the distribution coincide, the curve of the distribution is bell-shaped and symmetrical about the line x=μ, the total area under the curve is 1 and exactly half of the values are to the left of the center and the other half to the right.

7| Difference between Interpolation and Extrapolation

Extrapolation and interpolation are both used to estimate hypothetical values for a variable based on other observations. Interpolation is an estimation of a value within two known values in a sequence of values and Extrapolation is an estimation of a value based on extending a known sequence of values or facts beyond the area that is certainly known.

Click here to learn more.

8| What is a recommender system?

Recommender systems are one of the most widely spread applications of machine learning technologies in organisations. This system helps a user to interact with many items. The machine learning algorithms in recommender systems are typically classified into two categories —content-based and collaborative filtering methods although modern recommenders combine both approaches. Content-based methods are based on the similarity of item attributes and collaborative methods calculate similarity from interactions.

Click here to learn more.

9| Between R and Python, Which one would you choose for text analysis?

Between R and Python, Python would be the best choice as it has Pandas library which provides high-performance data analysis tools and easy to use data structures. However, you can go with either of these languages depending on the complexity of the data which is being analysed.

10| Explain A/B Testing

A/B testing is a statistical method of comparing two or more versions in order to determine which version works better and also understands if the difference between the two or more versions is statistically significant. It is a powerful tool for product development. In technical terms, A/B test is used to refer to any number of experiments where random assignment is used to tease out a causal relationship between treatment, typically some change to a website, and an outcome, often a metric that the business is interested in changing.

Click here to know more.

Access all our open Survey & Awards Nomination forms in one place