Some text some message..
Back Interview Question of Stats :for: Data Scientist 04 Apr, 2025

Here's a list of 50 statistics-related interview questions and answers tailored for Data Scientist roles.

These cover fundamentals, probability, distributions, sampling, hypothesis testing, and practical applications.


🔹 Basic Statistics

  1. Q: What is the difference between population and sample?
    A: A population includes all elements from a set of data, while a sample is a subset of the population used to make inferences.

  2. Q: Define descriptive vs inferential statistics.
    A: Descriptive stats summarize data (mean, median), inferential stats draw conclusions (hypothesis testing, confidence intervals).

  3. Q: What are the types of data?
    A: Nominal, ordinal, interval, and ratio.

  4. Q: What are measures of central tendency?
    A: Mean, median, and mode.

  5. Q: What are measures of dispersion?
    A: Range, variance, standard deviation, IQR.


🔹 Probability & Distributions

  1. Q: Define probability.
    A: Probability is the measure of the likelihood that an event will occur.

  2. Q: What is conditional probability?
    A: Probability of event A given B has occurred: P(A|B) = P(A ∩ B) / P(B)

  3. Q: What is Bayes' Theorem?
    A: It calculates the probability of a hypothesis based on prior knowledge:
    P(AB)=P(BA)P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}

  4. Q: Name some probability distributions.
    A: Normal, Binomial, Poisson, Bernoulli, Exponential, Uniform.

  5. Q: What is the Central Limit Theorem?
    A: It states that the sampling distribution of the sample mean approaches normality as the sample size increases.


🔹 Hypothesis Testing

  1. Q: What is a null and alternative hypothesis?
    A: H₀: no effect or status quo; H₁: effect or difference exists.

  2. Q: What is a p-value?
    A: The probability of obtaining test results at least as extreme as the observed during the test, assuming the null is true.

  3. Q: When do you reject the null hypothesis?
    A: When p-value < significance level (e.g., 0.05).

  4. Q: What are Type I and Type II errors?
    A: Type I: Rejecting true H₀ (false positive), Type II: Failing to reject false H₀ (false negative).

  5. Q: What is a confidence interval?
    A: A range within which the true population parameter lies with a specified probability.


🔹 Sampling Techniques

  1. Q: What is sampling?
    A: Selecting a subset from a population to estimate characteristics.

  2. Q: Name different sampling methods.
    A: Random, stratified, cluster, systematic, convenience.

  3. Q: What is sampling bias?
    A: It occurs when the sample isn't representative of the population.

  4. Q: What is the law of large numbers?
    A: As sample size increases, sample mean approximates population mean.

  5. Q: What is oversampling and undersampling?
    A: Techniques to balance classes in imbalanced datasets.


🔹 Correlation & Regression

  1. Q: Difference between correlation and causation?
    A: Correlation: association, Causation: one causes the other.

  2. Q: What is multicollinearity?
    A: When independent variables in regression are highly correlated.

  3. Q: What is R² in regression?
    A: Proportion of variance in the dependent variable explained by the model.

  4. Q: What are residuals?
    A: Differences between observed and predicted values.

  5. Q: What is adjusted R²?
    A: R² adjusted for the number of predictors in the model.


🔹 Advanced Concepts

  1. Q: What is ANOVA?
    A: Analysis of variance — compares means across multiple groups.

  2. Q: When to use t-test vs z-test?
    A: t-test for small samples or unknown variance, z-test for large samples and known variance.

  3. Q: What is a Chi-square test?
    A: Tests the association between categorical variables.

  4. Q: What is logistic regression used for?
    A: Predicting binary outcomes (0 or 1).

  5. Q: What is heteroscedasticity?
    A: When residuals have unequal variance.


🔹 Real-world Application

  1. Q: How do you handle missing data?
    A: Imputation (mean, median, mode), removal, or prediction.

  2. Q: How do you check if data is normally distributed?
    A: Histograms, Q-Q plots, Shapiro-Wilk test.

  3. Q: Why normalize or standardize data?
    A: To scale variables and improve model performance.

  4. Q: How to detect outliers?
    A: Z-score, IQR method, boxplot.

  5. Q: What is the purpose of A/B testing?
    A: Comparing two versions (A vs B) to determine statistically significant improvement.


🔹 Practical Scenarios

  1. Q: You get a p-value of 0.07. What do you conclude?
    A: Fail to reject the null at 5% significance level.

  2. Q: What does a high variance in data indicate?
    A: More spread out data; potential model instability.

  3. Q: When is a non-parametric test used?
    A: When data doesn’t meet assumptions of normality.

  4. Q: What is bootstrapping?
    A: A resampling technique to estimate statistics on a population.

  5. Q: Explain the difference between parametric and non-parametric tests.
    A: Parametric assumes data distribution, non-parametric doesn't.


🔹 Miscellaneous

  1. Q: What is skewness?
    A: Measure of data asymmetry.

  2. Q: What is kurtosis?
    A: Measure of tails and peak sharpness in a distribution.

  3. Q: What is a time series?
    A: Data collected at successive, equally spaced points in time.

  4. Q: What is autocorrelation?
    A: Correlation of a variable with itself over successive time intervals.

  5. Q: What is cross-validation?
    A: Technique to assess model performance on unseen data.


🔹 Critical Thinking

  1. Q: If a model has high accuracy but low precision, what does it mean?
    A: It predicts many positives, but many are false positives.

  2. Q: How do you handle class imbalance?
    A: Resampling, SMOTE, change evaluation metric (e.g., F1-score).

  3. Q: What is the difference between recall and precision?
    A: Precision = TP / (TP + FP); Recall = TP / (TP + FN)

  4. Q: What metric is best for imbalanced classes?
    A: F1-score, AUC-ROC.

  5. Q: How do you ensure statistical significance in results?
    A: Use appropriate tests, ensure sample size, and check p-values and confidence intervals.