4. ML Fundamentals (Breadth)

As the name suggests, this interview is intended to evaluate your general knowledge of ML concepts both from theoretical and practical perspectives. Unlike ML depth interviews, the breadth interviews tend to follow a pretty similar structure and coverage amongst different interviewers and interviewees.

The best way to prepare for this interview is to review your notes from ML courses as well some high quality online courses and material. In particular, I found the following resources pretty helpful.

1. Courses and review material:

Andrew Ng's Machine Learning Course (you can also find the lectures on Youtube )
Structuring Machine Learning Projects
Udacity's deep learning nanodegree or Coursera's Deep Learning Specialization (for deep learning)

If you already know the concepts, the following resources are pretty useful for a quick review of different concepts:

StatQuest Machine Learning videos
StatQuest Statistics (for statistics review - most useful for Data Science roles)
Machine Learning cheatsheets
Chris Albon's ML falshcards

2. ML Fundamentals Topics

Below are the most important topics to cover:

1. Classic ML Concepts

ML Algorithms' Categories

Supervised, unsupervised, and semi-supervised learning (with examples)
- Classification vs regression vs clustering
Parametric vs non-parametric algorithms
Linear vs Nonlinear algorithms

Supervised learning

Linear Algorithms
- Linear regression
  - least squares, residuals, linear vs multivariate regression
- Logistic regression
  - cost function (equation, code), sigmoid function, cross entropy
- Support Vector Machines
- Linear discriminant analysis
Decision Trees
- Logits
- Leaves
- Training algorithm
  - stop criteria
- Inference
- Pruning
Ensemble methods
- Bagging and boosting methods (with examples)
- Random Forest
- Boosting
  - Adaboost
  - GBM
  - XGBoost
Comparison of different algorithms
- [TBD: LinkedIn lecture]
Optimization
- Gradient descent (concept, formula, code)
- Other variations of gradient descent
  - SGD
  - Momentum
  - RMSprop
  - ADAM
Loss functions
- Logistic Loss function
- Cross Entropy (remember formula as well)
- Hinge loss (SVM)
Feature selection
- Feature importance
Model evaluation and selection
- Evaluation metrics
  - TP, FP, TN, FN
  - Confusion matrix
  - Accuracy, precision, recall/sensitivity, specificity, F-score
    - how do you choose among these? (imbalanced datasets)
    - precision vs TPR (why precision)
  - ROC curve (TPR vs FPR, threshold selection)
  - AUC (model comparison)
  - Extension of the above to multi-class (n-ary) classification
  - algorithm specific metrics [TBD]
- Model selection
  - Cross validation
    - k-fold cross validation (what's a good k value?)

Unsupervised learning

Clustering
- Centroid models: k-means clustering
- Connectivity models: Hierarchical clustering
- Density models: DBSCAN
Gaussian Mixture Models
Latent semantic analysis
Hidden Markov Models (HMMs)
- Markov processes
- Transition probability and emission probability
- Viterbi algorithm [Advanced]
Dimension reduction techniques
- Principal Component Analysis (PCA)
- Independent Component Analysis (ICA)
- T-sne

Bias / Variance (Underfitting/Overfitting)

Regularization techniques
- L1/L2 (Lasso/Ridge)

Sampling

sampling techniques
- Uniform sampling
- Reservoir sampling
- Stratified sampling

Handling data

Missing data
Imbalanced data
Data distribution shifts

Computational complexity of ML algorithms

[TBD]

2. Deep learning

Feedforward NNs
- In depth knowledge of how they work
- [EX] activation function for classes that are not mutually exclusive
RNN
- backpropagation through time (BPTT)
- vanishing/exploding gradient problem
LSTM
- vanishing/exploding gradient problem
- gradient?
Dropout
- how to apply dropout to LSTM?
Seq2seq models
Attention
- self-attention
- Transformer architecture (in details, no kidding!)
- Illustrated transformer
Embeddings (word embeddings)

3. Statistical ML

Bayesian algorithms

Naive Bayes
Maximum a posteriori (MAP) estimation
Maximum Likelihood (ML) estimation

Statistical significance

R-squared
P-values

4. Other topics:

Outliers
Similarity/dissimilarity metrics
- Euclidean, Manhattan, Cosine, Mahalanobis (advanced)

3. ML Fundamentals Sample Questions

What is machine learning and how does it differ from traditional programming?
What are different types of machine learning techniques?
What is the difference between supervised and unsupervised learning?
What is semi-supervised learning?
What are stages of building machine learning models?
Can you explain the bias-variance trade-off in machine learning?
What is overfitting and how do you prevent it?
Why and how do you split data into train, test, and validation set?
What is cross-validation and why is it important?
Can you explain the concept of regularization and its types (L1, L2, etc.)?
How Do You Handle Missing or Corrupted Data in a Dataset
What is a decision tree and how does it work?
Can you explain logistic regression?
Can you explain the K-Nearest Neighbors (KNN) algorithm?
Compare K-means and KNN algorithms.
Explain decision-tree based algorithms (random forest, GBDT)
What is gradient descent and how does it work?
Can you explain the support vector machine (SVM) algorithm? what is Kernel SVM?
Can you explain neural networks and how they work?
What is deep learning and how does it differ from traditional machine learning?
Can you explain the backpropagation algorithm and its role in training neural networks?
What is a convolutional neural network (CNN) and how does it work?
What is transfer learning and how is it used in practice?

45 ML interview questions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml-fundamental.md

ml-fundamental.md

4. ML Fundamentals (Breadth)

1. Courses and review material:

2. ML Fundamentals Topics

1. Classic ML Concepts

ML Algorithms' Categories

Supervised learning

Unsupervised learning

Bias / Variance (Underfitting/Overfitting)

Sampling

Handling data

Computational complexity of ML algorithms

2. Deep learning

3. Statistical ML

Bayesian algorithms

Statistical significance

4. Other topics:

3. ML Fundamentals Sample Questions

Files

ml-fundamental.md

Latest commit

History

ml-fundamental.md

File metadata and controls

4. ML Fundamentals (Breadth)

1. Courses and review material:

2. ML Fundamentals Topics

1. Classic ML Concepts

ML Algorithms' Categories

Supervised learning

Unsupervised learning

Bias / Variance (Underfitting/Overfitting)

Sampling

Handling data

Computational complexity of ML algorithms

2. Deep learning

3. Statistical ML

Bayesian algorithms

Statistical significance

4. Other topics:

3. ML Fundamentals Sample Questions