Machine Learning

Welcome to the introductory course to Classical Machine Learning. This course is designed for complete beginners who have prior experience with Python. No experience with any ML technique or library is required.

Most algorithms covered here will only require built-in data types (like list, dict, etc.) and functions of Python. All the techniques will be taught through hands-on exercises and challenges.

Tools
  • Numpy

  • Pandas

  • Matplotlib

  • Scikit learn

Evaluation Metrics
  • Accuracy

  • F1,

  • AUC

  • Precision, Recall → True/False - Positive/Negative

  • Sensitivity VS Specificity

  • Confusion Matrix

  • …

Concepts
  • Underfitting/overfitting ⇒ bias VS variance

  • Initialization techniques

  • Hyperparameter tuning

  • Optimizers (Gradient descent, GD with momentum, RMSProp, Adam)

  • Regularization (L1, L2 norms, dropout, early stopping)

  • Normalization

  • Cross-Validation

Chapters

Introduction to predictive algorithms
  • Majority class/frequency baseline for classification

  • Mean/median baseline for regression

  • Basic feature engineering

  • Numerical transformations

KNN
  • KNN Classification

  • Distance/Similarity metrics (L1 vs L2 distances)

  • Multi-dimensional cases

  • Weighted KNN

  • KNN Regression

Data Handling
  • Different features have different magnitudes

  • Normalization

  • Feature Engineering

K-Means Clustering
  • Supervised VS unsupervised training

  • Initialization: multiple restarts

  • Handling outliers

Evaluation
  • Train set VS Test set

  • Validation Accuracy

  • Cross-Validation

  • Underfitting/overfitting

  • Hyperparameter tuning: picking the best k

Linear Regression
  • Simple case

  • Multivariate regression

Logistic Regression
  • Classification problems

  • Class imbalance → fix with adjusting the loss function (maybe also talk about under-/over-sampling)

  • Lasso, Ridge: L1, L2 regularization

ML workflows and data leakage
  • Train/validation/test splits vs cross-validation

  • Stratified and group splits

  • Leakage pitfalls

  • Reproducibility

Decision tree classification
  • Visual intuition

  • Gini index

  • Entropy

  • Heuristics to prune the tree

Naive Bayes
  • Bayes' theorem

  • Bag of words

  • Spam detection

Polynomial Regression
  • Relationship to linear regression

  • Choosing the degree

  • Regularization for polynomial regression

Random forest
  • Relationship to decision trees

  • Picking the number of estimators

Support Vector Machines (SVM)
  • Support vectors

  • Cross Validation for SVM

Ensemble of models
  • Boosting VS Bagging

  • Random Forests as an ensemble of models

Dimensionality reduction
  • Curse of dimensionality

  • PCA

Wrap up with mini-projects
  • Custom datasets

  • Resources