Course Outline

Machine Learning Introduction

  • Types of machine learning – supervised vs unsupervised
  • From statistical learning to machine learning
  • The data mining workflow: business understanding, data preparation, modeling, deployment
  • Choosing the right algorithm for the task
  • Overfitting and the bias-variance tradeoff

Python and ML Libraries Overview

  • Why use programming languages for ML
  • Choosing between R and Python
  • Python crash course and Jupyter Notebooks
  • Python libraries: pandas, NumPy, scikit-learn, matplotlib, seaborn

Testing and Evaluating ML Algorithms

  • Generalization, overfitting, and model validation
  • Evaluation strategies: holdout, cross-validation, bootstrapping
  • Metrics for regression: ME, MSE, RMSE, MAPE
  • Metrics for classification: accuracy, confusion matrix, unbalanced classes
  • Model performance visualization: profit curve, ROC curve, lift curve
  • Model selection and grid search for tuning

Data Preparation

  • Data import and storage in Python
  • Exploratory analysis and summary statistics
  • Handling missing values and outliers
  • Standardization, normalization, and transformation
  • Qualitative data recoding and data wrangling with pandas

Classification Algorithms

  • Binary vs multiclass classification
  • Logistic regression and discriminant functions
  • Naïve Bayes, k-nearest neighbors
  • Decision trees: CART, Random Forests, Bagging, Boosting, XGBoost
  • Support Vector Machines and kernels
  • Ensemble learning techniques

Regression and Numerical Prediction

  • Least squares and variable selection
  • Regularization methods: L1, L2
  • Polynomial regression and nonlinear models
  • Regression trees and splines

Unsupervised Learning

  • Clustering techniques: k-means, k-medoids, hierarchical clustering, SOMs
  • Dimensionality reduction: PCA, factor analysis, SVD
  • Multidimensional scaling

Text Mining

  • Text preprocessing and tokenization
  • Bag-of-words, stemming, and lemmatization
  • Sentiment analysis and word frequency
  • Visualizing text data with word clouds

Recommendation Systems

  • User-based and item-based collaborative filtering
  • Designing and evaluating recommendation engines

Association Pattern Mining

  • Frequent itemsets and Apriori algorithm
  • Market basket analysis and lift ratio

Outlier Detection

  • Extreme value analysis
  • Distance-based and density-based methods
  • Outlier detection in high-dimensional data

Machine Learning Case Study

  • Understanding the business problem
  • Data preprocessing and feature engineering
  • Model selection and parameter tuning
  • Evaluation and presentation of findings
  • Deployment

Summary and Next Steps

Requirements

  • Basic understanding of statistics and linear algebra
  • Familiarity with data analysis or business intelligence concepts
  • Some exposure to programming (preferably Python or R) is recommended
  • Interest in learning applied machine learning for data-driven projects

Audience

  • Data analysts and scientists
  • Statisticians and research professionals
  • Developers and IT professionals exploring machine learning tools
  • Anyone involved in data science or predictive analytics projects
 21 Hours

Number of participants


Price per participant (excl. VAT)

Testimonials (3)

Upcoming Courses

Related Categories