Skip to main content

Statistics and Statistical Data Mining

Module information>

Academic Direction
Goldsmiths, University of London
Also part of
MSc Data Science
Modes of Study

This module aims to cover the key statistical concepts and techniques you will need to interpret the results you might generate through data analysis.

The areas covered in this module include probability theory, likelihood, common distributions, confidence intervals, hypothesis tests, parametric and non-parametric tests.

Upon successful completion of this module, you will be able to:

  • demonstrate the ability to critically appraise and evaluate mathematical and statistical techniques for the given empirical/data analysis.
  • understand the physical significance of the given mathematical and statistical technique.
  • use the optimisation techniques in decision making.
  • use the statistically significant conclusions from the sample data.

Topics covered

  • Exploratory Data Analysis (EDA)
  • Data Pre-processing, Correlation and Probability Overview
  • Sampling and Hypothesis Tests
  • Significance Tests
  • Linear Regression
  • Logistic Regression (LR)
  • Extreme Gradient Boosting (XGBoost)
  • Working with Imbalanced Data
  • Unsupervised Learning and Feature Selection
  • Machine Learning on the Cloud (AWS as an example)


15 (150 hours)


  • Coursework (50%)
  • Written examination (50%)