© 2019, DISCnet            DISCnet is the Data Intensive Science Centre in SEPnet, and an STFC Centre for Doctoral Training;  a collaboration between

the Universities of Southampton, Sussex, Portsmouth, Queen Mary University of London, and Open University

linkedinlogo.png
slack-15-682088.png

Statistics and Data Analysis

This 3 day residential DISCnet event (DISC6004) will cover theory and techniques of statistics and data analysis.

Aim

To acquire the skills needed for analysis of experimental data and model fitting.

 

Objectives

At the end of this course, a successful student will be able to:

  • Fit models to data, with robust estimates of model parameters, incorporating prior information.

  • Efficiently explore model parameter spaces.

  • Make informed choices between different possible models.

 

Part 1: Statistics

The first part of the school will cover the following statistical methods:

  1. Basics of Bayesian statistics, including rules of probability, Bayesian reasoning and priors,  moments and cumulants, common 1-D distributions.

  2. Multivariate distributions, including multivariate Gaussian, marginalisation, principal components analysis (PCA), changing variables.

  3. Estimator theory, including bias and variance, Fisher matrices, Cramer-Rao bound.

  4. Applications of Bayesian methods, including template fitting, Wiener filtering, marginalisation over nuisance parameters.

  5. Model selection, including Bayesian evidence and proxies. 

  6. Monte Carlo methods, including discussion of Markov chain Monte Carlo (MCMC) techniques , pseudo random number generators, the theory of finite Markov chains in a nutshell, application of Monte Carlo: integrals and the Ising model and a survey of Monte Carlo methods.

 

Part 2: Data analysis

The second part of the school covers data analysis methods:

  1. Treatment of errors, including experimental errors, weighted averages, covariance matrix, combining errors, non-linear functions of several variables, change of variables.

  2. Maximum likelihood methods, including least-squares fitting, linear least squares (uncorrelated measurements), full linear least squares (correlated measurements), non-linear fitting.

  3. Chi2 distribution, including chi2 testing, error ellipse, comparing mean and variance between samples.

 

Learning & Teaching Resources

Monte Carlo Simulations:

  1. Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. "Equation of State Calculations by Fast Computing Machines". Journal of Chemical Physics 21 (6): 1087, (1953).

  2. Metropolis, N.; Ulam, S., "The Monte Carlo Method". Journal of the American Statistical Association (American Statistical Association) 44 (247): 335–341, (1949).

  3. Madras, N. "Lectures on Monte Carlo Methods". American Mathematical Society (2002).

 

Data Analysis:

  1. Barlow, R.J.: Statistics (Wiley)

  2. Robinson, E.L.: Data Analysis for Scientists and Engineers (Princeton)

  3. Hogg, Bovy & Lang: Data analysis recipes: Fitting a model to data (http://arxiv.org/abs/1008.4686)

 

Examples

Examples will be given during the course.

 

Prerequisites / Linked Modules

It is recommended that students have the following software installed on their laptops:

 

Approximate hours: taught material + exercises + self-study

Each morning and afternoon session will start with a 1-hour lecture followed by 2-hours of hands-on exercises.