Module I. R Programming and Mixture models

Lecturers:

*Jochen Einbeck*, Department of Mathematical Sciences, Durham University, UK.

Duration: 6 hours.

Material:

Slides,

Task Sheet,

Web repository This module initially refreshes generic R programming skills, and proceeds with a practical implementation of the EM algorithm and related inferential tools for finite Gaussian mixture modelling.

The tutorial assumes basic working knowledge with R, though it does not require advanced programming skills. More specifically,

**Session 1.1** recalls basic tools and concepts which are useful for R programming in general; these include: workspace handling, reading in data files, extracting information from vectors and matrices, basic operations with data frames (such as ordering); basic programming skills such as if/then, while, for, and apply, and the construction of functions. Experienced R users can skip this session.

**Session 1.2** gives an introduction into Finite Gaussian Mixture models. The idea of 'complete data' (assuming the unknown component memberships to be known) is explained, based on which the complete likelihood is constructed and maximized. It is demonstrated how the resulting estimators can be incorporated into an EM algorithm, which is used to estimate all parameters of the mixture model.

**Session 1.3 and 1.4:** Using skills acquired in Session I, the EM algorithm is implemented in R in the practical part in these sessions . Depending on the progress, the more advanced students can proceed with implementing a bootstrap test for the number of mixture components, and/or with a multivariate version of the mixture model. The techniques are illustrated with, and applied to, real data sets taken from astronomy and the energy sector.