Students and Early Career Investigators (who have obtained their PhD degree in 2010 or after) can apply for a limited number of grants of 500 Euro for accommodation and traveling.
Saturday, 8th April 2017
Sunday, 9th April 2017
Monday, 10th April 2017
This module initially refreshes generic R programming skills, and proceeds with a practical implementation of the EM algorithm and related inferential tools for finite Gaussian mixture modelling.
The tutorial assumes basic working knowledge with R, though it does not require advanced programming skills. More specifically,
Session 1.1 recalls basic tools and concepts which are useful for R programming in general; these include: workspace handling, reading in data files, extracting information from vectors and matrices, basic operations with data frames (such as ordering); basic programming skills such as if/then, while, for, and apply, and the construction of functions. Experienced R users can skip this session.
Session 2.2 gives an introduction into Finite Gaussian Mixture models. The idea of 'complete data' (assuming the unknown component memberships to be known) is explained, based on which the complete likelihood is constructed and maximized. It is demonstrated how the resulting estimators can be incorporated into an EM algorithm, which is used to estimate all parameters of the mixture model.
Session 2.3 and 2.4: Using skills acquired in Session I, the EM algorithm is implemented in R in the practical part in these sessions . Depending on the progress, the more advanced students can proceed with implementing a bootstrap test for the number of mixture components, and/or with a multivariate version of the mixture model. The techniques are illustrated with, and applied to, real data sets taken from astronomy and the energy sector.
Session 2.I: The Vector Autoregressive (VAR) model for time series
The Vector AutoRegressive (VAR) Model is a popular model for the analysis of a multivariate time series. The VAR model allows to investigate the impact changes in one time series have on other ones. In the first session we present this model to an audience of non-experts. We discuss estimation, impulse response functions, and cointegration. Examples in R will be given.
Session 2.2: Sparse estimation of the VAR model
A drawback of the VAR model is the risk of overparametrization. This undermines the ability to identify important relationships in the data and to make accurate forecasts. In high dimensions, we therefore use sparse estimation. The approach is sparse in the sense that some of the parameters are estimated as exactly zero, thereby making interpretation more easy. A network analysis follows from the sparse estimation, and visualizes the dependencies. We illustrate the ideas by two applications: (i) identify demand effects in a large network of product categories (ii) identify spill-over effects in volatilities of a large number of assets.
Session 2.3: Aspects of robust estimation
The sparse estimation of the VAR model relies on two sparse estimators: the lasso for regression and the glasso for inverse covariance matrix estimation. The lasso and glasso are, however, not robust to outliers. Therefore we discuss robust versions of these estimators, that can be used in high dimensions. We discuss as well the selection of the tuning parameter, controlling the degree of sparsity. R-code and examples will be presented.
Session 2.4: Robust estimation for time series
Several types of outliers can occur in time series, making a robust analysis quite cumbersome. We give a review of several methods presented in the literature for robust estimation of the VAR model and outlier detection for time series. We present in more detail an R-package we developed for robust automatic forecasting.
After a short review of density oriented classification methods, an unsupervised and a supervised classification approach for Hilbert random curves will be discussed. Both approaches are based on a surrogate of the probability density which is defined, in a distribution-free mixture context, from an asymptotic factorization of the Small-Ball Probability (SmBP). Some asymptotic factorizations will be derived for the SmBP of a Hilbert valued random element X. Then, the classification algorithms and the computational implications are illustrated, with particular attention to the tuning of the parameters involved. Asymptotic results are sketched. Applications on simulated and real datasets show how the proposed methods work.
Fuzzy clustering is used extensively in several domains of research. A milestone is represented by the well-known fuzzy k-means (fkm) clustering algorithm. The aim of fkm is to discover a limited number of homogeneous clusters in such a way that the objects are assigned to the clusters according to the so-called membership degrees ranging in the interval [0, 1]. In the literature, starting from fkm clustering algorithm, an increasing number of papers devoted to fkm and its extensions can be found.
The aim is to introduce and discuss the main fuzzy clustering algorithms: fkm, Gustafson-Kessel extension of fkm, entropic extension of fkm, fuzzy clustering with polynomial fuzzifier function, fuzzy k-medoids and fuzzy k-means with noise cluster.
A toolbox for fuzzy clustering using the R programming language is presented by simulated and real-case studies. The toolbox, called fclust, contains a suit of fuzzy clustering algorithms, fuzzy cluster validity indices and visualization tools for fuzzy clustering results.