CRoNoS: Limassol 2017

2017 CRoNoS Spring Course on Multivariate methods with R

Dates: 8-10 April 2017.
Venue: Cyprus University of Technology
Room: Pefkios Georgiades Amphitheatre
Registration: The registration is free and open to all researchers, but pre-registration is mandatory. Participants must bring their own laptop and have R installed.
Additional venue information and accommodation form

Grants

Students and Early Career Investigators (who have obtained their PhD degree in 2010 or after) can apply for a limited number of grants of 500 Euro for accommodation and traveling.

In order to apply for the grants candidates should submit their CV by e-mail to cronos.cost@gmail.com. COST policies on geographical distribution and gender-balance will be taken into account to grant the applicants.
Deadline for applications: 1st March 2017.
Granted candidates will be informed by e-mail and must send their flight tickets and accommodation booking within a week after the notification to cronos.cost@gmail.com to secure their grants. Otherwise, their grants will be revoked and assigned to other candidate.
The granted candidates must attend all the sessions of the course in order to obtain their grants. Participants must bring their own laptop and have R installed.

Organizers

Organized by the CRoNos COST Action IC1408 represented by
Erricos J. Kontoghiorghes and Ana Colubi.

Sponsors

Tentative programme

Saturday, 8th April 2017

14:30 – 16:30 Session 1.1 - Module I
16:30 – 17:00 Coffee break
17:00 – 18:00 Session 1.2 - Module I

Sunday, 9th April 2017

09:30 – 11:00 Session 1.3 - Module I
11:00 – 11:30 Coffee break
11:30 – 13:00 Session 1.4 - Module I
13:00 – 15:00 Lunch break
15:00 – 16:30 Session 2.1 - Module II
16:30 – 17:00 Coffee break
17:00 – 18:30 Session 2.2 of Module II

Monday, 10th April 2017

09:00 – 10:30 Session 2.3 of Module II
10:30 – 11:00 Coffee break
11:00 – 12:30 Session 2.4 of Module II
12:30 – 14:30 Lunch break
14:30 – 16:30 Module III
16:30 – 17:00 Coffee break
17:00 – 19:00 Module IV

Module I. R Programming and Mixture models

Lecturers: Jochen Einbeck, Department of Mathematical Sciences, Durham University, UK.
Duration: 6 hours.
Material: Slides, Task Sheet, Web repository

This module initially refreshes generic R programming skills, and proceeds with a practical implementation of the EM algorithm and related inferential tools for finite Gaussian mixture modelling.

The tutorial assumes basic working knowledge with R, though it does not require advanced programming skills. More specifically,

Session 1.1 recalls basic tools and concepts which are useful for R programming in general; these include: workspace handling, reading in data files, extracting information from vectors and matrices, basic operations with data frames (such as ordering); basic programming skills such as if/then, while, for, and apply, and the construction of functions. Experienced R users can skip this session.

Session 1.2 gives an introduction into Finite Gaussian Mixture models. The idea of 'complete data' (assuming the unknown component memberships to be known) is explained, based on which the complete likelihood is constructed and maximized. It is demonstrated how the resulting estimators can be incorporated into an EM algorithm, which is used to estimate all parameters of the mixture model.

Session 1.3 and 1.4: Using skills acquired in Session I, the EM algorithm is implemented in R in the practical part in these sessions . Depending on the progress, the more advanced students can proceed with implementing a bootstrap test for the number of mixture components, and/or with a multivariate version of the mixture model. The techniques are illustrated with, and applied to, real data sets taken from astronomy and the energy sector.

Module II: Multivariate time series models: sparse estimation and robustness aspects

Lecturers: Christophe Croux, KU University, Belgium.
Duration: 6 hours.
Material I Material II

Session 2.I: The Vector Autoregressive (VAR) model for time series
The Vector AutoRegressive (VAR) Model is a popular model for the analysis of a multivariate time series. The VAR model allows to investigate the impact changes in one time series have on other ones. In the first session we present this model to an audience of non-experts. We discuss estimation, impulse response functions, and cointegration. Examples in R will be given.

Session 2.2: Sparse estimation of the VAR model
A drawback of the VAR model is the risk of overparametrization. This undermines the ability to identify important relationships in the data and to make accurate forecasts. In high dimensions, we therefore use sparse estimation. The approach is sparse in the sense that some of the parameters are estimated as exactly zero, thereby making interpretation more easy. A network analysis follows from the sparse estimation, and visualizes the dependencies. We illustrate the ideas by two applications: (i) identify demand effects in a large network of product categories (ii) identify spill-over effects in volatilities of a large number of assets.

Session 2.3: Aspects of robust estimation
The sparse estimation of the VAR model relies on two sparse estimators: the lasso for regression and the glasso for inverse covariance matrix estimation. The lasso and glasso are, however, not robust to outliers. Therefore we discuss robust versions of these estimators, that can be used in high dimensions. We discuss as well the selection of the tuning parameter, controlling the degree of sparsity. R-code and examples will be presented.

Session 2.4: Robust estimation for time series
Several types of outliers can occur in time series, making a robust analysis quite cumbersome. We give a review of several methods presented in the literature for robust estimation of the VAR model and outlier detection for time series. We present in more detail an R-package we developed for robust automatic forecasting.

Module III: Clustering functional data

Lecturers: Enea Bongiorno, University Piemonte Orientale, Italy.
Duration: 2 hours.
Material

After a short review of density oriented classification methods, an unsupervised and a supervised classification approach for Hilbert random curves will be discussed. Both approaches are based on a surrogate of the probability density which is defined, in a distribution-free mixture context, from an asymptotic factorization of the Small-Ball Probability (SmBP). Some asymptotic factorizations will be derived for the SmBP of a Hilbert valued random element X. Then, the classification algorithms and the computational implications are illustrated, with particular attention to the tuning of the parameters involved. Asymptotic results are sketched. Applications on simulated and real datasets show how the proposed methods work.

Module IV: An introduction to fuzzy clustering

Lecturers: M. Brigida Ferraro, La Sapienza University of Rome, Italy.
Duration: 2 hours.
Material

Fuzzy clustering is used extensively in several domains of research. A milestone is represented by the well-known fuzzy k-means (fkm) clustering algorithm. The aim of fkm is to discover a limited number of homogeneous clusters in such a way that the objects are assigned to the clusters according to the so-called membership degrees ranging in the interval [0, 1]. In the literature, starting from fkm clustering algorithm, an increasing number of papers devoted to fkm and its extensions can be found.

The aim is to introduce and discuss the main fuzzy clustering algorithms: fkm, Gustafson-Kessel extension of fkm, entropic extension of fkm, fuzzy clustering with polynomial fuzzifier function, fuzzy k-medoids and fuzzy k-means with noise cluster.

A toolbox for fuzzy clustering using the R programming language is presented by simulated and real-case studies. The toolbox, called fclust, contains a suit of fuzzy clustering algorithms, fuzzy cluster validity indices and visualization tools for fuzzy clustering results.