Anne Ruiz-Gazen, Toulouse School of Economics, France.
After a practical introduction of the general use of R for multivariate data analysis,the objective of the course is to present the Invariant Coordinate Selection (ICS) method as a tool for multivariate outlier detection. ICS was proposed by Tyler et al. (2009) and shows remarkable properties for revealing data structures such as outliers or clusters. It is based on the simultaneous spectral decomposition of two scatter matrices and leads to an aﬃne invariant coordinate system where the Euclidian distance corresponds to a Mahalanobis Distance (MD) in the original system. However, unlike MD, ICS makes it possible to select relevant components. This proves useful for detecting outliers lying in a small dimensional subspace for data sets in large dimensions. This context appears in particular in high reliability standards ﬁelds such as automotive, avionics or aerospace. In this context, ICS can be useful for detecting anomalies with a small proportion of false positives. The method will be illustrated on several artificial and real data sets using the recent R packages ICSOutlier and ICSShiny. The package ICSOutlier allows to choose scatter matrices, automatically select the most relevant components, calculate an outlierness index and identify potential outlying observations. The ICSShiny package provides a user-friendly application for ICS in particular for outlier detection.
Simon Caton, National College of Ireland, Ireland.
Model scaling is becoming increasingly necessary as datasets increase in size, but also to facilitate core aspects of model building, model prototyping, and model selection. In this session, we will explore the application of h2o to facilitate the parallelisation of R models. The session will begin with parallelising a selection of multivariate methods to use multiple cores on participants' machines. From here, it will move towards leveraging cloud resources to further increase model scalability and correspondingly reduce runtimes. It will culminate with advice on appropriate uses of cloud and other parallel architectures for model building.
The progress on our R-package robts is reported, which is available from R-Forge. Our package works under the assumption of short range dependence and provides different techniques for robust estimation of autocorrelations, partial autocorrelations and spectral densities, for robust fitting of autoregressive time series models, for model diagnostics and prediction. Since many time series models assume second order stationarity, we include robust tests for checking the stationarity of the mean, the variance and the autocovariances. Extensions to multivariate time series analysis are a task for future work.
Cristian Gatu, Alexandru Ioan Cuza University of Iasi, Romania.
Computational strategies for computing the best-subset regression models are proposed. The algorithms are based on a regression tree structure that generates all possible subset models. An efficient branch-and-bound algorithm that finds the best submodels without generating the entire tree is described. Specifically, the computational burden is reduced by pruning the non-optimal subtrees. Strategies and approximate algorithms that improve the computational performance are investigated. Further, this strategies are adapted to solve the problem of regression subset selection under the condition of non-negative coefficients. The solution is based on an alternative approach to quadratic programming that derives the non-negative least squares by solving the normal equations for a number of unrestricted least squares subproblems. This innovative approach is computationally superior to the straight-forward method that would estimate the corresponding non-negative least squares of all possible submodels in order to select the best one. The R package "lmSubsets" for regression subset selection is introduced and described. The package aims to provide a versatile tool for subset regression.
The registration fee includes participation to all sessions both of the Spring Course, material, coffee breaks and a welcome reception (pre-registration is mandatory). The registration also includes attendance to the CRoNoS Workshop on Multivariate Data Analysis and Software
|Early bird registration |
until February 15th, 2018
|Standard registration |
until March 9, 2018
|Late registration |
until March 23, 2018
|Cash registration |
after March 23, 2018
Attendees are responsible for making their own lodging and travel arrangements.
The Poseidonia Beach Hotel, venue of events, is offering special prices to the Summer Course and Workshops participants. In order to book at the special prices, you should register for the events to get your code and introduce it at their reservation page.