23rd October 2020
Speaker: Yoav Zemel (University of Cambridge)
Title: Probabilistic approximations to optimal transport
Abstract: Optimal transport is now a popular tool in statistics, machine learning, and data science. A major challenge in applying optimal transport to large-scale problems is its excessive computational cost. We propose a simple subsampling scheme for fast randomized approximate computation of optimal transport distances on finite spaces. This scheme operates on a random subset of the full data and can use any exact algorithm as a black-box back-end, including state-of-the-art solvers and entropically penalized versions. We give non-asymptotic deviation bounds for its accuracy in the case of discrete optimal transport problems, and show that in many important instances, including images (2D-histograms), the approximation error is independent of the size of the full problem. We present numerical experiments demonstrating very good approximation can be obtained while decreasing the computation time by several orders of magnitude.
We will also discuss further, recently obtained results on the limiting distribution of the optimal transport plan.
6th November 2020
Speaker: Alex Aue (UC Davis)
Title & Abstract to Follow
20th November 2020
Speaker: Florian Pein (University of Cambridge)
Title & Abstract to Follow
4th December 2020
Speaker: Priyang Dilini Talagala (University of Moratuwa)
Title & Abstract to Follow
To register your interest in accessing the StatScale Seminars, contact Dr Hyeyoung Maeng
9th October 2020
Speaker: Solt Kovacs (ETH Zurich)
Title: Optimistic search strategy: change point detection for large-scale data via adaptive logarithmic queries
Abstract: Change point detection is often formulated as a search for the maximum of a gain function describing improved fits when segmenting the data. Searching through all candidate split points on the grid for finding the best one requires O(T) evaluations of the gain function for an interval with T observations. If each evaluation is computationally demanding (e.g. in high-dimensional models), this can be computationally infeasible. Instead, we propose “optimistic” strategies with O(log T) evaluations exploiting specific structure of the gain function. Towards solid understanding of our strategies, we investigate in detail the classical univariate Gaussian change in mean setup. For some of our proposals we prove asymptotic minimax optimality for single and multiple change point scenarios, for the latter in combination with the computationally efficient seeded binary segmentation algorithm. In simulations we demonstrate competitive estimation performance with significantly reduced computational complexity. Our search strategies generalize far beyond the theoretically analyzed univariate setup. As a promising example, we demonstrate massive computational speedup in change point detection for high-dimensional Gaussian graphical models. This talk is based on joint work with Housen Li (University of Göttingen), Lorenz Haubner (ETH Zurich), Axel Munk (University of Göttingen) and Peter Bühlmann (ETH Zurich).
17th July 2020
Speaker: Dr Tobias Kley (University of Bristol)
Title: A new approach for open-end sequential change point monitoring
Abstract: We propose a new sequential monitoring scheme for changes in the parameters of a multivariate time series. In contrast to procedures proposed in the literature which compare an estimator from the training sample with an estimator calculated from the remaining data, we suggest to divide the sample at each time point after the training sample. Estimators from the sample before and after all separation points are then continuously compared calculating a maximum of norms of their differences. For open-end scenarios our approach yields an asymptotic level $\alpha$ procedure, which is consistent under the alternative of a change in the parameter. By means of a simulation study it is demonstrated that the new method outperforms the commonly used procedures with respect to power and the feasibility of our approach is illustrated by analyzing two data examples. This is joint work with Josua Gösmann and Holger Dette.
3rd July, 2020
Speaker: Prof. Dr. Claudia Kirch (Otto-von-Guericke University)
Title: Functional change point detection for fMRI data
Abstract: Functional magnetic resonance imaging (fMRI) is now a well-established technique for studying the brain. However, in many situations, such as when data are acquired in a resting state, the statistical analyzes depends crucially on stationarity which could easily be violated. We introduce tests for the detection of deviations from this assumption by making use of change point alternatives, where changes in the mean as well as covariance structure of functional time series are considered. Because of the very high-dimensionality of the data an approach based on a general covariance structure is not feasible, such that computations will be conducted by making use of a multidimensional separable functional covariance structure. Using the developed methods, a large study of resting state fMRI data is conducted to determine whether the subjects undertaking the resting scan have nonstationarities present in their time courses. It is found that a sizeable proportion of the subjects studied are not stationary. This is joint work with Christina Stoehr (Ruhr-Universität Bochum) and John Aston (University of Cambridge).
19th June, 2020
Speaker: Martin Tveten (Dept. of Mathematics, University of Oslo)
Title: Scalable changepoint and anomaly detection in cross-correlated data
Abstract: In the seminar, I will present ongoing work in collaboration with the Statscale group on detecting changes or anomalies in the mean of a subset of variables in cross-correlated data. The maximum likelihood solution of both problems scale exponentially in the number of variables, so not many variables are needed before an approximation is necessary. We propose an approximation in terms of a binary quadratic program and derive a dynamic programming algorithm for computing its solution in linear time in the number of variables, given that the precision matrix is banded. Our simulations indicate that little power is lost by using the approximation in place of the exact maximum likelihood, and that our method performs well even if the sparsity structure of the precision matrix estimate is misspecified. Through the simulation study, we also aim to understand when it is worth the effort to incorporate correlations rather than assuming all variables to be independent, and finding out how our method compares to competing methods in terms of power and estimation accuracy in a range of scenarios. Finally, results from an application of the method to detect known faults on a pump monitored by sensors will be shown.
5th June, 2020
Speaker: Yudong Chen (University of Cambridge)
Title: High-dimensional, multiscale online changepoint detection
Abstract: We introduce a new method for high-dimensional, online changepoint detection in settings where a p-variate Gaussian data stream may undergo a change in mean. The procedure works by performing likelihood ratio tests against simple alternatives of dif- ferent scales in each coordinate, and then aggregating test statistics across scales and coordinates. The algorithm is online in the sense that its worst-case computational complexity per new observation, namely O(p^2 log(ep)), is independent of the number of previous observations; in practice, it may even be significantly faster than this. We prove that the patience, or average run length under the null, of our procedure is at least at the desired nominal level, and provide guarantees on its response delay under the alternative that depend on the sparsity of the vector of mean change. Simulations confirm the practical effectiveness of our proposal. This talk is based on joint work with Tengyao Wang and Richard Samworth.