top of page
StatScale Workshop 2018 Abstracts

Alex Aue, University of California, Davis

Title: Testing for stationarity of functional time series in the frequency domain
Abstract: Interest in functional time series has spiked in the recent past with papers covering both methodology and applications being published at a much increased pace. Contributing to research in this area, this talk discusses stationarity tests for functional time series based on frequency domain methods. Setting up the tests requires a delicate understanding of periodogram- and spectral density operators that are the functional counterparts of periodogram- and spectral density matrices in the multivariate world. Two sets of statistics are proposed. One is based on the eigendecomposition of the spectral density operator, the other on a fixed projection basis. Their properties are derived both under the null hypothesis of stationary functional time series and under the smooth alternative of locally stationary functional time series. The methodology is theoretically justified through asymptotic results. Evidence from simulation studies and an application to annual temperature curves suggests that the tests work well in finite samples. The talk is based on joint work with Anne van Delft (Bochum).

 

 

 

Sabyasachi Chatterjee, University of Illinois at Urbana-Champaign

 

Title: Risk bounds for matrix total variation denoising

Abstract: Total Variation Denoising (TVD) for matrices is a very successful and widely used technique in Image processing. However, nuanced statistical risk bounds for the TVD estimator are sparsely available. We study the constrained form of the TVD estimator for matrices and obtain risk bounds from a worst case and an adaptive perspective. Firstly, we show the minimax rate optimality of this estimator for matrices with total variation bounded by some number V > 0. We then show the rate of convergence can be faster than the minimax rate when the true image is piecewise constant on rectangles and the correct tuning parameter is chosen. This fact gives statistical backing to the folklore that the TVD estimator preserves sharp edges in an image. Lastly, we explore the existence of estimators which are completely tuning parameter free yet can still achieve the minimax rate in this problem. 

 

 

 

Hao Chen, University of California, Davis
 

Title: Asymptotic distribution-free change-point detection for modern data
Abstract:We consider the testing and estimation of change-points, locations where the distribution abruptly changes, in a sequence of multivariate or non-Euclidean observations. We study a nonparametric framework that utilizes similarity information among observations and can thus be applied to any data types as long as an informative similarity measure on the sample space can be defined. The existing approach along this line has low power and/or biased estimates for change-points under some common scenarios. We address these problems by considering new tests based on the similarity information. Simulation studies show that the new approaches exhibit substantial improvements in detecting and estimating change-points. In addition, the new test statistics are asymptotically distribution free under the null hypothesis of no change with very mild conditions on the similarity graph. Accurate analytic p-value approximations to the significance of the new test statistics for both single change-point and changed interval alternatives are provided, making the new approaches easy off-the-shelf tools for large datasets. The new

 

 

 

Haeran Cho, University of Bristol

 

Title: Multiple change-point detection via multiscale MOSUM procedure with localised pruning
Abstract: In this work, we investigate the problem of multiple change-point detection where the changes occur in the mean of univariate data. The proposed methodology combines the multiscale Moving SUM (MOSUM) procedure and localised pruning based on an information criterion. The MOSUM procedure estimates the location of each change-point in a local environment defined by the bandwidth, which is utilised at the localised pruning stage. The combined approach achieves consistency in estimating the total number and locations of the change-points as well as being computationally efficient, and hence is easily scalable to handle long series. 

This is joint work with Claudia Kirch (OvGU Magdeburg)

 

 

 

Holger Dette, Ruhr-Universität Bochum

 

Title: Functional data analysis for continuous functions
Abstract: Functional data analysis is typically conducted within the $L^2$-Hilbert space framework. There is by now a fully developed statistical toolbox allowing for the principled application of the functional data machinery to real-world problems, often based on dimension reduction techniques such as functional principal component analysis. At the same time, there have recently been a number of publications that sidestep dimension reduction steps and focus on a fully functional $L^2$-methodology. This paper goes one step further and develops  data analysis methodology for functional time series in the  space of all continuous functions. The work is motivated by the fact that objects with rather different shapes may still have a small $L^2$-distance and are therefore identified as similar when using an $L^2$-metric. However,  in applications it is often desirable to use metrics reflecting the visualaization of the curves in the statistical analysis. The methodological contributions are focused on developing  two-sample and change-point tests as well as confidence bands, as these procedures appear do be conducive to the proposed setting. Particular interest is put on relevant differences; that is, on not trying to test for exact equality, but rather for pre-specified deviations under the null hypothesis.
The procedures are justified through large-sample theory. To ensure practicability, non-standard bootstrap procedures are developed and investigated addressing particular features that arise  in the problem of testing relevant hypotheses. The finite sample properties are explored through a simulation study and an application to annual temperature profiles.

John Duchi, Stanford University

 

Title: Some statistical aspects of distributional robustness

Abstract:In this talk, I will give an overview of work that my students, colleagues, and I have been doing on distributional robustness, in which we assume that we observe data from a distribution that may not be the “true” distribution against we measure performance. I will discuss methods for solving distributionally robust problems, asymptotic approximations that allow one to regularize stochastic optimization problems precisely by the variance of the losses one suffers, and connections to empirical likelihood estimators. I will show how these types of robustness can protect against gross errors in future (unobserved) data. As this talk will also include preliminary and ongoing work, I hope to make connections to some hidden variable models.

 

 

 

Paul Fearnhead, Lancaster University

Title: Efficient algorithms for changepoint detection

Abstract: One approach to detecting multiple changepoints is by minimising an objective function — for example a function defined as the negative log-likelihood for a model for the data plus a penalty that is linear in the number of changepoints. In some cases, dynamic programming can be used to minimise such an objective function, with the standard dynamic programming approaches based on a recursion that conditions on the time of the most recent changepoint. I will show how it is possible to derive an alternative dynamic programming formulation for minimising this cost function  (first developed in Rigaill 2015). Whilst slightly more complicated to solve, this formulation has many advantages: Leading to algorithms that can be much faster (for example analysing a million data points to find changes in mean in less than a second), and algorithms that can exactly minimise the cost in situations where there are dependencies in parameters associated with different segments. In particular we will present the first efficient algorithm that can exactly solve a formulation of the change-in-slope problem.

 

Joint work with Guillem Rigaill, Robert Maidstone, Toby Hocking and Adam Letchford

 

 

 

Arnoldo Frigessi, University of Oslo

 

Title: Personalized computer simulation of breast cancer treatment: Scaling up a multiscale dynamic model informed by multi-source patient data

Abstract:Mathematical modelling and simulation have emerged as a potentially powerful, time- and cost effective approach to personalised cancer treatment. In order to predict the effect of a therapeutic regimen for an individual patient, it is necessary to initialize and to parametrize the model so to mirror exactly this patient’s tumor. I will present a comprehensive approach to model and simulate a breast tumor treated by two different chemotherapies in combination or not.  In the multiscale model we represent individual tumor and normal cells, with their cell cycle and others intracellular processes (depending on key molecular characteristics), the formation of blood vessels and their disruption, extracellular processes, as the diffusion of oxygen, drugs and important molecules (including VEGF which modulates vascular dynamics) . The model is informed by data estimated from routinely acquired measurements of the patient’s tumor, including histopathology, imaging, and molecular profiling. We implemented a computer system which simulates a cross-section of the tumor under a 12 weeks therapy regimen. We show how the model is able to reproduce patients from a clinical trial, both responders and not. We show by scenario simulation, that other drug regimens might have led to a different outcome. I will then discuss how we were able to scale the simulator so simulate larger portions of the tumor.

 

 

 

Azadeh Khaleghi, Lancaster University

Title: Approximations of the restless bandit problem

Abstract: The multi-armed restless bandit problem is studied in the case where the pay-off distributions are jointly stationary, ϕ-mixing. This version of the problem provides a more realistic model for most real-world applications, but cannot be optimally solved in practice. Our objective is to characterize a sub-class of the problem where good approximate solutions can be found using tractable approaches. Specifically, it is shown that under some conditions on the ϕ-mixing coefficients, a modified version of UCB can prove effective. The main challenge is that, unlike in the i.i.d. setting, the distributions of the sampled pay-offs may not have the same characteristics as those of the original bandit arms. In particular, ϕ-mixing property does not necessarily carry over. This is overcome by carefully controlling the effect of a sampling policy on the pay-off distributions. Some of the proof techniques developed in this paper can be more generally used in the context of online sampling under dependence. Proposed algorithms are accompanied by corresponding regret analysis. 

 

 

Rebecca Killick, Lancaster University

Title: Estimating periodicities in high frequency data
Abstract:  Determining the periodicity of seasonal components (if any) within a time series is an important initial step in analysing data.  Traditionally the seasonal scale is either known, e.g. quarterly or is estimated using peaks in a periodogram or dummy variables in a regression.  When estimating seasonality using the fourier periodogram we have to determine what are "significant" peaks.  There are several methods that attempt to automate peak determination but all have parameters that the user has to set to encode significance.  Additionally, in a world of increasing granularity of our observations, longer cycles are often undetected due to the conflation between the cycle and the trend components. 
We will present an alternative approach for automatically determining periodicity using wavelets.  Wavelets are an alternative basis functions to the Fourier sinusoids that give a decomposition of frequency bands over time (compared to the Fourier Transform which gives only a frequency decomposition).  Using the theoretical relationship between wavelet coefficients over these frequency bands we can determine the periodicity of a time series automatically whether it has an hourly, weekly, monthly or decadal season.  We will illustrate our methodology for automatic seasonality detection using simulated and economic data.

Claudia Kirch, Otto-von-Guericke-Universität Magdeburg

 

Title: Frequency domain likelihood approximations for time series bootstrapping and Bayesian nonparametrics

Abstract:A large class of time series methods are based on Fourier analysis, which can be considered a whitening of the data, giving rise for example to the famous Whittle likelihood. In particular, frequency domain bootstrap methods have been successfully applied in a large range of situations. In this talk, we will first review existing frequency domain bootstrap methodology for stationary before generalizing them for locally stationary time series. To this end, we first  introduce a moving Fourier transformation that captures the time-varying spectral density in a similar manner as the classical Fourier transform does for stationary time series. We obtain consistent estimators for the local spectral densities and show that the corresponding bootstrap time series correctly mimics the covariance behavior of the original time series. The approach is illustrated by means of some simulations.

All time series bootstrap methods are implicitely using a likelihood approximation, which could be used explicitely in a Bayesian nonparametric framework for time series. So far, only the Whittle likelihood has been used in this context to get a nonparametric Bayesian estimation of the spectral density of stationary time series. In a second part of this talk we generalize this approach based on the implicit likelihood from the autoregressive aided periodogram bootstrap introduced by Kreiss and Paparoditis (2003). This likelihood combines a parametric approximation with a nonparametric correction making it particularly attractive for Bayesian applications. Some theoretic results about this likelihood approximation including posterior consistency in the Gaussian case are given. The performance is illustrated in simulations and an application to  LIGO gravitational wave data.

George Michailidis, University of Michigan

Title: Change point problems for large scale problems

Abstract: In this talk, we address change point estimation in the following two settings that involve large data.

In the first setting, we consider an exceedingly long time series comprising of several million time points. We propose a two-stage estimation procedure, which in the 1st stage uses a sparse subsample to obtain pilot estimates of the underlying change points, and in the 2nd stage refines these estimates by sampling densely in appropriately defined neighborhoods around them. We establish that this method achieves the same rate of convergence and even virtually the same asymptotic distribution as the analysis of the full data, while reducing computational complexity to O(N^0.5) time (N being the length of data set), as opposed to at least O(N) time for all current procedures. The main results are established under a signal plus noise model with independent and identically distributed error terms, but extensions to dependent data settings, as well as multiple stage (>2) procedures are also provided. The procedure is illustrated on an Internet data set. 

 

In the second setting, we consider a very large random graph (comprising of several million edges), where the status of each edge evolves in time (present/absent) with its own probability parameter. At an unknown time point, the probability parameter changes for a certain unknown proportion of edges, and the graph keeps evolving as before. With such large graphs, the change-point in time can often be identified exactly with probability going to 1, but a statistical analysis of the behavior of the entire graph over the observed time domain (say, via least squares) can be computationally expensive. Under certain mild conditions, we develop an inference scheme that allows accurate detection of the change-point via randomly sampling an appropriate number of edges from the graph (depending on its intrinsic parameters which are  estimated); in particular, we provide an analytical formula that guarantees a pre-specified level of precision to the inference. We illustrate the proposed method on synthetic and real data sets.

 

The common theme is on the role of sampling procedures to address change point problems for very large data sets.

 

 

 

Brendan Murphy, University College Dublin

 

Title: Exploring the social media relationships of Irish politicians using a latent space stochastic blockmodel

Abstract: Dáil Éireann is the principal chamber of the Irish parliament. The 31st Dáil was in session from March 11th, 2011 to February 6th, 2016. Many of the members of the Dáil were active on social media and many were Twitter users who followed other members of the Dáil. The pattern of following amongst these politicians is of interest to see how these relationships relate to political alignment within the Dáil. We propose a new model, called the latent position stochastic block model, which extends and generalizes both latent space model and stochastic block model to study social media connections between members of the Dáil. The probability of an edge between two nodes in a network depends on their respective class labels as well as latent positions in an unobserved latent sapce. The proposed model is capable of representing transitivity, clustering, as well as disassortative mixing. A Bayesian method with Markov chain Monte Carlo sampling is proposed for estimation of model parameters. Model selection is performed using the WAIC criterion and models of different number of classes or dimensions of latent space can be compared. The model is applied to studying Twitter following relationships of members of the Dáil and interesting structures in these relationships are found. Extensions of the modeling approach to larger scale networks will also be discussed.

 

This work is in conjunction with Tin-Lok James Ng, Tyler McCormick, Bailey Fosdick,  and Ted Westling.

Jonas Peters, University of Copenhagen

 

Title: Invariances in ODE-based systems

Abstract: Recent methods in causal discovery are based on the principle of invariance. We apply this methodology to complex systems, in which variables follow a dynamical process that is governed by ordinary differential equations. We assume that we are given several noisy repetitions of the system, for each of which we observe measurements at different time points. We then aim to estimate the underlying structure of the system. In particular, we consider a target variable and try to infer which other variables influence the differential equation of the target. The procedure is based on the observation that the ODE of the target variable remains invariant under perturbations of the system at places other than the target variable.

Katharina Proksch, University of Göttingen

 

Title: Statistical inference for molecules

Abstract: Nowadays it is possible to image fluorescent probes on a sub-diffraction spatial resolution due to super-resolving scanning microscopes like STED or RESOLFT. This recent increase in resolution allows to visualize biochemical processes on a nanometer scale such that inference on the level of single molecules is now a reasonable goal in fluorescence microscopy. However, these new imaging techniques do not provide directly accessible information about the (spatial and numeral) distribution of molecules, but only about the total brightness distribution. In this talk, we discuss options to infer on the number and locations of molecules in a given sample based on a hybrid algorithm, which offers both the segmentation of an image as well as estimates of the local numbers of molecules, while it preserves uniform confidence about all statements. The proposed method allows for the construction of a novel, automatized statistical analysis tool for scanning microscopy via a molecular map, that is, a graphical presentation of locations as well as local numbers of molecules and corresponding uniform confidence statements. It is based on a sound statistical model, which connects both the local brightness and molecule distributions with the fact that a single molecule can emit only one photon at a time (antibunching). More precisely, our method is built on rigorous statistical convolution modeling of higher order photon coincidences and an approach on hot spot detection in heterogeneous data via multiscale scan statistics. We demonstrate the functionality of the molecular map by means of data examples from STED fluorescence microscopy.

 

joint work with Jan Keller, Axel Munk and Frank Werner

Aaditya Ramdas, UC Berkeley

Title: Uniform nonasymptotic confidence sequences, with applications to sequential testing and estimation

Abstract: We develop nonasymptotic, distribution-free confidence sequences holding over unbounded time horizon and achieving arbitrary precision. Our technique draws a connection between the Cramèr-Chernoff method, the law of the iterated logarithm (LIL), and the sequential probability ratio test, extending the first to produce time-uniform concentration bounds, characterizing the second in a nonasymptotic fashion, and generalizing the third to nonparametric settings, including subGaussian and Bernstein conditions, self-normalized processes, and matrix martingales. We tighten and substantially generalize several existing constructions of nonasymptotic iterated logarithm bounds, illustrating the generality of our proof techniques by deriving a novel upper LIL bound for the maximum eigenvalue of a sum of random matrices. Finally, we demonstrate the utility of our approach with applications to covariance matrix estimation and to estimation of sample average treatment effect under the Neyman-Rubin potential outcomes model.

 

Joint work with Steve Howard, Jas Sekhon, Jon McAuliffe

 

 

 

Guillem Rigaill, Institut National de la Recherche Agronomique

Title: Changepoint detection in the presence of outliers
Abstract: Many traditional methods for identifying changepoints can struggle in the presence of outliers, or when the noise is heavy-tailed. Often they will infer additional changepoints in order to fit the outliers. To overcome this problem, data often needs to be pre-processed to remove outliers, though this is difficult for applications where the data needs to be analysed online. We present an approach to changepoint detection that is robust to the presence of outliers. The idea is to adapt existing penalised cost approaches for detecting changes so that they use loss functions that are less sensitive to outliers. We argue that loss functions that are bounded, such as the classical biweight loss, are particularly suitable -- as we show that only bounded loss functions are robust to arbitrarily extreme outliers. We present an efficient dynamic programming algorithm that can find the optimal segmentation under our penalised cost criteria. Importantly, this algorithm can be used in settings where the data needs to be analysed online. We show that we can consistently estimate the number of changepoints, and accurately estimate their locations, using the biweight loss function. We demonstrate the usefulness of our approach for applications such as analysing well-log data, detecting copy number variation, and detecting tampering of wireless devices.

Joint work with Paul Fearnhead

 

 

 

Stéphane Robin, AgroParisTech

Title: Inference of adaptive shifts for multivariate correlated traits

Abstract: Comparative and evolutive ecologists are interested in the distribution of quantitative traits between related species. The classical framework for these distributions consists of a (multivariate) random process running along the branches of a phylogenetic tree relating the species. We are interested in the detection of change-points in the past history of extent species, typically changes of ecological niches. We model these change-points as shifts in the process parameters, as they reveal fast adaptation to ecological changes.

 

The Ornstein-Uhlenbeck (OU) process is sometimes preferred to the simple Brownian Motion (BM) as it models stabilizing selection toward an optimum. The optimum for each trait is likely to be changing over the long periods of time spanned by large modern phylogenies. Our goal is to automatically detect the position of these shifts on a phylogenetic tree, while accounting for correlations between traits, which might exist because of structural or evolutionary constraints. We also aim at providing an efficient inference algorithm capable of dealing with increasingly large sets of species and/or traits.

 

We first show that models with shifts are not identifiable in general. Constraining the models to be parsimonious in the number of shifts partially alleviates the problem but several evolutionary scenarios can still provide the same joint distribution for the extant species. In such cases, we are able to list such scenarios that can not be distinguished based on the available data.

 

To infer the parameter of the process and the shift locations, we introduce a simplification of the full multivariate OU model, named scalar OU (scOU), which allows for noncausal correlations and is still computationally tractable. We describe an Expectation Maximization (EM) algorithm that allows for a maximum likelihood estimation of the shift positions. We extend the equivalence between the OU and a BM on a re-scaled tree to the multivariate framework, making the M step fully explicit. We also derive a new model selection criterion, accounting for the identifiability issues for the shift localization on the tree. In the univariate case, we prove that this model selection procedure satisfies an oracle inequality.

 

The method, freely available as an R-package (PhylogeneticEM) is fast, and can deal with missing values. We demonstrate its efficiency and accuracy compared to another state-of-the-art method (l1ou) on a wide range of simulated scenarios, and use this new framework to re-analyze recently published datasets.

 

Paul Bastide, Cécile Ané, Stéphane Robin & Mahendra Mariadassou

 

References:

* Bastide, P., Mariadassou, M., & Robin, S. (2017). Detection of adaptive shifts on phylogenies by using shifted stochastic processes on a tree. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(4), 1067-1093.

* Bastide, P., Ané, C., Robin, S., & Mariadassou, M. (201). Inference of Adaptive Shifts for Multivariate Correlated Traits. Systematic Biology. https://doi.org/10.1093/sysbio/syy005.

 


 

Rajen Shah, University of Cambridge

Title: Conditional Independence Testing: Hardness, the Generalised Covariance Measure and Double Estimation Friendly Inference
Abstract: It is a common saying that testing for conditional independence, i.e., testing whether X is conditionally independent of Y, given Z, is a hard statistical problem if Z is a continuous random variable. We provide a formalisation of this result and show that a test with correct size does not have power against any alternative.

Given the non-existence of a uniformly valid conditional independence test, we argue that tests must be designed so their suitability for a particular problem setting may be judged easily. To address this need, we propose to nonlinearly regress X on Z, and Y on Z and then compute a test statistic based on the sample covariance between the residuals, which we call the generalised covariance measure (GCM). We prove that validity of this form of test relies almost entirely on the weak requirement that the regression procedures are able to estimate the conditional means X given Z, and Y given Z, at a slow rate. While our general procedure can be tailored to the setting at hand by combining it with any regression technique, we develop the theoretical guarantees for kernel ridge regression. A simulation study shows that the test based on GCM is competitive with state of the art conditional independence tests.

 

Finally we present a framework for testing whether X is conditionally independent of Y given Z where either the conditional distribution of X given Z or Y given Z follows a (high-dimensional) generalised linear model, but which model is correctly specified can be unknown to the user.

  

Weijie Su, University of Pennsylvania

Title: Statistical inference for online learning and stochastic approximation via hierarchical incremental gradient descent

Abstract:  Stochastic gradient descent (SGD) is an immensely popular approach for online learning in settings where data arrives in a stream or data sizes are very large. However, despite an ever-increasing volume of work on SGD, much less is known about the statistical inferential properties of SGD-based predictions. Taking a fully inferential viewpoint, this talk introduces a novel procedure termed HiGrad to conduct statistical inference for online learning, without incurring additional computational cost compared with SGD. The HiGrad procedure begins by performing SGD updates for a while and then splits the single thread into several threads, and this procedure hierarchically operates in this fashion along each thread. With predictions provided by multiple threads in place, a t-based confidence interval is constructed by decorrelating predictions using covariance structures given by the Ruppert–Polyak averaging scheme. Under certain regularity conditions, the HiGrad confidence interval is shown to attain asymptotically exact coverage probability. Finally, the performance of HiGrad is evaluated through extensive simulation studies and a real data example. An R package higrad has been developed to implement the method.

 

This is joint work with Yuancheng Zhu


Tengyao Wang, University of Cambridge

 

Title: High-dimensional changepoint estimation via sparse projection

Abstract: Changepoints are a very common feature of Big Data that arrive in the form of a data stream.  In this paper, we study high-dimensional time series in which, at certain time points, the mean structure changes in a sparse subset of the coordinates.  The challenge is to borrow strength across the coordinates in order to detect smaller changes than could be observed in any individual component series.  We propose a two-stage procedure called 'inspect' for estimation of the changepoints: first, we argue that a good projection direction can be obtained as the leading left singular vector of the matrix that solves a convex optimisation problem derived from the CUSUM transformation of the time series.  We then apply an existing univariate changepoint estimation algorithm to the projected series.  Our theory provides strong guarantees on both the number of estimated changepoints and the rates of convergence of their locations, and our numerical studies validate its highly competitive empirical performance for a wide range of data generating mechanisms.  Software implementing the methodology is available in the R package 'InspectChangepoint'.

 

 

 

Asaf Weinstein, Stanford University

 

Title: Compound decision with heteroscedastic data

Abstract: A compound decision problem arises when n distinct but similar statistical problems are to be solved simultaneously under a cumulative loss. The theory for compound decision problems, one of Herbert Robbins’s main contributions to statistics, makes the connection between such problems and a single Bayesian problem with an unspecified prior, thereby motivating asymptotically subminimax rules. This connection relies heavily on the symmetry between the n individual problems, which is broken when heteroscedastic errors (with known variances) are involved. In this talk I’ll argue that symmetry can—by change of viewpoint—be restored in the heteroscedastic case, which reveals again a compound decision problem and hints at an “adequate” empirical Bayes rule. The problem of estimating the mean of a heteroscedastic normal vector will be used as an illustration and analyzed. 

 

 

 

Min Xu, University of Pennsylvania
 

Title: Optimal rates for community estimation in the weighted stochastic block model

Abstract: Community identification in a network is an important problem in fields such as social science, neuroscience, and genetics. Over the past decade, stochastic block models (SBMs) have emerged as a popular statistical framework for this problem. However, SBMs have an important limitation in that they are suited only for networks with unweighted edges; disregarding the edge weights may result in a loss of valuable information in various scientific applications. We propose a weighted generalization of the SBM where observations comprise of a weighted adjacency matrix where the edges are generated from an SBM and where the weight of each edge is generated independently from one of two unknown probability densities depending on whether the edge is within-community or between-community. We characterize the optimal rate of misclustering error of the weighted SBM in terms of the Rényi divergence between the edge weight distributions of within-community and between-community edges, substantially generalizing existing results for unweighted SBMs. Furthermore, we present a computationally tractable algorithm that is adaptive to the unknown edge weight probability distributions in the sense that it achieves the same optimal error rate as if it had perfect knowledge of the probability distributions of the edge weights. 

 

 

Yi Yu, University of Bristol

 

Title: Optimal change point detection in dynamic networks

Abstract: We study the change point detection in dynamic networks.  For t = 1, …, T, we observe an adjacency matrix At  from an underlying sparse low-rank graphon model, and the underlying models are time-wise independent and piece-wise constant.  Our task is to recover with high accuracy the number and locations of the change points, which are assumed unknown.  In this talk, I will explore the theoretical limits in terms of the sparsity of networks, the dimension of the networks, the minimal spacing between consecutive change points and the magnitude of smallest changes.   

 

bottom of page