##### STATSCALE SEMINARS

##### PREVIOUS SEMINAR - 20th November 2020

Speaker: Florian Pein (University of Cambridge)

Title: About the loss function for cross-validation in change-point regression

Abstract: Cross-validation is a major tool in non-parametric regression, in high-dimensional regression and in machine learning for model selection, for tuning parameter selection and for accessing estimation accuracy. Contrarily, in change-point regression cross-validation was not used

much. A main reason is the large interest in estimating the number of change-point accurately, but cross-validations is focusing on minimizing the prediction error. Thus, it is widely believed that cross-validation has a tendency to overestimate the number of change-points. However, recently Zou et al. (2020) have showed that the cross-validation procedure COPPS is estimating the number of change-points consistently under certain assumptions. In this work, we show that cross-validation using L2-loss can be problematic. It has not only a tendency to overestimate the number of change-points in some examples, it also underestimates the number of change-points in other examples. Consequently, even L2-consistency cannot be guaranteed. Those flaws can be explained by the fact that we have no information to identify where a change-point is between two observations and hence out-of-sample prediction errors can be large around change-points. We will discuss these points theoretically and in simulated examples. We will then propose a modified cross-validation criterion for which consistent estimation of the change-points can be showed again. Moreover, we will argue and verify by simulations that cross-validation using L1-loss can be good alternative.

This is joint work with Rajen Shah.