Access the full text.
Sign up today, get DeepDyve free for 14 days.
HanQin Cai, Tengyao Wang (2021)
Estimation of high-dimensional change-points under a group sparsity structureElectronic Journal of Statistics
J. Gosmann, Christina Stoehr, H. Dette (2020)
Sequential change point detection in high dimensional time seriesElectronic Journal of Statistics
Haoyang Liu, Chao Gao, R. Samworth (2019)
Minimax rates in sparse, high-dimensional change point detectionThe Annals of Statistics
M. Jirak (2015)
Uniform change point tests in high dimensionAnnals of Statistics, 43
F. Enikeeva, Zaïd Harchaoui (2013)
High-dimensional change-point detection with sparse alternativesarXiv: Statistics Theory
M. Csorgo, Lajos Horváth (1997)
Limit Theorems in Change-Point Analysis
A. Kaul, S. Fotopoulos, V. Jandhyala, Abolfazl Safikhani (2021)
Inference on the change point under a high dimensional sparse mean shiftElectronic Journal of Statistics, 15
H. Dette (2019)
Online Supplement to: A likelihood ratio approach to sequential change point detection for a general class of parameters
Bertille Follain, Tengyao Wang, R. Samworth (2021)
High‐dimensional changepoint estimation with heterogeneous missingnessJournal of the Royal Statistical Society: Series B (Statistical Methodology), 84
Oscar Padilla, Yi Yu, Daren Wang, A. Rinaldo (2019)
Optimal Nonparametric Multivariate Change Point Detection and LocalizationIEEE Transactions on Information Theory, 68
Yi Yu, Jelena Bradic, R. Samworth (2018)
Confidence intervals for high-dimensional Cox modelsStatistica Sinica
Adel Javanmard, A. Montanari (2013)
Confidence intervals and hypothesis testing for high-dimensional regressionArXiv, abs/1306.3171
D. Siegmund (1986)
Boundary Crossing Probabilities and Statistical ApplicationsAnnals of Statistics, 14
(2020)
quantmod: quantitative financial modelling framework. R package, available at https: //cran.r-project.org/package=quantmod
(2007)
Concentration inequalities and model selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII-2003
Yudong Chen, Tengyao Wang, R. Samworth (2020)
High‐dimensional, multiscale online changepoint detectionJournal of the Royal Statistical Society: Series B (Statistical Methodology), 84
A. Duncan (1975)
Quality Control and Industrial StatisticsApplied statistics, 24
A. Tartakovsky, I. Nikiforov, M. Basseville (2014)
Sequential Analysis: Hypothesis Testing and Changepoint Detection
Cun-Hui Zhang, Shenmin Zhang (2011)
Confidence intervals for low dimensional parameters in high dimensional linear modelsJournal of the Royal Statistical Society: Series B (Statistical Methodology), 76
C. Kirch, Christina Stoehr (2019)
Sequential change point tests based on U‐statisticsScandinavian Journal of Statistics, 49
S. Geer, Peter Buhlmann, Y. Ritov, Ruben Dezeure (2013)
On asymptotically optimal confidence regions and tests for high-dimensional modelsAnnals of Statistics, 42
Bin Yu (1997)
Assouad, Fano, and Le Cam
Haeran Cho (2016)
Change-point detection in panel data via double CUSUM statisticElectronic Journal of Statistics, 10
Yi Yu, Oscar Padilla, Daren Wang, A. Rinaldo (2021)
Optimal network online change point localisationArXiv, abs/2101.05477
Haeran Cho, P. Fryzlewicz (2015)
Multiple‐change‐point detection for high dimensional time series via sparsified binary segmentationJournal of the Royal Statistical Society: Series B (Statistical Methodology), 77
2020) quantmod: quantitative financial modelling framework. R package
Asymptotically optimal
Lajos Horváth, Gregory Rice (2014)
Extensions of some classical methods in change point analysisTEST, 23
Angela Hausman, W. Johnston (2014)
Timeline of a financial crisis: Introduction to the special issueJournal of Business Research, 67
H. Chan, G. Walther (2015)
Optimal detection of multi-sample aligned sparse signalsAnnals of Statistics, 43
Malte Londschien, S. Kovács, P. Bühlmann (2019)
Change-Point Detection for Graphical Models in the Presence of Missing ValuesJournal of Computational and Graphical Statistics, 30
N. Fisher, P. Sen (1994)
Probability Inequalities for Sums of Bounded Random Variables
A. Rinaldo, Daren Wang, Qin Wen, R. Willett, Yi Yu (2020)
Localizing Changes in High-Dimensional Regression Models
Tengyao Wang, R. Samworth (2016)
High dimensional change point estimation via sparse projectionJournal of the Royal Statistical Society: Series B (Statistical Methodology), 80
Yong Soh, V. Chandrasekaran
Applied and Computational Harmonic Analysis High-dimensional Change-point Estimation: Combining Filtering with Convex Optimization
Jana Janková, S. Geer (2014)
Confidence intervals for high-dimensional inverse covariance estimationElectronic Journal of Statistics, 9
G. Young (2020)
High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9International Statistical Review
E. Page (1954)
CONTINUOUS INSPECTION SCHEMESBiometrika, 41
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2023, VOL. 00, NO. 0, 1–12: Theory and Methods https://doi.org/10.1080/01621459.2023.2199962 a,b b a Yudong Chen ,Tengyao Wang ,and RichardJ.Samworth a b Statistical Laboratory, University of Cambridge, Cambridge, UK; London School of Economics and Political Science, London, UK ABSTRACT ARTICLE HISTORY We introduce and study two new inferential challenges associated with the sequential detection of change Received November 2021 Accepted April 2023 in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices of coordinates in which the mean changes. We propose an online algorithm KEYWORDS that produces an interval with guaranteed nominal coverage, and whose length is, with high probability, Confidence interval; of the same order as the average detection delay, up to a logarithmic factor. The corresponding support Sequential method; Sparsity; estimate enjoys control of both false negatives and false positives. Simulations confirm the eeff ctiveness of Support estimate our methodology, and we also illustrate its applicability on the U.S. excess deaths data from 2017 to 2020. The supplementary material, which contains the proofs of our theoretical results, is available online. 1. Introduction here hasbeenonrecentdevelopments, includingfinite-sample results in multivariate and high-dimensional settings, we also The real-time monitoring of evolving processes has become a mention that changepoint analysis has a long history (e.g., Page characteristic feature of 21st century life. Watches and defibril- 1954). Entry points to this classical literature include Csörgoa ˝ nd lators track health data, Covid-19 case numbers are reported Horváth (1997) and Horváth and Rice (2014). For univariate on a daily basis and financial decisions are made continuously data, sequential changepoint detection is also studied under the based on the latest market movements. Given that changes in banner of statistical process control (Duncan 1952;Tartakovsky, the dynamics of such processes are frequently of great interest, Nikiforov, and Basseville 2014). In thefieldofhigh-dimensional it is unsurprising that the area of changepoint detection has statistical inference more generally, uncertainty quantification undergone a renaissance over the last 5–10 years. has become a major theme over the last decade, originating with One of the features of modern datasets that has driven much influential work on the debiased Lasso in (generalized) linear of the recent research in changepoint analysis is high dimen- models (Javanmard and Montanari 2014; van de Geer et al. 2014; sionality, where we monitor many processes simultaneously, and Zhang and Zhang 2014), and subsequently developed in other seek to borrow strength across the different series to identify settings (e.g., Janková and van de Geer 2015;Yu, Bradic,and changepoints. The nature of series that are tracked in appli- Samworth 2021). cations, as well as the desire to evade to the greatest extent The aim of this article is to propose methods to address possible the curse of dimensionality, means that it is commonly two new inferential challenges associated with the high- assumed that the signal of interest is relatively sparse, in the sense dimensional, sequential detection of a sparse change in mean. that only a small proportion of the constituent series undergo The rfi st is to provide a confidence interval for the location a change. Furthermore, the large majority of these works have of the changepoint, while the second is to estimate the signal focused on the retrospective (or offline ) challenges of detecting set of indices of coordinates that undergo the change. Despite and estimating changes aer ft seeing all of the available data (e.g., the importance of uncertainty quanticfi ation and signal support Chan and Walther 2015;Cho andFryzlewicz 2015; Jirak 2015; recovery in changepoint applications, neither of these prob- Cho 2016; Soh and Chandrasekaran 2017;Wangand Samworth lems has previously been studied in the multivariate sequential 2018;Enikeevaand Harchaoui 2019; Padilla et al. 2021;Kaul changepoint detection literature, to the best of our knowledge. et al. 2021;Liu,Gao,and Samworth 2021; Londschien, Kovács, Of course, one option here would be to apply an offline con-fi and Bühlmann 2021; Rinaldo et al. 2021; Follain, Wang, and dence interval construction (e.g., Kaul et al. 2021)aeft rasequen- Samworth 2022). Nevertheless, the related problem where one tial procedure has declared a change. However, this would be observes data sequentially and seeks to declare changes as soon to ignore the essential challenge of the sequential nature of the as possible aeft r they have occurred,isnowadaysreceiving problem, whereby one wishes to avoid storing all historical data, increasing attention (e.,g. Kirch and Stoehr 2019;Dette and to enable inferencetobecarried outinan online manner. By this Gösmann 2020;Gösmann et al. 2020;Chen, Wang,and Sam- we mean that the computational complexity for processing each worth 2022;Yuetal. 2021). Although the focus of our review newobservation,aswellasthe storagerequirements, should CONTACT Richard J. Samworth r.samworth@statslab.cam.ac.uk Statistical Laboratory, University of Cambridge, Wilberforce Road, Cambridge, CB3 0WB, UK. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA. © 2023 The Author(s). Published with license by Taylor & Francis Group, LLC. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent. 2 Y. CHEN, T. WANG, AND R. J. SAMWORTH depend only on the number of bits needed to represent the parameters. Theorem 2 provides a corresponding guarantee on new data point observed. The online requirement turns out to the length of the interval. In Section 3.3,weshowthatbyusing impose severe restrictions on the class of algorithms available ocd as the base procedure, the aforementioned patience and to the practitioner, and lies at the heart of the difficulty of the detection delay condition is indeed satisfied. As a result, the problem. output confidence interval has guaranteed nominal coverage To give a brief outline of our construction of a cond fi ence and the length of the interval is of the same order as the average interval with guaranteed (1 − α)-level coverage, consider for detection delay for the base ocd procedure, up to a poly- simplicity the univariate setting, where (X ) form a sequence n n∈N logarithmic factor. This is remarkable in view of the intrinsic iid challenge that the better such a changepoint detection procedure of independent random variables with X , ... , X ∼ N (0, 1) 1 z performs, the fewer post-change observations are available for iid and X , X , ... ∼ N (θ,1). Without loss of generality, we z+1 z+2 inferential tasks. assume that θ> 0. Suppose that θ is knowntobeatleast b > 0 Averyusefulbyproduct of our ocd_CI methodology is that and, for n ∈ N, define residual tail lengths we obtain a natural estimate of the set of signal coordinates n (i.e., those that undergo change). In Theorem 3,weprove that, t := argmax (X − b/2).(1) with high probability, it is able both to recover the eeff ctive n,b i 0≤h≤n support of the signal (see Section 3.1 for a formal den fi ition), i=n−h+1 andtoavoid noisecoordinates.Wethenbroaden thescope In the case of a tie, we choose the smallest h achieving the of applicability of our methodology in Section 3.4 by relaxing maximum. Since (X − b/2) canbeviewedasthe i=n−h+1 our distributional assumptions to deal with sub-Gaussian or likelihood ratio statistic for testing the null of N (0, 1) against sub-exponential data. Finally, in Section 3.5,weintroduce a the alternative of N (b,1) using X , ... , X ,the quantity n−h+1 n modicfi ation of our algorithm that permits an arbitrarily loose t is the tail length for which the likelihood ratio statistic is n,b lower bound β> 0 on the Euclidean norm of the vector of maximized. If N is the stopping time defining a good sequential mean change to be employed, with only a logarithmic increase changepoint detection procedure, then, intuitively, N − t N,b in the cond fi ence interval length guarantee and the computa- should be close to the true changepoint location z, and almost tional cost. pivotal. This motivates the construction of a cond fi ence interval An attraction of our theoretical results is that we are able to of the form max N −t −g(α, b),0 , N , where we control the N,b handle arbitrary spatial (cross-sectional) dependence between tail probability of the distribution of N − t to choose g(α, b) N,b the different coordinates of our data stream. On the other hand, so as to ensure the desired coverage. In the multivariate case, two limitations of our analysis for practical use are that real considerable care is required to handle the post-selection nature data may exhibit both heavier than sub-exponential tails and of the inferential problem, as well as to determine an appropriate temporal dependence. While a full theoretical analysis of the left endpoint for the cond fi ence interval. For this latter purpose, ocd_CI algorithm in these contexts appears to be challenging, we only assume a lower bound on the Euclidean norm of the we have made some practical suggestions regarding these issues vector of mean change, and employ a delicate multivariate and in Sections 3.4 and 4.4,respectively. multiscale aggregation scheme; see Section 2 for details, as well Section 4 is devoted to a study of the numerical perfor- as Section 3.5 for further discussion. mance of our methodological proposals. Our simulations con- The procedure for the inference tasks discussed above, firm that the ocd_CI methodology (with the ocd base proce- which we call ocd_CI (short for online changepoint detection dure) attains the desired coverage level across a wide range of Confidence Intervals), can be run in conjunction with any base parameter settings, that the average confidence interval length sequential changepoint detection procedure. However, we rec- is of comparable order to the average detection delay and that ommend using the ocd algorithm introduced by Chen, Wang, our support recovery guarantees are validated empirically. We and Samworth (2022), or its variant ocd , which provides guar- further demonstrate the way in which naive application of offline antees on both the average and worst-case detection delays, methods may lead to poor performance in this problem. More- subjecttoaguaranteeonthe patience, or average false alarm over, in Section 4.4, we apply our methods to excess death data rate under the null hypothesis of no change. Crucially, these are from the Covid-19 pandemic in the United States. Proofs, aux- both online algorithms. The corresponding inferential proce- iliary results, extensions to sub-Gaussian and sub-exponential dures inherit this same online property, thereby making them settings and additional simulation results are provided in the applicable even in very high-dimensional settings and where supplementary material. changesmay be rare,sowemay need to seemanynew data We conclude this introduction with some notation used points before declaring a change. throughout the article. We write N for the set of all nonnegative In Section 3 we study the theoretical performance of the integers. For d ∈ N,wewrite [d] :={1, ... , d}.Given a, b ∈ R, ocd_CI procedure. In particular, we prove in Theorem 1 that, we denote a ∨ b := max(a, b) and a ∧ b := min(a, b).For whenever the base sequential detection procedure satisfies a aset S,weuse 1 and |S| to denote its indicator function and patience and detection delay condition, the confidence interval cardinality, respectively. For a real-valued function f on a totally hasatleast nominalcoveragefor asuitablechoiceofinput ordered set S,wewrite sargmax f (x) := min argmax f (x) x∈S x∈S and largmax f (x) := max argmax f (x) for the smallest x∈S x∈S Here, we ignore the errors in rounding real numbers to machine preci- and largest maximizers of f in S, and define sargmin f (x) and x∈S sion; thus, we do not distinguish between continuous random variables 1 M largmin f (x) analogously. For a vector v = v , ... , v ∈ and quantized versions where the data have been rounded to machine x∈S precision. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 3 1/2 M M ˆ ˆ ˆ M i 2 j,j j,j j 1/2 R , we define v := 1 i , v := (v ) 0 2 {v =0} the quantity E := A /(t ∨ 1) should be centered close i=1 i=1 ˆ ˆ ˆ N,b N,b N,b and v := max |v |. In addition, for j ∈[M], we define ∞ i∈[M] ˆ j 1/2 ˆ ˆ 1/2 to θ (t ) for j ∈[p]\{j}, with variance close to 1. Indeed, if j, −j i 2 i,j d ×d 1 2 v := (v ) .For amatrix A = (A ) ∈ R N,b i:i =j ˆ ˆ j j ·,j 1,j d ,j d ˆ 1 1 b, N and t were xfi ed, and if 0 < t ≤ N −z, then the former and j ∈[d ],wewrite A := A , ... , A ∈ R ˆ ˆ N,b N,b −j,j 1,j j−1,j j+1,j d ,j d −1 1 1 quantity wouldhaveunitvariancearoundthiscentering value. and A := A , ... , A , A ... , A ∈ R .We The random nature of these quantities, however, introduces a use (·), (·) and φ(·) to denote the distribution function, post-selection inference aspect to the problem. Nevertheless, by survivor function and density function of the standard normal choosing an appropriate threshold value d > 0, we can ensure distribution, respectively. For two real-valued random variables that with high probability, when j = j is a noise coordinate, we U and V,wewrite U ≥ V or V ≤ U if P(U ≤ x) ≤ P(V ≤ x) st st j,j for all x ∈ R. We adopt conventions that an empty sum is 0 and ˆ have |E | < d ,and when j = j is a coordinate with sufficiently N,b that min ∅ :=∞,max ∅ :=−∞. j j large signal, there exists a signed scale b ∈ (B∪B )∩[−|θ |, |θ |], ˆ ˆ j,j j j 1/2 having thesamesignas θ ,for which E −|b|(t ) ≥ d . ˆ ˆ N,b N,b 2. Confidence Interval Construction and Support In fact, such a signed scale, if it exists, can always be chosen to Estimation Methodology be from B . As a convenient byproduct, the set of indices j for which the latter inequality holds, which we denote as S,forms In the multivariate sequential changepoint detection problem, a natural estimate of the set of coordinates in which the mean we observe p-variate observations X , X , ... in turn, and seek 1 2 change is large. to report a stopping time N by which we believe a change has For each j ∈ S, there exists a largest scale b ∈ (B ∪ occurred. Here and throughout, a stopping time is understood to ˆ ˆ j,j j be with respect to the natural filtration, so that the event {N = n} 1/2 B ) ∩ (0, ∞) for which E − b(t ) ≥ d . We denote 0 1 ˆ ˆ N,b N,b belongs to the σ -algebra generated by X , ... , X .The focusof 1 n the signed version of this quantity, where the sign is chosen to this work is on changes in the mean of the underlying process, j,j agree with that of E ,by b ; this can be regarded as a shrunken andwedenotethe time of thechangepoint by z.Moreover, since N,b our primary interest is in high-dimensional settings, we will also j estimate of θ ,soplays theroleofthe lowerbound b from the seek to exploit sparsity in the vector of mean change. Given univariate problem discussed in the introduction. Finally, then, α ∈ (0, 1), then, our primary goal is to construct a confidence our cond fi ence interval is constructed as the intersection over interval C ≡ C(X , ... , X , α) with the property that z ∈ C with 1 N indices j ∈ S of the cond fi ence interval from the univariate probability at least 1 − α. problem in coordinate j,withsignedscale b . For i ∈ N and j ∈[p],let X denote the jth coordinate of X . i As adevicetofacilitateour theory,the ocd_CI algorithm The ocd_CI algorithm relies on a lower bound β> 0for the allows the practitioner the possibility of observing a further -norm of the vector of mean change, sets of signed scales B observations aer ft the time of changepoint declaration, and B defined in terms of β and a base sequential changepoint before constructing the confidence interval. The additional detection procedure CP. As CP processes each new data vector, ˆ observations are used to determine the anchor coordinate j and ˆ ˆ we update thematrixofresidualtaillengths (t ) scale b,aswellasthe estimatedsupport S and the estimated j∈[p],b∈B∪B n,b j j j n ˜ ˆ scale b for each j ∈ S. Thus, the extra sampling is used to guard with t := sargmax (X − b/2),aswellas 0≤h≤n i=n−h+1 i n,b against an unusually early changepoint declaration that leaves ·,j corresponding tail partial sum vectors A ,where n,b j∈[p],b∈B∪B very few post-change observations for inference. Nevertheless, j ,j j A := X . we will see in Theorem 1 that the output cond fi ence interval n,b i i=n−t +1 n,b has guaranteed nominal coverage even with = 0, so that After the base procedure CP declares a change at a stopping additional observations are only used to control the length of time N, we identify an “anchor” coordinate j ∈[p] and a signed the interval. In fact, even for this latter aspect, the numerical anchor scale b ∈ B,where evidence presented in Section 4 indicates that = 0 provides j ,j 2 confidence intervals of reasonable length in practice. Similarly, N,b ˆ ˆ Theorem 3 ensures that with high probability, our support (j, b) := argmax 1 . j ,j j |A |≥a t ∨1 N,b N,b ˆ (j,b)∈[p]×B t ∨ 1 estimate S contains no noise coordinates (i.e., has false positive j ∈[p]\{j} N,b control) even with = 0, so that the extra sampling is only used The intuition is that the anchor coordinate and signed anchor to provide false negative control. scale are chosen so that the n fi al t observations provide the Pseudo-code for this ocd_CI confidence interval construc- N,b tion is given in Algorithm 1, where we suppress the n depen- best evidence among all of the residual tail lengths against the ˆ dence on quantities that are updated at each time step. The ·,j null hypothesis of no change. Meanwhile, A aggregates the ˆ computational complexity per new observation, as well as the N,b storage requirements, of this algorithm is equal to the sum of last t observations in each coordinate, providing a measure N,b the corresponding quantities for the CP base procedure and of the strength of this evidence against the null. O p log(ep) regardless of the observation history. Thus, the The main idea of our confidence interval construction is to ocd_CI method inherits the property of being an online algo- seek to identify coordinates with large post-change signal. To rithm, as discussed in the introduction, from any online CP base procedure. this end, observe when t is not too much larger than N − z, N,b 4 Y. CHEN, T. WANG, AND R. J. SAMWORTH A natural choice for the base online changepoint detection N (θ, ).Welet ϑ := θ ,and write P for probabili- p 2 z,θ, procedure CP is the ocd algorithm, or its variant ocd ,intro- ties computed under this model, though in places we omit the duced by Chen, Wang, and Samworth (2022). Both are online subscripts for brevity. Define the effective sparsity of θ, denoted 0 1 log (p) algorithms, with computational complexity per new observa- s(θ ),tobethe smallest s ∈ 2 ,2 , ... ,2 such that the tion and storage requirements of O p log(ep) .The ocd base corresponding effective support S(θ ) := j ∈[p] : |θ |≥ procedure is considered for the theoretical analysis in Section 3 θ / s log (2p) has cardinality at least s(θ ).Thus, thesum of due to its known patience and detection delay guarantees, while squares of coordinates in the effective support of θ has the same we prefer ocd for numerical studies and practical use. For the order of magnitude as θ , up to logarithmic factors. Moreover, reader’s convenience, the ocd and ocd algorithms are provided if at most s components of θ are nonzero, then s(θ ) ≤ s,and the as Algorithms S1 and S2, respectively in Section S3 of the sup- equality is attained when, for example, all nonzero coordinates plementary materials. have thesamemagnitude. For r > 0 and an online changepoint detection procedure CP characterized by an extended stopping time N, we define Algorithm 1: Pseudo-code for the confidence interval g(r; N) := sup P (N > z + r).(2) construction algorithm ocd_CI z,θ, z∈N Input: X , X , ... ∈ R observed sequentially, β> 0, 1 2 a ≥ 0, an online changepoint detection procedure CP, d , d > 0and ∈ N 3.1. Coverage Probability and Length of the Confidence 1 2 0 Interval Set: b = √ , B ={±b }, min 0 min log (2p) 2 log (2p) m/2 The following theorem shows that the confidence interval con- B = ±2 b : m = 1, ... , log (2p) , n = 0, min p×p p structed in the ocd_CI algorithm has the desired coverage A = 0 ∈ R and t = 0 ∈ R for all b ∈ B ∪ B b b 0 level whenever the base online changepoint detection procedure repeat n ← n + 1 satisfies a patience and detection delay condition. observe new data vector X and update CP with X n n Theorem 1. Let p ≥ 2and xfi α ∈ (0, 1).Suppose that ϑ ≥ β> for (j, b) ∈[p]× (B ∪ B ) do j j 0. Let CP be an online changepoint procedure characterized by t ← t + 1 b b an extended stopping time N satisfying ·,j ·,j A ← A + X b b j,j j 2 3 2 2 −rβ /(8s log (2p)) if bA − b t /2 ≤ 0 then 2 P (N ≤ z) + g(r; N) + 4rp log (4p)e ≤ α b b z,θ, j ·,j t ← 0and A ← 0 b b (3) for some r ≥ 1. Then, with inputs (X ) , β> 0, a ≥ 0, CP, t t∈N until CP declares a change; 5rβ d = , d = 4d and ≥ 0, the output confidence 1 2 Observe new data vectors X , ... , X 1 n+1 n+ 9s log (2p) j ,j j n+ interval C from Algorithm 1 satisfies j ,j A + X b i=n+1 i Set E ← for j , j ∈[p], b ∈ B ∪ B (t +)∨1 b P (z ∈ C) ≥ 1 − α. z,θ, j j ,j Compute Q ← (E ) 1 for j∈[p], j ,j j ∈[p]\{j} b b {|E |≥a} b As mentioned in Section 2, our coverage guarantee in The- b ∈B orem 1 holds even with = 0, that is, with no additional (j, b) ← argmax Q (j,b)∈[p]×B b sampling. Condition (3) places a joint assumption on the base ˆ ˆ j,j j 1/2 ˆ ˆ changepoint procedure CP and the parameter r,the latter of S ← j ∈[p]\{j} : E − b (t + ) ≥ d min 1 ˆ ˆ b b which appears in the inputs d and d of Algorithm 1.The rfi st 1 2 for j ∈ S do term on thele-ft handsideof ( 3) is the false alarm rate of the j,j b ← sgn E max b ∈ (B ∪ B ) ∩ (0, ∞) : stopping time N associated with CP. The second term can be interpreted as an upper bound on the probability of the detection ˆ ˆ j,j j 1/2 E − b(t + ) ≥ d ˆ ˆ delay of N being larger than r, and in addition we also need b b r to be at least of order s/β up to logarithmic factors for the Output: Confidence interval d third term to be sufficiently small. See Section 3.3 for further C = max n − min t + ,0 , n j j j∈S ˜ ˜ 2 b (b ) discussion, where in particular we provide a choice of r for which (3) holds with the ocd base procedure. We now provide a guarantee on the length of the ocd_CI confidence interval. 3. Theoretical Analysis Theorem 2. Fix α ∈ (0, 1). Assume that θ has an eeff ctive sparsity of s := s(θ ) ≥ 2andthat ϑ ≥ β> 0. let CP be an online Throughout this section, we will assume that the sequential changepoint detection procedure characterized by an extended observations X , X , ... are independent, and that for some 1 2 stopping time N that satisfies ( 3)for some r ≥ 1. Then there p×p unknown covariance matrix ∈ R whose diagonal entries exists a universal constant C > 0 such that, with inputs (X ) , t t∈N 1 p are all equal to 1, there exist z ∈ N and θ = (θ , ... , θ ) 5rβ β> 0, a = C log(rp/α),CP, d = , d = 4d , 1 2 0for which X , ... , X ∼ N (0, ) and X , X , ... ∼ 1 1 z p z+1 z+2 9s log (2p) 2 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 5 ≥ 80r,the length L of the output cond fi ence interval C satisfies Thus, given θ ∈ R and z ∈ N ,wewrite P for a probability 0 z,θ measure under which (X ) are independent with X ∼ n n∈N n P (L > 8r) ≤ α. z,θ, N (θ1 , I ).For r > 0and m ∈[p]∪{0},write p {n>z} p As mentioned following Theorem 1,wecan take r to be √ p j := θ ∈ R : |{j ∈[p] : |θ |≤ 1/(8 r)}| ≥ m . r,m the maximum of an appropriate quantile of the detection delay distribution of CP and a quantity that is of order s/β up to Den fi e T to be the set of stopping times with respect to the logarithmic factors. The main conclusion of Theorem 2 is that, natural filtration (F ) ,and set n n∈N with high probability, the length of the confidence interval is then of this same order r. Whenever the quantile of the detection T := N ∈ T:sup P (N > z + r) ≤ . delay distribution achieves the maximum above—which is the r,m z,θ z∈N∪{0},θ ∈ r,m case, up to logarithmic factors, for the ocd base procedure (see Proposition 5)—we can conclude that with high probability, the [p] Write 2 for the power set of [p], equipped with the symmetric length of the ocd_CI confidence interval is of the same order as difference metric d : (A, B) →|(A \ B) ∪ (B \ A)|.For any this detection delay quantile (which is the best one could hope stopping time N, denote for).Notethatthe choicesofinputsin Theorem 2 are identical to those in Theorem 1, except that we now ask for order r additional p ∞ [p] J :={ψ : (R ) → 2 : ψ is F -measurable}, N N observations aer ft the changepoint declaration. wherewerecallthat ψ is said to be F -measurable if for any [p] −1 A ∈ 2 and n ∈ N ,wehavethat ψ (A) ∩{N = n} is F - 3.2. Support Recovery 0 n measurable. Recall the den fi ition of S(θ ) from the beginning of this section, and denote S (θ ) := j ∈[p] : |θ |≥ b ,where b , β min min Proposition 4. For r > 0and m ≥ 15, we have defined in Algorithm 1, is the smallest positive scale in B ∪ B , We will suppress the dependence on θ of both these quantities in inf inf sup E d ψ(X , X , ...),supp(θ ) ≥ . z,θ 1 2 N∈T ψ ∈J 32 this section. Theorem 3 provides a support recovery guarantee r,m N z∈N ,θ ∈ 0 r,m ˆ ˆ for S, defined in Algorithm 1.Since neither S nor the anchor This proposition considers any support estimation algorithm coordinate j defined in the algorithm depend on d ,weomit obtained from a stopping time in T , and we note that such r,m its specification; the choices of other tuning parameters mimic a competing procedure is even allowed to store all data up to those in Theorems 1 and 2. this stopping time, in contrast to our online algorithm. This result can be interpreted as an optimality guarantee for the Theorem 3. Let p ≥ 2and xfi α ∈ (0, 1).Suppose that ϑ ≥ support recovery property of the ocd_CI algorithm presented β> 0. Let CP be an online changepoint detection procedure in Theorem 3(b), provided that the base procedure N belongs to characterized by an extended stopping time N that satisfies ( 3) the class T ,and that N and r satisfy (3). See Section 3.3 for for some r ≥ 1. r,m further discussion. (a) Then, with inputs (X ) , β> 0, a ≥ 0, CP, d = t t∈N 1 5rβ , ≥ 0, we have 9s log (2p) 3.3. Using ocd as the Base Procedure P (S ⊆ S ) ≥ 1 − α. z,θ, β In this section, we provide a value of r that suffices for condi- tion (3)toholdwhenwetakeour base proceduretobe ocd . (b) Now assume that θ has an eeff ctive sparsity of s := s(θ ) ≥ 2. For the convenience of the reader, this algorithm is presented Then there exists a universal constant C > 0suchthat, with 2 as Algorithm S2 in Section S3 of the supplementary materials, 5rβ inputs (X ) , β> 0, a = C log(rp/α),CP, d = , t t∈N 1 9s log (2p) where we also provide interpretation to the input parameters a ˜, ≥ 80r,wehave diag off T and T . P (S ∪{j}⊇ S) ≥ 1 − α. z,θ, Proposition 5. Fix α ∈ (0, 1) and γ> 0. Assume that θ has an eeff ctive sparsity of s := s(θ ) ≥ 2, that ϑ ≥ β> Note that S ⊆ S ⊆{j ∈[p] : θ = 0}. Thus, part (a) 0and that z ≤ 2αγ.Thenwithinputs (X ) , β> 0, t t∈N of the theorem reveals that with high probability, our support 2 diag a ˜ = 2log{16p γ log (2p)}, T = log{16pγ log (4p)} and ˆ 2 2 estimate S does not contain any noise coordinates. Part (b) offers off T = 8log{16pγ log (2p)} in the ocd procedure, there exists a complementary guarantee on the inclusion of all “big” signal a universal constant C > 0suchthatfor all coordinates, provided we augment our support estimate with the −1 −2 anchor coordinate j. See also the further discussion of this result C s log (2p) log{pγα (β ∨ 1)} r ≥ + 2 =: r,(4) following Proposition 4 and in Section 3.3. 1 We now turn our attention to the optimality of our support recovery algorithm, by establishing a complementary minimax the output N satisfies ( 3). lower bound on the performance of any support estimator. In fact, we can already establish this optimality by restricting By combining Proposition 5 with Theorems 1–3,respectively, the cross-sectional covariance matrix to be the identity matrix. we immediately arrive at the following corollary. 6 Y. CHEN, T. WANG, AND R. J. SAMWORTH Corollary 6. Fix α ∈ (0, 1), γ> 0. Assume that θ has an all pre-change observations to be identically distributed, and effective sparsity of s := s(θ ) ≥ 2, that ϑ ≥ β> 0and that likewise the post-change observations need not all have the same z ≤ 2αγ.Let (X ) , β> 0, a ˜ = 2log{16p γ log (2p)}, distribution. We assume that the base changepoint procedure, t t∈N diag off characterized by an extended stopping time N, satisfies a slightly T = log{16pγ log (4p)} and T = 8log{16pγ log (2p)} be 2 2 strengthened version of (3), namely that the inputs of the ocd procedure. Then the following statements hold: P (N ≤ z) + g(r; N) z,θ, 5r β (a) With extra inputs a ≥ 0, CP = ocd , d = , d = 2 3 1 2 9s log (2p) 2 3 −2 −rβ /(8s log (2p)) 2 2 + 100rp log (4p)(pβ ∨ 1)e ≤ α.(5) 4d and ≥ 0for Algorithm 1, the output confidence interval C and the support estimate S satisfy P (z ∈ C) ≥ 1 − α and z,θ, for some r ≥ 1. Under (5), Theorems 1–3 hold with the same P (S ⊆ S ) ≥ 1 − α. z,θ, β choices of input parameters. Moreover, the ocd base procedure satisfies the conclusion of Proposition 5, that is, there exists a (b) There exists a universal constant C > 0 such that, with extra universal constant C > 0suchthat(5)holds for r ≥ r = 5r β inputs a = C log(r p/α),CP = ocd , d = , 1 1 9s log (2p) r (C ) in (4), provided that we use the modified input a ˜ = d = 4d , ≥ 80r for Algorithm 1,the length L of the 2 1 2 1 2log 32p γ log (2p) . output confidence interval C and the support estimate sat- Generalizing these ideas further, now consider the model ˆ ˆ isfy P (L > 8r ) ≤ α and P (S ∪{j}⊇ S) ≥ z,θ, 1 z,θ, where X , ... , X , X − θ, X − θ, ... are independent, 1 z z+1 z+2 1 − α. each having sub-exponential components with variance param- eter 1 and rate parameter A > 0. In this setting, pro- Corollary 6 reveals that, when ocd is used as the base vided the base procedure satisfies ( 5)for some r ≥ 1 procedure, the ocd_CI methodology provides guaranteed con- and ϑ ≤ 2A log (2p), Theorems 1–3 hold when we fidence interval coverage. Moreover, up to poly-logarithmic fac- 2 redefine a := C max log(rp/α), log(rp/α) and d := tors, with an additional O 1∨(s/β ) post-change observations, 5rβ 5rβ the ocd_CI interval length is of the same order as the average max , . Furthermore, with the modified 9s log (2p) 9As log (2p) 2 2 detection delay. In terms of signal recovery, Corollary 6(b) shows 2 2 input a ˜ = 2log 32p γ log (2p) ∨ log 32p γ log (2p) , that with high probability, ocd_CI with inputs as given in that 2 2 result selects all signal coordinates whose magnitude exceeds the ocd base procedure satisfies the conclusion of Proposition 5 1/2 ϑ/s , up to logarithmic factors. Focusing on the case β = ϑ for and where s/ϑ is bounded away from zero for simplicity of 2 −1 −2 −2 C s log (2p) log {pγα (β ∨ 1)} max{1, A log(pγ)} discussion (though see also Section 3.5 for discussion of the r ≥ + 2, eeff ct of the choice of β), Proposition S3 in the supplementary materials also reveals that the ocd base procedure belongs to where C > 0isauniversalconstant. T with r of order s/ϑ , up to logarithmic factors, and m = r,m The claims made in the previous two paragraphs are justified |{j : |θ |≤ 1/(8 r)}|. On the other hand, Proposition 4 shows in Section S4 of the supplementary materials. These results that any such support estimation algorithm makes on average a confirm the flexible scope of the ocd_CI methodology beyond nonvanishing fraction of errors in distinguishing between noise the original Gaussian setting, at least as far as sub-exponential 1/2 coordinates and signals that are below the level ϑ/s ,again up tails are concerned. Where data may exhibit heavier tails than to logarithmic factors. In other words, with high probability, the this, clipping (truncation) and quantile transformations may ocd_CI algorithm with base procedure ocd selects all signals represent viable ways to proceed, though further research is that are strong enough (up to logarithmic factors) to be reliably needed to confirm the theoretical validity of such approaches. detected, while at the same time including no noise coordinates (see Corollary 6(a)). 3.5. Confidence Interval Construction with Unknown Signal Size 3.4. Relaxing the Gaussianity Assumption In some settings, an experienced practitioner may have a rea- It is natural to ask to what extent the theory of Sections 3.1–3.3 sonable idea of the magnitude ϑ of the -norm of the vector of can be generalized beyond the Gaussian setting. The purpose mean change that would be of interest to them, and this would of this section, then, is to describe how our earlier results can facilitate a choice of a lower bound β for ϑ in Algorithm 1. be modified to handle both sub-Gaussian and sub-exponential However, it is also worth considering the effect of this choice, data. Recall that we say a random variable Z with EZ = 0is and the extent to which its impact can be mitigated. 2 2 2 λZ σ λ /2 sub-Gaussian with variance parameter σ > 0if Ee ≤ e We rfi st remark that the coverage probability guarantee for for all λ ∈ R,and is sub-exponential with variance parameter the ocd_CI interval in Corollary 6 remains valid for any (arbi- 2 2 2 λZ σ λ /2 σ > 0 and rate parameter A > 0if Ee ≤ e for all trarily loose) lower bound β on ϑ. The only issue is in terms |λ|≤ A. of power: if β is chosen to be too small, then both the average We first consider the sub-Gaussian setting where detection delay and the high-probability bound on the length of X , ... , X , X − θ, X − θ, ... are independent, each the confidence interval may be inflated. In the remainder of this 1 z z+1 z+2 having sub-Gaussian components with variance parameter 1. section, then, we describe a simple modification to Algorithm 1 Note that this data generating mechanism no longer requires that permits a loose lower bound β to be employed that retains JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 7 coverage validity with only a logarithmic effect on the high- was of order (s/β ) ∨ 1. Thus, the modified algorithm has the probability bound on the length of the confidence interval. The significant advantage of enabling a conservative choice of β with only otherprice we payisthatthe computationalcostincreases only a logarithmic eeff ct on the length guarantee relative to an as β decreases, as we describe below. oracle procedure with knowledge of θ .The computational Our change to Algorithm 1 is as follows: we replace the complexity per new observation and the storage requirements of den fi ition of b by setting 2 min this modified algorithm are O p log(ep) + log(1/β) ,sothe β 1 order of magnitude is increased relative to the original algorithm b = ∧ √ , min only in an asymptotic regime where β is small by comparison log (2p) 2 log (2p) with 1/p for every K > 0. Moreover, the modified algorithm still does not require storage of historical data and the computa- m/2 set M =2log (1/b ) and define B = ±2 b : m ∈ min min tional time per new observation aeft r observing n observations [M] in both the ocd base procedure and in Algorithm 1.The does not increase with n. Nevertheless, since the computational rest of algorithm remains as previously stated. Thus, if we choose complexity now depends on β, the modified algorithm does not a conservative (very small) β, then the effect of the modification strictly satisfy our den fi ition of an online algorithm given in is to increase the number of scales on which we search for a introduction. change, so that the largest element of B is of order 1. In order to state our theoretical results for this modified algorithm, first define b := max b ∈ B ∩ (0, ∞) : b ≤ ,which opt 4. Numerical Studies s log (2p) satisfies b ≥ ∧ 1. Under thesameassumptionsas opt In this section, we study the empirical performance of the 2s log (2p) in Proposition 5, and modifying the inputs to (X ) , β> 0, ocd_CI algorithm. Throughout this section, by default, the t t∈N 2 diag off ocd_CI algorithm is used in conjunction with the recom- a ˜ = 2log(16p γ M), T = log 16pγ(M + 1) and T = mended base online changepoint detection procedure CP = 8log(16pγ M), it can be shown using very similar arguments to ocd. those in the proof of Proposition 5 that there exists a universal −2 −1 −2 constant C > 0suchthatwith r ≥ C b log pγα (β ∨ opt 1) =: r ,the output N satisfies 4.1. Tuning Parameters 2 2 −rb /8 opt P (N ≤ z) + g(r; N) + 4rp (M + 1) e ≤ α. z,θ, Chen, Wang, and Samworth (2022) found that the theoretical diag off choices of thresholds T and T for the ocd procedure were With this in place, we can derive Corollary 7, which is the analog a little conservative, and therefore recommended determining of Corollary 6 for our modified algorithm. these thresholds via Monte Carlo simulation; we replicate the method for choosing these thresholds described in their Sec- Corollary 7. Fix α ∈ (0, 1),γ> 0. Assume that θ has an eeff ctive tion 4.1. Likewise, as in Chen, Wang, and Samworth (2022), we sparsity of s := s(θ ) ≥ 2, that ϑ ≥ β> 0and that z ≤ 2αγ.Let take a = a ˜ = 2log p in our simulations. diag (X ) , β> 0, a ˜ = 2log(16p γ M), T = log 16pγ(M + t t∈N 2 For d and d ,assuggested by ourtheory, we take d = 4d , 1 2 2 off 1) and T = 8log(16pγ M) be the inputs of the modiefi d ocd and take d to be of the form d = c log(p/α).Here, we 1 1 −1 −2 procedure. Let d = 5C log pγα (β ∨ 1) /9and d = tune the parameter c > 0 through Monte Carlo simulation, 1 2 as we now describe. We considered the parameter settings p ∈ 4d . Then the following statements hold: 1 √ {100, 500}, s ∈{2, p, p}, ϑ ∈{2, 1, 1/2}, = I , α = 0.05, (a) With extra inputs CP = ocd , a ≥ 0, and ≥ 0for the β ∈{2ϑ, ϑ, ϑ/2}, γ = 30,000 and z = 500. Then, with modified Algorithm 1, the output confidence interval C and the θ generated as ϑU,where U is uniformly distributed on the ˆ ˆ p support estimate S satisfy P (z ∈ C) ≥ 1−α and P (S ⊆ z,θ, z,θ, union of all s-sparse unit spheres in R (independent of our S ) ≥ 1 − α. β data), we studied the coverage probabilities, estimated over 2000 repetitions as c varies, of the ocd_CI confidence interval for (b) There exists a universal constant C > 0 such that, with extra data generated according to the Gaussian model defined at the −1 −2 inputs CP = modified ocd , a = C log pγα (β ∨ 1) , beginning of Section 3. Figure 1 displays a subset of the results and ≥ 80r for the modiefi d Algorithm 1,the length L of the (the omitted curves were qualitatively similar). On this basis, we output confidence interval C and the support estimate satisfy recommend c = 0.5 as a safe choice across a wide range of data generating mechanisms, and we used this value of c throughout 2s log (2p) 2 −1 −2 P L > 8C max ,1 log pγα (β ∨ 1) ≤ α z,θ, our confidence interval simulations. The previous two paragraphs, in combination with Algo- (6) ˆ rithms 1 andS1, providethe practicalimplementationofthe and P (S ∪{j}⊇ S) ≥ 1 − α. z,θ, ocd_CI algorithm that we use in our numerical studies and The main difference between Corollaries 7 and 6 concerns that we recommend for practitioners. The only quantity that the high-probability guarantees on the length of the confidence remains for the practitioner to input (other than the data) is β, interval. Ignoring logarithmic factors, with high probability the which represents a lower bound on the Euclidean norm of the length of the cond fi ence interval in the modied fi algorithm is at vector of mean change. Fortunately, this description makes β most of order (s/ϑ ) ∨ 1, whereas for the original algorithm it easily interpretable by practitioners. In cases where an informed 8 Y. CHEN, T. WANG, AND R. J. SAMWORTH Figure 1. Coverage probabilities of the ocd_CI confidence interval as the parameter c, involved in the choice of tuning parameter d ,varies. default choice is not available, practitioners can make a conser- obtained using an offline procedure as described in the intro- vative (verysmall)choiceand useanincreased grid of scales duction. More precisely, after the ocd algorithm has declared to with only a small inflation in the cond fi ence interval length a change, we treat the data up to the stopping time as an guarantee and computational cost; see Section 3.5. offline dataset, and apply the inspect algorithm (Wang and Samworth 2018), followed by the one-step refinement of Kaul KFJS et al. (2021), to construct an estimate, z ˆ ,ofthe changepoint location. As recommended by Kaul et al. (2021), we obtain an 4.2. Coverage Probability and Interval Length KFJS estimator ϑ of ϑ using the -norm of the soft-thresholded KFJS dieff renceinempiricalmeanvectorsbeforeandaeft r z ˆ ,with In Table 1, we present the detection delay of the ocd procedure, the so-t ft hresholding parameter chosen via the Bayesian Infor- as well as the coverage probabilities and average confidence mation Criterion. The na fi l confidence interval is then of the interval lengths of the ocd_CI procedure, all estimated over KFJS α/2 KFJS 2 KFJS α/2 KFJS 2 ˆ ˆ form [z ˆ − q /(ϑ ) , z ˆ + q /(ϑ ) ],where 2000 repetitions, with the same set of parameter choices and α/2 q is the 1 − α/2 quantile of the distribution of the (almost data generating mechanism as in Section 4.1. From this table, surely unique) maximizer of a two-sided Brownian motion with we see that the coverage probabilities are at least at the nominal a triangular drift as given by Kaul et al. ( 2021, Theorem 3.1). level (up to Monte Carlo error) across all settings considered. 0.025 In particular, we have q = 11.03. The last two columns Underspecicfi ation of β means that the grid of scales that can of Table 1 reveal that both the coverage probabilities and con- be chosen for indices in S is shieft d downwards, and therefore j j d fi ence interval lengths from this procedure are disappointing ˜ ˆ increases the probability that b will underestimate θ for j ∈ S. and not competitive with those of the ocd_CI algorithm. There In turn, this leads to a slight conservativeness for the coverage aretwo main reasonsfor this:first, thenatureofthe online probability (and corresponding increased average confidence problem means that the changepoint is oeft n located near the interval length). On the other hand, overspecicfi ation of β yields right-hand end of the dataset up to the stopping time; on the a shorter interval on average, though these were nevertheless other hand, the theoretical guarantees of Kaul et al. (2021) able to retain the nominal coverage in all cases considered. are obtained under an asymptotic setting where the fraction Another interesting feature of Table 1 is to compare the of data either side of the change is bounded away from zero. average confidence interval lengths with the corresponding aver- Thus,theestimatedchangepointfromtheone-steprenfi ementis age detection delays. Corollary 6(b), as well as Chen, Wang, oen ft quite poor. Moreover, the estimated magnitude of change, and Samworth (2022, Theorem 4), indicates that both of these KFJS ϑ , is oeft n a significant underestimate of ϑ due to the quantities are of order (s/β ) ∨ 1, up to polylogarithmic factors soft-thresholding operation, and this can lead to substantially in p and γ , but of course whenever the confidence interval inflated confidence interval lengths. We emphasize that the Kaul includes the changepoint, its length must be at least as long as et al. (2021) procedure was not designed for use in this online the detection delay. Nevertheless, in most settings, it is only 2–3 setting, but we nevertheless present these results to illustrate the times longer on average, and in all cases considered was less than fact that the naive application of offline methods in sequential seven times longer on average. Moreover, we can also observe problems may fail badly. that the cond fi ence interval length increases with s and decreases While Table 1 covers the most basic setting for our methodol- with β,asanticipated by ourtheory. ogy, our theory in Section 3 applies equally well to data with spa- For comparison, we also present the corresponding cover- tial dependence across different coordinates. To assess whether age probabilities and average lengths of confidence intervals JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 9 Table 1. Estimated coverage and average length of the ocd_CI confidence interval and average detection delay over 2000 repetitions, with standard errors in brackets. ocd_CI Kaul_et_al ps ϑβ Delay Coverage (%) CI Length Coverage (%) CI Length 100 2 2 4 9.8 96.2 20.1 83.2 732.7 (0.1) (0.4) (0.7) (0.8) (9.6) 100 2 2 2 12.6 97.0 33.7 82.5 474.9 (0.1) (0.4) (0.7) (0.8) (11.0) 100 2 2 1 14.1 97.9 80.8 83.5 341.4 (0.1) (0.3) (1.0) (0.8) (10.4) 100 2 1 2 34.2 95.8 66.1 76.6 399.3 (0.3) (0.4) (1.0) (0.9) (10.5) 100 2 1 1 44.2 97.5 122.0 80.5 123.8 (0.3) (0.4) (1.4) (0.9) (6.5) 100 2 1 0.5 52.0 97.4 309.1 81.5 90.8 (0.4) (0.4) (2.0) (0.9) (5.4) 100 10 2 4 14.7 96.0 32.5 80.2 636.4 (0.1) (0.4) (0.8) (0.9) (10.5) 100 10 2 2 15.7 97.4 38.4 77.4 537.9 (0.1) (0.4) (0.8) (0.9) (10.9) 100 10 2 1 15.9 97.0 80.2 80.8 542.4 (0.1) (0.4) (1.1) (0.9) (10.9) 100 10 1 2 52.6 96.2 114.0 75.8 342.4 (0.5) (0.4) (1.5) (1) (10.1) 100 10 1 1 56.9 97.1 142.5 73.9 262.6 (0.4) (0.4) (1.8) (1) (9.1) 100 10 1 0.5 60.2 98.2 301.1 75.9 248.3 (0.4) (0.3) (1.6) (1) (8.9) 100 100 2 4 27.2 96.1 77.6 68.2 533.9 (0.2) (0.4) (0.9) (1.0) (10.7) 100 100 2 2 27.7 96.0 81.8 71.3 537.7 (0.2) (0.4) (1.0) (1.0) (10.8) 100 100 2 1 28.2 97.5 99.4 71.8 556.0 (0.2) (0.3) (1.3) (1.0) (10.7) 100 100 1 2 100.7 94.7 292.8 87.7 850.5 (0.8) (0.5) (3.5) (0.7) (9.5) 100 100 1 1 100.5 96.3 296.0 88.0 863.7 (0.9) (0.4) (3.4) (0.7) (9.3) 100 100 1 0.5 103.2 97.3 365.9 89.3 876.8 (0.8) (0.4) (2.8) (0.7) (9.1) 500 2 2 4 11.3 97.2 23.1 92.0 958.7 (0.1) (0.4) (0.7) (0.6) (4.3) 500 2 2 2 15.8 97.7 45.2 83.3 806.4 (0.1) (0.3) (0.9) (0.8) (8.7) 500 2 2 1 17.7 97.5 117.3 79.9 624.9 (0.1) (0.4) (1.0) (0.9) (10.7) 500 2 1 2 41.5 97.3 81.8 80.0 774.9 (0.3) (0.4) (1.2) (0.9) (9.4) 500 2 1 1 55.0 96.8 168.9 73.0 275.9 (0.4) (0.4) (1.5) (1) (9.4) 500 2 1 0.5 64.6 98.1 445.0 75.6 186.1 (0.5) (0.3) (1.7) (1) (8.0) 500 22 2 4 23.6 96.3 55.4 87.0 884.9 (0.2) (0.4) (1.0) (0.8) (7.3) 500 22 2 2 25.0 97.0 60.3 85.5 864.2 (0.2) (0.4) (0.8) (0.8) (7.8) 500 22 2 1 25.5 98.1 119.7 83.6 823.0 (0.2) (0.3) (0.8) (0.8) (8.6) 500 22 1 2 88.1 97.0 203.5 77.2 645.0 (0.7) (0.4) (2.1) (0.9) (11.0) 500 22 1 1 91.9 97.8 229.7 76.2 562.8 (0.6) (0.3) (2.2) (1) (11.1) 500 22 1 0.5 94.9 98.3 462.8 75.5 538.3 (0.6) (0.3) (1.4) (1) (11.2) 500 500 2 4 79.8 95.0 238.9 88.5 913.0 (0.6) (0.5) (2.7) (0.7) (8.0) 500 500 2 2 80.3 95.8 245.7 90.3 928.8 (0.6) (0.4) (2.6) (0.7) (7.7) 500 500 2 1 80.9 97.5 250.2 90.6 928.3 (0.6) (0.4) (2.5) (0.7) (7.7) 500 500 1 2 290.5 94.5 819.7 95.2 1189.4 (2.3) (0.5) (7.9) (0.5) (7.3) 500 500 1 1 291.4 95.2 831.1 94.3 1204.9 (2.3) (0.5) (7.5) (0.5) (7.0) 500 500 1 0.5 297.3 98.1 875.0 94.6 1207.4 (2.3) (0.3) (6.7) (0.5) (6.8) NOTE: Other parameters: γ = 30,000, z = 1000, = I , α = 0.05, a = a ˜ = 2log p, c = 0.5, d = c log(p/α), d = 4d . For comparison, we also present the p 1 2 corresponding estimated coverage probabilities and average lengths of the procedure based on Kaul et al. (2021), as described in Section 4.2. Table 2. Estimated support recovery probabilities (with standard errors in brack- this theory carries over to empirical performance, Table S1 in ets). the supplementary materials presents corresponding coverage ˆ ˆ ˆ probabilities and lengths for the ocd_CI procedure with the s ϑ Signal shape S ⊆ S (%) S ∪{j}⊇ S (%) cross-sectional covariance matrix = ( ) taken to be jk j,k∈[p] 5 2 uniform 99.8 97.6 (0.2) (0.7) Toeplitz with parameter ρ ∈{0.5, 0.75};inother words, = 5 1 uniform 100.0 97.6 jk (0.0) (0.7) |j−k| 50 2 uniform 100.0 95.6 (0.0) (0.9) ρ . The results are again encouraging: the coverage remains 50 1 uniform 100.0 97.8 (0.0) (0.7) perfectly satisfactory in all settings considered, and moreover, 5 2 inv sqrt 99.6 96.6 (0.3) (0.8) the lengths of the confidence intervals are very similar to those 5 1 inv sqrt 100.0 98.8 (0.0) (0.5) in Table 1. 50 2 inv sqrt 100.0 99.8 (0.0) (0.2) 50 1 inv sqrt 100.0 100.0 (0.0) (0.0) 5 2 harmonic 100.0 97.6 (0.0) (0.7) 5 1 harmonic 99.6 97.8 (0.3) (0.7) 4.3. Support Recovery 50 2 harmonic 100.0 99.4 (0.0) (0.3) 50 1 harmonic 100.0 100.0 (0.0) (0.0) We now turn our attention to the empirical support recovery NOTE: Other parameters: p = 100, = I , α = 0.05, a = a ˜ = 2log p, d = p 1 properties of the quantity S (in combination with the anchor −2 2log(p/α), β = ϑ, and with an additional =2s log (2p) log(p)β post- coordinate j)computedinthe ocd_CI algorithm. In Table 2, declaration observations. we present the probabilities, estimated over 500 repetitions, that ˆ ˆ ˆ S ⊆ S and that S ∪{j}⊇ S for p = 100, s ∈{5, 50}, ϑ ∈{1, 2}, = I , and for three different signal shapes: in the 2log p, α = 0.05, d = 2log(p/α), β = ϑ,and,motivated p 1 −2 uniform, inverse square root and harmonic cases, we took θ ∝ by Corollary 6, took an additional =2sβ log (2p) log p −1/2 −1 (1 ) , θ ∝ (j 1 ) and θ ∝ (j 1 ) , post-declaration observations in constructing the support esti- {j∈[s]} j∈[p] {j∈[s]} j∈[p] {j∈[s]} j∈[p] respectively. As inputs to the algorithm, we set a = a ˜ = mates. The results reported in Table 2 provide empirical 10 Y. CHEN,T.WANG, ANDR.J.SAMWORTH Figure 2. Support recovery properties of ocd_CI. In the left panel, we plot ROC curves for three different signal shapes and for sparsity levels s ∈{5, 50}. The triangles and circles correspond to points on the curves with d = 2log(p/α) (with α = 0.05), and d = 2log p, respectively. The dotted vertical line corresponds to P(S ⊆ 1 1 ˆ ˆ S ) = 1 − α. In the right panel, we plot the proportion of 500 repetitions for which each coordinate belongs to S ∪{j} with d = 2log p;here, the s = 20 signals β 1 have an inverse square root shape, and are plotted in red; noise coordinates are plotted in black. Other parameters for both panels: p = 100, = I , β = ϑ = 2, = 0, a = a = 2log p. Figure 3. Standardized, transformed weekly excess death data from 12 states (including Washington, D.C.). The monitoring period starts from 30 June 2019 (dashed line). The data from the states in the support estimate are shown in red. The confidence interval [March 8, 2020, March 28, 2020] is shown in the light blue shaded region. confirmation of the support recovery properties claimed in panel conrfi ms that, with an inverse square root signal shape, the Corollary 6. probability that we capture each signal increases with the signal Finally in this section, we consider the extent to which the magnitude, and that even small signals tend to be selected with additional observations are necessary in practice to provide higher probability than noise coordinates. satisfactory support recovery. In the left panel of Figure 2,we plot Receiver Operating Characteristic (ROC) curves to study 4.4. US Covid-19 Data Example the estimated support recovery probabilities with = 0asa In this section, we apply ocd_CI to a dataset of weekly function of the input parameter d ,which canbethought of as ˆ ˆ ˆ deaths in the United States between January 2017 and June controlling the tradeoff between P(S ∪{j}⊇ S) and P(S ⊆ S ). 2020 (available at: https://www.cdc.gov/nchs/nvss/vsrr/covid19/ The fact that the triangles in this plot are all to the left of the excess_deaths.htm). The data up to 29 June 2019 are treated as dotted vertical line confirms the theoretical guarantee provided our training data. An obvious discrepancy between underlying in Corollary 6(a), which holds with d = 2log(p/α),and dynamics of these weekly deaths and the conditions assumed even with = 0); the less conservative choice d = 2log p, in our theory in Section 3 is temporal dependence, particularly which roughly corresponds to an average of one noise coordinate induced by seasonal and weather eeff cts. Although we can never included in S,allowsustocapture alargerproportionofthe hope to remove this dependence entirely, we seek to mitigate its signal.Fromthispanel,wealsosee that additional sampling impact by pre-processing the data as follows: for each of the 50 is needed to ensure that, with high probability, we recover all states,aswellasWashington, D.C. (p = 51), we rfi st estimate of the true signals. This is unsurprising: for instance, with a the “seasonal death curve,” that is, the mean death numbers for uniform signal shape and s = 50, it is very unlikely that all 50 each day of the year, for each state. The seasonal death curve signal coordinates will have accumulated such similar levels of is estimated by rst fi splitting the weekly death numbers evenly ˆ ˆ evidence to appear in S ∪{j} by the time of declaration. The right across the seven relevant days, and then estimating the average JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 11 number of deaths on each day of the year from these derived and EP/N031938/1, as well as European Research Council Advanced Grant daily death numbers using a Gaussian kernel with a bandwidth of 20 days. As the death numbers follow an approximate Poisson distribution, we apply a square-root transformation to stabilize ORCID the variance; more precisely, the transformed weekly excess deaths are computed as the difference of the square roots of Richard J. Samworth https://orcid.org/0000-0003-2426-4679 the weekly deaths and the predicted weekly deaths from the seasonal death curve. Finally, we standardize the transformed References weekly excess deaths using the mean and standard deviation of the transformed data over the training period. The standardized, Chan, H. P., and Walther, G. (2015), “Optimal Detection of Multi-Sample transformed data are plotted in Figure 3 for 12 states. When Aligned Sparse Signals,” Annals of Statistics, 43, 1865–1895. [1] Chen, Y., Wang, T., and Samworth, R. J. (2022), “High-Dimensional, Mul- applying ocd_CI to these data, we take a = a ˜ = 2log p, diag off tiscaleOnlineChangepoint Detection,” Journalofthe RoyalStatistical T = log{16pγ log (4p)}, T = 8log{16pγ log (2p)}, d = 2 2 Society, Series B, 84, 234–266. [1,2,4,7,8] 0.5 log(p/α) and d = 4d ,with α = 0.05, β = 50 and 1 Cho, H. (2016), “Change-Point Detection in Panel Data via Double CUSUM γ = 1000. On the monitoring data (from 30 June 2019), the Statistic,” Electronic Journal of Statistics, 10, 2000–2038. [1] Cho, H., and Fryzlewicz, P. (2015), “Multiple-Change-Point Detection for ocd_CI algorithm declares a change on the week ending 28 High Dimensional Time Series via Sparsified Binary Segmentation,” March 2020, and provides a confidence interval from the week Journalofthe RoyalStatistical Society, Series B, 77, 475–507. [1] ending 21 March 2020 to the week ending 28 March 2020. Csörgo, M., and Horváth, L. (1997), Limit Theorems in Change-Point Anal- This coincides with the beginning of the rfi st wave of Covid- ysis, New York: Wiley. [1] 19 deaths in the United States. The algorithm also identifies Dette, H., and Gösmann, J. (2020), “A Likelihood Ratio Approach to New York, New Jersey, Connecticut, Michigan, and Louisiana Sequential Change Point Detection for a General Class of Parameters,” Journal of the American Statistical Association, 115, 1361–1377. [1] as theestimated supportofthe change.Interestingly,ifwerun Duncan, A. J. (1952), Quality Control and Industrial Statistics,Chicago: the ocd_CI procedure from the beginning of the training data Richard D. Irwin Professional Publishing Inc. [1] period (while still standardizing as before, due to the lack of Enikeeva, F., and Harchaoui, Z. (2019), “High-Dimensional Change-Point available data prior to 2017), it identifies a subtler change on Detection Under Sparse Alternatives,” Annals of Statistics, 47, 2051–2079. the week ending January 6, 2018, with a confidence interval of [1] Follain, B., Wang, T., and Samworth, R. J. (2022), “High-Dimensional [December 17, 2017, January 6, 2018]. This corresponds to a bad Changepoint Estimation With Heterogeneous Missingness,” Journal of influenza season at the end of 2017 (see, https://www.cdc.gov/ the Royal Statistical Society, Series B, 84, 1023–1055. [1] u fl /about/season/u fl -season-2017-2018.htm ). Despite the natu- Gösmann, J., Stoehr, C., Heiny, J., and Dette, H. (2020), “Sequential Change ral interpretation of these findings, we recognize that the model Point Detection in High Dimensional Time Series,” arXiv preprint, in Section 3 under which we proved our theoretical results arxiv:2006.00636. [1] Horváth, L., and Rice, G. (2014), “Extensions of Some Classical Methods in cannot capture the full complexity of the temporal dependence Change Point Analysis,” TEST, 23, 219–255. [1] in this dataset even aeft r our pre-processing transformations. A Janková, J., and van de Geer, S. (2015), “Confidence Intervals for High- complete theoretical analysis of the performance of ocd_CI in dimensional Inverse Covariance Estimation,” Electronic Journal of Statis- time-dependent settings is challenging and beyond the scope tics, 9, 1205–1229. [1] of the current work; in practical applications, we advise careful Javanmard, A., and Montanari, A. (2014), “Confidence Intervals and modeling of this dependence to facilitate the construction of Hypothesis Testing for High-Dimensional Regression,” Journal of Machine Learning Research, 15, 2869–2909. [1] appropriate residuals for which the main eeff cts of this depen- Jirak, M. (2015), “Uniform Change Point Tests in High Dimension,” Annals dence have been removed. of Statistics, 43, 2451–2483. [1] Kaul, A., Fotopoulos, S. B., Jandhyala, V. K., and Safikhani, A. (2021), Supplementary Materials “Inference on the Change Point Under a High Dimensional Sparse Mean Shift,” Electronic Journal of Statistics, 15, 71–134. [1,8,9] This contains the proofs of our main theoretical results and auxiliary results, Kirch, C., and Stoehr, C. (2019), “Sequential Change Point Tests Based on as well as pseudocode for the ocd and ocd base procedures, extensions U-statistics,” arXiv preprint, arxiv:1912.08580. [1] of our results to sub-Gaussian and sub-exponential settings and additional Liu, H., Gao, C., and Samworth, R. J. (2021), “Minimax Rates in Sparse, simulation results. High-dimensional Changepoint Detection,” Annals of Statistics, 49, 1081–1112. [1] Londschien, M., Kovács, S., and Bühlmann, P. (2021), “Change-Point Detec- Acknowledgments tion for Graphical Models in the Presence of Missing Values,” Journal of Computational and Graphical Statistics, 30, 1–12. [1] We are grateful to the Editor, the Associate Editor and two anonymous Padilla, O. H. M., Yu, Y., Wang, D., and Rinaldo, A. (2021), “Optimal referees for their constructive feedback, which helped to improve the article. Nonparametric Multivariate Change Point Detection and Localization,” IEEE Transactions on Information Theory, 68, 1922–1944. [1] Disclosure Statement Page, E. S. (1954), “Continuous Inspection Schemes,” Biometrika, 41, 100– 115. [1] The authors report there are no competing interests to declare. Rinaldo, A., Wang, D., Wen, Q., Willett, R., and Yu, Y. (2021), “Localiz- ing Changes in High-dimensional Regression Models,” Proceedings of Machine Learning Research, pp. 2089–2097. [1] Funding Soh, Y. S., and Chandrasekaran, V. (2017), “High-Dimensional Change- The research of the first and second authors was supported by Engineering Point Estimation: Combining Filtering With Convex Optimization,” and Physical Sciences Research Council (EPSRC) grant EP/T02772X/1 and Applied and Computational Harmonic Analysis, 43, 122–147. that of the third author was supported by EPSRC grants EP/P031447/1 [1] 12 Y. CHEN,T.WANG, ANDR.J.SAMWORTH Tartakovsky, A., Nikiforov, I., and Basseville, M. (2014), Sequential Analysis: Yu, Y., Bradic, J., and Samworth, R. J. (2021), “Confidence Intervals Hypothesis Testing and Changepoint Detection,London: Chapmanand for High-Dimensional Cox Models,” Statistica Sinica, 31, 243–267. Hall. [1] [1] van de Geer, S., Bühlmann, P., Ritov, Y., and Dezeure, R. (2014), “On Asymp- Yu, Y., Padilla, O. H. M., Wang, D., and Rinaldo, A. (2021), “Opti- totically Optimal Confidence Regions and Tests for High-Dimensional mal Network Online Change Point Localisation,” arXiv preprint, Models,” Annals of Statistics, 42, 1166–1202. [1] arxiv:2021.05477. [1] Wang, T., and Samworth, R. J. (2018), “High Dimensional Change Point Zhang, C.-H., and Zhang, S. S. (2014), “Confidence Intervals for Low Estimation via Sparse Projection,” Journalofthe RoyalStatistical Society, Dimensional Parameters in High Dimensional Linear Models,” Journal Series B, 80, 57–83. [1,8] of the Royal Statistical Society, Series B, 76, 217–242. [1]
Journal of the American Statistical Association – Taylor & Francis
Published: May 24, 2023
Keywords: Confidence interval; Sequential method; Sparsity; Support estimate
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.