Wild Binary Segmentation for multiple change-point detection
|
|
- Marilyn Hicks
- 5 years ago
- Views:
Transcription
1 for multiple change-point detection Piotr Fryzlewicz Department of Statistics, London School of Economics, UK Isaac Newton Institute, 14 January 2014
2 Segmentation in a simple function + noise model We consider the canonical function + noise model X t = f t + ε t, t = 1,..., T where f t is piecewise-constant with an unknown number N of change-points, possibly increasing with T, and ε t s are iid Gaussian (for simplicity; can be extended to various more complex settings). Objective: estimating the number and the locations of (any) change-points in f t.
3 Segmentation in a simple function + noise model We consider the canonical function + noise model X t = f t + ε t, t = 1,..., T where f t is piecewise-constant with an unknown number N of change-points, possibly increasing with T, and ε t s are iid Gaussian (for simplicity; can be extended to various more complex settings). Objective: estimating the number and the locations of (any) change-points in f t Days Log-returns on daily closing values of S&P 500 over approximately 8 trading years ending 26 October Volatility removed via a GARCH(1,1) fit. Any change-points here?
4 Existing approaches A substantial number of techniques. A brief literature review: Least-squares (or generally likelihood-type fit) + AIC or BIC-type penalty: Yao (1988), Yao and Au (1989), Lee (1995), Lavielle (1999, 2005), Lavielle & Moulines (2000), Lebarbier (2005) Pan & Chen (2006), Boysen et al. (2009). Minimum Description Length: Davis et al. (2006). L 1 -type penalties: Davies & Kovac (2001), Rinaldo (2009), Harchaoui & Levy-Leduc (2010). Classical wavelet transform: Wang (1995). Binary Segmentation: Vostrikova (1981), Venkatraman (1992), Bai (1997), Chen et al. (2011), Fryzlewicz & Subba Rao (2012), Cho & Fryzlewicz (2012, 2013).
5 Existing approaches: criticisms No technique is perfect. Some comments / criticisms: Least-squares (or generally likelihood-type fit) + AIC or BIC-type penalty: slow computational speed, typically of order O(T 2 ). However some efforts to reduce this, e.g. Rigail (2010) (but still O(T 2 ) in the worst case), Killick et al. (2012) (PELT). Both will be revisited in the simulation study. MDL: minimisation not obvious, via a genetic algorithm in Davis et al. (2006), often (very) random output. L 1 -type penalties: not optimal for change-point detection, see Brodsky & Darkhovsky (1993). Often lead to spurious detections. Classical wavelets: hopeless in noisy settings. Binary Segmentation: more details soon.
6 Focus on Binary Segmentation Generic algorithm for Binary Segmentation (BS): 1 Find f i, a step function with one change-point, minimising T (X t f t ) 2. t=1 2 Denote the location of the change-point in f t by b. 3 Perform similar fitting on 1,..., b and b + 1,..., T. 4 Continue in the same manner until a certain criterion is satisfied. In principle, Binary Segmentation is fast (typically O(T log T )), conceptually simple, easy to code, tractable theoretically (with some effort), and easy to transfer to other more complex settings.
7 Binary Segmentation Haar wavelet interpretation Denote by f t s,b,e a step function (vector) starting at index s, with a change-point at b, ending at e. We have b 0 := arg min b e (X t t=s f s,b,e t ) 2 = arg max X, ψ s,b,e, b where ψ s,b,e is an Unbalanced Haar vector, i.e. a vector which is constant positive for i = s,..., b, is constant negative for i = b + 1,..., e, sums to zero and sums to one when squared. Thus, change-point candidates are located by inspecting the maxima of X, ψ s,b,e over b.
8 Binary Segmentation when can expect good performance? Since BS fits a one-step function to the current interval [s,e], we can expect the performance to be good if [s,e] contains no more than one change-point. However, things can go disastrously wrong if this is not the case. In the following example, we demonstrate how BS can (spectacularly) fail if the interval [s, e] contains more than one change-point.
9 Binary Segmentation good versus bad performance Example of global (blue) and local (red) CUSUM X, ψ s,b,e as a function of b, on data X in black. z Time
10 Main idea of Clearly, it would have been preferable to use the maximum of the red curve as a locator for a change-point candidate. However, it is obviously not clear a priori what starting point s and end-point e to choose. Motivated by this, we propose the following Wild Binary Segmentation (WBS) locator statistics WBS = arg max s,b,e X, ψs,b,e, where s, e are drawn uniformly over the current data segment [s, e] a suitable number of times. Checking all s, e would have resulted in cubic computational complexity, which would be prohibitive hence the random draws. The b that achieves the above maximum is taken as a change-point candidate.
11 Motivation for WBS If the number of draws is large enough, we will be able to guarantee, with high probability, particularly favourable draws for which e.g. [s, e ] contains only one change-point (or is sufficiently close to this situation, as in the example above). The number of draws guaranteed to achieve this is not large, as will be shown later.
12 Stopping criteria for BS and WBS Stopping criteria for BS and WBS: two different approaches. 1 Thresholding. In BS combined with the thresholding approach, we stop on the current interval [s, e] when max b X, ψ s,b,e < ζ T. In WBS, we stop when max s,b,e X, ψs,b,e < ζ T. The threshold ζ T will be different for both algorithms. 2 New information criterion for WBS. Alternatively, for WBS, we propose what we call the strengthened Schwarz Information Criterion (ssic). It works by performing WBS to the end, then pruning back to retain only those estimated change-points that correspond to the k 0 largest statistics max b X, ψ s,b,e, where k 0 = arg min k=0,...,k T 2 log ˆσ2 k + k log α T, with ˆσ 2 k being the MLE of the residual variance and α > 1.
13 Comparison of BS and WBS in theory Assumption 1. 1 The random sequence ε t is iid Gaussian with mean zero and variance 1. 2 The sequence f t is bounded, i.e. f t < f <. 3 The magnitudes of the change-points are bounded from below, i.e. min i=1,...,n f ηi f ηi 1 > f > 0. Assumption 2. (for BS) The minimum spacing between change-points satisfies min i=1,...,n+1 η i η i 1 > δ T, where δ T = O(T Θ ) with Θ (3/4, 1]. Assumption 3. (for WBS) The minimum spacing between change-points satisfies min i=1,...,n+1 η i η i 1 > δ T, where δ T C log T for a large enough C.
14 Consistency of the BS algorithm Theorem (BS). Suppose Assumptions 1 and 2 hold. Let N and η 1,..., η N denote, respectively, the number and locations of change-points. Let ˆN denote the number, and ˆη 1,..., ˆη N the locations, sorted in increasing order, of the change-point estimates obtained by the standard Binary Segmentation algorithm with the thresholding stopping criterion. Let the threshold parameter satisfy ζ T = c 1 T θ where θ (1 Θ, Θ 1/2) if Θ ( 3 4, 1), or ζ T c 2 log p T (p > 1/2) and ζ T c 3 T θ (θ < 1/2) if Θ = 1, for any positive constants c 1, c 2, c 3. Then there exists a positive constant C such that P(A T ) 1, where A T = { ˆN = N; max ˆη i η i Cɛ T } i=1,...,n with ɛ T = λ 2 2 T 2 δ 2 T, where λ 2 is such that P(A T ) 1, where { A T = (e b + 1) 1/2 e ε i < λ 2 i=b 1 b e T }. (1)
15 Consistency of the WBS algorithm Theorem (WBS). Suppose Assumptions 1 and 3 hold. Let N and η 1,..., η N denote, respectively, the number and locations of change-points. Let ˆN denote the number, and ˆη 1,..., ˆη N the locations, sorted in increasing order, of the change-point estimates obtained by the algorithm with the thresholding stopping criterion. There exist two constants C, C such that if C log 1/2 T ζ T Cδ 1/2 T, then P(A T ) 1, where A T = { ˆN = N; max ˆη i η i C log T } i=1,...,n for a certain positive C, where the guaranteed speed of convergence of P(A T ) to 1 is no faster than T δ 1 T (1 δ2 T T 2 /9) M, with M denoting the overall number of random draws. Note: similar results hold for ssic-bs and ssic-wbs.
16 Choice of the number of draws M Note that only one set of M intervals needs to be drawn, i.e. we do not need to draw new intervals at each binary stage as we can just as well reuse the previously drawn intervals that fall within each current interval [s, e]. Considering the bound from the WBS consistency theorem, suppose we wish to have T δ 1 T (1 δ2 T T 2 /9) M T α for a certain positive α. This is practically equivalent to M 9T 2 δ 2 T log(t 1+α δ 1 T ). In the easy case of δ T = O(T ), this results in a logarithmic number of draws. Naturally, M progressively increases as δ T decreases.
17 Parameter choice in practice Choice of M: We have tested, and recommend, M = 5000 or M = for datasets of length T not exceeding a few thousand. Part of the algorithm is coded in C so it takes a fraction of a second on a standard PC. Note that WBS can be fully parallelized e.g. on a GPU as each interval can be drawn and processed independently of others. In this sense, in a parallel computing environment, WBS is actually faster than BS! Choice of threshold ζ T : We use multiples of the universal threshold, i.e. ζ T = C ˆσ(2 log T ) 1/2, with C = 1.0 (which tends to perform well or slightly over-estimate N) or C = 1.3 (which tends to perform well or slightly under-estimate N). Choice of the α parameter in ssic-wbs: We use α = 1.01 in order to stay close to the standard SIC.
18 Simulation study (1) The blocks signal: Time Time
19 Simulation study (2) The fms signal: Time Time
20 Simulation study (3) The mix signal: Time Time
21 Simulation study (4) The teeth10 signal: Time Time
22 Simulation study (5) The stairs10 signal: Time Time
23 Simulation study Best available competitors from R packages publicly available on CRAN: PELT: method from the changepoint package, see Killick et al. (2012), B&P: method from the strucchange package, see Bai and Perron (2003), cumseg: method from the cumseg package, see Muggeo and Adelfio (2011), S3IB: method from the Segmentor3IsBack package, see Rigaill (2010).
24 Simulation study Results for the blocks signal. ˆN N Method Model MSE PELT B&P cumseg S3IB (1) WBS C = WBS C = WBS ssic BS C = BS C =
25 Simulation study Results for the fms signal. ˆN N Method Model MSE PELT B&P cumseg S3IB (2) WBS C = WBS C = WBS ssic BS C = BS C =
26 Simulation study Results for the mix signal. ˆN N Method Model MSE PELT B&P cumseg S3IB (3) WBS C = WBS C = WBS ssic BS C = BS C =
27 Simulation study Results for the teeth10 signal. ˆN N Method Model MSE PELT B&P cumseg S3IB (4) WBS C = WBS C = WBS ssic BS C = BS C =
28 Simulation study Results for the stairs10 signal. ˆN N Method Model MSE PELT B&P cumseg S3IB (5) WBS C = WBS C = WBS ssic BS C = BS C =
29 Real data example We now revisit the example from the start of the talk. The time-threshold map below shows the estimated change-points depending on the threshold chosen. Blue line: C =
30 Real data example contd Cumulative sum of X t, change-points corresponding to ssic (thick solid vertical lines), ζ T = 3.83 (thin and thick solid vertical lines), ζ T = 3.1 (all vertical lines) Time
31 Some final thoughts Some final thoughts: Change-point detection is neither an entirely global problem nor an entirely local one, so a multiscale approach, such as that offered by WBS (in that both short and long intervals are used), appears to be helpful. Can similar local-global randomised approaches be used in other nonparametric problems?
32 References for multiple change-point detection. P. Fryzlewicz (2013). Under revision. Available from Package wbs. R. Baranowski & P. Fryzlewicz (2014). Available from
arxiv: v1 [math.st] 4 Nov 2014
The Annals of Statistics 2014, Vol. 42, No. 6, 2243 2281 DOI: 10.1214/14-AOS1245 c Institute of Mathematical Statistics, 2014 arxiv:1411.0858v1 [math.st] 4 Nov 2014 WILD BINARY SEGMENTATION FOR MULTIPLE
More informationRecent advances in multiple change-point detection
Recent advances in multiple change-point detection London School of Economics, UK Vienna University of Economics and Business, June 2017 Introduction nonparametric estimators as algorithms Estimators formulated
More informationDetecting multiple generalized change-points by isolating single ones
Detecting multiple generalized change-points by isolating single ones Andreas Anastasiou and Piotr Fryzlewicz Department of Statistics, The London School of Economics and Political Science Abstract We
More informationMultiscale and multilevel technique for consistent segmentation of nonstationary time series
Multiscale and multilevel technique for consistent segmentation of nonstationary time series Haeran Cho Piotr Fryzlewicz University of Bristol London School of Economics INSPIRE 2009 Imperial College London
More informationOptimal Covariance Change Point Detection in High Dimension
Optimal Covariance Change Point Detection in High Dimension Joint work with Daren Wang and Alessandro Rinaldo, CMU Yi Yu School of Mathematics, University of Bristol Outline Review of change point detection
More informationKarolos K. Korkas and Piotr Fryzlewicz Multiple change-point detection for nonstationary time series using wild binary segmentation
Karolos K. Korkas and Piotr Fryzlewicz Multiple change-point detection for nonstationary time series using wild binary segmentation Article (Published version) (Refereed) Original citation: Korkas, Karolos
More informationFinancial Time Series: Changepoints, structural breaks, segmentations and other stories.
Financial Time Series: Changepoints, structural breaks, segmentations and other stories. City Lecture hosted by NAG in partnership with CQF Institute and Fitch Learning Rebecca Killick r.killick@lancs.ac.uk
More informationChange-Point Detection in Time Series Data via the Cross-Entropy Method
nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 017 mssanz.org.au/modsim017 Change-Point Detection in Time Series Data via the Cross-Entropy Method G.
More informationDetecting changes in slope with an L 0 penalty
Detecting changes in slope with an L 0 penalty Robert Maidstone 1,2, Paul Fearnhead 1, and Adam Letchford 3 1 Department of Mathematics and Statistics, Lancaster University 2 STOR-i Doctoral Training Centre,
More informationMultiscale interpretation of taut string estimation
Multiscale interpretation of taut string estimation and its connection to Unbalanced Haar wavelets Haeran Cho and Piotr Fryzlewicz August 31, 2010 Abstract We compare two state-of-the-art non-linear techniques
More informationChangepoint Detection in the Presence of Outliers
Changepoint Detection in the Presence of Outliers Paul Fearnhead 1, and Guillem Rigaill 2,3 1 Department of Mathematics and Statistics, Lancaster University 2 Institute of Plant Sciences Paris-Saclay,
More informationEfficient penalty search for multiple changepoint problems
Efficient penalty search for multiple changepoint problems Kaylea Haynes 1, Idris A. Eckley 2 and Paul Fearnhead 2 arxiv:1412.3617v1 [stat.co] 11 Dec 2014 1 STOR-i Centre for Doctoral Training, Lancaster
More informationTime-Threshold Maps: using information from wavelet reconstructions with all threshold values simultaneously
Time-Threshold Maps: using information from wavelet reconstructions with all threshold values simultaneously Piotr Fryzlewicz February 22, 2012 Abstract Wavelets are a commonly used tool in science and
More informationMULTISCALE AND MULTILEVEL TECHNIQUE FOR CONSISTENT SEGMENTATION OF NONSTATIONARY TIME SERIES
Statistica Sinica 22 (2012), 207-229 doi:http://dx.doi.org/10.5705/ss.2009.280 MULTISCALE AND MULTILEVEL TECHNIQUE FOR CONSISTENT SEGMENTATION OF NONSTATIONARY TIME SERIES Haeran Cho and Piotr Fryzlewicz
More informationPost-selection Inference for Changepoint Detection
Post-selection Inference for Changepoint Detection Sangwon Hyun (Justin) Dept. of Statistics Advisors: Max G Sell, Ryan Tibshirani Committee: Will Fithian (UC Berkeley), Alessandro Rinaldo, Kathryn Roeder,
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationMODELING NON-STATIONARY LONG-MEMORY SIGNALS WITH LARGE AMOUNTS OF DATA. Li Song and Pascal Bondon
19th European Signal Processing Conference (EUSIPCO 2011) Barcelona, Spain, August 29 - September 2, 2011 MODELING NON-STATIONARY LONG-MEMORY SIGNALS WITH LARGE AMOUNTS OF DATA Li Song and Pascal Bondon
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationChangepoint Detection for Acoustic Sensing Signals
Changepoint Detection for Acoustic Sensing Signals Benjamin James Pickering, B.Sc. (Hons.), M.Res. Submitted for the degree of Doctor of Philosophy at Lancaster University. December 2015 Changepoint Detection
More informationMultiple Change-Point Detection and Analysis of Chromosome Copy Number Variations
Multiple Change-Point Detection and Analysis of Chromosome Copy Number Variations Yale School of Public Health Joint work with Ning Hao, Yue S. Niu presented @Tsinghua University Outline 1 The Problem
More informationTime Series Segmentation Procedures to Detect, Locate and Estimate Change-Points
Time Series Segmentation Procedures to Detect, Locate and Estimate Change-Points Ana Laura Badagián, Regina Kaiser, and Daniel Peña Abstract This article deals with the problem of detecting, locating,
More informationOn optimal multiple changepoint algorithms for large data
Stat Comput (217) 27:519 533 DOI 1.17/s11222-16-9636-3 On optimal multiple changepoint algorithms for large data Robert Maidstone 1 Toby Hocking 2 Guillem Rigaill 3 Paul Fearnhead 4 Received: 6 March 215
More information6.867 Machine Learning
6.867 Machine Learning Problem Set 2 Due date: Wednesday October 6 Please address all questions and comments about this problem set to 6867-staff@csail.mit.edu. You will need to use MATLAB for some of
More informationChange-Point Detection on Solar Panel Performance Using Thresholded LASSO
Change-Point Detection on Solar Panel Performance Using Thresholded LASSO Youngjun Choe a, Weihong Guo b, Eunshin Byon a, Jionghua (Judy) Jin a, and Jingjing Li c a Department of Industrial and Operations
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationPackage unbalhaar. February 20, 2015
Type Package Package unbalhaar February 20, 2015 Title Function estimation via Unbalanced Haar wavelets Version 2.0 Date 2010-08-09 Author Maintainer The package implements top-down
More informationSimultaneous change-point and factor analysis for high-dimensional time series
Simultaneous change-point and factor analysis for high-dimensional time series Piotr Fryzlewicz Joint work with Haeran Cho and Matteo Barigozzi (slides courtesy of Haeran) CMStatistics 2017 Department
More informationMultiscale and multilevel technique for consistent segmentation of nonstationary time series
Multiscale and multilevel technique for consistent segmentation of nonstationary series Haeran Cho University of Bristol, Bristol, UK. Piotr Fryzlewicz London School of Economics, London, UK. Summary.
More informationDetecting Changes in Multivariate Time Series
Detecting Changes in Multivariate Time Series Alan Wise* Supervisor: Rebecca Wilson September 2 nd, 2016 *STOR-i Ball: Best Dressed Male 2016 What I Will Cover 1 Univariate Changepoint Detection Detecting
More informationAdaptive trend estimation in financial time series via multiscale change-point-induced basis recovery
Adaptive trend estimation in financial time series via multiscale change-point-induced basis recovery Anna Louise Schröder Piotr Fryzlewicz Department of Statistics, London School of Economics, UK {a.m.schroeder,
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationarxiv: v2 [stat.me] 14 Jul 2016
Multiple Change-point Detection: a Selective Overview Yue S. Niu, Ning Hao, and Heping Zhang University of Arizona and Yale University arxiv:1512.04093v2 [stat.me] 14 Jul 2016 July 15, 2016 Abstract Very
More informationDetection of structural breaks in multivariate time series
Detection of structural breaks in multivariate time series Holger Dette, Ruhr-Universität Bochum Philip Preuß, Ruhr-Universität Bochum Ruprecht Puchstein, Ruhr-Universität Bochum January 14, 2014 Outline
More informationLecture Stat Information Criterion
Lecture Stat 461-561 Information Criterion Arnaud Doucet February 2008 Arnaud Doucet () February 2008 1 / 34 Review of Maximum Likelihood Approach We have data X i i.i.d. g (x). We model the distribution
More informationForecasting in the presence of recent structural breaks
Forecasting in the presence of recent structural breaks Second International Conference in memory of Carlo Giannini Jana Eklund 1, George Kapetanios 1,2 and Simon Price 1,3 1 Bank of England, 2 Queen Mary
More informationRobust Backtesting Tests for Value-at-Risk Models
Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationAdvanced Signal Processing Introduction to Estimation Theory
Advanced Signal Processing Introduction to Estimation Theory Danilo Mandic, room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,
More informationModel selection using penalty function criteria
Model selection using penalty function criteria Laimonis Kavalieris University of Otago Dunedin, New Zealand Econometrics, Time Series Analysis, and Systems Theory Wien, June 18 20 Outline Classes of models.
More informationINFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis
INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS Tao Jiang A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the
More informationWeek 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend
Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 :
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationarxiv: v1 [math.st] 1 Dec 2014
HOW TO MONITOR AND MITIGATE STAIR-CASING IN L TREND FILTERING Cristian R. Rojas and Bo Wahlberg Department of Automatic Control and ACCESS Linnaeus Centre School of Electrical Engineering, KTH Royal Institute
More informationEcon 423 Lecture Notes: Additional Topics in Time Series 1
Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes
More informationLasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices
Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,
More informationMASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp
Selection criteria Example Methods MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp Lecture 5, spring 2018 Model selection tools Mathematical Statistics / Centre
More informationSystematic strategies for real time filtering of turbulent signals in complex systems
Systematic strategies for real time filtering of turbulent signals in complex systems Statistical inversion theory for Gaussian random variables The Kalman Filter for Vector Systems: Reduced Filters and
More informationFast Algorithms for Segmented Regression
Fast Algorithms for Segmented Regression Jayadev Acharya 1 Ilias Diakonikolas 2 Jerry Li 1 Ludwig Schmidt 1 1 MIT 2 USC June 21, 2016 1 / 21 Statistical vs computational tradeoffs? General Motivating Question
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationA Composite Likelihood-based Approach for Change-point Detection in Spatio-temporal Process
A Composite Likelihood-based Approach for Change-point Detection in Spatio-temporal Process Zifeng Zhao 1, Ting Fung Ma 2, Wai Leong Ng 3, Chun Yip Yau 3 arxiv:1904.06340v1 [stat.me] 12 Apr 2019 University
More informationMFM Practitioner Module: Quantitiative Risk Management. John Dodson. October 14, 2015
MFM Practitioner Module: Quantitiative Risk Management October 14, 2015 The n-block maxima 1 is a random variable defined as M n max (X 1,..., X n ) for i.i.d. random variables X i with distribution function
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationTutorial lecture 2: System identification
Tutorial lecture 2: System identification Data driven modeling: Find a good model from noisy data. Model class: Set of all a priori feasible candidate systems Identification procedure: Attach a system
More informationEconometric Forecasting
Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna October 1, 2014 Outline Introduction Model-free extrapolation Univariate time-series models Trend
More informationRevisiting linear and non-linear methodologies for time series prediction - application to ESTSP 08 competition data
Revisiting linear and non-linear methodologies for time series - application to ESTSP 08 competition data Madalina Olteanu Universite Paris 1 - SAMOS CES 90 Rue de Tolbiac, 75013 Paris - France Abstract.
More informationsegmentation of nonstationary time series
Multiscale and multilevel technique for consistent segmentation of nonstationary time series Haeran Cho and Piotr Fryzlewicz July 19, 2013 Abstract In this paper, we propose a fast, well-performing, and
More informationSparse linear models
Sparse linear models Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 2/22/2016 Introduction Linear transforms Frequency representation Short-time
More informationPENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA
PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University
More informationResolving the White Noise Paradox in the Regularisation of Inverse Problems
1 / 32 Resolving the White Noise Paradox in the Regularisation of Inverse Problems Hanne Kekkonen joint work with Matti Lassas and Samuli Siltanen Department of Mathematics and Statistics University of
More informationA NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.
A EW IFORMATIO THEORETIC APPROACH TO ORDER ESTIMATIO PROBLEM Soosan Beheshti Munther A. Dahleh Massachusetts Institute of Technology, Cambridge, MA 0239, U.S.A. Abstract: We introduce a new method of model
More informationDay 4: Shrinkage Estimators
Day 4: Shrinkage Estimators Kenneth Benoit Data Mining and Statistical Learning March 9, 2015 n versus p (aka k) Classical regression framework: n > p. Without this inequality, the OLS coefficients have
More informationApproximate Bayesian Computation and Particle Filters
Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationSGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection
SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationComparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions
c 215 FORMATH Research Group FORMATH Vol. 14 (215): 27 39, DOI:1.15684/formath.14.4 Comparison with Residual-Sum-of-Squares-Based Model Selection Criteria for Selecting Growth Functions Keisuke Fukui 1,
More informationOpen Problems in Mixed Models
xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationFinancial Econometrics
Financial Econometrics Nonlinear time series analysis Gerald P. Dwyer Trinity College, Dublin January 2016 Outline 1 Nonlinearity Does nonlinearity matter? Nonlinear models Tests for nonlinearity Forecasting
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More informationSignal Denoising with Wavelets
Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationOpen Archive Toulouse Archive Ouverte
Open Archive Toulouse Archive Ouverte OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible This is an author s version
More informationAppendix 1 Model Selection: GARCH Models. Parameter estimates and summary statistics for models of the form: 1 if ɛt i < 0 0 otherwise
Appendix 1 Model Selection: GARCH Models Parameter estimates and summary statistics for models of the form: R t = µ + ɛ t ; ɛ t (0, h 2 t ) (1) h 2 t = α + 2 ( 2 ( 2 ( βi ht i) 2 + γi ɛt i) 2 + δi D t
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationLTI Systems, Additive Noise, and Order Estimation
LTI Systems, Additive oise, and Order Estimation Soosan Beheshti, Munther A. Dahleh Laboratory for Information and Decision Systems Department of Electrical Engineering and Computer Science Massachusetts
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationAdaptive Detection of Multiple Change Points in Asset Price Volatility
Adaptive Detection of Multiple Change Points in Asset Price Volatility Marc Lavielle 1 and Gilles Teyssière 2 1 Université René Descartes and Université Paris Sud, Laboratoire de Mathématiques. Marc.Lavielle@math.u-psud.fr
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationTuning Parameter Selection in L1 Regularized Logistic Regression
Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 2012 Tuning Parameter Selection in L1 Regularized Logistic Regression Shujing Shi Virginia Commonwealth University
More informationBIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation
BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)
More informationUsing CART to Detect Multiple Change Points in the Mean for large samples
Using CART to Detect Multiple Change Points in the Mean for large samples by Servane Gey and Emilie Lebarbier Research Report No. 12 February 28 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry,
More informationA simple nonparametric test for structural change in joint tail probabilities SFB 823. Discussion Paper. Walter Krämer, Maarten van Kampen
SFB 823 A simple nonparametric test for structural change in joint tail probabilities Discussion Paper Walter Krämer, Maarten van Kampen Nr. 4/2009 A simple nonparametric test for structural change in
More informationUncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department
Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in
More informationarxiv: v2 [stat.co] 1 Jul 2013
Fast estimation of the Integrated Completed Likelihood criterion for change-point detection problems with applications to Next-Generation Sequencing data arxiv:1211.3210v2 [stat.co] 1 Jul 2013 A. Cleynen
More informationHow the mean changes depends on the other variable. Plots can show what s happening...
Chapter 8 (continued) Section 8.2: Interaction models An interaction model includes one or several cross-product terms. Example: two predictors Y i = β 0 + β 1 x i1 + β 2 x i2 + β 12 x i1 x i2 + ɛ i. How
More informationLearning Sparse Penalties for Change-Point Detection using Max Margin Interval Regression
Learning Sparse Penalties for Change-Point Detection using Max Margin Interval Regression Guillem Rigaill rigaill@evry.inra.fr Unité de Recherche en Génomique Végétale INRA-CNRS-Université d Evry Val d
More informationHow New Information Criteria WAIC and WBIC Worked for MLP Model Selection
How ew Information Criteria WAIC and WBIC Worked for MLP Model Selection Seiya Satoh and Ryohei akano ational Institute of Advanced Industrial Science and Tech, --7 Aomi, Koto-ku, Tokyo, 5-6, Japan Chubu
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationComputational methods for mixed models
Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationPerformance of Autoregressive Order Selection Criteria: A Simulation Study
Pertanika J. Sci. & Technol. 6 (2): 7-76 (2008) ISSN: 028-7680 Universiti Putra Malaysia Press Performance of Autoregressive Order Selection Criteria: A Simulation Study Venus Khim-Sen Liew, Mahendran
More informationISSN Article. Selection Criteria in Regime Switching Conditional Volatility Models
Econometrics 2015, 3, 289-316; doi:10.3390/econometrics3020289 OPEN ACCESS econometrics ISSN 2225-1146 www.mdpi.com/journal/econometrics Article Selection Criteria in Regime Switching Conditional Volatility
More information