Applying the Q n Estimator Online

Similar documents
Robust Preprocessing of Time Series with Trends

Robust repeated median regression in moving windows with data-adaptive width selection SFB 823. Discussion Paper. Matthias Borowski, Roland Fried

Modified repeated median filters

On rank tests for shift detection in time series

ROBUST FILTERING OF TIME SERIES WITH TRENDS

Robust online scale estimation in time series : a modelfree

Robust scale estimation with extensions

Robust Online Scale Estimation in Time Series: A Regression-Free Approach

Robust Exponential Smoothing of Multivariate Time Series

Robust forecasting of non-stationary time series DEPARTMENT OF DECISION SCIENCES AND INFORMATION MANAGEMENT (KBI)

Efficient and Robust Scale Estimation

Graphical Techniques Stem and Leaf Box plot Histograms Cumulative Frequency Distributions

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

Detecting outliers in weighted univariate survey data

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter

Spatial autocorrelation: robustness of measures and tests

Chapter 14 Combining Models

Chapter 1 - Lecture 3 Measures of Location

Concepts and Applications of Kriging. Eric Krause

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

Algebra of Random Variables: Optimal Average and Optimal Scaling Minimising

FAST CROSS-VALIDATION IN ROBUST PCA

Fast and robust bootstrap for LTS

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Small Sample Corrections for LTS and MCD

The Data Cleaning Problem: Some Key Issues & Practical Approaches. Ronald K. Pearson

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

ECE521 week 3: 23/26 January 2017

Inference based on robust estimators Part 2

A nonparametric test for seasonal unit roots

Location Prediction of Moving Target

Section 2.3: One Quantitative Variable: Measures of Spread

Regression diagnostics

Math Section SR MW 1-2:30pm. Bekki George: University of Houston. Sections

ENVIRONMENTAL DATA ANALYSIS WILLIAM MENKE JOSHUA MENKE WITH MATLAB COPYRIGHT 2011 BY ELSEVIER, INC. ALL RIGHTS RESERVED.

Data Analysis and Statistical Methods Statistics 651

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Key Words: absolute value methods, correlation, eigenvalues, estimating functions, median methods, simple linear regression.

Concepts and Applications of Kriging

Learning Deep Broadband Hongjoo LEE

Mining Imperfect Data

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

L-momenty s rušivou regresí

Linear and Logistic Regression. Dr. Xiaowei Huang

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Heteroskedasticity in Time Series

Probability and Statistics for Final Year Engineering Students

Why is the field of statistics still an active one?

Signal Denoising with Wavelets

LARGE SAMPLE BEHAVIOR OF SOME WELL-KNOWN ROBUST ESTIMATORS UNDER LONG-RANGE DEPENDENCE

GEOSTATISTICS. Dr. Spyros Fountas

Concepts and Applications of Kriging

A New Method for Varying Adaptive Bandwidth Selection

THE BREAKDOWN POINT EXAMPLES AND COUNTEREXAMPLES

Robust control charts for time series data

Lecture 2 and Lecture 3

Robust statistics. Michael Love 7/10/2016

Introduction to Linear regression analysis. Part 2. Model comparisons

5. Linear Regression

Improvement of The Hotelling s T 2 Charts Using Robust Location Winsorized One Step M-Estimator (WMOM)

Measuring robustness

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Regression, Ridge Regression, Lasso

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Chapter 3. Measuring data

CS 534: Computer Vision Segmentation III Statistical Nonparametric Methods for Segmentation

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Metrics used to measure climate extremes

A Robust Strategy for Joint Data Reconciliation and Parameter Estimation

Highly Robust Variogram Estimation 1. Marc G. Genton 2

CIVL 7012/8012. Collection and Analysis of Information

Optimal normalization of DNA-microarray data

Robustness of the Contextual Bandit Algorithm to A Physical Activity Motivation Effect

STAT 443 Final Exam Review. 1 Basic Definitions. 2 Statistical Tests. L A TEXer: W. Kong

5. Linear Regression

Overview of Spatial Statistics with Applications to fmri

A CUSUM approach for online change-point detection on curve sequences

Multiple Regression Analysis

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

3 Joint Distributions 71

Numerical Measures of Central Tendency

Introduction to Statistical Inference

Statistics Toolbox 6. Apply statistical algorithms and probability models

BNG 495 Capstone Design. Descriptive Statistics

Time Series Analysis. James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEY

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Spatial Analysis II. Spatial data analysis Spatial analysis and inference

CHAPTER 1. Introduction

Measures of disease spread

Statistics for Managers using Microsoft Excel 6 th Edition

MULTIVARIATE TECHNIQUES, ROBUSTNESS

Streaming multiscale anomaly detection

Descriptive Statistics-I. Dr Mahmoud Alhussami

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Gaussian Process Regression Forecasting of Computer Network Conditions

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

Transcription:

Applying the Q n Estimator Online Robin Nunkesser 1 Karen Schettlinger 2 Roland Fried 2 1 Department of Computer Science, University of Dortmund 2 Department of Statistics, University of Dortmund GfKl 2007 Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 1 / 19

Outline 1 Introduction 2 Computation Offline Computation Online Computation Running Time Simulations 3 Application in a Simulation Study Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 2 / 19

Motivation: Intensive Care Online Monitoring Arterial Mean Pressure of Two Patients mmhg 90 100 110 120 mmhg 105 110 115 120 125 130 135 0 50 100 150 200 time [min] 0 50 100 150 200 250 time [min] Goal Fast detection of relevant patterns (level/trend changes) resisting the outliers Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 3 / 19

Signal and Noise Model for (y t ) Model y t = µ t + u t + v t µ t (signal): smooth with a few sudden shifts u t (observational noise): symmetric, mean zero v t (spiky noise): measurement artifacts Moving window (y t m+1,..., y t,..., y t+h ) to approximate µ t Choice of Width n = m + h variance, robustness bias, computation time, delay Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 4 / 19

The Q n Scale Estimator Definition (Rousseeuw and Croux (1993)) For n points x i R and h = n/2 + 1 the estimator Q n is defined as Q n = d n 2.2219{ x i x j ; i < j} ( h 2). Gaussian Efficiencies of Different Robust Scale Estimators efficiency 0 20 40 60 80 100 Qn Sn Length of the shortest half (LSH) Interquartile range (IQR) Median absolute deviation (MAD) 0 10 20 30 40 50 sample size Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 5 / 19

Problem Transformation Transformation of the Q n Q n = c { x i x j ; i < j} (k) = c {x (i) x (n j+1) ; 1 i, j n} (k+n+( n 2)) Q n Computation by Selecting a Rank in a Matrix of Type X + Y x (n) + ( x (1) ) 0. 0 x (n 1) + ( x (n) ).. x (2) + ( x (1) ) 0. 0 x (1) + ( x (2) ) x (1) + ( x (n) ) Definition (Hodges-Lehmann estimator) { } xi + x j ˆµ = median, 1 i j n 2 Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 6 / 19

The Offline Algorithm of Johnson and Mizoguchi 1 Select one of the (remaining) matrix elements 2 Divide the remaining elements in O(n) time into three parts by 1 Finding elements certainly smaller than the selected element 2 Finding elements certainly greater than the selected element 3 Determine one or two of the parts to exclude from the search 4 If less than n elements remain determine the sought-after element, else go to step 1 If the selection in step 1 is done by means of the weighted median, the algorithm takes optimal O(n log n) time Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 7 / 19

Concept of the Online Algorithm Use a buffer of size O(n) and data structures supporting fast update Indexed AVL tree Getting, inserting, removing and rank information available in O(log n) Each element in X, Y and the buffer is stored in an IAVL and supported by two pointers The Data Structure Used Updating needs O( inserted/deleted buffer elements log n) per update Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 8 / 19

Example Example of a Column Deletion Column Deletion 1 Search in X for the element that is to be deleted (O(log n)) 2 Follow pointers to the largest/smallest element in the column (O(1)) 3 Determine the rows of these elements (O(1)) 4 Compute and delete all elements in between from the buffer (O( deleted elements log n)) Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 9 / 19

Properties of the Online Algorithm Drawback In a worst-case scenario we still need O( n + k log n) accumulated time per update with k = O(n 2 ) But... If every update position has equal probability the expected accumulated time per update is O(log n) This case occurs e.g. for a constant signal with stationary noise General bound: O (Prob(most probable update position)n log n) We expect good behaviour in practice Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 10 / 19

Running Time Simulations Simulated Datasets 1 Constant signal with standard normal noise and 10% outliers 2 Additional shifts and trends Simulated Datasets 2 0 2 4 6 8 10 0 20 40 60 x x 0 50 100 150 200 time 0 50 100 150 200 time Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 11 / 19

Running Time Simulations Steps Performed per Update Measured Buffer Positions steps per update 0 100 200 300 400 500 0 100 200 300 400 500 n Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 12 / 19

Running Time Simulations Steps Performed per Update Measured Buffer Positions steps per update 0 100 200 300 400 500 0 100 200 300 400 500 n Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 12 / 19

Shift Detection Comparison of Two Window Level Estimates Right and Left of Time t Local constant level -2 0 2 4 6 X X t-m+1 t t+h time T (y t ) = s ˆµ t+ ˆµ t «ˆσ 2 c t+ h + ˆσ2 t m Two-sample t-test (Stein & Follow, 1985) Trimmed two-sample t-test (Hou & Koh, 2003) Median comparison (Bovik & Munson, 1986, Hwang & Haddad, 1994) Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 13 / 19

Power in the Standard Normal Case Widths m = h = 7 Widths m = h = 15 power 0 20 40 60 80 100 0 5 10 15 shift size power 0 20 40 60 80 100 0 1 2 3 4 5 6 shift size t-test Qn LSH MAD 30%-trimmed t-test Sn IQR Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 14 / 19

Power in Case of a Single Outlier of Increasing Size Widths m = h = 7 Widths m = h = 15 power 0 20 40 60 80 100 0 5 10 15 20 outlier size power 0 20 40 60 80 100 0 5 10 15 20 outlier size t-test Qn 30%-trimmed t-test Sn IQR Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 15 / 19

Increasing Number of Outliers Widths m = h = 7 Widths m = h = 15 power 0 20 40 60 80 100 power 0 20 40 60 80 100 0 1 2 3 4 5 6 7 number of outliers 0 2 4 6 8 10 12 14 number of outliers Qn Sn IQR Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 16 / 19

Example: Piecewise Constant Signal Moving window filters (width m = h = 9) mmhg 0 5 10 15 Modified Trimmed Mean Median with MAD Median with Qn 0 50 100 150 200 250 300 time [min] Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 17 / 19

Summary New online algorithm which is substantially faster in practice Shift detection based on half-window medians and Q n scale estimator Choose thresholds for shift detection based on estimated lag-1 autocorrelation derived using the Q n once more Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 18 / 19

Thank you! Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 19 / 19

Bibliography Bernholt, T., Fried, R., 2003. Computing the update of the repeated median regression line in linear time. Inf. Process. Lett. 88, 111 117. Bernholt, T., Fried, R., Gather, U., Wegener, I., 2006. Modified repeated median filters. Statistics and Computing 16 (2), 177 192. Davies, P. L., Fried, R., Gather, U., May 2004. Robust signal extraction for on-line monitoring data. J. Stat. Plan. Infer. 122 (1-2), 65 78. Fried, R., Jun. 2004. Robust filtering of time series with trends. Journal of Nonparametric Statistics 16 (3-4), 313 328. Fried, R., Bernholt, T., Gather, U., May 2006. Repeated median and hybrid filters. Comput. Statist. Data. Anal. 50 (9), 2313 2338. Gather, U., Fried, R., 2003. Robust scale estimation for local linear temporal trends. Tatra Mt. Math. Publ. 26, 87 101. Gather, U., Fried, R., 2004. Methods and algorithms for robust filtering. In: Proceedings of COMPSTAT 2004. Physica Verlag, Nunkesser, Schettlinger, Fried Applying the Q n Estimator Online GfKl 2007 20 / 19