Monitoring Random Start Forward Searches for Multivariate Data

Size: px
Start display at page:

Download "Monitoring Random Start Forward Searches for Multivariate Data"

Transcription

1 Monitoring Random Start Forward Searches for Multivariate Data Anthony C. Atkinson 1, Marco Riani 2, and Andrea Cerioli 2 1 Department of Statistics, London School of Economics London WC2A 2AE, UK, a.c.atkinson@lse.ac.uk 2 Dipartimento di Economia, Università diparma Parma, Italy, mriani@unipr.it, andrea.cerioli@unipr.it Abstract. During a forward search from a robustly chosen starting point the plot of maximum Mahalanobis distances of observations in the subset may provide a test for outliers. This is not the customary test. We obtain distributional results for this distance during the search and exemplify its use. However, if clusters are present in the data, searches from random starts are required for their detection. We show that our new statistic has the same distributional properties whether the searches have random or robustly chosen starting points. Keywords: clustering, Mahalanobis distance, order statistic, outlier detection, robustness 1 Introduction The forward search is a powerful general method for detecting systematic or random departures from statistical models, such as those caused by outliers and the presence of clusters. The forward search for multivariate data is given book-length treatment by Atkinson, Riani and Cerioli (2004). To detect outliers they study the evolution of Mahalanobis distances calculated during a search through the data that starts from a carefully selected subset of observations. More recently Atkinson and Riani (2007) suggested the use of many searches starting from random starting points as a tool in the detection of clusters. An important aspect of this work is the provision of bounds against which to judge the observed values of the distances. Atkinson and Riani (2007) use simulation for this purpose as well as providing approximate numerical values for the quantiles of the distribution. These theoretical results are for the minimum Mahalanobis distance of observations not in the subset used for fitting when the starting point of the search is robustly selected. In this paper we consider instead the alternative statistic of the maximum Mahalanobis distance amongst observations in the subset. We derive good approximations to its distribution during the forward search and empirically compare its distribution to that of the minimum distance, both for random and robust starts. We find for the maximum distance,

2 448 Atkinson, A.C. et al. but not for the minimum, that the distribution of the distance does not depend on how the search starts. Our ultimate purpose is a more automatic method of outlier and cluster identification. We start in 2 with an introduction to the forward search that emphasises the importance of Mahalanobis distances in outlier detection. Some introductory theoretical results for the distributions of distances are in 3. Section 4 introduces the importance of random start searches in cluster detection. Our main theoretical results are in 5 where we use results on order statistics to derive good approximations to the distribution of the maximum distance during the search. Our methods are exemplified in 6 by the analysis of data on horse mussels. The comparison of distributions for random and elliptical starts to the search is conducted by simulation in 7. 2 Mahalanobis distances and the forward search The tools that we use for outlier detection and cluster identification are plots of various Mahalanobis distances. The squared distances for the sample are defined as d 2 i (ˆµ, ˆΣ) ={y i ˆµ} T ˆΣ 1 {y i ˆµ}, (1) where ˆµ and ˆΣ are estimates of the mean and covariance matrix of the n observations. In the forward search the parameters µ and Σ are replaced by their standard unbiased estimators from a subset of m observations, yielding estimates ˆµ(m)and ˆΣ(m). From this subset we obtain n squared Mahalanobis distances d 2 i (m) ={y i ˆµ(m)} T ˆΣ 1 (m){y i ˆµ(m)}, i =1,...,n. (2) We start with a subset of m 0 observations which grows in size during the search. When a subset S(m) ofm observations is used in fitting, we order the squared distances and take the observations corresponding to the m +1 smallest as the new subset S(m + 1). In what we call normal progression this process augments the subset by one observation, but sometimes two or more observations enter as one or more leave. In our examples we look at forward plots of quantities derived from the distances d i (m). These distances tend to decrease as n increases. If interest is in the latter part of the search we may use scaled distances d sc i (m) =d i (m) ( ˆΣ(m) / ˆΣ(n) ) 1/2v, (3) where v is the dimension of the observations y and ˆΣ(n) is the estimate of Σ at the end of the search. To detect outliers Atkinson et al. (2004) and Atkinson and Riani (2007) examined the minimum Mahalanobis distance amongst observations not in the subset d min (m) =mind i (m) i/ S(m), (4)

3 Random Forward Search 449 or its scaled version d sc min (m). In either case let this be observation i min (m). If observation i min (m) is an outlier relative to the other m observations, the distance (4) will be large compared to its reference distribution. In this paper we investigate instead the properties of the maximum Mahalanobis distance amongst the m observations in the subset d max (m) =maxd i (m) i S(m), (5) letting this be observation i max (m). Whether we monitor d max (m) or d min (m) the search is the same, progressing through the ordering of d 2 i (m). 3 Minimum and maximum Mahalanobis distances We now consider the relationship between d min (m) andd max (m) as outlier tests. This relationship depends on the subsets S(m) and S(m + 1). Let the kth largest ordered Mahalanobis distance be d [k] (m) whenestimation is based on the subset S(m). In normal progression d [m+1] (m) =d min (m) (6) and S(m +1)isformedfromS(m) by the addition of observation i min (m). Likewise, in normal progression this will give rise to the largest distance within the new subset, that is d [m+1] (m +1)=d max (m +1)=d imin(m) (m +1). (7) The distance d imin(m) (m + 1) is that for the new observation i min (m) when the parameters are estimated from S(m + 1). The consequence of (7) is that both d max (m +1)andd min (m) are tests of the outlyingness of observation i min (m). Although both statistics are testing the same hypothesis they do not have the same numerical value and should be referred to different null distributions. In 5 we discuss the effect of the ranking of the observations on these distributions as well as the consequence of estimating µ and Σ from a subset of the observations. For estimates using all n observations d max (n) isoneof the distances in (1). Standard distributional results in, for example, Atkinson et al. (2004, 2.6) show that d 2 i (ˆµ, ˆΣ) =d 2 (n 1)2 i (n) n ( v Beta 2, n v 1 ). (8) 2 On the other hand, d 2 min (n 1) is a deletion distance in which the parameters are estimated, in general, with the omission of observation i. The distribution of such distances is d 2 (i) n v(n 2) (n 1) (n v 1) F v,n v 1, (9)

4 450 Atkinson, A.C. et al. although the distribution of d 2 min (n 1) depends on the order statistics from this distribution. For moderate n the range of the distribution of d 2 i (n) in (8) is approximately (0,n) rather than the unbounded range for the F distribution of the deletion distances. As we shall see, the consequence is that the distribution of d 2 max (m) has much shorter tails than that of d2 min (m), particularly for small m. Our argument has been derived assuming normal progression. This occurs under the null hypothesis of a single multivariate normal population when there are no outliers or clusters in the data, so that the ordering of the observations by closeness to the fitted model does not alter appreciably during the search. Then we obtain very similar forward plots of d max (m) andd min (m), even if they have to be interpreted against different null distributions. In fact, we do not need the order to remain unchanged, but only that i min (m) and i max (m + 1) are the same observation and that the other observations in S(m + 1) arethosethatwereins(m). Dispersed outliers likewise do not appreciably affect the ordering of the data. This is however affected by clusters of observations that cause appreciable changes in the parameter estimates as they enter S(m). A discussion of the ordering of observations within and without S(m) is on pp of Atkinson et al. (2004). 4 Elliptical and random starts To find the starting subset for the search Atkinson et al. (2004) use the robust bivariate boxplots of Zani, Riani and Corbellini (1998) to pick a starting set S (m 0 ) that excludes any two-dimensional outliers. The boxplots have elliptical contours, so we refer to this method as the elliptical start. However, if there are clusters in the data, the elliptical start may lead to a search in which observations from several clusters enter the subset in sequence in such a way that the clusters are not revealed. Searches from more than one starting point are then needed to reveal the clustering structure. If we start with an initial subset of observations from each cluster in turn, the other clusters are revealed as outliers. However, such a procedure is not suitable for automatic cluster detection. Atkinson and Riani (2007) therefore instead run many forward searches from randomly selected starting points, monitoring the evolution of the values of d min (m) as the searches progress. Here we monitor d max (m). As the search progresses, the examples of Atkinson and Riani (2007) show that the effect of the starting point decreases. Once two searches have the same subsets S(m) forsomem, they will have the same subsets for all successive m. Typically, in the last third of the search the individual searches from random starts converge to that from the elliptical start. The implication is that the same envelopes can be used, except in the very early stages of the search, whether we use random or elliptical starts. If we are looking for a few

5 Random Forward Search 451 outliers, we will be looking at the end of the search. However, the envelopes for d min (m) andd max (m) will be different. 5 Envelopes from order statistics For relatively small samples we can use simulation to obtain envelopes for d max (m) during the search. For larger samples we adapt the method of Riani, Atkinson and Cerioli (2007) who find very good approximations to the envelopes for d min (m) using order statistics and a result of Tallis (1963) on truncated multivariate normal distributions. Let Y [m] be the mth order statistic from a sample of size n from a univariate distribution with c.d.f. G(y). From, for example Lehmann (1991, p. 353) and Guenther (1977), the required quantile of order γ of the distribution of Y [m] say y m,n;γ can be obtained as y m,n;γ = G 1 ( m m +(n m +1)x 2(n m+1),2m;1 γ ), (10) where x 2(n m+1),2m;1 γ is the quantile of order 1 γ of the F distribution with 2(n m +1)and2m degrees of freedom. Riani et al. (2007) comment that care needs to be taken to ensure that the numerical calculation of this inverse distribution is sufficiently accurate as m n, particularly for large n and extreme γ. We now consider our choice of G(x), which is different from that of Riani et al. (2007). We estimate Σ on m 1 degrees of freedom. The distribution of the m distances in the subset can, from (8), be written as d 2 (m 1)2 i (m) m Beta ( v 2, m v 1 2 ), i S(m). (11) The estimate of Σ that we use is biased since it is calculated from the m observations in the subset that have been chosen as having the m smallest distances. However, in the calculation of the scaled distances (3) we approximately correct for this effect by multiplication by a ratio derived from estimates of Σ. So the envelopes for the scaled Mahalanobis distances derived from d max (m) aregivenby (m 1) 2 V m,γ = ym,n;γ, (12) m with G the beta distribution in (11). For unscaled distances we need to correct for the bias in the estimate of Σ. We follow Riani et al. (2007) and consider elliptical truncation in the multivariate normal distribution. From the results of Tallis (1963) they obtain the large-sample correction factor m/n c FS (m) = P (Xv+2 2 <χ2 v,m/n ), (13)

6 452 Atkinson, A.C. et al. with χ 2 v,m/n the m/n quantile of χ2 v and X2 v+2 a chi-squared random variable on v + 2 degrees of freedom. Envelopes for unscaled distances are then obtained by scaling up the values of the order statistics V m,γ = c FS(m)V m,γ. Figure 1 shows the agreement between simulated envelopes (continuous lines) and theoretical envelopes (dotted lines) for d max (m) whenn = Scaled distances are in the upper panel; agreement between the two sets of envelopes is excellent throughout virtually the whole range. Agreement for the unscaled distances in the lower panel of the figure is less good, but is certainly more than satisfactory for inferences about outliers at least in the last half of the search. Unfortunately, the inclusion of ˆΣ(n) in the expression for scaled distances (3) results in small distances in the presence of outliers, due to the inflation of the variance estimate and to consequent difficulties of interpretation. For practical data analysis we have to use the unscaled distances, which are less well approximated. 6 Horse mussels As an example of the uses of elliptical and random starts in the analysis of multivariate data we look at measurements on horse mussels from New Zealand introduced by Cook and Weisberg (1994, p. 161) who treat them as regression with muscle mass, the edible portion of the mussel, as response. They focus on independent transformations of the response and of one of the explanatory variables. Atkinson et al. (2004, 4.9) consider multivariate normality obtained by joint transformation of all five variables. There are 82 observations on five variables: shell length, width, height and mass and the mass of the mussels muscle, which is the edible part. We begin with an analysis of the untransformed data using a forward search with an elliptical start. The left-hand panel of Figure 2 monitors d min (m), whereas the right-hand panel monitors d max (m). The two sets of simulation envelopes were found by direct simulation of 5,000 forward searches. The figure shows how very different the two distributions are at the beginning of the search. That in the left-hand panel for d min (m) isderived from the unbounded F distribution (9) whereas that for d max (m) in the right-hand panel is derived from the beta distribution (11). The two traces are very similar once they are calibrated by the envelopes. They both show appreciable departure from multivariate normality in the last one third of the search. Since we are selecting observations by their closeness to the multivariate normal model, we expect departure, if any, to be at the end of the search. Even allowing for the scaling of the two plots, the maximum distances seem to show less fluctuation at the beginning of the search. For

7 Random Forward Search 453 Scaled MD Unscaled MD Fig. 1. Envelopes for Mahalanobis distances dmax(m) whenn = 1000 and v =5. Dotted lines from order statistics, continuous lines from 5,000 simulations. Upper panel scaled distances, lower panel unscaled distances. Elliptical starts. 1%, 50% and 99% points.

8 454 Atkinson, A.C. et al Fig. 2. Horse mussels: forward search on untransformed data. Left-hand panel d min (m), right-hand panel dmax(m). Elliptical starts; 1%, 50% and 99% points from 5,000 simulations. much of the rest of the paper we focus on plots of the maximum Mahalanobis distances d max (m). We now analyse the data using a multivariate version of the parametric transformation of Box and Cox (1964). As a result of their analysis Atkinson et al. (2004) suggest the vector of transformation parameters λ =(0.5, 0, 0.5, 0, 0) T ; that is, the square root transformation for y 1 and y 3 and the logarithmic transformation for the other three variables. We look at forward plots of d max (m) to see whether this transformation yields multivariate normality. The upper-left panel of Figure 3 shows the maximum distance for all n = 82 observations for the transformed data. The contrast with the righthand panel of Figure 2 is informative. The plot still goes out of the 99% envelope at the end of the search, but the number of outliers is much smaller, now only around 5. The last five units to enter are those numbered 37, 16, 78, 8 and finally 48. The plot of the maximum distance in the upper-right panel of Figure 3 shows that, with these five observations deleted, the last value just lies below the 99% point of the distribution. We have found a multivariate normal sample, after transformation, with five outliers. That there are five outliers, not four, is confirmed in the lower panel of Figure 3 where observation 37 has been reincluded. Now the plot of maximum distances goes outside the 99% envelope at the end of the search. The limits in figures like 3 have been simulated to have the required pointwise level, that is they are correct for each m considered independently. However, the probability that the observed trace of values of d max (m) exceeds a specific bound at least once during the search is much greater than the pointwise value. Atkinson and Riani (2006) evaluate such simultaneous probabilities; they are surprisingly high.

9 Random Forward Search Fig. 3. Horse mussels: forward search on transformed data. Plots of dmax(m) for three different sample sizes. Upper-left panel n = 82; upper-right panel, observations 37, 16, 78, 8 and 48 removed (n = 77). Lower panel, observation 37 re-included (n = 78). Elliptical starts; 1%, 50% and 99% points from 5,000 simulations. There are five outliers. 7 Elliptical and random starts We have analysed the mussels data and investigated the properties of d max (m) andd min (m) using searches with elliptical starts. Finally we look at the properties of plots of both distances when random starts are used, for example to aid in the identification of clusters. The upper panel of Figure 4 presents a comparison of simulated envelopes for random and elliptical starts for d max (m) from data with the same dimensions as the mussels data. For this small data set there is no operationally important difference between the two envelopes. The important conclusion is that, for larger data sets, we can use the approximations of 5 based on order statistics whether we are using random or elliptical starts. The surprising conclusion that we obtain the same envelopes for searches from elliptical or random starts however does not hold when instead we monitor d min (m). The lower panel of Figure 4 repeats the simulations for d min (m). Now there is a noticeable difference, during the first half of the search, between the envelopes for random and those from elliptical starts. We now consider the implications of this difference on the properties of individual searches. The left-hand panel of Figure 5 repeats the simulated envelopes for elliptical starts from the upper panel of Figure 4 and adds 250

10 456 Atkinson, A.C. et al. Max. MD Random start Elliptical start Min. MD Random start Elliptical start Fig. 4. Simulated envelopes for Mahalanobis distances when n = 82andv = 5. Upper panel, dmax(m), lower panel d min (m) Dotted lines from elliptical starts, continuous lines from random starts. 1%, 50% and 99% points from 5,000 simulations.

11 Random Forward Search 457 trajectories of distances for d max (m) for simulations from random starting points. The simulated values nicely fill the envelope, although there are a surprising number of transient peaks above the envelope. The right-hand panel of the figure repeats this process of envelopes from elliptical starts and trajectories from random starts but for the minimum distances d min (m). Now, as we would expect from Figure 4, the simulated values sit a little low in the envelopes. If a subset contains one or more outliers, these will give rise to a too large estimate of Σ. As a consequence, some of the distances of units not included in the subset will be too small and the smallest of these will be selected as d min (m). On the contrary, if outliers are present in S(m) when we calculate d max (m), the distance that we look at will be that for one of the outliers and so will not be shrunken due to the too-large estimate of Σ Fig. 5. Simulated envelopes from elliptical starts for Mahalanobis distances when n = 82 and v = 5 with 250 trajectories from random starts. Left-hand panel dmax(m), right-hand panel d min (m). Our justification for the use of random start forward searches was that searches from elliptical starts may not detect clusters in the data if these start from a subset of units in more than one cluster. We have however analysed the mussels data using values of d max (m) from elliptical starts. Our conclusion, from Figure 3, was that after transformation there were 77 units from a multivariate normal population and five outliers. We checked this conclusion using random start forward searches with d max (m) and failed to detect any clusters. The purpose of this paper has been to explore the properties of the maximum distance d max (m). We have found its null distribution and obtained good approximations to this distribution for use in the forward search.the lack of dependence of this distribution on the starting point of the search is an appealing feature. However, we need to investigate the properties of this measure when the null distribution does not hold. One particular question is whether use of d max (m) provides tests for outliers and clusters that are as powerful as those using the customary minimum distance d min (m).

12 458 Atkinson, A.C. et al. References ATKINSON, A.C. and RIANI, M. (2000): Robust Diagnostic Regression Analysis. Springer-Verlag, New York. ATKINSON, A.C. and RIANI, M. (2006): Distribution theory and simulations for tests of outliers in regression. Journal of Computational and Graphical Statistics 15, ATKINSON, A.C. and RIANI, M. (2007): Exploratory tools for clustering multivariate data. Computational Statistics and Data Analysis 52, doi: /j.csda ATKINSON, A.C., RIANI, M. and CERIOLI, A (2004): Exploring Multivariate Data with the Forward Search. Springer-Verlag, New York. BOX, G.E.P. and COX, D.R. (1964): An analysis of transformations (with discussion). Journal of the Royal Statistical Society, Series B 26, COOK, R.D. and WEISBERG, S. (1994): An Introduction to Regression Graphics. Wiley, New York. GUENTHER, W.C. (1977): An easy method for obtaining percentage points of order statistics. Technometrics 19, LEHMANN, E. (1991): Point Estimation, 2nd edition. Wiley, New York. RIANI, M., ATKINSON, A.C. and CERIOLI, A. (2007): Results in finding an unknown number of multivariate outliers in large data sets. Research Report 140, Department of Statistics, London School of Economics. TALLIS, G.M.(1963): Elliptical and radial truncation in normal samples. Annals of Mathematical Statistics 34, ZANI, S., RIANI, M. and CORBELLINI, A. (1998): Robust bivariate boxplots and multiple outlier detection. Computational Statistics and Data Analysis 28,

TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION. Anthony Atkinson, 25th March 2014

TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION. Anthony Atkinson, 25th March 2014 TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION Anthony Atkinson, 25th March 2014 Joint work with Marco Riani, Parma Department of Statistics London School of Economics London WC2A 2AE, UK a.c.atkinson@lse.ac.uk

More information

Accurate and Powerful Multivariate Outlier Detection

Accurate and Powerful Multivariate Outlier Detection Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di

More information

COMPUTER DATA ANALYSIS AND MODELING

COMPUTER DATA ANALYSIS AND MODELING MINISTRY OF EDUCATION BELARUSIAN STATE UNIVERSITY NATIONAL RESEARCH CENTER FOR APPLIED PROBLEMS OF MATHEMATICS AND INFORMATICS SCHOOL OF APPLIED MATHEMATICS AND INFORMATICS BELARUSIAN REPUBLICAN FOUNDATION

More information

Finding an unknown number of multivariate outliers

Finding an unknown number of multivariate outliers J. R. Statist. Soc. B (2009) 71, Part 2, pp. Finding an unknown number of multivariate outliers Marco Riani, Università di Parma, Italy Anthony C. Atkinson London School of Economics and Political Science,

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 54 (2010) 3300 3312 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Robust

More information

Cluster Detection and Clustering with Random Start Forward Searches

Cluster Detection and Clustering with Random Start Forward Searches To appear in the Journal of Applied Statistics Vol., No., Month XX, 9 Cluster Detection and Clustering with Random Start Forward Searches Anthony C. Atkinson a Marco Riani and Andrea Cerioli b a Department

More information

Journal of the Korean Statistical Society

Journal of the Korean Statistical Society Journal of the Korean Statistical Society 39 (2010) 117 134 Contents lists available at ScienceDirect Journal of the Korean Statistical Society journal homepage: www.elsevier.com/locate/jkss The forward

More information

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy

Introduction to Robust Statistics. Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Introduction to Robust Statistics Anthony Atkinson, London School of Economics, UK Marco Riani, Univ. of Parma, Italy Multivariate analysis Multivariate location and scatter Data where the observations

More information

Supplementary Material for Wang and Serfling paper

Supplementary Material for Wang and Serfling paper Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness

More information

Robust Bayesian regression with the forward search: theory and data analysis

Robust Bayesian regression with the forward search: theory and data analysis TEST (2017) 26:869 886 DOI 10.1007/s11749-017-0542-6 ORIGINAL PAPER Robust Bayesian regression with the forward search: theory and data analysis Anthony C. Atkinson 1 Aldo Corbellini 2 Marco Riani 2 Received:

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Original Article Robust correlation coefficient goodness-of-fit test for the Gumbel distribution Abbas Mahdavi 1* 1 Department of Statistics, School of Mathematical

More information

A Note on Visualizing Response Transformations in Regression

A Note on Visualizing Response Transformations in Regression Southern Illinois University Carbondale OpenSIUC Articles and Preprints Department of Mathematics 11-2001 A Note on Visualizing Response Transformations in Regression R. Dennis Cook University of Minnesota

More information

HANDBOOK OF APPLICABLE MATHEMATICS

HANDBOOK OF APPLICABLE MATHEMATICS HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester

More information

Computational Statistics and Data Analysis

Computational Statistics and Data Analysis Computational Statistics and Data Analysis 65 (2013) 29 45 Contents lists available at SciVerse ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Robust

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Preface. Why We Wrote This Book

Preface. Why We Wrote This Book Preface Why We Wrote This Book This book is about using graphs to explore and model continuous multivariate data. Such data are often modelled using the multivariate normal distribution and, indeed, there

More information

Detecting outliers in weighted univariate survey data

Detecting outliers in weighted univariate survey data Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,

More information

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern. STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

A robust statistical analysis of the 1988 Turin Shroud radiocarbon dating results G. Fanti 1, F. Crosilla 2, M. Riani 3, A.C. Atkinson 4 1 Department of Mechanical Engineering University of Padua, Italy,

More information

Identifying and accounting for outliers and extreme response patterns in latent variable modelling

Identifying and accounting for outliers and extreme response patterns in latent variable modelling Identifying and accounting for outliers and extreme response patterns in latent variable modelling Irini Moustaki Athens University of Economics and Business Outline 1. Define the problem of outliers and

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Diagnostics and Remedial Measures

Diagnostics and Remedial Measures Diagnostics and Remedial Measures Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Diagnostics and Remedial Measures 1 / 72 Remedial Measures How do we know that the regression

More information

Some Results on the Multivariate Truncated Normal Distribution

Some Results on the Multivariate Truncated Normal Distribution Syracuse University SURFACE Economics Faculty Scholarship Maxwell School of Citizenship and Public Affairs 5-2005 Some Results on the Multivariate Truncated Normal Distribution William C. Horrace Syracuse

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Package ForwardSearch

Package ForwardSearch Package ForwardSearch February 19, 2015 Type Package Title Forward Search using asymptotic theory Version 1.0 Date 2014-09-10 Author Bent Nielsen Maintainer Bent Nielsen

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software October 215, Volume 67, Code Snippet 1. doi: 1.18637/jss.v67.c1 The Forward Search for Very Large Datasets Marco Riani Università di Parma Domenico Perrotta European

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

In most cases, a plot of d (j) = {d (j) 2 } against {χ p 2 (1-q j )} is preferable since there is less piling up

In most cases, a plot of d (j) = {d (j) 2 } against {χ p 2 (1-q j )} is preferable since there is less piling up THE UNIVERSITY OF MINNESOTA Statistics 5401 September 17, 2005 Chi-Squared Q-Q plots to Assess Multivariate Normality Suppose x 1, x 2,..., x n is a random sample from a p-dimensional multivariate distribution

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Chapter 2: Inference about the mean vector(s) Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 In this chapter we will discuss inference

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series A, Pt. 1, pp. 144 149 A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE By M. MERCEDES SUÁREZ RANCEL and MIGUEL A. GONZÁLEZ SIERRA University

More information

New robust dynamic plots for regression mixture detection

New robust dynamic plots for regression mixture detection Adv Data Anal Classif (2009) 3:263 279 DOI 10.1007/s11634-009-0050-y REGULAR ARTICLE New robust dynamic plots for regression mixture detection Domenico Perrotta Marco Riani Francesca Torti Received: 12

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

Identification of Multivariate Outliers: A Performance Study

Identification of Multivariate Outliers: A Performance Study AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

The power of (extended) monitoring in robust clustering

The power of (extended) monitoring in robust clustering Statistical Methods & Applications manuscript No. (will be inserted by the editor) The power of (extended) monitoring in robust clustering Alessio Farcomeni and Francesco Dotto Received: date / Accepted:

More information

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems Jeremy S. Conner and Dale E. Seborg Department of Chemical Engineering University of California, Santa Barbara, CA

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

of the Citizens, Support to External Security Unit, Ispra, Italy

of the Citizens, Support to External Security Unit, Ispra, Italy Robust Methods for Complex Data ( ) Metodi robusti per l analisi di dati complessi Marco Riani 1, Andrea Cerioli 1, Domenico Perrotta 2, Francesca Torti 2 1 Dipartimento di Economia, Università di Parma,

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) National Advanced Placement (AP) Statistics Course Outline (Grades 9-12) Following is an outline of the major topics covered by the AP Statistics Examination. The ordering here is intended to define the

More information

Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information

Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information Testing Equality of Two Intercepts for the Parallel Regression Model with Non-sample Prior Information Budi Pratikno 1 and Shahjahan Khan 2 1 Department of Mathematics and Natural Science Jenderal Soedirman

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Two sample T 2 test 1 Two sample T 2 test 2 Analogous to the univariate context, we

More information

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus Multivariate Capability Analysis Using Statgraphics Presented by Dr. Neil W. Polhemus Multivariate Capability Analysis Used to demonstrate conformance of a process to requirements or specifications that

More information

Regression III Lecture 1: Preliminary

Regression III Lecture 1: Preliminary Regression III Lecture 1: Preliminary Dave Armstrong University of Western Ontario Department of Political Science Department of Statistics and Actuarial Science (by courtesy) e: dave.armstrong@uwo.ca

More information

Diagnostics for Linear Models With Functional Responses

Diagnostics for Linear Models With Functional Responses Diagnostics for Linear Models With Functional Responses Qing Shen Edmunds.com Inc. 2401 Colorado Ave., Suite 250 Santa Monica, CA 90404 (shenqing26@hotmail.com) Hongquan Xu Department of Statistics University

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

On the Distribution of Hotelling s T 2 Statistic Based on the Successive Differences Covariance Matrix Estimator

On the Distribution of Hotelling s T 2 Statistic Based on the Successive Differences Covariance Matrix Estimator On the Distribution of Hotelling s T 2 Statistic Based on the Successive Differences Covariance Matrix Estimator JAMES D. WILLIAMS GE Global Research, Niskayuna, NY 12309 WILLIAM H. WOODALL and JEFFREY

More information

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics). Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

Later in the same chapter (page 45) he asserted that

Later in the same chapter (page 45) he asserted that Chapter 7 Randomization 7 Randomization 1 1 Fisher on randomization..................... 1 2 Shoes: a paired comparison................... 2 3 The randomization distribution................. 4 4 Theoretical

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

arxiv: v1 [stat.me] 5 Apr 2013

arxiv: v1 [stat.me] 5 Apr 2013 Logistic regression geometry Karim Anaya-Izquierdo arxiv:1304.1720v1 [stat.me] 5 Apr 2013 London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK e-mail: karim.anaya@lshtm.ac.uk and Frank Critchley

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Computational Connections Between Robust Multivariate Analysis and Clustering

Computational Connections Between Robust Multivariate Analysis and Clustering 1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,

More information

Outlier Detection via Feature Selection Algorithms in

Outlier Detection via Feature Selection Algorithms in Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS032) p.4638 Outlier Detection via Feature Selection Algorithms in Covariance Estimation Menjoge, Rajiv S. M.I.T.,

More information

Measurement And Uncertainty

Measurement And Uncertainty Measurement And Uncertainty Based on Guidelines for Evaluating and Expressing the Uncertainty of NIST Measurement Results, NIST Technical Note 1297, 1994 Edition PHYS 407 1 Measurement approximates or

More information

Bootstrapping Hypotheses Tests

Bootstrapping Hypotheses Tests Southern Illinois University Carbondale OpenSIUC Research Papers Graduate School Summer 2015 Bootstrapping Hypotheses Tests Chathurangi H. Pathiravasan Southern Illinois University Carbondale, chathurangi@siu.edu

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4. Multivariate normal distribution Reading: AMSA: pages 149-200 Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016 1. Density and properties Brief outline

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Finite Population Sampling and Inference

Finite Population Sampling and Inference Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY SECOND YEAR B.Sc. SEMESTER - III SYLLABUS FOR S. Y. B. Sc. STATISTICS Academic Year 07-8 S.Y. B.Sc. (Statistics)

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

4 Statistics of Normally Distributed Data

4 Statistics of Normally Distributed Data 4 Statistics of Normally Distributed Data 4.1 One Sample a The Three Basic Questions of Inferential Statistics. Inferential statistics form the bridge between the probability models that structure our

More information

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes

More information

Regression diagnostics

Regression diagnostics Regression diagnostics Kerby Shedden Department of Statistics, University of Michigan November 5, 018 1 / 6 Motivation When working with a linear model with design matrix X, the conventional linear model

More information

For Bonnie and Jesse (again)

For Bonnie and Jesse (again) SECOND EDITION A P P L I E D R E G R E S S I O N A N A L Y S I S a n d G E N E R A L I Z E D L I N E A R M O D E L S For Bonnie and Jesse (again) SECOND EDITION A P P L I E D R E G R E S S I O N A N A

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Reliability of inference (1 of 2 lectures)

Reliability of inference (1 of 2 lectures) Reliability of inference (1 of 2 lectures) Ragnar Nymoen University of Oslo 5 March 2013 1 / 19 This lecture (#13 and 14): I The optimality of the OLS estimators and tests depend on the assumptions of

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

TESTING FOR CO-INTEGRATION

TESTING FOR CO-INTEGRATION Bo Sjö 2010-12-05 TESTING FOR CO-INTEGRATION To be used in combination with Sjö (2008) Testing for Unit Roots and Cointegration A Guide. Instructions: Use the Johansen method to test for Purchasing Power

More information

Detection of outliers in multivariate data:

Detection of outliers in multivariate data: 1 Detection of outliers in multivariate data: a method based on clustering and robust estimators Carla M. Santos-Pereira 1 and Ana M. Pires 2 1 Universidade Portucalense Infante D. Henrique, Oporto, Portugal

More information

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication G. S. Maddala Kajal Lahiri WILEY A John Wiley and Sons, Ltd., Publication TEMT Foreword Preface to the Fourth Edition xvii xix Part I Introduction and the Linear Regression Model 1 CHAPTER 1 What is Econometrics?

More information

YUN WU. B.S., Gui Zhou University of Finance and Economics, 1996 A REPORT. Submitted in partial fulfillment of the requirements for the degree

YUN WU. B.S., Gui Zhou University of Finance and Economics, 1996 A REPORT. Submitted in partial fulfillment of the requirements for the degree A SIMULATION STUDY OF THE ROBUSTNESS OF HOTELLING S T TEST FOR THE MEAN OF A MULTIVARIATE DISTRIBUTION WHEN SAMPLING FROM A MULTIVARIATE SKEW-NORMAL DISTRIBUTION by YUN WU B.S., Gui Zhou University of

More information

The S-estimator of multivariate location and scatter in Stata

The S-estimator of multivariate location and scatter in Stata The Stata Journal (yyyy) vv, Number ii, pp. 1 9 The S-estimator of multivariate location and scatter in Stata Vincenzo Verardi University of Namur (FUNDP) Center for Research in the Economics of Development

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009 EAMINERS REPORT & SOLUTIONS STATISTICS (MATH 400) May-June 2009 Examiners Report A. Most plots were well done. Some candidates muddled hinges and quartiles and gave the wrong one. Generally candidates

More information

PAijpam.eu ADAPTIVE K-S TESTS FOR WHITE NOISE IN THE FREQUENCY DOMAIN Hossein Arsham University of Baltimore Baltimore, MD 21201, USA

PAijpam.eu ADAPTIVE K-S TESTS FOR WHITE NOISE IN THE FREQUENCY DOMAIN Hossein Arsham University of Baltimore Baltimore, MD 21201, USA International Journal of Pure and Applied Mathematics Volume 82 No. 4 2013, 521-529 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v82i4.2

More information

FAST CROSS-VALIDATION IN ROBUST PCA

FAST CROSS-VALIDATION IN ROBUST PCA COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 FAST CROSS-VALIDATION IN ROBUST PCA Sanne Engelen, Mia Hubert Key words: Cross-Validation, Robustness, fast algorithm COMPSTAT 2004 section: Partial

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Geochemical Data Evaluation and Interpretation

Geochemical Data Evaluation and Interpretation Geochemical Data Evaluation and Interpretation Eric Grunsky Geological Survey of Canada Workshop 2: Exploration Geochemistry Basic Principles & Concepts Exploration 07 8-Sep-2007 Outline What is geochemical

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Modified confidence intervals for the Mahalanobis distance

Modified confidence intervals for the Mahalanobis distance Modified confidence intervals for the Mahalanobis distance Paul H. Garthwaite a, Fadlalla G. Elfadaly a,b, John R. Crawford c a School of Mathematics and Statistics, The Open University, MK7 6AA, UK; b

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 0 11 1 1.(5) Give the result of the following matrix multiplication: 1 10 1 Solution: 0 1 1 2

More information

Forecasting Levels of log Variables in Vector Autoregressions

Forecasting Levels of log Variables in Vector Autoregressions September 24, 200 Forecasting Levels of log Variables in Vector Autoregressions Gunnar Bårdsen Department of Economics, Dragvoll, NTNU, N-749 Trondheim, NORWAY email: gunnar.bardsen@svt.ntnu.no Helmut

More information

Introduction to statistics. Laurent Eyer Maria Suveges, Marie Heim-Vögtlin SNSF grant

Introduction to statistics. Laurent Eyer Maria Suveges, Marie Heim-Vögtlin SNSF grant Introduction to statistics Laurent Eyer Maria Suveges, Marie Heim-Vögtlin SNSF grant Recent history at the Observatory Request of something on statistics from PhD students, because of an impression of

More information