Robust Statistics. Frank Klawonn

Size: px
Start display at page:

Download "Robust Statistics. Frank Klawonn"

Transcription

1 Robust Statistics Frank Klawonn Data Analysis and Pattern Recognition Lab Department of Computer Science University of Applied Sciences Braunschweig/Wolfenbüttel, Germany Bioinformatics & Statistics Helmholtz Centre for Infection Research Braunschweig, Germany Robust Statistics p.1/98

2 Outline Motivation: Mean or median ffl What is robust statistics? ffl M-estimators ffl Robust regression ffl Median polish ffl Summary and references ffl Robust Statistics p.2/98

3 Motivation: Mean or median Imagine a small town with 20 thousand ffl inhabitants. In average, each inhabitant has a capital of 10 ffl thousand $. Assume a very rich man named Bill G. owning a ffl capital of 20 billion $ decides to move to this town. After Bill G. has settled there, the inhabitants ffl own an average capital of roughly one million $. Robust Statistics p.3/98

4 Motivation: Mean or median Imagine a small town with 20 thousand ffl inhabitants. In average, each inhabitant has a capital of 10 ffl thousand $. Assume a very rich man named Bill G. owning a ffl capital of 20 billion $ decides to move to this town. After Bill G. has settled there, the inhabitants ffl own an average capital of roughly one million $. And all but one inhabitants might own less capital ffl than average. Robust Statistics p.4/98

5 (Empirical) median (1) ; : : : ; x (n) denotes a sample in ascending order. x Definition. The (sample or empirical) median denoted by ~x, isgivenby 8 < n+1 x ) ( 2 if n is odd ~x = : x ( n 2 ) +x ( n 2 + 1) if n is even 2 Robust Statistics p.5/98

6 (Empirical) median E= E= A L A Robust Statistics p.6/98

7 Motivation: Mean or median A less extreme example: Robust Statistics p.7/98

8 Motivation: Mean or median A less extreme example: Robust Statistics p.8/98

9 Motivation: Mean or median A less extreme example: Robust Statistics p.9/98

10 What is a good estimator? Assume, we want to estimate the expected value of a normal distribution from which a sample was generated. For a symmetric distributions like the normal distribution, the expected value and median are equal. The median q 0:5 of a (continuous) probability distribution, representing the random variable X, is the 50%-quantile, i.e. P (X» q 0:5 ) = 0:5 = P (X q 0:5 ): Robust Statistics p.10/98

11 What is a good estimator? Classical statistics: (a) The estimator should be correct in average (unbiased), at least for large sample sizes (asymptotically unbiased). (b) The estimator should have a small variance (efficiency). (c) With increasing sample size the variance of the estimator should tend to zero. (a) and (b) together guarantee consistency: With increasing sample size, the estimator converges with probability one to the true value of the parameter to be estimated. Robust Statistics p.11/98

12 What is a good estimator? Should we choose the mean or the median to estimate the expected value μ of our normal distribution? Both estimators are consistent. Robust Statistics p.12/98

13 Mean or median Histogram for the estimation of the mean, n= 20 Histogram for the estimation of the median, n= 20 Frequency Frequency mean x median x Robust Statistics p.13/98

14 Mean or median Histogram for the estimation of the mean, n= 100 Histogram for the estimation of the median, n= 100 Frequency Frequency mean x median x Robust Statistics p.14/98

15 Mean or median Histogram for the estimation of the mean, n= 20 Histogram for the estimation of the mean, n= 20 Frequency Frequency mean x mean (5% noise) xxx Robust Statistics p.15/98

16 Mean or median Histogram for the estimation of the mean, n= 100 Histogram for the estimation of the mean, n= 100 Frequency Frequency mean x mean (5% noise) x Robust Statistics p.16/98

17 Mean or median Histogram for the estimation of the median, n= 20 Histogram for the estimation of the median, n= 20 Frequency Frequency median x median (5% noise) x Robust Statistics p.17/98

18 Mean or median Histogram for the estimation of the median, n= 100 Histogram for the estimation of the median, n= 100 Frequency Frequency median x median (5% noise) x Robust Statistics p.18/98

19 Mean or median Under the ideal assumption that the data were ffl sampled from a normal distribution, the mean is a more efficient estimator than the median. If a small fraction of the data is for some reason ffl erroneous or generated by another distribution, the mean can even become a biased estimator and lose consistency. The median is more or less not affected if a small ffl fraction of the data is corrupted. Robust Statistics p.19/98

20 Robust statistics Hampel et al. (1986): In a broad informal sense, robust statistics is a body of knowledge, partly formalized into theories of robustness, relating to deviations from idealized assumptions in statistics. Robust Statistics p.20/98

21 Robust statistics idealized assumption: The data are sampled from the (possibly multivariate) random variable X with cumulative distribution function F X. modified assumption: The data are sampled from a random variable with "-contaminated cumulative distribution function F " = (1 ")F X + "F outliers : ffl F X : The assumed ideal model distribution ffl ": (small) probability for outliers ffl F outliers : unknown and unspecified distribution Robust Statistics p.21/98

22 nx (X i μ X) 2 Estimators (Statistics) Statistics is concerned with functionals t (or better t n ) called statistics which are used for parameter estimation and other purposes. The mean t n (X 1 ; : : : ; X n ) = μ X = 1 n nx X i ; the median or the (empirical) variance i=1 1 t n (X 1 ; : : : ; X n ) = s 2 = n 1 are typical examples for estimators. i=1 Robust Statistics p.22/98

23 Estimators (Statistics) Two views of estimators: Applied to (finite) samples (x 1 ; : : : ; x n ) resulting ffl in a concrete estimation (a realization of a random experiment consisting of the drawn sample). As random variables (applied to random ffl variables). This enables us to investigate the (theoretical) properties of estimators. Samples are not needed for this purpose. Robust Statistics p.23/98

24 Estimators (Statistics) Assuming an infinite sample size, the limit in probability t(f X ) = lim t n(x 1 ; : : : ; X n ) n!1 can be considered (in case it exists). t(f X ) is then again a random variable. For typical estimators, t(f X ) is a constant random variable, i.e. the limit converges (with probability 1) to a unique value. Robust Statistics p.24/98

25 Fisher consistency An estimator t is called Fisher consistent for a paramater of probability distribution X if t(f X ) = ; i.e. for large (infinite) sample sizes, the estimator converges with probability 1 to the true value of the parameter to be estimated. Robust Statistics p.25/98

26 Empirical influence function Given a sample (x 1 ; : : : ; x n ) and an estimator t n (x 1 ; : : : ; x n ), what is the influence of a single observation on t? Empirical influence function: = t n+1 (x 1 ; : : : ; x n ; x) EIF(x) x Vary 1 between and 1. Robust Statistics p.26/98

27 Empirical influence function Consider the (ordered) sample 0.4, 1.2, 1.4, 1.5, 1.7, 2.0, 2.9, 3.8, 3.8, 4.2 μx = 2:29 med(x) = 1:85 μx 10% = 2:2875 The (ff-)trimmed mean is the mean of the sample from which the lowest and highest 100 ff% values are removed. (For the mean: ff = 0, for the median: ff = 0:5.) Robust Statistics p.27/98

28 Empirical influence function mean(x) median(x) trimmedmean(x) Robust Statistics p.28/98

29 Sensitivity curve The (empirical) sensitivity curve is a normalized EIF (centred around 0 and scaled according to the sample size): = t n+1(x 1 ; : : : ; x n ; x) t n (x 1 ; : : : ; x n ) 1 SC(x) n+1 15 mean(x) median(x) trimmedmean(x) Robust Statistics p.29/98

30 1 n 1 F + n ffi x t(f ) 1 t Influence function The influence function corresponds to the sensitivity curve for large (infinite) sample sizes. IF(x; t; F ) = lim n!1 x represents a (cumulative probability) distribution ffi yielding the x value with probability 1. In this sense, the influence function measures what happens with the estimator for an infinitesimal small contamination for large sample sizes. Note that the influence function might not be defined if the limit does not exist. 1 n Robust Statistics p.30/98

31 Gross-error sensitivity The worst case (in terms of the outlier x) is called gross-error sensitivity. Λ (t; F ) = sup fl fjif(x; t; F )jg x If fl Λ (t; F ) is finite, t is called a B-robust estimator (B stands for bias) (at F ). For the arithmetic mean, we have fl Λ (μx; F ) = 1. For the median and the trimmed mean, the gross-error sensitivity depends on the sample F distribution. Robust Statistics p.31/98

32 Breakdown point The influence curve and the gross-error sensitivity characterise the influence of single (or even infinitesimal) outliers. A minimum requirement for robustness is that the influence curve is bounded. What happens when the fraction of outliers increases? Robust Statistics p.32/98

33 Breakdown point The breakdown point is the smallest fraction of (extreme) outliers that need to be included in a sample in order to let the estimator break down completely, i.e. yield (almost) infinity. Let hd((x 1 ; : : : ; x n ); (y 1 ; : : : ; y n )) = jfi 2 f1; : : : ; ng j x i 6= y i gj denote the Hamming distance between two samples (x 1 ; : : : ; x n ) and (y 1 ; : : : ; y n ). Robust Statistics p.33/98

34 fi fi fi fi supfjt(y 1; : : : ; y n )j j ) Breakdown point The breakdown point of an estimator t is defined as " Λ n (t; x 1; : : : ; x n ) = 1 n min (m fi hd((x 1 ; : : : ; x n ); (y 1 ; : : : ; y n )) = mg = 1 : " Normally, n is independent of the specific choice of the sample Λ 1 ; : : : ; x n ). (x Robust Statistics p.34/98

35 Breakdown point " If n is independent of the sample, for large (infinite) sample sizes the breakdown point is defined as Λ Examples: Λ = lim " "Λ n : n!1 Arithmetic mean: " Λ = 0% Median: " Λ = 50% ff-trimmed mean: " Λ = ff Robust Statistics p.35/98

36 Criteria for robust estimators Bounded influence function: Single extreme ffl outliers cannot do too much harm to the estimator. Low gross-error sensitivity ffl Positive breakdown point (the higher, the better): ffl Even a number of outliers can be tolerated without leading to nonsense estimations. Fisher consistency: For very large sample sizes ffl the estimator will yield the correct value. High efficiency: The variance of the estimator ffl should be as low as possible. Robust Statistics p.36/98

37 Criteria for robust estimators There is no way to satisfy all criteria in the best way at the same time. There is a trade-off between robustness issues like positive breakdown point and low gross-error sensitivity on the one hand and efficiency on the other hand. As an example, compare the mean (high efficiency, breakdown point 0) and the median (lower efficiency, but very good breakdown point). Robust Statistics p.37/98

38 Robust measures of spread The (empirical) variance suffers from the same problems as the mean. (The estimation of the variance usually includes an estimation of the mean.) An example for a more robust estimator for spread is the interquartile range, the difference between the 75%- and the 25%-quantile. (The q%-quantile is the value x in the sample for which q% are smaller than x and (100 q)% are larger than x.) Robust Statistics p.38/98

39 E (X μ) 2 : nx Error measures The expected value μ minimizes the error function Correspondingly, the arithmetic mean μx minimizes the error function (x i μx) 2 : i=1 Robust Statistics p.39/98

40 nx Error measures The median q 0:5 minimizes the error function E (jx q 0:5 j) : Correspondingly, the (sample) median ~x minimies the error function jx i ~xj: i=1 Robust Statistics p.40/98

41 Error measures This also explains, why the median is less sensitive to outliers: The quadratic error for the mean punishes outlier much stronger than the absolute error. Therefore, extreme outliers have a higher influence ( pull stronger ) than other points. Robust Statistics p.41/98

42 Error measures How to measure errors? The error for an estimation ^ including the sign is = x i ^ : n i=1 Minimizing i does not make sense. e P e i ffl Usually inf ^ P n i=1 e i = 1. Even if we require P n ffl e i 0, a small value for i=1 n i=1 e i does not mean that the errors e P i are small. There might be large positive and large negative errors that balance each other. Robust Statistics p.42/98

43 Error measures Therefore, we need a modified error ρ(e). Which properties should the function ρ : R! R have? ffl ρ(e) 0, ffl ρ(0) = 0, ffl ρ(e) = ρ( e), ffl ρ(e i ) ρ(e j ),ifje i j je j j. Robust Statistics p.43/98

44 ffl ρ(e) = e 2 Error measures Possible choices for ρ: ρ(e) = jej ffl : : :? ffl Advantage ρ(e) = e of : In order to minimize ρ(e), we can take derivatives. 2 P n i=1 This does not work for ρ(e) = jej, since the function f (x) = jxj is not differentiable (at 0). Robust Statistics p.44/98

45 y i = fi 0 + fi 1 x i1 + : : : + fi k x ik + " i Error measures Which other options do we have for ρ? The quadratic error is obviously not a good choice when we seek for robustness. Consider the more general setting of linear models of the form = x > i fi + " i: This covers also the special case of estimators for location: = fi 0 + " i y i Robust Statistics p.45/98

46 y i = ff + fi 1 x i1 + : : : + fi k x ik + " i y i = a + b 1 x i1 + : : : + b k x ik + e i nx nx Linear regression linear model: x > i fi + " i = computed model: x > i b + e i = objective function: ρ(e i ) = ρ(y i x > i b) i=1 i=1 Robust Statistics p.46/98

47 nx nx e 2 i = 1 2 nx (y i x > i b)2 Least squares regression Computing derivatives of 1 2 (the constant factor does not change the 1 2 optimisation problem) leads to i=1 i=1 (y i x > i b) x> i = 0: The solution of this system of linear equations is straight forward and can be found in any textbook. i=1 Robust Statistics p.47/98

48 Statistics tool R Open source software: R uses a type-free command language. Assignments are written in the form > x <- y y is assigned to x. The object y must be defined (generated), before it can be assigned to x. Declaration of x is not required. Robust Statistics p.48/98

49 R: Reading a file > mydata <-read.table(file.choose(),header=t) opens a file chooser. The chosen file is assigned to the object named mydata. header = T contain a header. means that the chosen file will The first line of the file contains the names of the variables. The following contain the values (tab- or space-separated). Robust Statistics p.49/98

50 R: Accessing a single variable > vn <- mydata$varname assigns the column named varname of the data set contained in the object mydata to the object vn. The command > print(vn) prints the corresponding column on the screen. Robust Statistics p.50/98

51 R: Printing on the screen [1] [19] [37] [55] [73] [91] [109] [127] [145] Robust Statistics p.51/98

52 R: Empirical mean & median The mean and median can be computed using R by the functions mean() and median(), respectively. > mean(vn) [1] > median(vn) [1] 1.3 The mean and median can also be applied to data objects consisting of more than one (numerical) column, yielding a vector of mean/median values. Robust Statistics p.52/98

53 R: Empirical variance The function var() yields the empirical variance in R. > var(vn) [1] The function sd() yields the empirical standard deviation. > sd(vn) [1] Robust Statistics p.53/98

54 R: min and max The functions min() and max() compute the minimum and the maximum in a data set. > min(vn) [1] 0.1 > max(vn) [1] 2.5 The function IQR() yields the interquartile range. Robust Statistics p.54/98

55 Least squares regression > reg.lsq <- lm(y x) > summary(reg.lsq) Call: lm(formula = y x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * x Signif. codes: 0 *** ** 0.01 * Residual standard error: on 9 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 1 and 9 DF, p-value: Robust Statistics p.55/98

56 Least squares regression > plot(x,y) > abline(reg.lsq) y x Robust Statistics p.56/98

57 Least squares regression plot(y-predict.lm(reg.lsq)) y predict.lm(reg.lsq) Index Robust Statistics p.57/98

58 Least squares regression > plot(x,y-predict.lm(reg.lsq)) y predict.lm(reg.lsq) x Robust Statistics p.58/98

59 Least squares regression > reg.lsq <- lm(y x) > summary(reg.lsq) Call: lm(formula = y x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * x Signif. codes: 0 *** ** 0.01 * Residual standard error: on 120 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 1 and 120 DF, p-value: Robust Statistics p.59/98

60 Least squares regression > plot(x,y) > abline(reg.lsq) y x Robust Statistics p.60/98

61 Least squares regression plot(y-predict.lm(reg.lsq)) y predict.lm(reg.lsq) Index Robust Statistics p.61/98

62 n i=1 ρ(e i) = P n i=1 ρ(y i P x > i nx nx nx M-estimators Define ψ = ρ 0, w(e) = ψ(e)=e and w i = w(e i ). Computing derivatives of b) leads to i x > i b) ψ(y e > i x i = i e w i (y i x > i b) x> i = 0: The solution of this system of equations is the same as for the weighted least squares problem i=1 i=1 w i e 2 i : i=1 Robust Statistics p.62/98

63 M-estimators Problem: The weights w i depend on the errors e i and ffl the errors e ffl i depend on the w weights i. Solution strategy: Alternating optimisation: 1. Initialise with standard least squares regression. 2. Compute the weights. 3. Apply standard least squares regression with the computed weights. 4. Repeat 2. and 3. until convergence. Robust Statistics p.63/98

64 : Robust regression Method ρ(e) e least squares 2 Huber ρ 1 2 e2 ; jej» k if jej > k if Tukey 8 < 2 k 6 kjej 1 2 k2 ; 1 3 e 2 k 1 e 2 ; if jej» k 2 k 6 ; if jej > k Robust Statistics p.64/98

65 M-estimators: Least squares rho 20 w e e Robust Statistics p.65/98

66 M-estimators: Huber rho 4 w e e Robust Statistics p.66/98

67 M-estimators: Tukey rho w e e Robust Statistics p.67/98

68 Robust regression with R At least the package MASS will be required. Packages can be downloaded directly in R from the Internet. Once a package is downloaded, it can be installed by > library(packagename) Robust Statistics p.68/98

69 Robust regression (Huber) > reg.rob <- rlm(y x) > summary(reg.rob) Call: rlm(formula = y x) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) x Residual standard error: on 9 degrees of freedom Correlation of Coefficients: (Intercept) x Robust Statistics p.69/98

70 Robust regression (Huber) > plot(x,y) > abline(reg.rob) y x Robust Statistics p.70/98

71 Robust regression (Huber) > plot(y-predict.lm(reg.rob)) y predict.lm(reg.rob) Index Robust Statistics p.71/98

72 Robust regression (Huber) > plot(reg.rob$w) reg.rob$w Index Robust Statistics p.72/98

73 Robust regression (Tukey) > reg.rob <- rlm(y x,method="mm") > summary(reg.rob) Call: rlm(formula = y x, method = "MM") Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) x Residual standard error: on 9 degrees of freedom Correlation of Coefficients: (Intercept) x Robust Statistics p.73/98

74 Robust regression (Tukey) > plot(x,y) > abline(reg.rob) y x Robust Statistics p.74/98

75 Robust regression (Tukey) plot(y-predict.lm(reg.rob)) y predict.lm(reg.rob) Index Robust Statistics p.75/98

76 Robust regression (Tukey) > plot(reg.rob$w) reg.rob$w Index Robust Statistics p.76/98

77 Robust regression (Huber) > reg.rob <- rlm(y x) > summary(reg.rob) Call: rlm(formula = y x) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) x Residual standard error: on 120 degrees of freedom Correlation of Coefficients: (Intercept) x Robust Statistics p.77/98

78 Robust regression (Huber) > plot(x,y) > abline(reg.rob) y x Robust Statistics p.78/98

79 Robust regression (Huber) > plot(y-predict.lm(reg.rob)) y predict.lm(reg.rob) Index Robust Statistics p.79/98

80 Robust regression (Huber) > plot(reg.rob$w) reg.rob$w Index Robust Statistics p.80/98

81 Robust regression (Tukey) > reg.rob <- rlm(y x,method="mm") > summary(reg.rob) Call: rlm(formula = y x, method = "MM") Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) x Residual standard error: on 120 degrees of freedom Correlation of Coefficients: (Intercept) x Robust Statistics p.81/98

82 Robust regression (Tukey) > plot(x,y) > abline(reg.rob) y x Robust Statistics p.82/98

83 Robust regression (Tukey) plot(y-predict.lm(reg.rob)) y predict.lm(reg.rob) Index Robust Statistics p.83/98

84 Robust regression (Tukey) > plot(reg.rob$w) reg.rob$w Index Robust Statistics p.84/98

85 Robust regression with R After plotting the weights by > plot(reg.rob$w) clicking single points can be enabled by > identify(1:length(reg.rob$w), reg.rob$w) in order to get the indices of interesting weights. Robust Statistics p.85/98

86 y i = ff + fi 1 x i1 + : : : + fi k x ik + " i Multivariate regression For simple linear regression y ß ax + b, plotting the data often helps to identify problems and outliers. This no longer possible for multivariate regression = x > i fi + " i: Here, methods like residual plots and residual analysis are possible ways to gain more insight on outliers and other problems. In R, simply write for instance rlm(yοx1+x2+x3). Robust Statistics p.86/98

87 Two-way tables Example people were asked which political party they voted for in order to find out whether the choice of the party and the sex of the voter are independent. pol. partyn sex female male sum SPD CDU/CSU Grüne FDP PDS Others No answer sum Robust Statistics p.87/98

88 ffl : : : Two-way tables In such contexts, typically statistical tests like χ the -test (for independence, homogeneity), ffl Fisher s exact test (for 2 2-tables), ffl 2 Kruskal-Wallis ffl test MANOVA ffl Robust Statistics p.88/98

89 Two-way tables The tests are not robust, have very restrictive assumptions (MANOVA, Fisher s exact test) and the χ 2 -test is only an asymptotic test. Alternative: Median polish Robust Statistics p.89/98

90 Median polish Underlying (additive) model: y ij = μ + ff i + fi j + " ij : ffl μ: Overall typical value (general level) ffl ff i : Row effect (here: the political party) ffl fi j : Column effect (here: the sex) ffl " ij : Noise or random fluctuation Robust Statistics p.90/98

91 Median polish Algorithm: 1. Subtract for each row its median. 2. For the updated table, subtract from each column its median. 3. Repeat 1. and 2. (with the corresponding updated tables) until convergence. Robust Statistics p.91/98

92 (t) i = medfe (t 1) ij a (t) b = medfb (t 1) j m (t) ij = e (t 1) ij d a (t) i Median polish Iterative estimation of the parameters: (0) = 0; a (0) i = 0; b (0) j = 0; e (0) ij = y ij ; m Rows: j j 2 f1; : : : ; Jgg j j 2 f1; : : : ; Jgg Robust Statistics p.92/98

93 (t) j = medfd (t) ij b (t) a = medfa (t 1) i m (t) ij = d (t 1) ij e (t) i = a (t 1) i a (t) j = b (t 1) j b b j (t) m (t) b + Median polish Columns: j i 2 f1; : : : ; Igg + a i (t) j i 2 f1; : : : ; Igg b (t) j Common value and effects: (t) = m (t 1) + m (t) a m m(t) b + a i (t) m (t) a + Robust Statistics p.93/98

94 Median polish After convergence, the remaining entries in the table correspond to the " ij. Median polish in R is implemented by the function medpolish(). Robust Statistics p.94/98

95 Summary Robust statistics allows the deviation from the ffl ideal model that the sample is not contaminated. Robust methods rely on the majority of the data. ffl Few outliers can be disregarded or their influence ffl is reduced. Robust Statistics p.95/98

96 Key references F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, ffl W.A. Stahel: Robust Statistics. The Approach Based on Influence Functions. Wiley, New York (1986) S. Heritier, E. Cantoni, S. Copt, M.-P. ffl Victoria-Feser: Robust Methods in Biostatistics. Wiley, New York (2009) D.C. Hoaglin, F. Mosteller, J.W. Tukey: ffl Understanding Robust and Exploratory Data Analysis. Wiley, New York (2000) P.J. Huber: Robust Statistics. Wiley, New York ffl (2004) Robust Statistics p.96/98

97 Key references R. Maronna, D. Martin, V. Yohai: Robust ffl Statistics: Theory and Methods. Wiley, Toronto (2006) P.J. Rousseeuw, A.M. Leroy: Robust Regression ffl and Outlier Detection. Wiley, New York (1987) Robust Statistics p.97/98

98 Software R: ffl Library: MASS ffl Library: robustbase ffl Library: rrcov ffl Robust Statistics p.98/98

Indian Statistical Institute

Indian Statistical Institute Indian Statistical Institute Introductory Computer programming Robust Regression methods with high breakdown point Author: Roll No: MD1701 February 24, 2018 Contents 1 Introduction 2 2 Criteria for evaluating

More information

Measuring robustness

Measuring robustness Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.

Introduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland. Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction

More information

Lecture 12 Robust Estimation

Lecture 12 Robust Estimation Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes

More information

Robust regression in R. Eva Cantoni

Robust regression in R. Eva Cantoni Robust regression in R Eva Cantoni Research Center for Statistics and Geneva School of Economics and Management, University of Geneva, Switzerland April 4th, 2017 1 Robust statistics philosopy 2 Robust

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one. Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

MIT Spring 2015

MIT Spring 2015 MIT 18.443 Dr. Kempthorne Spring 2015 MIT 18.443 1 Outline 1 MIT 18.443 2 Batches of data: single or multiple x 1, x 2,..., x n y 1, y 2,..., y m w 1, w 2,..., w l etc. Graphical displays Summary statistics:

More information

A Brief Overview of Robust Statistics

A Brief Overview of Robust Statistics A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Definitions of ψ-functions Available in Robustbase

Definitions of ψ-functions Available in Robustbase Definitions of ψ-functions Available in Robustbase Manuel Koller and Martin Mächler July 18, 2018 Contents 1 Monotone ψ-functions 2 1.1 Huber.......................................... 3 2 Redescenders

More information

1 Introduction to Minitab

1 Introduction to Minitab 1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

A Robust Strategy for Joint Data Reconciliation and Parameter Estimation

A Robust Strategy for Joint Data Reconciliation and Parameter Estimation A Robust Strategy for Joint Data Reconciliation and Parameter Estimation Yen Yen Joe 1) 3), David Wang ), Chi Bun Ching 3), Arthur Tay 1), Weng Khuen Ho 1) and Jose Romagnoli ) * 1) Dept. of Electrical

More information

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter

Midwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Robust statistics. Michael Love 7/10/2016

Robust statistics. Michael Love 7/10/2016 Robust statistics Michael Love 7/10/2016 Robust topics Median MAD Spearman Wilcoxon rank test Weighted least squares Cook's distance M-estimators Robust topics Median => middle MAD => spread Spearman =>

More information

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS

ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various

More information

REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE

REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE P. L. Davies Eindhoven, February 2007 Reading List Daniel, C. (1976) Applications of Statistics to Industrial Experimentation, Wiley. Tukey, J. W. (1977) Exploratory

More information

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are

More information

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make

More information

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics). Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

2.1 Measures of Location (P.9-11)

2.1 Measures of Location (P.9-11) MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1

More information

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION

COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION (REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department

More information

COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY

COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational

More information

Breakdown points of Cauchy regression-scale estimators

Breakdown points of Cauchy regression-scale estimators Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Efficient and Robust Scale Estimation

Efficient and Robust Scale Estimation Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression

Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Robust scale estimation with extensions

Robust scale estimation with extensions Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response. Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response

More information

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Diagnostics and Transformations Part 2

Diagnostics and Transformations Part 2 Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics

More information

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

8. Nonstandard standard error issues 8.1. The bias of robust standard errors 8.1. The bias of robust standard errors Bias Robust standard errors are now easily obtained using e.g. Stata option robust Robust standard errors are preferable to normal standard errors when residuals

More information

Figure 1. Sketch of various properties of an influence function. Rejection point

Figure 1. Sketch of various properties of an influence function. Rejection point Robust Filtering of NMR Images Petr Hotmar and Jarom r Kukal Prague Institute of Chemical Technology, Faculty of Chemical Engineering, Department of Computing and Control Engineering Introduction The development

More information

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES

BIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Describing Distributions with Numbers

Describing Distributions with Numbers Topic 2 We next look at quantitative data. Recall that in this case, these data can be subject to the operations of arithmetic. In particular, we can add or subtract observation values, we can sort them

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

1 Introduction 1. 2 The Multiple Regression Model 1

1 Introduction 1. 2 The Multiple Regression Model 1 Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests

More information

Units. Exploratory Data Analysis. Variables. Student Data

Units. Exploratory Data Analysis. Variables. Student Data Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as

More information

A Comparison of Robust Estimators Based on Two Types of Trimming

A Comparison of Robust Estimators Based on Two Types of Trimming Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Statistics for Engineering, 4C3/6C3 Assignment 2

Statistics for Engineering, 4C3/6C3 Assignment 2 Statistics for Engineering, 4C3/6C3 Assignment 2 Kevin Dunn, kevin.dunn@mcmaster.ca Due date: 23 January 2014 Assignment objectives: interpreting data visualizations; univariate data analysis Question

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

A Short Course in Basic Statistics

A Short Course in Basic Statistics A Short Course in Basic Statistics Ian Schindler November 5, 2017 Creative commons license share and share alike BY: C 1 Descriptive Statistics 1.1 Presenting statistical data Definition 1 A statistical

More information

Determining the Spread of a Distribution

Determining the Spread of a Distribution Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative

More information

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested

More information

Why is the field of statistics still an active one?

Why is the field of statistics still an active one? Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with

More information

Correlated Data: Linear Mixed Models with Random Intercepts

Correlated Data: Linear Mixed Models with Random Intercepts 1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise

More information

Highly Robust Variogram Estimation 1. Marc G. Genton 2

Highly Robust Variogram Estimation 1. Marc G. Genton 2 Mathematical Geology, Vol. 30, No. 2, 1998 Highly Robust Variogram Estimation 1 Marc G. Genton 2 The classical variogram estimator proposed by Matheron is not robust against outliers in the data, nor is

More information

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Introduction to robust statistics*

Introduction to robust statistics* Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

Least Squares Estimation-Finite-Sample Properties

Least Squares Estimation-Finite-Sample Properties Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions

More information

Week 7.1--IES 612-STA STA doc

Week 7.1--IES 612-STA STA doc Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ

More information

Statistics for Python

Statistics for Python Statistics for Python An extension module for the Python scripting language Michiel de Hoon, Columbia University 2 September 2010 Statistics for Python, an extension module for the Python scripting language.

More information

Package ForwardSearch

Package ForwardSearch Package ForwardSearch February 19, 2015 Type Package Title Forward Search using asymptotic theory Version 1.0 Date 2014-09-10 Author Bent Nielsen Maintainer Bent Nielsen

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

An Introduction to Descriptive Statistics. 2. Manually create a dot plot for small and modest sample sizes

An Introduction to Descriptive Statistics. 2. Manually create a dot plot for small and modest sample sizes Living with the Lab Winter 2013 An Introduction to Descriptive Statistics Gerald Recktenwald v: January 25, 2013 gerry@me.pdx.edu Learning Objectives By reading and studying these notes you should be able

More information

Robust estimation of scale and covariance with P n and its application to precision matrix estimation

Robust estimation of scale and covariance with P n and its application to precision matrix estimation Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Post hoc Comparisons In the prior chapter we used ANOVA

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:

More information

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table Lesson Plan Answer Questions Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 1 2. Summary Statistics Given a collection of data, one needs to find representations

More information

Beam Example: Identifying Influential Observations using the Hat Matrix

Beam Example: Identifying Influential Observations using the Hat Matrix Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the

More information

Robustness of location estimators under t- distributions: a literature review

Robustness of location estimators under t- distributions: a literature review IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Robustness of location estimators under t- distributions: a literature review o cite this article: C Sumarni et al 07 IOP Conf.

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 The breakfast cereal data Study collected data on nutritional

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.

The entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials. One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine

More information

EXTENDING PARTIAL LEAST SQUARES REGRESSION

EXTENDING PARTIAL LEAST SQUARES REGRESSION EXTENDING PARTIAL LEAST SQUARES REGRESSION ATHANASSIOS KONDYLIS UNIVERSITY OF NEUCHÂTEL 1 Outline Multivariate Calibration in Chemometrics PLS regression (PLSR) and the PLS1 algorithm PLS1 from a statistical

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

A SHORT COURSE ON ROBUST STATISTICS. David E. Tyler Rutgers The State University of New Jersey. Web-Site dtyler/shortcourse.

A SHORT COURSE ON ROBUST STATISTICS. David E. Tyler Rutgers The State University of New Jersey. Web-Site  dtyler/shortcourse. A SHORT COURSE ON ROBUST STATISTICS David E. Tyler Rutgers The State University of New Jersey Web-Site www.rci.rutgers.edu/ dtyler/shortcourse.pdf References Huber, P.J. (1981). Robust Statistics. Wiley,

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

ISQS 5349 Final Exam, Spring 2017.

ISQS 5349 Final Exam, Spring 2017. ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC

More information