Robust Statistics. Frank Klawonn
|
|
- Duane Powers
- 6 years ago
- Views:
Transcription
1 Robust Statistics Frank Klawonn Data Analysis and Pattern Recognition Lab Department of Computer Science University of Applied Sciences Braunschweig/Wolfenbüttel, Germany Bioinformatics & Statistics Helmholtz Centre for Infection Research Braunschweig, Germany Robust Statistics p.1/98
2 Outline Motivation: Mean or median ffl What is robust statistics? ffl M-estimators ffl Robust regression ffl Median polish ffl Summary and references ffl Robust Statistics p.2/98
3 Motivation: Mean or median Imagine a small town with 20 thousand ffl inhabitants. In average, each inhabitant has a capital of 10 ffl thousand $. Assume a very rich man named Bill G. owning a ffl capital of 20 billion $ decides to move to this town. After Bill G. has settled there, the inhabitants ffl own an average capital of roughly one million $. Robust Statistics p.3/98
4 Motivation: Mean or median Imagine a small town with 20 thousand ffl inhabitants. In average, each inhabitant has a capital of 10 ffl thousand $. Assume a very rich man named Bill G. owning a ffl capital of 20 billion $ decides to move to this town. After Bill G. has settled there, the inhabitants ffl own an average capital of roughly one million $. And all but one inhabitants might own less capital ffl than average. Robust Statistics p.4/98
5 (Empirical) median (1) ; : : : ; x (n) denotes a sample in ascending order. x Definition. The (sample or empirical) median denoted by ~x, isgivenby 8 < n+1 x ) ( 2 if n is odd ~x = : x ( n 2 ) +x ( n 2 + 1) if n is even 2 Robust Statistics p.5/98
6 (Empirical) median E= E= A L A Robust Statistics p.6/98
7 Motivation: Mean or median A less extreme example: Robust Statistics p.7/98
8 Motivation: Mean or median A less extreme example: Robust Statistics p.8/98
9 Motivation: Mean or median A less extreme example: Robust Statistics p.9/98
10 What is a good estimator? Assume, we want to estimate the expected value of a normal distribution from which a sample was generated. For a symmetric distributions like the normal distribution, the expected value and median are equal. The median q 0:5 of a (continuous) probability distribution, representing the random variable X, is the 50%-quantile, i.e. P (X» q 0:5 ) = 0:5 = P (X q 0:5 ): Robust Statistics p.10/98
11 What is a good estimator? Classical statistics: (a) The estimator should be correct in average (unbiased), at least for large sample sizes (asymptotically unbiased). (b) The estimator should have a small variance (efficiency). (c) With increasing sample size the variance of the estimator should tend to zero. (a) and (b) together guarantee consistency: With increasing sample size, the estimator converges with probability one to the true value of the parameter to be estimated. Robust Statistics p.11/98
12 What is a good estimator? Should we choose the mean or the median to estimate the expected value μ of our normal distribution? Both estimators are consistent. Robust Statistics p.12/98
13 Mean or median Histogram for the estimation of the mean, n= 20 Histogram for the estimation of the median, n= 20 Frequency Frequency mean x median x Robust Statistics p.13/98
14 Mean or median Histogram for the estimation of the mean, n= 100 Histogram for the estimation of the median, n= 100 Frequency Frequency mean x median x Robust Statistics p.14/98
15 Mean or median Histogram for the estimation of the mean, n= 20 Histogram for the estimation of the mean, n= 20 Frequency Frequency mean x mean (5% noise) xxx Robust Statistics p.15/98
16 Mean or median Histogram for the estimation of the mean, n= 100 Histogram for the estimation of the mean, n= 100 Frequency Frequency mean x mean (5% noise) x Robust Statistics p.16/98
17 Mean or median Histogram for the estimation of the median, n= 20 Histogram for the estimation of the median, n= 20 Frequency Frequency median x median (5% noise) x Robust Statistics p.17/98
18 Mean or median Histogram for the estimation of the median, n= 100 Histogram for the estimation of the median, n= 100 Frequency Frequency median x median (5% noise) x Robust Statistics p.18/98
19 Mean or median Under the ideal assumption that the data were ffl sampled from a normal distribution, the mean is a more efficient estimator than the median. If a small fraction of the data is for some reason ffl erroneous or generated by another distribution, the mean can even become a biased estimator and lose consistency. The median is more or less not affected if a small ffl fraction of the data is corrupted. Robust Statistics p.19/98
20 Robust statistics Hampel et al. (1986): In a broad informal sense, robust statistics is a body of knowledge, partly formalized into theories of robustness, relating to deviations from idealized assumptions in statistics. Robust Statistics p.20/98
21 Robust statistics idealized assumption: The data are sampled from the (possibly multivariate) random variable X with cumulative distribution function F X. modified assumption: The data are sampled from a random variable with "-contaminated cumulative distribution function F " = (1 ")F X + "F outliers : ffl F X : The assumed ideal model distribution ffl ": (small) probability for outliers ffl F outliers : unknown and unspecified distribution Robust Statistics p.21/98
22 nx (X i μ X) 2 Estimators (Statistics) Statistics is concerned with functionals t (or better t n ) called statistics which are used for parameter estimation and other purposes. The mean t n (X 1 ; : : : ; X n ) = μ X = 1 n nx X i ; the median or the (empirical) variance i=1 1 t n (X 1 ; : : : ; X n ) = s 2 = n 1 are typical examples for estimators. i=1 Robust Statistics p.22/98
23 Estimators (Statistics) Two views of estimators: Applied to (finite) samples (x 1 ; : : : ; x n ) resulting ffl in a concrete estimation (a realization of a random experiment consisting of the drawn sample). As random variables (applied to random ffl variables). This enables us to investigate the (theoretical) properties of estimators. Samples are not needed for this purpose. Robust Statistics p.23/98
24 Estimators (Statistics) Assuming an infinite sample size, the limit in probability t(f X ) = lim t n(x 1 ; : : : ; X n ) n!1 can be considered (in case it exists). t(f X ) is then again a random variable. For typical estimators, t(f X ) is a constant random variable, i.e. the limit converges (with probability 1) to a unique value. Robust Statistics p.24/98
25 Fisher consistency An estimator t is called Fisher consistent for a paramater of probability distribution X if t(f X ) = ; i.e. for large (infinite) sample sizes, the estimator converges with probability 1 to the true value of the parameter to be estimated. Robust Statistics p.25/98
26 Empirical influence function Given a sample (x 1 ; : : : ; x n ) and an estimator t n (x 1 ; : : : ; x n ), what is the influence of a single observation on t? Empirical influence function: = t n+1 (x 1 ; : : : ; x n ; x) EIF(x) x Vary 1 between and 1. Robust Statistics p.26/98
27 Empirical influence function Consider the (ordered) sample 0.4, 1.2, 1.4, 1.5, 1.7, 2.0, 2.9, 3.8, 3.8, 4.2 μx = 2:29 med(x) = 1:85 μx 10% = 2:2875 The (ff-)trimmed mean is the mean of the sample from which the lowest and highest 100 ff% values are removed. (For the mean: ff = 0, for the median: ff = 0:5.) Robust Statistics p.27/98
28 Empirical influence function mean(x) median(x) trimmedmean(x) Robust Statistics p.28/98
29 Sensitivity curve The (empirical) sensitivity curve is a normalized EIF (centred around 0 and scaled according to the sample size): = t n+1(x 1 ; : : : ; x n ; x) t n (x 1 ; : : : ; x n ) 1 SC(x) n+1 15 mean(x) median(x) trimmedmean(x) Robust Statistics p.29/98
30 1 n 1 F + n ffi x t(f ) 1 t Influence function The influence function corresponds to the sensitivity curve for large (infinite) sample sizes. IF(x; t; F ) = lim n!1 x represents a (cumulative probability) distribution ffi yielding the x value with probability 1. In this sense, the influence function measures what happens with the estimator for an infinitesimal small contamination for large sample sizes. Note that the influence function might not be defined if the limit does not exist. 1 n Robust Statistics p.30/98
31 Gross-error sensitivity The worst case (in terms of the outlier x) is called gross-error sensitivity. Λ (t; F ) = sup fl fjif(x; t; F )jg x If fl Λ (t; F ) is finite, t is called a B-robust estimator (B stands for bias) (at F ). For the arithmetic mean, we have fl Λ (μx; F ) = 1. For the median and the trimmed mean, the gross-error sensitivity depends on the sample F distribution. Robust Statistics p.31/98
32 Breakdown point The influence curve and the gross-error sensitivity characterise the influence of single (or even infinitesimal) outliers. A minimum requirement for robustness is that the influence curve is bounded. What happens when the fraction of outliers increases? Robust Statistics p.32/98
33 Breakdown point The breakdown point is the smallest fraction of (extreme) outliers that need to be included in a sample in order to let the estimator break down completely, i.e. yield (almost) infinity. Let hd((x 1 ; : : : ; x n ); (y 1 ; : : : ; y n )) = jfi 2 f1; : : : ; ng j x i 6= y i gj denote the Hamming distance between two samples (x 1 ; : : : ; x n ) and (y 1 ; : : : ; y n ). Robust Statistics p.33/98
34 fi fi fi fi supfjt(y 1; : : : ; y n )j j ) Breakdown point The breakdown point of an estimator t is defined as " Λ n (t; x 1; : : : ; x n ) = 1 n min (m fi hd((x 1 ; : : : ; x n ); (y 1 ; : : : ; y n )) = mg = 1 : " Normally, n is independent of the specific choice of the sample Λ 1 ; : : : ; x n ). (x Robust Statistics p.34/98
35 Breakdown point " If n is independent of the sample, for large (infinite) sample sizes the breakdown point is defined as Λ Examples: Λ = lim " "Λ n : n!1 Arithmetic mean: " Λ = 0% Median: " Λ = 50% ff-trimmed mean: " Λ = ff Robust Statistics p.35/98
36 Criteria for robust estimators Bounded influence function: Single extreme ffl outliers cannot do too much harm to the estimator. Low gross-error sensitivity ffl Positive breakdown point (the higher, the better): ffl Even a number of outliers can be tolerated without leading to nonsense estimations. Fisher consistency: For very large sample sizes ffl the estimator will yield the correct value. High efficiency: The variance of the estimator ffl should be as low as possible. Robust Statistics p.36/98
37 Criteria for robust estimators There is no way to satisfy all criteria in the best way at the same time. There is a trade-off between robustness issues like positive breakdown point and low gross-error sensitivity on the one hand and efficiency on the other hand. As an example, compare the mean (high efficiency, breakdown point 0) and the median (lower efficiency, but very good breakdown point). Robust Statistics p.37/98
38 Robust measures of spread The (empirical) variance suffers from the same problems as the mean. (The estimation of the variance usually includes an estimation of the mean.) An example for a more robust estimator for spread is the interquartile range, the difference between the 75%- and the 25%-quantile. (The q%-quantile is the value x in the sample for which q% are smaller than x and (100 q)% are larger than x.) Robust Statistics p.38/98
39 E (X μ) 2 : nx Error measures The expected value μ minimizes the error function Correspondingly, the arithmetic mean μx minimizes the error function (x i μx) 2 : i=1 Robust Statistics p.39/98
40 nx Error measures The median q 0:5 minimizes the error function E (jx q 0:5 j) : Correspondingly, the (sample) median ~x minimies the error function jx i ~xj: i=1 Robust Statistics p.40/98
41 Error measures This also explains, why the median is less sensitive to outliers: The quadratic error for the mean punishes outlier much stronger than the absolute error. Therefore, extreme outliers have a higher influence ( pull stronger ) than other points. Robust Statistics p.41/98
42 Error measures How to measure errors? The error for an estimation ^ including the sign is = x i ^ : n i=1 Minimizing i does not make sense. e P e i ffl Usually inf ^ P n i=1 e i = 1. Even if we require P n ffl e i 0, a small value for i=1 n i=1 e i does not mean that the errors e P i are small. There might be large positive and large negative errors that balance each other. Robust Statistics p.42/98
43 Error measures Therefore, we need a modified error ρ(e). Which properties should the function ρ : R! R have? ffl ρ(e) 0, ffl ρ(0) = 0, ffl ρ(e) = ρ( e), ffl ρ(e i ) ρ(e j ),ifje i j je j j. Robust Statistics p.43/98
44 ffl ρ(e) = e 2 Error measures Possible choices for ρ: ρ(e) = jej ffl : : :? ffl Advantage ρ(e) = e of : In order to minimize ρ(e), we can take derivatives. 2 P n i=1 This does not work for ρ(e) = jej, since the function f (x) = jxj is not differentiable (at 0). Robust Statistics p.44/98
45 y i = fi 0 + fi 1 x i1 + : : : + fi k x ik + " i Error measures Which other options do we have for ρ? The quadratic error is obviously not a good choice when we seek for robustness. Consider the more general setting of linear models of the form = x > i fi + " i: This covers also the special case of estimators for location: = fi 0 + " i y i Robust Statistics p.45/98
46 y i = ff + fi 1 x i1 + : : : + fi k x ik + " i y i = a + b 1 x i1 + : : : + b k x ik + e i nx nx Linear regression linear model: x > i fi + " i = computed model: x > i b + e i = objective function: ρ(e i ) = ρ(y i x > i b) i=1 i=1 Robust Statistics p.46/98
47 nx nx e 2 i = 1 2 nx (y i x > i b)2 Least squares regression Computing derivatives of 1 2 (the constant factor does not change the 1 2 optimisation problem) leads to i=1 i=1 (y i x > i b) x> i = 0: The solution of this system of linear equations is straight forward and can be found in any textbook. i=1 Robust Statistics p.47/98
48 Statistics tool R Open source software: R uses a type-free command language. Assignments are written in the form > x <- y y is assigned to x. The object y must be defined (generated), before it can be assigned to x. Declaration of x is not required. Robust Statistics p.48/98
49 R: Reading a file > mydata <-read.table(file.choose(),header=t) opens a file chooser. The chosen file is assigned to the object named mydata. header = T contain a header. means that the chosen file will The first line of the file contains the names of the variables. The following contain the values (tab- or space-separated). Robust Statistics p.49/98
50 R: Accessing a single variable > vn <- mydata$varname assigns the column named varname of the data set contained in the object mydata to the object vn. The command > print(vn) prints the corresponding column on the screen. Robust Statistics p.50/98
51 R: Printing on the screen [1] [19] [37] [55] [73] [91] [109] [127] [145] Robust Statistics p.51/98
52 R: Empirical mean & median The mean and median can be computed using R by the functions mean() and median(), respectively. > mean(vn) [1] > median(vn) [1] 1.3 The mean and median can also be applied to data objects consisting of more than one (numerical) column, yielding a vector of mean/median values. Robust Statistics p.52/98
53 R: Empirical variance The function var() yields the empirical variance in R. > var(vn) [1] The function sd() yields the empirical standard deviation. > sd(vn) [1] Robust Statistics p.53/98
54 R: min and max The functions min() and max() compute the minimum and the maximum in a data set. > min(vn) [1] 0.1 > max(vn) [1] 2.5 The function IQR() yields the interquartile range. Robust Statistics p.54/98
55 Least squares regression > reg.lsq <- lm(y x) > summary(reg.lsq) Call: lm(formula = y x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * x Signif. codes: 0 *** ** 0.01 * Residual standard error: on 9 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 1 and 9 DF, p-value: Robust Statistics p.55/98
56 Least squares regression > plot(x,y) > abline(reg.lsq) y x Robust Statistics p.56/98
57 Least squares regression plot(y-predict.lm(reg.lsq)) y predict.lm(reg.lsq) Index Robust Statistics p.57/98
58 Least squares regression > plot(x,y-predict.lm(reg.lsq)) y predict.lm(reg.lsq) x Robust Statistics p.58/98
59 Least squares regression > reg.lsq <- lm(y x) > summary(reg.lsq) Call: lm(formula = y x) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * x Signif. codes: 0 *** ** 0.01 * Residual standard error: on 120 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 1 and 120 DF, p-value: Robust Statistics p.59/98
60 Least squares regression > plot(x,y) > abline(reg.lsq) y x Robust Statistics p.60/98
61 Least squares regression plot(y-predict.lm(reg.lsq)) y predict.lm(reg.lsq) Index Robust Statistics p.61/98
62 n i=1 ρ(e i) = P n i=1 ρ(y i P x > i nx nx nx M-estimators Define ψ = ρ 0, w(e) = ψ(e)=e and w i = w(e i ). Computing derivatives of b) leads to i x > i b) ψ(y e > i x i = i e w i (y i x > i b) x> i = 0: The solution of this system of equations is the same as for the weighted least squares problem i=1 i=1 w i e 2 i : i=1 Robust Statistics p.62/98
63 M-estimators Problem: The weights w i depend on the errors e i and ffl the errors e ffl i depend on the w weights i. Solution strategy: Alternating optimisation: 1. Initialise with standard least squares regression. 2. Compute the weights. 3. Apply standard least squares regression with the computed weights. 4. Repeat 2. and 3. until convergence. Robust Statistics p.63/98
64 : Robust regression Method ρ(e) e least squares 2 Huber ρ 1 2 e2 ; jej» k if jej > k if Tukey 8 < 2 k 6 kjej 1 2 k2 ; 1 3 e 2 k 1 e 2 ; if jej» k 2 k 6 ; if jej > k Robust Statistics p.64/98
65 M-estimators: Least squares rho 20 w e e Robust Statistics p.65/98
66 M-estimators: Huber rho 4 w e e Robust Statistics p.66/98
67 M-estimators: Tukey rho w e e Robust Statistics p.67/98
68 Robust regression with R At least the package MASS will be required. Packages can be downloaded directly in R from the Internet. Once a package is downloaded, it can be installed by > library(packagename) Robust Statistics p.68/98
69 Robust regression (Huber) > reg.rob <- rlm(y x) > summary(reg.rob) Call: rlm(formula = y x) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) x Residual standard error: on 9 degrees of freedom Correlation of Coefficients: (Intercept) x Robust Statistics p.69/98
70 Robust regression (Huber) > plot(x,y) > abline(reg.rob) y x Robust Statistics p.70/98
71 Robust regression (Huber) > plot(y-predict.lm(reg.rob)) y predict.lm(reg.rob) Index Robust Statistics p.71/98
72 Robust regression (Huber) > plot(reg.rob$w) reg.rob$w Index Robust Statistics p.72/98
73 Robust regression (Tukey) > reg.rob <- rlm(y x,method="mm") > summary(reg.rob) Call: rlm(formula = y x, method = "MM") Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) x Residual standard error: on 9 degrees of freedom Correlation of Coefficients: (Intercept) x Robust Statistics p.73/98
74 Robust regression (Tukey) > plot(x,y) > abline(reg.rob) y x Robust Statistics p.74/98
75 Robust regression (Tukey) plot(y-predict.lm(reg.rob)) y predict.lm(reg.rob) Index Robust Statistics p.75/98
76 Robust regression (Tukey) > plot(reg.rob$w) reg.rob$w Index Robust Statistics p.76/98
77 Robust regression (Huber) > reg.rob <- rlm(y x) > summary(reg.rob) Call: rlm(formula = y x) Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) x Residual standard error: on 120 degrees of freedom Correlation of Coefficients: (Intercept) x Robust Statistics p.77/98
78 Robust regression (Huber) > plot(x,y) > abline(reg.rob) y x Robust Statistics p.78/98
79 Robust regression (Huber) > plot(y-predict.lm(reg.rob)) y predict.lm(reg.rob) Index Robust Statistics p.79/98
80 Robust regression (Huber) > plot(reg.rob$w) reg.rob$w Index Robust Statistics p.80/98
81 Robust regression (Tukey) > reg.rob <- rlm(y x,method="mm") > summary(reg.rob) Call: rlm(formula = y x, method = "MM") Residuals: Min 1Q Median 3Q Max Coefficients: Value Std. Error t value (Intercept) x Residual standard error: on 120 degrees of freedom Correlation of Coefficients: (Intercept) x Robust Statistics p.81/98
82 Robust regression (Tukey) > plot(x,y) > abline(reg.rob) y x Robust Statistics p.82/98
83 Robust regression (Tukey) plot(y-predict.lm(reg.rob)) y predict.lm(reg.rob) Index Robust Statistics p.83/98
84 Robust regression (Tukey) > plot(reg.rob$w) reg.rob$w Index Robust Statistics p.84/98
85 Robust regression with R After plotting the weights by > plot(reg.rob$w) clicking single points can be enabled by > identify(1:length(reg.rob$w), reg.rob$w) in order to get the indices of interesting weights. Robust Statistics p.85/98
86 y i = ff + fi 1 x i1 + : : : + fi k x ik + " i Multivariate regression For simple linear regression y ß ax + b, plotting the data often helps to identify problems and outliers. This no longer possible for multivariate regression = x > i fi + " i: Here, methods like residual plots and residual analysis are possible ways to gain more insight on outliers and other problems. In R, simply write for instance rlm(yοx1+x2+x3). Robust Statistics p.86/98
87 Two-way tables Example people were asked which political party they voted for in order to find out whether the choice of the party and the sex of the voter are independent. pol. partyn sex female male sum SPD CDU/CSU Grüne FDP PDS Others No answer sum Robust Statistics p.87/98
88 ffl : : : Two-way tables In such contexts, typically statistical tests like χ the -test (for independence, homogeneity), ffl Fisher s exact test (for 2 2-tables), ffl 2 Kruskal-Wallis ffl test MANOVA ffl Robust Statistics p.88/98
89 Two-way tables The tests are not robust, have very restrictive assumptions (MANOVA, Fisher s exact test) and the χ 2 -test is only an asymptotic test. Alternative: Median polish Robust Statistics p.89/98
90 Median polish Underlying (additive) model: y ij = μ + ff i + fi j + " ij : ffl μ: Overall typical value (general level) ffl ff i : Row effect (here: the political party) ffl fi j : Column effect (here: the sex) ffl " ij : Noise or random fluctuation Robust Statistics p.90/98
91 Median polish Algorithm: 1. Subtract for each row its median. 2. For the updated table, subtract from each column its median. 3. Repeat 1. and 2. (with the corresponding updated tables) until convergence. Robust Statistics p.91/98
92 (t) i = medfe (t 1) ij a (t) b = medfb (t 1) j m (t) ij = e (t 1) ij d a (t) i Median polish Iterative estimation of the parameters: (0) = 0; a (0) i = 0; b (0) j = 0; e (0) ij = y ij ; m Rows: j j 2 f1; : : : ; Jgg j j 2 f1; : : : ; Jgg Robust Statistics p.92/98
93 (t) j = medfd (t) ij b (t) a = medfa (t 1) i m (t) ij = d (t 1) ij e (t) i = a (t 1) i a (t) j = b (t 1) j b b j (t) m (t) b + Median polish Columns: j i 2 f1; : : : ; Igg + a i (t) j i 2 f1; : : : ; Igg b (t) j Common value and effects: (t) = m (t 1) + m (t) a m m(t) b + a i (t) m (t) a + Robust Statistics p.93/98
94 Median polish After convergence, the remaining entries in the table correspond to the " ij. Median polish in R is implemented by the function medpolish(). Robust Statistics p.94/98
95 Summary Robust statistics allows the deviation from the ffl ideal model that the sample is not contaminated. Robust methods rely on the majority of the data. ffl Few outliers can be disregarded or their influence ffl is reduced. Robust Statistics p.95/98
96 Key references F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, ffl W.A. Stahel: Robust Statistics. The Approach Based on Influence Functions. Wiley, New York (1986) S. Heritier, E. Cantoni, S. Copt, M.-P. ffl Victoria-Feser: Robust Methods in Biostatistics. Wiley, New York (2009) D.C. Hoaglin, F. Mosteller, J.W. Tukey: ffl Understanding Robust and Exploratory Data Analysis. Wiley, New York (2000) P.J. Huber: Robust Statistics. Wiley, New York ffl (2004) Robust Statistics p.96/98
97 Key references R. Maronna, D. Martin, V. Yohai: Robust ffl Statistics: Theory and Methods. Wiley, Toronto (2006) P.J. Rousseeuw, A.M. Leroy: Robust Regression ffl and Outlier Detection. Wiley, New York (1987) Robust Statistics p.97/98
98 Software R: ffl Library: MASS ffl Library: robustbase ffl Library: rrcov ffl Robust Statistics p.98/98
Indian Statistical Institute
Indian Statistical Institute Introductory Computer programming Robust Regression methods with high breakdown point Author: Roll No: MD1701 February 24, 2018 Contents 1 Introduction 2 2 Criteria for evaluating
More informationMeasuring robustness
Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely
More informationROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY
ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com
More informationIntroduction to Robust Statistics. Elvezio Ronchetti. Department of Econometrics University of Geneva Switzerland.
Introduction to Robust Statistics Elvezio Ronchetti Department of Econometrics University of Geneva Switzerland Elvezio.Ronchetti@metri.unige.ch http://www.unige.ch/ses/metri/ronchetti/ 1 Outline Introduction
More informationLecture 12 Robust Estimation
Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Copyright These lecture-notes
More informationRobust regression in R. Eva Cantoni
Robust regression in R Eva Cantoni Research Center for Statistics and Geneva School of Economics and Management, University of Geneva, Switzerland April 4th, 2017 1 Robust statistics philosopy 2 Robust
More informationON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX
STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information
More informationStudy Sheet. December 10, The course PDF has been updated (6/11). Read the new one.
Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is
More informationIntroduction Robust regression Examples Conclusion. Robust regression. Jiří Franc
Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics
More informationMIT Spring 2015
MIT 18.443 Dr. Kempthorne Spring 2015 MIT 18.443 1 Outline 1 MIT 18.443 2 Batches of data: single or multiple x 1, x 2,..., x n y 1, y 2,..., y m w 1, w 2,..., w l etc. Graphical displays Summary statistics:
More informationA Brief Overview of Robust Statistics
A Brief Overview of Robust Statistics Olfa Nasraoui Department of Computer Engineering & Computer Science University of Louisville, olfa.nasraoui_at_louisville.edu Robust Statistical Estimators Robust
More informationIMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE
IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationDefinitions of ψ-functions Available in Robustbase
Definitions of ψ-functions Available in Robustbase Manuel Koller and Martin Mächler July 18, 2018 Contents 1 Monotone ψ-functions 2 1.1 Huber.......................................... 3 2 Redescenders
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationRegression Analysis for Data Containing Outliers and High Leverage Points
Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain
More informationA Robust Strategy for Joint Data Reconciliation and Parameter Estimation
A Robust Strategy for Joint Data Reconciliation and Parameter Estimation Yen Yen Joe 1) 3), David Wang ), Chi Bun Ching 3), Arthur Tay 1), Weng Khuen Ho 1) and Jose Romagnoli ) * 1) Dept. of Electrical
More informationMidwest Big Data Summer School: Introduction to Statistics. Kris De Brabanter
Midwest Big Data Summer School: Introduction to Statistics Kris De Brabanter kbrabant@iastate.edu Iowa State University Department of Statistics Department of Computer Science June 20, 2016 1/27 Outline
More informationA Modified M-estimator for the Detection of Outliers
A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationRobust statistics. Michael Love 7/10/2016
Robust statistics Michael Love 7/10/2016 Robust topics Median MAD Spearman Wilcoxon rank test Weighted least squares Cook's distance M-estimators Robust topics Median => middle MAD => spread Spearman =>
More informationROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS
ROBUST TESTS BASED ON MINIMUM DENSITY POWER DIVERGENCE ESTIMATORS AND SADDLEPOINT APPROXIMATIONS AIDA TOMA The nonrobustness of classical tests for parametric models is a well known problem and various
More informationREGRESSION ANALYSIS AND ANALYSIS OF VARIANCE
REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE P. L. Davies Eindhoven, February 2007 Reading List Daniel, C. (1976) Applications of Statistics to Industrial Experimentation, Wiley. Tukey, J. W. (1977) Exploratory
More informationUnit Two Descriptive Biostatistics. Dr Mahmoud Alhussami
Unit Two Descriptive Biostatistics Dr Mahmoud Alhussami Descriptive Biostatistics The best way to work with data is to summarize and organize them. Numbers that have not been summarized and organized are
More informationSTP 420 INTRODUCTION TO APPLIED STATISTICS NOTES
INTRODUCTION TO APPLIED STATISTICS NOTES PART - DATA CHAPTER LOOKING AT DATA - DISTRIBUTIONS Individuals objects described by a set of data (people, animals, things) - all the data for one individual make
More information-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).
Chemometrics Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information. -However, this definition can be expanded to include: biology (biometrics), environmental
More informationDescriptive Statistics-I. Dr Mahmoud Alhussami
Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.
More information2.1 Measures of Location (P.9-11)
MATH1015 Biostatistics Week.1 Measures of Location (P.9-11).1.1 Summation Notation Suppose that we observe n values from an experiment. This collection (or set) of n values is called a sample. Let x 1
More informationCOMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION
(REFEREED RESEARCH) COMPARISON OF THE ESTIMATORS OF THE LOCATION AND SCALE PARAMETERS UNDER THE MIXTURE AND OUTLIER MODELS VIA SIMULATION Hakan S. Sazak 1, *, Hülya Yılmaz 2 1 Ege University, Department
More informationCOMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY
COMPARING ROBUST REGRESSION LINES ASSOCIATED WITH TWO DEPENDENT GROUPS WHEN THERE IS HETEROSCEDASTICITY Rand R. Wilcox Dept of Psychology University of Southern California Florence Clark Division of Occupational
More informationBreakdown points of Cauchy regression-scale estimators
Breadown points of Cauchy regression-scale estimators Ivan Mizera University of Alberta 1 and Christine H. Müller Carl von Ossietzy University of Oldenburg Abstract. The lower bounds for the explosion
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More informationEfficient and Robust Scale Estimation
Efficient and Robust Scale Estimation Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline Introduction and motivation The robust scale estimator
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationRobust model selection criteria for robust S and LT S estimators
Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often
More informationQuantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression
Quantitative Understanding in Biology Module II: Model Parameter Estimation Lecture I: Linear Correlation and Regression Correlation Linear correlation and linear regression are often confused, mostly
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationRobust scale estimation with extensions
Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More information9. Robust regression
9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................
More informationLeverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.
Leverage Some cases have high leverage, the potential to greatly affect the fit. These cases are outliers in the space of predictors. Often the residuals for these cases are not large because the response
More informationLecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:
Lecture 2 Quantitative variables There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data: Stemplot (stem-and-leaf plot) Histogram Dot plot Stemplots
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationDiagnostics and Transformations Part 2
Diagnostics and Transformations Part 2 Bivariate Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University Multilevel Regression Modeling, 2009 Diagnostics
More information8. Nonstandard standard error issues 8.1. The bias of robust standard errors
8.1. The bias of robust standard errors Bias Robust standard errors are now easily obtained using e.g. Stata option robust Robust standard errors are preferable to normal standard errors when residuals
More informationFigure 1. Sketch of various properties of an influence function. Rejection point
Robust Filtering of NMR Images Petr Hotmar and Jarom r Kukal Prague Institute of Chemical Technology, Faculty of Chemical Engineering, Department of Computing and Control Engineering Introduction The development
More informationBIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES
BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationDescribing Distributions with Numbers
Topic 2 We next look at quantitative data. Recall that in this case, these data can be subject to the operations of arithmetic. In particular, we can add or subtract observation values, we can sort them
More informationAP Statistics Cumulative AP Exam Study Guide
AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics
More information1 Introduction 1. 2 The Multiple Regression Model 1
Multiple Linear Regression Contents 1 Introduction 1 2 The Multiple Regression Model 1 3 Setting Up a Multiple Regression Model 2 3.1 Introduction.............................. 2 3.2 Significance Tests
More informationUnits. Exploratory Data Analysis. Variables. Student Data
Units Exploratory Data Analysis Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 13th September 2005 A unit is an object that can be measured, such as
More informationA Comparison of Robust Estimators Based on Two Types of Trimming
Submitted to the Bernoulli A Comparison of Robust Estimators Based on Two Types of Trimming SUBHRA SANKAR DHAR 1, and PROBAL CHAUDHURI 1, 1 Theoretical Statistics and Mathematics Unit, Indian Statistical
More informationIntroduction to Linear Regression
Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46
More informationDescriptive Data Summarization
Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning
More informationStatistics for Engineering, 4C3/6C3 Assignment 2
Statistics for Engineering, 4C3/6C3 Assignment 2 Kevin Dunn, kevin.dunn@mcmaster.ca Due date: 23 January 2014 Assignment objectives: interpreting data visualizations; univariate data analysis Question
More informationDetermining the Spread of a Distribution
Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative
More informationLinear Regression Model. Badr Missaoui
Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus
More informationA Short Course in Basic Statistics
A Short Course in Basic Statistics Ian Schindler November 5, 2017 Creative commons license share and share alike BY: C 1 Descriptive Statistics 1.1 Presenting statistical data Definition 1 A statistical
More informationDetermining the Spread of a Distribution
Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58 Outline 1 Describing Quantitative
More informationF78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives
F78SC2 Notes 2 RJRC Algebra It is useful to use letters to represent numbers. We can use the rules of arithmetic to manipulate the formula and just substitute in the numbers at the end. Example: 100 invested
More informationWhy is the field of statistics still an active one?
Why is the field of statistics still an active one? It s obvious that one needs statistics: to describe experimental data in a compact way, to compare datasets, to ask whether data are consistent with
More informationCorrelated Data: Linear Mixed Models with Random Intercepts
1 Correlated Data: Linear Mixed Models with Random Intercepts Mixed Effects Models This lecture introduces linear mixed effects models. Linear mixed models are a type of regression model, which generalise
More informationHighly Robust Variogram Estimation 1. Marc G. Genton 2
Mathematical Geology, Vol. 30, No. 2, 1998 Highly Robust Variogram Estimation 1 Marc G. Genton 2 The classical variogram estimator proposed by Matheron is not robust against outliers in the data, nor is
More informationStat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov
Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT Nov 20 2015 Charlotte Wickham stat511.cwick.co.nz Quiz #4 This weekend, don t forget. Usual format Assumptions Display 7.5 p. 180 The ideal normal, simple
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationIntroduction to robust statistics*
Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate
More informationSAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c
Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More informationWeek 7.1--IES 612-STA STA doc
Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ
More informationStatistics for Python
Statistics for Python An extension module for the Python scripting language Michiel de Hoon, Columbia University 2 September 2010 Statistics for Python, an extension module for the Python scripting language.
More informationPackage ForwardSearch
Package ForwardSearch February 19, 2015 Type Package Title Forward Search using asymptotic theory Version 1.0 Date 2014-09-10 Author Bent Nielsen Maintainer Bent Nielsen
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationAn Introduction to Descriptive Statistics. 2. Manually create a dot plot for small and modest sample sizes
Living with the Lab Winter 2013 An Introduction to Descriptive Statistics Gerald Recktenwald v: January 25, 2013 gerry@me.pdx.edu Learning Objectives By reading and studying these notes you should be able
More informationRobust estimation of scale and covariance with P n and its application to precision matrix estimation
Robust estimation of scale and covariance with P n and its application to precision matrix estimation Garth Tarr, Samuel Müller and Neville Weber USYD 2013 School of Mathematics and Statistics THE UNIVERSITY
More informationSmall Sample Corrections for LTS and MCD
myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationMath 2311 Written Homework 6 (Sections )
Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More information13: Additional ANOVA Topics. Post hoc Comparisons
13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Post hoc Comparisons In the prior chapter we used ANOVA
More informationElementary Statistics
Elementary Statistics Q: What is data? Q: What does the data look like? Q: What conclusions can we draw from the data? Q: Where is the middle of the data? Q: Why is the spread of the data important? Q:
More informationLesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table
Lesson Plan Answer Questions Summary Statistics Histograms The Normal Distribution Using the Standard Normal Table 1 2. Summary Statistics Given a collection of data, one needs to find representations
More informationBeam Example: Identifying Influential Observations using the Hat Matrix
Math 3080. Treibergs Beam Example: Identifying Influential Observations using the Hat Matrix Name: Example March 22, 204 This R c program explores influential observations and their detection using the
More informationRobustness of location estimators under t- distributions: a literature review
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Robustness of location estimators under t- distributions: a literature review o cite this article: C Sumarni et al 07 IOP Conf.
More informationRobustness and Distribution Assumptions
Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationChapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.
Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution. 1 Histograms p53 The breakfast cereal data Study collected data on nutritional
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationThe entire data set consists of n = 32 widgets, 8 of which were made from each of q = 4 different materials.
One-Way ANOVA Summary The One-Way ANOVA procedure is designed to construct a statistical model describing the impact of a single categorical factor X on a dependent variable Y. Tests are run to determine
More informationEXTENDING PARTIAL LEAST SQUARES REGRESSION
EXTENDING PARTIAL LEAST SQUARES REGRESSION ATHANASSIOS KONDYLIS UNIVERSITY OF NEUCHÂTEL 1 Outline Multivariate Calibration in Chemometrics PLS regression (PLSR) and the PLS1 algorithm PLS1 from a statistical
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationunadjusted model for baseline cholesterol 22:31 Monday, April 19,
unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol
More informationFrequency Distribution Cross-Tabulation
Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape
More informationA SHORT COURSE ON ROBUST STATISTICS. David E. Tyler Rutgers The State University of New Jersey. Web-Site dtyler/shortcourse.
A SHORT COURSE ON ROBUST STATISTICS David E. Tyler Rutgers The State University of New Jersey Web-Site www.rci.rutgers.edu/ dtyler/shortcourse.pdf References Huber, P.J. (1981). Robust Statistics. Wiley,
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationISQS 5349 Final Exam, Spring 2017.
ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC
More information