Y i = η + ɛ i, i = 1,...,n.

Similar documents
Nonparametric tests. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 704: Data Analysis I

1. Simple Linear Regression

Simple and Multiple Linear Regression

Simple Linear Regression

Regression Models - Introduction

Statistics for Engineers Lecture 9 Linear Regression

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

Bias Variance Trade-off

Inference in Regression Analysis

Business Statistics. Lecture 10: Course Review

Econometrics Summary Algebraic and Statistical Preliminaries

Formal Statement of Simple Linear Regression Model

STAT Chapter 11: Regression

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Nonparametric hypothesis tests and permutation tests

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Confidence Intervals, Testing and ANOVA Summary

ECON The Simple Regression Model

Unit 14: Nonparametric Statistical Methods

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Regression Models - Introduction

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

Chapter 1 Linear Regression with One Predictor

STA 302f16 Assignment Five 1

Ch 2: Simple Linear Regression

Statistics 135 Fall 2008 Final Exam

Chapter 14. Linear least squares

Chapter 1. Linear Regression with One Predictor Variable

Linear Model Under General Variance

Introduction to Simple Linear Regression

Section 3: Simple Linear Regression

Density Temp vs Ratio. temp

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

Inference in Normal Regression Model. Dr. Frank Wood

Chapter 7 Comparison of two independent samples

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Statistics 112 Simple Linear Regression Fuel Consumption Example March 1, 2004 E. Bura

Linear models and their mathematical foundations: Simple linear regression

Chapter 3: Regression Methods for Trends

Business Statistics. Lecture 9: Simple Regression

Comparison of Two Population Means

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

The Simple Linear Regression Model

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Lectures 5 & 6: Hypothesis Testing

Motivation for multiple regression

STAT5044: Regression and Anova. Inyoung Kim

Simple Linear Regression Model & Introduction to. OLS Estimation

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Review of Econometrics

Lecture 3: Multiple Regression

Two-Variable Regression Model: The Problem of Estimation

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Model Checking and Improvement

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Scatter plot of data from the study. Linear Regression

9 Correlation and Regression

Introduction to Statistical Data Analysis III

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

STAT 461/561- Assignments, Year 2015

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

Introduction to Estimation Methods for Time Series models. Lecture 1

Simple Linear Regression: The Model

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Can you tell the relationship between students SAT scores and their college grades?

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

1 Introduction to Generalized Least Squares

Nonparametric Statistics

Basic Probability Reference Sheet

Scatter plot of data from the study. Linear Regression

Lecture 14 Simple Linear Regression

Regression with Nonlinear Transformations

13 Simple Linear Regression

Review of Statistics 101

Review for Final Exam Stat 205: Statistics for the Life Sciences

Linear Regression & Correlation

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Transition Passage to Descriptive Statistics 28

Estadística II Chapter 4: Simple linear regression

Final Exam. Economics 835: Econometrics. Fall 2010

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Chapter 7: Simple linear regression

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Xβ is a linear combination of the columns of X: Copyright c 2010 Dan Nettleton (Iowa State University) Statistics / 25 X =

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

STAT 4385 Topic 03: Simple Linear Regression

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

Lecture 4 Multiple linear regression

Multivariate Regression

Transcription:

Nonparametric tests If data do not come from a normal population (and if the sample is not large), we cannot use a t-test. One useful approach to creating test statistics is through the use of rank statistics. Sign test for one population The sign test assumes the data come from from a continuous distribution with model Y i = η + ɛ i, i = 1,...,n. η is the population median and ɛ i follow an unknown, continuous distribution. Want to test H 0 : η = η 0 where η 0 is known versus one of the three common alternatives: H 0 : η < η 0, H 0 : η η 0, or H 0 : η > η 0. 1

Test statistic is B = n i=1 I{y i > η 0 }, the number of y 1,...,y n larger than η 0. Under H 0, B bin(n, 0.5). Reject H 0 if B is unusual relative to this binomial distribution. Example: Eye relief data. Question: How would you form a large sample test statistic from B? You would not need to do that here, but this is common with more complex test statistics with non-standard distributions. 2

Wilcoxin signed rank test Again, test H 0 : η = η 0. However, this method assumes a symmetric pdf around the median η. Test statistic built from ranks of { y 1 η 0, y 2 η 0,..., y n η 0 }, denoted R 1,...,R n. The signed rank for observation i is R + R i y i > η 0 i = 0 y i η 0. The signed rank statistic is W + = n i=1 R+ i. If W + is large, this is evidence that η > η 0. If W + is small, this is evidence that η < η 0. 3

Both the sign test and the signed-rank test can be used with paired data (e.g. we could test whether the median difference is zero). When to use what? Use t-test when data are approximately normal, or in large sample sizes. Use sign test when data are highly skewed, multimodal, etc. Use signed rank test when data are approximately symmetric but non-normal (e.g. heavy or light-tailed, multimodal yet symmetric, etc.) Example: Weather station data Note: The sign test and signed-rank test are more flexible than the t-test because they require less strict assumptions, but the t-test has more power when the data are approximately normal. 4

Mann-Whitney-Wilcoxin nonparametric test (pp. 795 796) The Mann-Whitney test assumes Y 11,...Y 1n1 iid F 1 independent Y 21,...,Y 2n2 iid F 2, where F 1 is the cdf of data from the first group and F 2 is the cdf of data from the second group. The null hypothesis is H 0 : F 1 = F 2, i.e. that the distributions of data in the two groups are identical. The alternative is H 1 : F 1 F 2. One-sided tests can also be performed. Although the test statistic is built assuming F 1 = F 2, the alternative is often taken to be that the population medians are unequal. This is a fine way to report the results of the test. Additionally, a CI will give a plausible range for in the shift model F 2 (x) = F 1 (x ). can be the difference in medians or means. 5

The Mann-Whitney test is intuitive. The data are y 11, y 12,...,y 1n1 and y 21, y 22,...,y 2n2. For each observation j in the first group count the number of observations in the second group c j that are smaller; ties result in adding 0.5 to the count. Assuming H 0 is true, on average half the observations in group 2 would be above Y 1j and half would be below if they come from the same distribution. That is E(c j ) = 0.5n 2. The sum of these guys is U = n 1 j=1 c j and has mean E(U) = 0.5n 1 n 2. The variance is a bit harder to derive, but is Var(U) = n 1 n 2 (n 1 + n 2 + 1)/12. 6

Something akin to the CLT tells us Z 0 = U E(U) n1 n 2 (n 1 + n 2 + 1)/12 N(0, 1), when H 0 is true. Seeing a U far away from what we expect under the null gives evidence that H 0 is false; U is then standardized as usual (subtract off then mean we expect under the null and standardize by an estimate of the standard deviation of U). A p-value can be computed as usual as well as a CI. Example: Dental measurements. Note: This test essentially boils down to replacing the observed values with their ranks and carrying out a simple pooled t-test! 7

Regression models Model: a mathematical approximation of the relationship among real quantities (equation & assumptions about terms). We have seen several models for a single outcome variable from either one or two populations. Now we ll consider models that relate an outcome to one or more continuous predictors. 8

Section 1.1: relationships between variables A functional relationship between two variables is deterministic, e.g. Y = cos(2.1x) + 4.7. Although often an approximation to reality (e.g. the solution to a differential equation under simplifying assumptions), the relation itself is perfect. (e.g. page 3) A statistical or stochastic relationship introduces some error in seeing Y, typically a functional relationship plus noise. (e.g. Figures 1.1, 1.2, and 1.3; pp. 4 5). Statistical relationship: not a perfect line or curve, but a general tendency plus slop. 9

Example: x y 1 1 2 1 3 2 4 2 5 4 Let s obtain the plot in R: x=c(1,2,3,4,5); y=c(1,1,2,2,4) Must decide what is the proper functional form for this relationship, i.e. trend: Linear? Curved? Piecewise continuous? 10

Section 1.3: Simple linear regression model For a sample of pairs {(x i, Y i )} n i=1, let where Y i = β 0 + β 1 + ɛ i, i = 1,...,n, Y 1,...,Y n are realizations of the response variable, x 1,...,x n are the associated predictor variables, β 0 is the intercept of the regression line, β 1 is the slope of the regression line, and ɛ 1,...,ɛ n are unobserved, uncorrelated random errors. This model assumes that x and Y are linearly related, i.e. the mean of Y changes linearly with x. 11

Assumptions about the random errors We assume that E(ɛ i ) = 0 and var(ɛ i ) = σ 2, and corr(ɛ i, ɛ j ) = 0 for i j: mean zero, constant variance, uncorrelated. β 0 + β 1 x i is the deterministic part of the model. It is fixed but unknown. ɛ i represents the random part of the model. The goal of statistics is often to separate signal from noise; which is which here? 12

Note that E(Y i ) = E(β 0 + β 1 x i + ɛ i ) = β 0 + β 1 x i + E(ɛ i ) = β 0 + β 1 x i, and similarly var(y i ) = var(β 0 + β 1 x i + ɛ i ) = var(ɛ i ) = σ 2. Also, corr(y i, Y j ) = 0 for i j. (We will draw a picture...) 13

Example: Pages 10 11 (see picture). We have E(Y ) = 9.5 + 2.1x. When x = 45, our expected y-value is 104, but we will actually observe a value somewhere around 104. What does 9.5 represent here? Is it sensible/interpretable? How is 2.1 interpreted here? In general, β 1 represents how the mean response changes when x is increased one unit. 14

β1 β1 Matrix form: Note the simple linear regression model can be written in matrix terms as Y 1 1 x 1 ɛ 1 Y 2. = 1 x 2.. β 0 + ɛ 2., Y n 1 x n ɛ n or equivalently where Y = Xβ + ɛ, Y 1 1 x 1 ɛ 1 Y = Y 2.,X = 1 x 2.., β = β 0, ɛ = ɛ 2.. Y n 1 x n ɛ n This will be useful later on. 15

Section 1.6: Estimation of (β 0, β 1 ) β 0 and β 1 are unknown parameters to be estimated from the data: (x 1, Y 1 ), (x 2, Y 2 ),...,(x n, Y n ). They completely determine the unknown mean at each value of x: E(Y ) = β 0 + β 1 x. Since we expect the various Y i to be both above and below β 0 + β 1 x i roughly the same amount (E(ɛ i ) = 0), a good-fitting line b 0 + b 1 x will go through the heart of the data points in a scatterplot. The method of least-squares formalizes this idea by minimizing the sum of the squared deviations of the observed y i from the line b 0 + b 1 x i. 16

Least squares method for estimating (β 0, β 1 ). The sum of squared deviations about the line is n Q(β 0, β 1 ) = [Y i (β 0 + β 1 x i )] 2. i=1 Least squares minimizes Q(β 0, β 1 ) over all (β 0, β 1 ). Calculus shows that the least squares estimators are b 1 = n i=1 (x i x)(y i Ȳ ) n i=1 (x i x) 2 b 0 = Ȳ b 1 x 17

Proof: Q β 1 = Q β 0 = n 2(Y i β 0 β 1 x i )( x i ) i=1 [ n n n = 2 x i Y i β 0 x i β 1 i=1 i=1 n 2(Y i β 0 β 1 x i )( 1) i=1 [ n ] n = 2 Y i nβ 0 β 1 x i. i=1 i=1 i=1 x 2 i ]. 18

Setting these equal to zero, and dropping indexes on the summations, we have xi Y i = b 0 xi + b 1 x 2 i normal equations Yi = nb 0 + b 1 xi Multiply the first by n and multiply the second by x i and subtract yielding n x i Y i x i Yi = b 1 [n ( ) ] 2 x 2 i xi. Solving for b 1 we get b 1 = n x i Y i x i Yi n x 2 i ( x i ) 2 = xi Y i nȳ x x 2 i n x 2. 19

The second normal equation immediately gives b 0 = Ȳ b 1 x. Our solution for b 1 is correct but not as aesthetically pleasing as the purported solution b 1 = n i=1 (x i x)(y i Ȳ ) n i=1 (x i x) 2. Show (xi x)(y i Ȳ ) = x i Y i nȳ x (xi x) 2 = x 2 i n x 2 20

The line Ŷ = b 0 + b 1 x is called the least squares estimated regression line. Why are the least squares estimates (b 0, b 1 ) good? 1. They are unbiased: E(b 0 ) = β 0 and E(b 1 ) = β 1. 2. Among all linear unbiased estimators, they have the smallest variance. They are best linear unbiased estimators, BLUEs. We will show the first property in the next class. The second property is formally called the Gauss-Markov theorem and is proved in linear models. (Look it up on Wikipedia if interested.) 21