df=degrees of freedom = n - 1

Similar documents
Regression Analysis: Exploring relationships between variables. Stat 251

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

16.3 One-Way ANOVA: The Procedure

Correlation and the Analysis of Variance Approach to Simple Linear Regression

ANOVA (Analysis of Variance) output RLS 11/20/2016

Correlation Analysis

General Linear Model (Chapter 4)

Chapter 11 - Lecture 1 Single Factor ANOVA

Lecture 3: Inference in SLR

Econ 3790: Business and Economic Statistics. Instructor: Yogesh Uppal

Statistical Techniques II EXST7015 Simple Linear Regression

Inference for Regression Simple Linear Regression

Chapter 14 Simple Linear Regression (A)

PLS205 Lab 2 January 15, Laboratory Topic 3

Inferences for Regression

ST Correlation and Regression

Simple Linear Regression

Statistics for Managers using Microsoft Excel 6 th Edition

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

COMPLETELY RANDOM DESIGN (CRD) -Design can be used when experimental units are essentially homogeneous.

Chapter 12 - Lecture 2 Inferences about regression coefficient

1 Introduction to One-way ANOVA

Chapter 16. Simple Linear Regression and Correlation

Regression Analysis II

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Lecture 10 Multiple Linear Regression

A discussion on multiple regression models

Inference for Regression Inference about the Regression Model and Using the Regression Line

Analysis of Variance

ST505/S697R: Fall Homework 2 Solution.

STAT 3A03 Applied Regression With SAS Fall 2017

Inference for the Regression Coefficient

Basic Business Statistics 6 th Edition

Ordinary Least Squares Regression Explained: Vartanian

Chapter 16. Simple Linear Regression and dcorrelation

The Simple Linear Regression Model

Simple linear regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

The Multiple Regression Model

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

y response variable x 1, x 2,, x k -- a set of explanatory variables

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

SIMPLE REGRESSION ANALYSIS. Business Statistics

Ordinary Least Squares Regression Explained: Vartanian

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

9. Linear Regression and Correlation

Topic 20: Single Factor Analysis of Variance

Lecture notes on Regression & SAS example demonstration

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Lecture 11: Simple Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

REVIEW 8/2/2017 陈芳华东师大英语系

STOR 455 STATISTICAL METHODS I

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Chapter 4: Regression Models

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

Ch 2: Simple Linear Regression

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Math 3330: Solution to midterm Exam

ECO220Y Simple Regression: Testing the Slope

Review of Statistics 101

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Two-Sample Inferential Statistics

Simple Linear Regression: One Quantitative IV

Ch 3: Multiple Linear Regression

Confidence Intervals, Testing and ANOVA Summary

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Simple Linear Regression: One Qualitative IV

Regression Models - Introduction

Business Statistics. Lecture 10: Correlation and Linear Regression

STA 6167 Exam 1 Spring 2016 PRINT Name

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Lecture 1 Linear Regression with One Predictor Variable.p2

Week 7.1--IES 612-STA STA doc

A Re-Introduction to General Linear Models (GLM)

Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

Business Statistics. Lecture 10: Course Review

EX1. One way ANOVA: miles versus Plug. a) What are the hypotheses to be tested? b) What are df 1 and df 2? Verify by hand. , y 3

STATS Analysis of variance: ANOVA

Chapter 7 Comparison of two independent samples

Categorical Predictor Variables

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Can you tell the relationship between students SAT scores and their college grades?

Chapter 4. Regression Models. Learning Objectives

Sociology 6Z03 Review II

An inferential procedure to use sample data to understand a population Procedures

Chapter 11 - Lecture 1 Single Factor ANOVA

Transcription:

One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses: H 0 : μ = μ 0 H A : μ μ 0 or use > or < in place of Most software, the default sign of the alternative hypothesis is. In SAS (and other programs), the alternative can be changed. t = x μ 0 s n df=degrees of freedom = n - 1 Rejection Criteria: Reject Ho if pvalue α Assume that α = 0.05 unless stated otherwise General form of PROC TTEST: PROC TTEST <options>; CLASS variable; VAR variable(s); PAIRED variables;... RUN; Options for PROC TTEST statement: DATA= SAS-dataset ALPHA= specifies the significance level H0= specifies the null value (μ 0 ) SIDES= specifies one or two-tailed test, U, L CI= requests confidence intervals for the standard deviation or the coefficient of variation PLOTS produces statistical graphs (histograms and QQ plot)

-sample t-tests Independent samples: Pooled t-test assumes that σ 1 σ Unpooled t-test (Satterthwaite) assumes σ 1 σ H 0 : μ 1 = μ H A : μ 1 μ (can use > or <) Or H 0 : μ 1 μ = 0 H A : μ 1 μ 0 Pooled t calculation: t = (x 1 x ) 0 s p ( 1 n 1 + 1 n ) Where s p = s 1 (n 1 1)+s (n 1) n 1 +n and df = n 1 + n Unpooled t calculation: t = (x 1 x ) 0 s 1 n + s 1 n ( s 1 n + s df = 1 n ) ( s 1 s n 1 ) ( n 1 1 + n ) n 1 But no thank you! A decent approximation is: df min (n 1 1, n 1) and SAS will calculate it for us. Reject Ho if pvalue α Dependent samples (paired t-test): Assumptions are the same as the independent samples tests but rather than independent random samples, the samples are dependent (still random) because each unit or subject is

measured twice once before treatment and once after treatment. The treatment does not have to be a literal treatment as it can be time (like GDP measured in countries over two time periods). Hypotheses: H 0 : μ D = Δ 0 H A : μ D Δ 0 (can use > or <) This test is done on the differences between the two measurements on each subject/unit. Once the differences are calculated, the test is equivalent to a one-sample test of the mean. d i = x i y j where x and y are the two measurements per subject. d = average difference = d i n s d = st. dev. of differences = d i ( d i ) /n n 1 t = d Δ 0 s d n and df = n 1 Reject Ho if pvalue α Simple Linear regression (aka least-squares regression, etc.): The main purpose of regression is to explore the relationship between variables x and y, where x is the independent (explanatory) variable and y is the dependent (response) variable. Most often we calculate a regression equation (y = mx + b) to use in interpolation (estimation of y with an x that is in the known range of x values). Population regression model: y = β 0 + β 1 x i + ε i where: β 0 is the intercept when x = 0 β 1 is the slope ε i is the random error (residual) term Sample regression equation: y = β 0 + β 1x i The hats mean estimated values. Note that the error term drops off. This is based on the assumptions of regression:

1. E(ε i ) = 0 where ε i = y y The residuals have a mean of 0.. V(ε i ) = σ ε The variance of the residuals is homogeneous (constant for all values of y ). 3. Cov(ε i, ε i ) = 0 The residuals are independent (if the covariance between variables is 0, then the variables are independent) 4. ε i ~N(0, σ ε ) The residuals should have an approximate normal distribution with mean 0 and variance σ ε. (the underlying assumption is that we do, in fact, have a linear relationship between x and y) There is a lot more to regression than we get a chance to look at but this is a good start. β 1 = (x i x )(y i y ) (x i x ) β 0 = y β 1x Correlation: Population notation: ρ (rho) Sample notation: r While exploring the relationship between x and y, we will want to know how strong the relationship is. Correlation will explore the strength and direction (as in direction of the slope) of the linear relationship between x and y. Correlation is unitless as in it has no units of measurement associated with it 1 r 1 where the endpoints (-1,1) are perfect relationships and the closer it is to 0, the weaker the relationship is or a lack of relationship overall r makes no distinction between x and y units of measurement can be changed and it will not change the correlation Some examples of different r values:

Ways to test if the relationship between x and y is significant (SAS does out of the 3 ways in PROC REG): test the slope test the overall model test the correlation 1. Slope test: It is a t-test of the following hypotheses: H 0 : β 1 = 0 H A : β 1 0 (can use < or > and the value can be something other than 0 if you are testing for a change in the slope rather than just significance of the slope) t = β 1 0 se β 1 and df = n k where k = # estimated parameters (in SLR, k = ). Overall F test: It is a test of the following hypotheses about slopes: H 0 : β 1 = β = = β k = 0 H A : At least one β k 0 But, in SLR, there is only one slope so it is the same result as the t-test for the slope F = MSregression MSE

3. Correlation test (not included in PROC REG): It is a test of the following hypotheses about the correlation: H 0 : ρ = 0 H A : ρ 0 (can use < or > and the value can be something other than 0 if you are testing for a change in the correlation rather than just significance of the correlation) t = r n 1 r Experimental design for Complete Randomized Design (CRD): The limitation of the -sample tests is what happens when you have more than samples? You cannot just do several -sample tests because the Type I error rate will increase. Analysis of variance is for when you have more than two means to compare. However, its drawback is that it will not detect where the differences are, just that there are differences. The assumptions are the same as SLR. Hypotheses: H 0 : μ 1 = μ = = μ t H A : at least one μ i differs Source df SS MS F Treatment t-1 SSTr MSTr MSTr/MSE Error N-t SSE MSE n/a Total N-1 TSS n/a n/a t = number of treatment groups (factor levels) N = total number of observations in experiment SSTr = Sum of Squares for treatment MSTr = Mean square for treatment SSE = Sum of squares for error (residuals) MSE = Mean square error (aka residual variance σ ε ) SSTr = n i (x i. x..) SSE = s i (n i 1) MSTr = SSTr t 1 MSE = SSE N t The only thing ANOVA can do by itself is tell you that there is a difference in one of the group means and nothing else without a follow-up multiple comparison.