One-sided and two-sided t-test

Similar documents
Data analysis and Geostatistics - lecture VII

Inferences for Regression

ST Correlation and Regression

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

SIMPLE REGRESSION ANALYSIS. Business Statistics

Analysing data: regression and correlation S6 and S7

Chapter 4. Regression Models. Learning Objectives

Chapter 14 Student Lecture Notes 14-1

Ch 2: Simple Linear Regression

Correlation Analysis

Simple Linear Regression

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Lectures 5 & 6: Hypothesis Testing

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Introduction to Linear regression analysis. Part 2. Model comparisons

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Analysis of Bivariate Data

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

CS 5014: Research Methods in Computer Science

Simple Linear Regression: One Qualitative IV

Accelerated Advanced Algebra. Chapter 1 Patterns and Recursion Homework List and Objectives

Variance Decomposition and Goodness of Fit

df=degrees of freedom = n - 1

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Ch. 1: Data and Distributions

s e, which is large when errors are large and small Linear regression model

Chapter 14 Simple Linear Regression (A)

Econometrics. 4) Statistical inference

Statistics for Managers using Microsoft Excel 6 th Edition

STA 4210 Practise set 2a

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Correlation and Regression

Chapter 9 - Correlation and Regression

A discussion on multiple regression models

Simple Linear Regression: One Quantitative IV

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Chapter 4: Regression Models

Linear Correlation and Regression Analysis

Data analysis and Geostatistics - lecture VI

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

AMS 7 Correlation and Regression Lecture 8

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Inference for Regression

Measuring the fit of the model - SSR

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Review of Section 1.1. Mathematical Models. Review of Section 1.1. Review of Section 1.1. Functions. Domain and range. Piecewise functions

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

The Multiple Regression Model

Chapter 13. Multiple Regression and Model Building

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Checking model assumptions with regression diagnostics

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Correlation and Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Chapter 7 Student Lecture Notes 7-1

Lecture 11: Simple Linear Regression

Important note: Transcripts are not substitutes for textbook assignments. 1

Chapter 16. Simple Linear Regression and Correlation

TOPIC 9 SIMPLE REGRESSION & CORRELATION

Prob/Stats Questions? /32

Mathematics for Economics MA course

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Residuals from regression on original data 1

Single and multiple linear regression analysis

Simple Linear Regression Using Ordinary Least Squares

Linear Regression Model. Badr Missaoui

Data Analysis. with Excel. An introduction for Physical scientists. LesKirkup university of Technology, Sydney CAMBRIDGE UNIVERSITY PRESS

Simple Linear Regression

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

Sociology 6Z03 Review II

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Hypothesis Testing hypothesis testing approach

Stat 501, F. Chiaromonte. Lecture #8

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Ma 3/103: Lecture 25 Linear Regression II: Hypothesis Testing and ANOVA

3 Variables: Cyberloafing Conscientiousness Age

Unit 10: Simple Linear Regression and Correlation

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Can you tell the relationship between students SAT scores and their college grades?

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

LI EAR REGRESSIO A D CORRELATIO

Basic Business Statistics 6 th Edition

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Review of Statistics 101

STA121: Applied Regression Analysis

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Data Set 8: Laysan Finch Beak Widths

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Ch Inference for Linear Regression

Outline. Remedial Measures) Extra Sums of Squares Standardized Version of the Multiple Regression Model

Multiple Linear Regression

Transcription:

One-sided and two-sided t-test Given a mean cancer rate in Montreal, 1. What is the probability of finding a deviation of > 1 stdev from the mean? 2. What is the probability of finding 1 stdev more cases? -5.0-4.0-3.0-2.0-1.0 0.0 1.0 2.0 3.0 4.0 5.0 1 Regression analysis - ANOVA Distribution of variance in regression analysis var source sum of squares d.f. variance regression SSR 1 MSR deviation SSD n - 2 MSD total SSTOT n - 1 what are the d.f. for each contribution? deviation: need β1 and β0 coefficients to determine the predicted value of Y, which you need for SSD, so the d.f. = n - 2 regression: only 1 degree of freedom as the slope fixes the relation between the variables; can only shift curve up or down total d.f.: sum of the d.f. regression and d.f. deviation 3

Regression analysis - ANOVA var source sum of squares d.f. variance regression SSR 1 MSR deviation SSD n - 2 MSD total SSTOT n - 1 variance = sum of squares divided by the degrees of freedom: s 2 D = MSD = SSD / n-2 and s 2 R = MSR = SSR / 1 This can be used to determine whether the regression fit is significant following our earlier ANOVA approach: MSR has to be significantly larger than MSD at alpha: F-test on the ratio of MSR and MSD (H0; MSR = MSD) 4 Regression analysis What if the fit is not significant? 1. there is no correlation between the variables plot the data in a scatter diagram and check 2. the correlation is weak and not significant due to lack of data obtain more data or accept a larger value of alpha 3. the data are correlated, but the correlation is not linear repeat the same exercise using a more appropriate curve: quadratic: exponential: reciprocal: Y = b1x + b2x 2 + b0 Y = b0 EXP(b1X) Y = 1 / (b1x + b0) multiple linear: Y = b1x1 + b2x2 + b3x3 + b0 5

Linear regression with the statistics package PAST data linear fit 6 Linear regression with the statistics package PAST data linear fit - statistics 7

Linear regression with the statistics package PAST data linear fit: are the coefficients significant? H0; a = 0, b = 0 HA; a 0, b 0 t α,df = (a - 0) / stdev t α,df = (b - 0) / stdev t (slope) calc = 8.16 > t α,df = 2.08 -> reject H0 t (intercept) calc = -1.59 < t α,df = -2.08 -> accept H0 8 Linear regression with the statistics package PAST data linear fit: is it significant? SSD = = SSResidual SSR = mean Y = 340 SSTOT = SSR + SSD R 2 = SSR SSTOT SSD SSR SSTOT 345975 1153347 1499321 R 2 = 0.77 9

Linear regression with the statistics package PAST var source sum of squares d.f. variance regression SSR = 1153347 1 s 2 R = 1153347 deviation SSD = 345975 n - 2 = 20 s 2 D = 17299 total SSTOT = 1499321 n - 1 = 21 s 2 D = SSD / n-2 and s 2 R = SSR / 1 For the regression model to be meaningful, s 2 R has to be significantly larger than s 2 D at your chosen confidence level: F-test on the ratio of s 2 R and s 2 D (H0; s 2 R = s 2 D) Fcalc = 66.67 > F0.05,1,20 = 4.35 The model is meaningful 10 Linear regression with PAST F-ratio is sufficiently high that we can reject the H0 hypothesis: the regression fit explains a significant part of the total variance and is therefore meaningful residuals 11

Appropriate fit for this dataset Even though the linear regression fit is significant, it is not necessarily the most appropriate fit for the data linear fit 3 rd polynomial fit R 2 = 0.77 R 2 = 0.94 12 Cubic regression with PAST residuals F-ratio is higher than before: a more significant model for the data. Chance of obtaining this result purely by chance: 1 / 100000000000 13

Regression and curve fitting in PAST 14 Multiple linear regression with PAST the dependent is a linear combination of many independents: the composition of a soil will be the sum of the compositions of its constituents multiplied by their respective fraction in the soil: clay quartz plag micas organic Cu 25 0 5 120 2500 Pb 16 0.1 50 260 1200 Ni 8 0 3 14 890 Co 2 0 1 4 651 Zn 40 0.5 23 64 2200 Zr 8 4 16 4 56 Ti 120 8 8 140 80 Cu(soil) = Xclay*25 + Xqtz*0 + Xplag*5 + Xmicas*120 + Xorganic*2500 15

Multiple linear regression with PAST can derive the phase fractions by multiple linear regression independents dependent 16 Multiple linear regression with PAST can derive the phase fractions by multiple linear regression Pearson correlation coefficient matrix potential problem: organics strongly dominant control on soil composition 17

Multiple linear regression with PAST can derive the phase fractions by multiple linear regression The regression model: regression coefficients ± t-test on coeff. contribution to R 2 probability that coefficient is 0 18 Multiple linear regression with PAST can derive the phase fractions by multiple linear regression The regression model: very high Fcalc 19

Multiple linear regression with NCSS 20 Multiple linear regression with NCSS - checks no significant difference between regular and PRESS R 2 no significant variance inflation (VIF < 5-10) and tolerance close to 1 residuals are normally distributed 21

Multiple linear regression with NCSS no trends between the residuals and the (in)dependent variables very good regression fit that satisfies all the requirements for regression 22 Regression summary Regression analysis allows you to define a model for your data that is predictive (both interpolative and extrapolative) However, have to test that the model is meaningful by testing: 1. that the regression coefficients and the intercept are meaningful (if not, the non-significant ones need to be removed from the regression model) 2. that the overall model is significant (using an ANOVA analysis, R 2 is not sufficient) 3. that the assumptions are met (residual distribution) 4. that the model is not overly dependent on a single datapoint or variable 23

Robust regression Deviations from normality, such as outliers, can have a major impact on regression coefficients and invalidate results. Unfortunately such datasets cannot always be avoided: use robust regression 24 Robust regression - Sen slope One type of robust regression, which is especially suited to small sets of data is the Sen slope: The Sen slope involves calculating the slope of each combination of two data points, and then taking the median of these slopes as the robust characteristic slope slope = x / y for 5 data points: 10 slopes Sen slope = median(10 slopes) 25

Robust regression - Sen slope date 5-Dec 6-Dec 7-Dec 8-Dec 9-Dec 10-Dec 11-Dec 12-Dec 13-Dec 14-Dec 15-Dec 16-Dec 17-Dec 18-Dec 19-Dec 20-Dec 21-Dec 22-Dec 23-Dec 24-Dec shopping spending 35 98 45 52 67 2 76 83 84 90 112 144 12 152 166 185 208 360 810 250 200" 180" 160" 140" 120" 100" 80" 60" 40" 20" 0" LS slope: 9.1 y"="9.0956x"*"381796" R²"="0.55152" Sen slope: 9.2 5*Dec" 6*Dec" 7*Dec" 8*Dec" 9*Dec" 10*Dec" 11*Dec" 12*Dec" 13*Dec" 14*Dec" 15*Dec" 16*Dec" 17*Dec" 18*Dec" 19*Dec" 20*Dec" 26