Ch. 1: Data and Distributions

Similar documents
Sociology 6Z03 Review II

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Inference for Regression Simple Linear Regression

Ch 2: Simple Linear Regression

Inferences for Regression

Inference for the Regression Coefficient

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Lecture 10 Multiple Linear Regression

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Simple Linear Regression: One Quantitative IV

Lectures on Simple Linear Regression Stat 431, Summer 2012

ECN221 Exam 1 VERSION B Fall 2017 (Modules 1-4), ASU-COX VERSION B

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Business Statistics. Lecture 10: Course Review

ANOVA: Analysis of Variation

Statistics for Managers using Microsoft Excel 6 th Edition

Inference for Regression Inference about the Regression Model and Using the Regression Line

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

AP Statistics Cumulative AP Exam Study Guide

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Unit 10: Simple Linear Regression and Correlation

SIMPLE REGRESSION ANALYSIS. Business Statistics

Basic Business Statistics 6 th Edition

Mathematical Notation Math Introduction to Applied Statistics

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

STATISTICS 141 Final Review

Inference for Regression

STAT Chapter 11: Regression

Correlation Analysis

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Simple Linear Regression

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Mathematics for Economics MA course

Statistics For Economics & Business

Review of Statistics 101

Chapter 16. Simple Linear Regression and Correlation

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

STAT 350 Final (new Material) Review Problems Key Spring 2016

Week 12 Hypothesis Testing, Part II Comparing Two Populations

Review of Statistics

4.1. Introduction: Comparing Means

Lecture 3: Inference in SLR

Chapter 16. Simple Linear Regression and dcorrelation

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Math Review Sheet, Fall 2008

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

REVIEW: Midterm Exam. Spring 2012

Formal Statement of Simple Linear Regression Model

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Confidence Intervals, Testing and ANOVA Summary

Psychology 282 Lecture #4 Outline Inferences in SLR

Multiple linear regression

Lecture 11: Simple Linear Regression

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Analysis of Variance and Design of Experiments-I

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Correlation and regression

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

If we have many sets of populations, we may compare the means of populations in each set with one experiment.

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

STAT 4385 Topic 03: Simple Linear Regression

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

Contents. Acknowledgments. xix

Section 4.6 Simple Linear Regression

Glossary for the Triola Statistics Series

Mathematical Notation Math Introduction to Applied Statistics

Multiple Linear Regression

Chapter 4. Regression Models. Learning Objectives

Ch Inference for Linear Regression

Stat 2300 International, Fall 2006 Sample Midterm. Friday, October 20, Your Name: A Number:

Simple Linear Regression for the Climate Data

df=degrees of freedom = n - 1

Chapter 12 - Lecture 2 Inferences about regression coefficient

CS 5014: Research Methods in Computer Science

Ch 3: Multiple Linear Regression

Simple linear regression

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Lecture 9: Linear Regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

IT 403 Statistics and Data Analysis Final Review Guide

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

Simple Linear Regression

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

Linear regression. We have that the estimated mean in linear regression is. ˆµ Y X=x = ˆβ 0 + ˆβ 1 x. The standard error of ˆµ Y X=x is.

Lecture 18 MA Applied Statistics II D 2004

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

We like to capture and represent the relationship between a set of possible causes and their response, by using a statistical predictive model.

Statistics and Quantitative Analysis U4320

We need to define some concepts that are used in experiments.

The Multiple Regression Model

Unit 27 One-Way Analysis of Variance

Exam details. Final Review Session. Things to Review

Announcements. Final Review: Units 1-7

Transcription:

Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and discrete variables Density functions and Mass functions Three basic properties Shows the distribution of the entire population or process Some important distributions and associated Probability Continuous: Exponential, Normal, Uniform Discrete: Binomial, Poisson 4/4/1 H.X. Lecture 30: Final Summary 1

Ch. : Numerical Summary Measures Measure of center of Data (Sample) Sample mean Sample median, midpoint Trimmed means x x1 + x +... + xn 1 = = xi n n Measure of variability for Data (Sample) ( x1 x) + ( x x) +... + ( xn Sample variance s = n 1 Sample Standard deviation 1 = n 1 ( x x) Quartiles; Five-number-Summary; IQR and Outliers Graphical Display: Boxplots; Modified Version; Side- By-Side Boxplots s = s x) i 4/4/1 H.X. Lecture 30: Final Summary

Ch. (Cont.): Numerical Summary Measures Measures of Center (Distributions) Continuous: µ X = x f ( x) dx Discrete: = x p(x) µ X Measure of variability (Distributions) Continuous: Discrete: ( ) X x µ X f ( x) = dx Normal Quantile (QQ) plot σ ( x X ) σ X = µ p( x) 4/4/1 H.X. Lecture 30: Final Summary 3

Ch.3: Bivariate Data Scatterplots: Visually Display Bivariate data, y vs. x Pearson s Correlation Coefficient (between X and Y, both quantitative), r : r measures the strength and direction of the linear relationship, other convenient formulas for Sxy, Sxx and Syy Takes values between -1 and 1, inclusive Sign indicates type/direction of relationship (positive, negative) Value indicates strength: farther from 0 is stronger If switch roles of X and Y à r doesn t change Unit free unaffected by linear transformations Affected by Outliers, Not a resistant measure Correlation Causaiton 4/4/1 H.X. Lecture 30: Final Summary 4

Ch. 3: LS (Least Square) Regression Line Estimated straight line Equation: y = a + b x a is the intercept (where it crosses the y-axis) b is the slope (rate) b = r s s y x Predicted value of y Residual from the fit (or regression line) Breaking up Sum of Squares: SSR, SSE, SST Coefficient of Determination: SSR = = 1 SST Percent of variation explained by the linear regression between Y and X r SSE SST 4/4/1 H.X. Lecture 30: Final Summary 5

Ch. 3 (Cont.): MSE and Residual Plot Mean Squared Error about the LS line: Standard Deviation about the LS line: Also called root MSE in SAS output. Residual: e ˆ i = yi yi A residual plot, plotting the residuals against x. The residual plot should not have any pattern but a random scattering of points If a pattern is observed, the linear regression model is probably not appropriate. 4/4/1 H.X. Lecture 30: Final Summary 6

Ch. 5: Probability and Sampling Distributions Chance Experiments: Simple Events: individual outcomes Events: collections of simple events Sample Space: Venn Diagrams Tree Diagrams Complex Events: Event A or B, Event A and B, Event A (Complement of A) Disjoint Events (Mutually Exclusive) Independent Events 4/4/1 H.X. Lecture 30: Final Summary 7

Probability Basic Rules Probability Axioms: 0 P(A) 1 for any event A P(S) = 1, where S is the sample space Addition Rule - For any disjoint events A and B, P(A or B) = P(A)+P(B) Complementary Events: P(A ) = 1 - P(A) General Addition Rule: (for any events A and B) P(A or B) = P(A)+P(B)-P(A and B) Independence Rule: P (A and B) = P(A) P(B) Conditional Probability: P(A B) = P (A and B) / P(B) Bayes Rule for Calculation of Conditional Probability, Tree Diagrams 4/4/1 H.X. Lecture 30: Final Summary 8

Random Variables and Sampling Distribution Random Variables Discrete Distribution Table, Prob. Histogram Continuous Distribution Curve, density function Independent R.V.s Sampling Distribution of a Sample Mean Sampling Distribution of a Sample Proportion (rule of thumb for Normal Appox.) Central Limit Theorem Continuity Correction (from Binomial to Normal Appox.) 4/4/1 H.X. Lecture 30: Final Summary 9

Ch 7: Estimation and Statistical Inference by C.I. s (Unbiased, Consistent) Point Estimation Large-Sample C.I.s for a Population Mean (Normality Assumption) s X ± (z critical value) one-sided C.I.s: Upper or Lower bound C.I. Interpretation of Confidence Level. Necessary sample size for a desired Bound (round up): ZCrits n = B Small-Sample C.I. s X ± (t critical value) n t-crit is associated with d.f. = n -1 Normailty Assumption still holds. 4/4/1 H.X. Lecture 30: Final Summary 10 n

C.I. for a Population Proportion Point Estimation for a Population Proportion Large-Sample C.I.s for a Population Proportion pˆ (1 pˆ ) pˆ ± Zcrit n Necessary sample size for a desired Bound (round up for not-an-integer): z _ critical n= p*(1 p*) B p* = pˆ, or 0.5 if p-hat is unavailable. Small-Sample C.I. replaces z-crit by t-crit 4/4/1 H.X. Lecture 30: Final Summary 11

C.I. for two Population Means Difference Large-Sample C.I.s for Difference between two Population Means (Normality Assumption) 1 X 1 X ± Zcrit + n1 Small-Sample C.I., Zcrit replaced by t-crit, with (round down for non-integer) ( ) s1 n1 + s n df = ( ) ( ) s1 n1 s n + n n 1 1 1 4/4/1 H.X. Lecture 30: Final Summary 1 s s n

t C.I. for Paired Data 4/4/1 H.X. Lecture 30: Final Summary 13

Ch. 8: Hypotheses Testing State Hypotheses Both Null and Alternative (one or two-sided) Determine an appropriate α level. If not specified, use 5% Type I error; Significance Level. Understand it. Calculate the appropriate test statistic Find the P-value, the probability of the as extreme or more extreme than the test statistic Reject H 0, when the P-value is smaller than the significance level α. Otherwise: Fail to reject H 0 State a conclusion in layman s terms 4/4/1 H.X. Lecture 30: Final Summary 14

One-sample t Test for a Population Mean: The null hypothesis is H 0 : µ = µ 0 The alternative hypothesis could be: H a : µ µ 0 (two-sided) H a : µ > µ 0 (one-sided) H a : µ < µ 0 (one-sided) Test Statistic X t ~ Student s t-distribution df = n 1 s µ 0 t = n If n is large ( 30), CLT guarantees an approximate normal distribution and the t can be replaced with z, where z follows a standard normal distribution. 4/4/1 H.X. Lecture 30: Final Summary 15

P-value tied to H a Two-sided (both tails) H a : µ µ 0 One-sided (right tail) H a : µ > µ 0 One-sided (left tail) H a : µ < µ 0 4/4/1 H.X. Lecture 30: Final Summary 16

Other Tests or Remarks Two-Sample z (or t, depending on sample sizes) test for Two Population Means When using t, the d.f. calculation One-Sample t Test with (Matched) Paired Data Focus on two population means difference A two-sided significance test <-> A two-sided C.I. for the same parameter If the claimed value is in the CI à fail to reject H 0 If the claimed is not in the CI à reject H 0 NOTE: must have in H a! Statistical Significance Practical Sig. 4/4/1 H.X. Lecture 30: Final Summary 17

Cautions (for both C.I. and tests of significance): Data: assume SRS (random sampling) Population need to be If n < 30, have to check normality (by Normal QQ-plot) With n 30, CLT can give us approximate normality in most situations. 4/4/1 H.X. Lecture 30: Final Summary 18

Ch. 9: One Way ANOVA Hypotheses: H 0 : µ 1 = µ = = µ k vs. H a : At least one µ i is different F test statistic ANOVA table test statistic = between - samples variation within - samplesvariation Source DF SS MS Model (Between) Error (Within) k 1 n k SSM (formula) SSE (formula) Total n 1 SST = SSM + SSE SSM/k 1 SSE/n k P-value is always the upper tail of the F distribution with (k 1, n k) degrees of freedom. Tables of critical values for F distribution: (Table VIII) F statistic > F critical value <=> P-value < α => Reject H 0 4/4/1 H.X. Lecture 30: Final Summary 19

Assumptions (prior to Running one-way ANOVA) 1. Constant variance: The variances of the k populations are the same. Check this with the ratio of the largest and smallest standard deviations, the ratio must be <. Each of the k populations follows a normal distribution. Check this by looking at QQplots for each group Remark: statistical significance practical significance 4/4/1 H.X. Lecture 30: Final Summary 0

Ch. 9: Multiple Comparison If insignificant in one-way ANOVA, we don t have to try further steps Otherwise, run Multiple Comparison to see which explicitly means are different. Tukey s Mehtod ( cldiff or lines format) Dunnett s Method (only if there s a control group) 4/4/1 H.X. Lecture 30: Final Summary 1

9.4: Randomized Complete Block Design RCBD (both treatment and block factor must be categorical) Source DF SS MS Factor A (treatment) Factor B (block) a 1 SSA MSA b 1 SSB MSB Error (a 1)(b 1) SSE MSE Total ab 1 SST In RCBD, we are only interested in the treatment factor The block factor might affect response but that s not of interest. Two F tests Blocking Effect? Use test statistic and P-value to conclude Treatment Effect? Use test statistic and P-value to conclude 4/4/1 H.X. Lecture 30: Final Summary

Necessary Assumptions for RCBD Similar to one-way ANOVA 1. Constant variance. Each of the k populations follows a normal distribution One additional assumption 3. There is no interaction between the treatment and blocking variables Can assess just using common sense (Just ask: Do/ should they interact?) OR check by a Two-way ANOVA model Interaction Plot 4/4/1 H.X. Lecture 30: Final Summary 3

Ch. 10: Two-Way ANOVA Testing Two factors and their interaction s effect to the response variable Source DF SS MS Factor A a 1 SSA MSA Factor B b 1 SSB MSB AB interaction (a 1)(b 1) SSAB MSAB Error ab(r 1) SSE MSE Total abr 1 SST Test First, Interaction (of the most interest). Then Factor A and B, respectively. If Interaction significant, still run slicing for Factor A and B. If Interaction insignificant while a single Factor significant, run one-way ANOVA and multiple comparison. 4/4/1 H.X. Lecture 30: Final Summary 4

Ch. 10 (Cont.): Two-Way ANOVA Interaction plot Roughly speaking, there s no Interaction effect if all lines are parallel to each other In summary, for Ch. 9 and 10 we should know: All of One-way ANOVA (Ch. 9) By hand and/or using SAS Most of randomized Blocking design (Sec 9.4), Two-way ANOVA (Ch. 10, Section ) For both: Complete ANOVA tables, calculate DFs and F test statistic Perform F tests using F table Interpret SAS output Know the general concept of a higher order (multi-way) ANOVA model. 4/4/1 H.X. Lecture 30: Final Summary 5

Ch. 11: Inferential Methods in Regression and Slopes (Correlations) Normal Error Regression Model Error Item (3 assumptions: Independence, Normality and Constant Variance) SSE, MSE, and Root MSE Coefficient of Determination, R^ % of variation explained by the regression model Simply by squaring r Statistical Inference about the slope in SLR Model: C.I. for β (the slope): b ± (t crit) * s b Hypotheses Testing w.r.t. the slope, i.e. test of Linear Relationship Remark: t~student s t-distribution with d.f. = n 4/4/1 H.X. Lecture 30: Final Summary 6

Using ANOVA table to test SLR Source DF SS MS Model (Regression) 1 SSM (or SSR) SSM/1 = MSM (or MSR) Error n SSE (or SSResid) Total n 1 SST = SSM + SSE SSE/n = MSE Remark: d.f. of F test statistic = (1, n ) 4/4/1 H.X. Lecture 30: Final Summary 7

Multiple Linear Regression Model MLR Model: Y X X X e i = α + β1 1+ β +... + βp p + i Test the above linear relationship H 0 : All β i s = 0 vs. H α : At least one β i 0 A rejection of the null indicates that collectively the Xs do well at explaining Y; otherwise don t have to run the following step But it doesn t show which explicit Xi s are doing the explaining Model Selection, especially Backward Elimination The Estimated Line, from SAS output Use it to Predict Yi; Get residual by Actual Y_i Predicted Value 4/4/1 H.X. Lecture 30: Final Summary 8

After Class Review Notes, practices, Hw, Labs and previous tests. Wed, Lab#8 (optional) Final Exam (Close book, Close notes) Next Wed, 8-10am Student ID, a calculator (SAT policy, NO QWERTY keyboard) and pencils, two-page crib sheet (8 by 11 ) handwritten by yourself, two-sided. SEE CALCULATOR POLICY and crib sheet (on Syllabus) from course website. No electronics except a calculator. Not allowed to exchange calculator or crib sheet during the exam. Not allowed to type/print your crib sheet. 4/4/1 H.X. Lecture 30: Final Summary 9