Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Similar documents
Lecture 11 Simple Linear Regression

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

1 Inferential Methods for Correlation and Regression Analysis

Sample Size Determination (Two or More Samples)

Common Large/Small Sample Tests 1/55

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Describing the Relation between Two Variables

Stat 200 -Testing Summary Page 1

(all terms are scalars).the minimization is clearer in sum notation:

Formulas and Tables for Gerstman

Final Examination Solutions 17/6/2010

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Important Formulas. Expectation: E (X) = Σ [X P(X)] = n p q σ = n p q. P(X) = n! X1! X 2! X 3! X k! p X. Chapter 6 The Normal Distribution.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

STATISTICAL INFERENCE

Topic 9: Sampling Distributions of Estimators

1.010 Uncertainty in Engineering Fall 2008

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

TAMS24: Notations and Formulas

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Topic 9: Sampling Distributions of Estimators

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Chapter 2 Descriptive Statistics

University of California, Los Angeles Department of Statistics. Simple regression analysis

Stat 139 Homework 7 Solutions, Fall 2015

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Topic 9: Sampling Distributions of Estimators

Introductory statistics

Chapter 1 (Definitions)

Grant MacEwan University STAT 252 Dr. Karen Buro Formula Sheet

Sampling Distributions, Z-Tests, Power

Interval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),

Statistics 20: Final Exam Solutions Summer Session 2007

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Linear Regression Models

Logit regression Logit regression

MATH/STAT 352: Lecture 15

Topic 10: Introduction to Estimation

Parameter, Statistic and Random Samples

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

STATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling

Statistics 300: Elementary Statistics

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Parameter, Statistic and Random Samples

Properties and Hypothesis Testing

z is the upper tail critical value from the normal distribution

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

INSTRUCTIONS (A) 1.22 (B) 0.74 (C) 4.93 (D) 1.18 (E) 2.43

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Statistical Intervals for a Single Sample

Frequentist Inference

11 Correlation and Regression

- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

STA6938-Logistic Regression Model

Statistical inference: example 1. Inferential Statistics

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

Worksheet 23 ( ) Introduction to Simple Linear Regression (continued)

(7 One- and Two-Sample Estimation Problem )

Linear Regression Analysis. Analysis of paired data and using a given value of one variable to predict the value of the other

STAC51: Categorical data Analysis

Simple Linear Regression

Statistics 203 Introduction to Regression and Analysis of Variance Assignment #1 Solutions January 20, 2005

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Announcements. Unit 5: Inference for Categorical Data Lecture 1: Inference for a single proportion

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL/MAY 2009 EXAMINATIONS ECO220Y1Y PART 1 OF 2 SOLUTIONS

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Stat 319 Theory of Statistics (2) Exercises

STAT431 Review. X = n. n )

Mathematical Notation Math Introduction to Applied Statistics

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Regression. Correlation vs. regression. The parameters of linear regression. Regression assumes... Random sample. Y = α + β X.

Lesson 2. Projects and Hand-ins. Hypothesis testing Chaptre 3. { } x=172.0 = 3.67

Chapter 20. Comparing Two Proportions. BPS - 5th Ed. Chapter 20 1

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

f(x i ; ) L(x; p) = i=1 To estimate the value of that maximizes L or equivalently ln L we will set =0, for i =1, 2,...,m p x i (1 p) 1 x i i=1

MIT : Quantitative Reasoning and Statistical Methods for Planning I

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

STA 4032 Final Exam Formula Sheet

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Read through these prior to coming to the test and follow them when you take your test.

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Section 14. Simple linear regression.

1 Models for Matched Pairs

Computing Confidence Intervals for Sample Data

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

Transcription:

Fial Review Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech 1

Radom samplig model radom samples populatio radom samples: x 1,..., x For example, we use digital thermometer to measure body temperature for 5 times, we obtai a sequece. If we do this experimet the ext day, we get a differet sequece of measures. The result of the measuremet is a sequece of radom samples (also called data). 2

3

Descriptive statistics Quatitative values provides simple summaries about samples plot Histogram Box plot Stem & Leaf diagram 4

Numerical Data DESCRIPTIVE STATISTICS Categorical Data Normal Distributio Biomial Distributio RANDOM N( µ, σ 2 ) VARIABLES X ~Bi(,p) µ = X σ 2 = S 2 POINT ESTIMATION p = X STATISTICAL INFERENCE Cofidece Iterval (L( µ ), U( µ ) ) Hypothesis Testig H : µ = µ 0 0 Cofidece Iterval Hypothesis Testig (L( p ), U( p ) ) H : p = p 0 0 Cofidece Iterval Hypothesis Testig Cofidece Iterval Hypothesis Testig µ µ H : µ = µ p1 p2 H 0: p 2 1 = p 2 1 2 0 1 INFERENCE ON MULTIPLE POPULATIONS ANOVA STATISTICAL MODELING Cotigecy Tables Liear Regressio Liear Regressio Logistic Regressio 5

Data summary Samples Sample mea Sample media 1) rak samples from smallest to largest x 1, x 2,, x 1 1 x = x + x2 + y 1, y 2,, y 1 + 2) odd umber of samples, media = eve umber of samples, media = (y ( 1)/2 + y ( 1)/2 ) / 2 1 x y (+1)/2 6

Sample rage = largest - smallest Sample variace Sample quartile : pth quartile is such that p- percet of samples are smaller tha upper quartile lower quartile x p S 2 = 1 1 i=1 ( x i x ) 2 Iter quartile rage (IQR) = upper quartile - lower quartile x p 7

Samplig distributio Distributio of the statistics we come up (above) Samplig distributio extremely useful for determiig forms of cofidece iterval hypothesis test 8

Samplig distributio: summary Sample mea Sample variace Form X = 1 i=1 X i S 2 = 1 i=1 ( X i X) 2 sample i.i.d. ormal Kow variace X ~ N µ, σ 2 S 2 ( 1) σ 2 2 ~ χ 1 Ukow variace large, approximately ormal as above large, approximately ormal 9

Other commo samplig distributio Sample proportio ˆp = X Stadardized sample mea, kow variace Exact: Exact Exact ˆp ~ BIN(, p) Large sample: ˆp ~ N(p,p(1 p)) X µ σ 2 / X µ σ 2 / ~ N 0, 1 ( ) Stadardized sample mea, ukow variace X µ S 2 / X µ S 2 / ~ t 1 10

Two sample Differece i sample mea, kow variace Differece i sample mea, ukow (but idetical) variace, Proportio of sample variace ( X 1 X ) 2 ( µ 1 µ 2 ) σ 1 2 2 1 + σ 2 2 ~ N ( 0, 1 ) ( X 1 X ) 2 ( µ 1 µ ) 2 1 S p + 1 1 2 ~ t 1 + 2 1 S 1 2 /σ 1 2 S 2 2 /σ 2 2 ~ F 1 1, 2 1 S p 2 = 1 ( ) 2 X 1i X 1 + X 2i X 2 i=1 1 i=1 1 + 2 2 ( ) 2 11

Statistical methods Poit estimator Cofidece iterval Hypothesis test Two sample test (two populatios) ANOVA (more tha two populatios) Liear regressio 12

Poit estimator Mea of estimator: ubiased Variace of estimator Mea Square Error (MSE) MSE = biase 2 + variace Method of fidig poit estimators method of momet maximum likelihood 13

Cofidece iterval Poit estimator: a sigle value for estimated parameter Cofidece iterval: a iterval such that true parameter lies i [a, b] cotais true parameter with probability the [a, b] is the 1 α cofidece iterval 1 α 14

Typical forms of k k = upper cuttig poit * variace of poit estimator width of cofidece iterval determied by sample size ad cofidece level 15 x σ z α /2, x + σ z α /2 + 1, 2 1, 2, t s x t s x α α ˆp z α /2 ˆp 1 ˆp ( ) /, ˆp z α /2 ˆp 1 ˆp ( ) / ( )

Tails etc 1z2 P1Z z2 z 1 1 22 e 2 u2 du 0 t α, ν α Φ (z) z 0 α = 0.25 f 0.25, 1, 2 CDF Upper cuttig poit (also called percetage poit i textbook) 16

Forms of cofidece itervals Two- sided iterval [poit estimator - k, poit estimator + k] Oe- sided iterval [poit estimator + k, ifiity] or [- ifiity, poit estimator - k] k specifies width of cofidece iterval 17

Hypothesis test Use data to test two cotradictig statemets H 0 : ull hypothesis H 1 : alterative hypothesis Two approaches Fixed cofidece level Form: reject H 0 whe test statistic falls out of thresholds p- value probability of observig somethig more extreme tha data 18

Procedure of hypothesis test (sec. 9.1.6) 1. Set&&the&sigificace&level&(.01,&.05,&.1)& 2. Set&ull&ad&altera:ve&hypothesis& 3. Determie&other&parameters& 4. Decide&type&of&the&test& &&&&&&&&&C&test&for&mea&with&kow&variace&(zCtest)& &&&&&&&&&C&test&for&mea&with&ukow&variace&(tCtest)& &&C&test&for&sample&propor:o&parameter& 6. Use&data&available:&& &&&&&&&&C&perform&test&to&reach&a&decisio&& &&C&ad&report&pCvalue& 19

Summary: test for mea NullHypothesis H 0 : µ = µ 0 TestSta(s(c x Sigificacelevel:α Altera(ve* Hypothesis* H 1 : µ µ 0 KowVariace * H0*is*rejected*if x µ 0 > z α 2 σ / UkowVariace * H0*is*rejected*if x µ 0 > t α 2, 1 s / H 1 : µ > µ 0 x > µ 0 + z α σ / x > µ 0 + t α, 1 s / H 1 : µ < µ 0 x < µ 0 z α σ / x < µ 0 t α, 1 s / 20

Test for sample proportio NullHypothesis H 0 : p = p 0 Sigificacelevel:α TestSta(s(c ˆp p 0 p ( 0 1 p ) 0 / Altera(ve* Hypothesis* H0*is*rejected*if H 1 : p p 0 ˆp p 0 ( ) / > z α /2 p 0 1 p 0 21

Two sample test: mea For the followig hypothesis test H 0 : µ 1 µ 2 = Δ H 1 : µ 1 µ 2 Δ Reject H 0 whe X Y (µ 1 µ 2 ) S p 1/ 1 +1/ 2 > t α /2 22

Two-sample test: sample proportio For two-sided test, H 0 : p 1 = p 2 H 1 : p 1 p 2 reject H 0 whe ˆp 1 ˆp ( ) ˆp 1 ˆp 2 1 + 1 1 2 > z α /2 23

Aalysis of variace Multiple populatios Aalyze differece i their meas We#would#reject#H 0 #if# F 0 > Fα, a 1, a( 1) 24

Liear regressio Simple liear regressio ε i Respose Regressor or Predictor Y i = β + β X i + ε i =1,2,, 0 1 i Itercept Slope Radom error 25

Fitted coefficiets S x x a i 1 1 2 1x i x2 2 a i 1 2 x i a a i 1 a x i b 2 b (11-10) S x y a 1y i y21x i x2 a i 1 a 1 21 2 a i 1 a a x i b a a i 1 i 1 x i ay ia b a a b y i b (11-11) ˆ β 0 = y ˆ β1x 1 ˆβ = S S xy xx ˆ ˆ ˆ yi = β 0 + β1x i Fitted (estimated) regressio model 26

Model diagosis Plot residuals Use R ad read the output For simple ad multiple liear regressio: we are goig to rely o R to do the calculatios 27

Fially 28

Fially What statistics is about? Fit model usig data (e.g. distributios) Use model to make ifereces estimatio hypothesis testig predictio (e.g. usig liear regressio) Why model is useful? report fidigs from data systematically quatify ucertaity 29

30