Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Fial Review Fall 2013 Prof. Yao Xie, yao.xie@isye.gatech.edu H. Milto Stewart School of Idustrial Systems & Egieerig Georgia Tech 1

Radom samplig model radom samples populatio radom samples: x 1,..., x For example, we use digital thermometer to measure body temperature for 5 times, we obtai a sequece. If we do this experimet the ext day, we get a differet sequece of measures. The result of the measuremet is a sequece of radom samples (also called data). 2

Descriptive statistics Quatitative values provides simple summaries about samples plot Histogram Box plot Stem & Leaf diagram 4

Numerical Data DESCRIPTIVE STATISTICS Categorical Data Normal Distributio Biomial Distributio RANDOM N( µ, σ 2 ) VARIABLES X ~Bi(,p) µ = X σ 2 = S 2 POINT ESTIMATION p = X STATISTICAL INFERENCE Cofidece Iterval (L( µ ), U( µ ) ) Hypothesis Testig H : µ = µ 0 0 Cofidece Iterval Hypothesis Testig (L( p ), U( p ) ) H : p = p 0 0 Cofidece Iterval Hypothesis Testig Cofidece Iterval Hypothesis Testig µ µ H : µ = µ p1 p2 H 0: p 2 1 = p 2 1 2 0 1 INFERENCE ON MULTIPLE POPULATIONS ANOVA STATISTICAL MODELING Cotigecy Tables Liear Regressio Liear Regressio Logistic Regressio 5

Data summary Samples Sample mea Sample media 1) rak samples from smallest to largest x 1, x 2,, x 1 1 x = x + x2 + y 1, y 2,, y 1 + 2) odd umber of samples, media = eve umber of samples, media = (y ( 1)/2 + y ( 1)/2 ) / 2 1 x y (+1)/2 6

Sample rage = largest - smallest Sample variace Sample quartile : pth quartile is such that p- percet of samples are smaller tha upper quartile lower quartile x p S 2 = 1 1 i=1 ( x i x ) 2 Iter quartile rage (IQR) = upper quartile - lower quartile x p 7

Samplig distributio Distributio of the statistics we come up (above) Samplig distributio extremely useful for determiig forms of cofidece iterval hypothesis test 8

Samplig distributio: summary Sample mea Sample variace Form X = 1 i=1 X i S 2 = 1 i=1 ( X i X) 2 sample i.i.d. ormal Kow variace X ~ N µ, σ 2 S 2 ( 1) σ 2 2 ~ χ 1 Ukow variace large, approximately ormal as above large, approximately ormal 9

Other commo samplig distributio Sample proportio ˆp = X Stadardized sample mea, kow variace Exact: Exact Exact ˆp ~ BIN(, p) Large sample: ˆp ~ N(p,p(1 p)) X µ σ 2 / X µ σ 2 / ~ N 0, 1 ( ) Stadardized sample mea, ukow variace X µ S 2 / X µ S 2 / ~ t 1 10

Two sample Differece i sample mea, kow variace Differece i sample mea, ukow (but idetical) variace, Proportio of sample variace ( X 1 X ) 2 ( µ 1 µ 2 ) σ 1 2 2 1 + σ 2 2 ~ N ( 0, 1 ) ( X 1 X ) 2 ( µ 1 µ ) 2 1 S p + 1 1 2 ~ t 1 + 2 1 S 1 2 /σ 1 2 S 2 2 /σ 2 2 ~ F 1 1, 2 1 S p 2 = 1 ( ) 2 X 1i X 1 + X 2i X 2 i=1 1 i=1 1 + 2 2 ( ) 2 11

Statistical methods Poit estimator Cofidece iterval Hypothesis test Two sample test (two populatios) ANOVA (more tha two populatios) Liear regressio 12

Poit estimator Mea of estimator: ubiased Variace of estimator Mea Square Error (MSE) MSE = biase 2 + variace Method of fidig poit estimators method of momet maximum likelihood 13

Cofidece iterval Poit estimator: a sigle value for estimated parameter Cofidece iterval: a iterval such that true parameter lies i [a, b] cotais true parameter with probability the [a, b] is the 1 α cofidece iterval 1 α 14

Typical forms of k k = upper cuttig poit * variace of poit estimator width of cofidece iterval determied by sample size ad cofidece level 15 x σ z α /2, x + σ z α /2 + 1, 2 1, 2, t s x t s x α α ˆp z α /2 ˆp 1 ˆp ( ) /, ˆp z α /2 ˆp 1 ˆp ( ) / ( )

Tails etc 1z2 P1Z z2 z 1 1 22 e 2 u2 du 0 t α, ν α Φ (z) z 0 α = 0.25 f 0.25, 1, 2 CDF Upper cuttig poit (also called percetage poit i textbook) 16

Forms of cofidece itervals Two- sided iterval [poit estimator - k, poit estimator + k] Oe- sided iterval [poit estimator + k, ifiity] or [- ifiity, poit estimator - k] k specifies width of cofidece iterval 17

Hypothesis test Use data to test two cotradictig statemets H 0 : ull hypothesis H 1 : alterative hypothesis Two approaches Fixed cofidece level Form: reject H 0 whe test statistic falls out of thresholds p- value probability of observig somethig more extreme tha data 18

Procedure of hypothesis test (sec. 9.1.6) 1. Set&&the&sigificace&level&(.01,&.05,&.1)& 2. Set&ull&ad&altera:ve&hypothesis& 3. Determie&other&parameters& 4. Decide&type&of&the&test& &&&&&&&&&C&test&for&mea&with&kow&variace&(zCtest)& &&&&&&&&&C&test&for&mea&with&ukow&variace&(tCtest)& &&C&test&for&sample&propor:o&parameter& 6. Use&data&available:&& &&&&&&&&C&perform&test&to&reach&a&decisio&& &&C&ad&report&pCvalue& 19

Summary: test for mea NullHypothesis H 0 : µ = µ 0 TestSta(s(c x Sigificacelevel:α Altera(ve* Hypothesis* H 1 : µ µ 0 KowVariace * H0*is*rejected*if x µ 0 > z α 2 σ / UkowVariace * H0*is*rejected*if x µ 0 > t α 2, 1 s / H 1 : µ > µ 0 x > µ 0 + z α σ / x > µ 0 + t α, 1 s / H 1 : µ < µ 0 x < µ 0 z α σ / x < µ 0 t α, 1 s / 20

Test for sample proportio NullHypothesis H 0 : p = p 0 Sigificacelevel:α TestSta(s(c ˆp p 0 p ( 0 1 p ) 0 / Altera(ve* Hypothesis* H0*is*rejected*if H 1 : p p 0 ˆp p 0 ( ) / > z α /2 p 0 1 p 0 21

Two sample test: mea For the followig hypothesis test H 0 : µ 1 µ 2 = Δ H 1 : µ 1 µ 2 Δ Reject H 0 whe X Y (µ 1 µ 2 ) S p 1/ 1 +1/ 2 > t α /2 22

Two-sample test: sample proportio For two-sided test, H 0 : p 1 = p 2 H 1 : p 1 p 2 reject H 0 whe ˆp 1 ˆp ( ) ˆp 1 ˆp 2 1 + 1 1 2 > z α /2 23

Aalysis of variace Multiple populatios Aalyze differece i their meas We#would#reject#H 0 #if# F 0 > Fα, a 1, a( 1) 24

Liear regressio Simple liear regressio ε i Respose Regressor or Predictor Y i = β + β X i + ε i =1,2,, 0 1 i Itercept Slope Radom error 25

Fitted coefficiets S x x a i 1 1 2 1x i x2 2 a i 1 2 x i a a i 1 a x i b 2 b (11-10) S x y a 1y i y21x i x2 a i 1 a 1 21 2 a i 1 a a x i b a a i 1 i 1 x i ay ia b a a b y i b (11-11) ˆ β 0 = y ˆ β1x 1 ˆβ = S S xy xx ˆ ˆ ˆ yi = β 0 + β1x i Fitted (estimated) regressio model 26

Model diagosis Plot residuals Use R ad read the output For simple ad multiple liear regressio: we are goig to rely o R to do the calculatios 27

Fially 28

Fially What statistics is about? Fit model usig data (e.g. distributios) Use model to make ifereces estimatio hypothesis testig predictio (e.g. usig liear regressio) Why model is useful? report fidigs from data systematically quatify ucertaity 29