Admiistrative Notes s - Lecture 7 Fial review Fial Exam is Tuesday, May 0th (3-5pm Covers Chapters -8 ad 0 i textbook Brig ID cards to fial! Allowed: Calculators, double-sided 8.5 x cheat sheet Exam Rooms: Stat Lecture Last Name Fial Exam Room am pm Everyoe MEYERSON HALL B 3pm Everyoe COHEN HALL G7 April 6, 06 Stat - Lecture 6 - Review April 6, 06 Stat - Lecture 6 - Review Admiistrative Notes Office hours will be held throughout the exam period up util the fial exam o May 0 th List of additioal textbook study problems from secod half of the course will be also be posted o the course website Outlie Collectig Data (Chapter 3 Explorig Data - Oe variable (Chapter Explorig Data - Two variables (Chapter Probability (Chapter 4 Samplig Distributios (Chapter 5 Itroductio to Iferece (Chapter 6 Iferece for Meas (Chapter 7 Iferece for Proportios (Chapter 8 Iferece for Regressio (Chapter 0 Urba Aalytics Case Study April 6, 06 Stat - Lecture 6 - Review 3 April 6, 06 Stat - Lecture 6 - Review 4 Experimets Experimetal Uits Treatmet Group Cotrol Group Treatmet No Treatmet Samplig ad Surveys? Parameter Try to establish the causal effect of a treatmet Key is reducig presece of cofoudig variables Matchig: esure treatmet/cotrol groups are very similar o observed variables eg. race, geder, age Radomizatio: radomly dividig ito treatmet or cotrol leads to groups that are similar o observed ad uobserved cofoudig variables Double-Blidig: both subjects ad evaluators do t kow who is i treatmet group vs. cotrol group April 6, 06 Stat - Lecture 6 - Review 5 Samplig Sample Estimatio Iferece Just like i experimets, we must be cautious of potetial sources of bias i our samplig results Volutary respose samples, udercoverage, orespose, utrue-respose, wordig of questios Simple Radom Samplig: less biased sice each idividual i the populatio has a equal chace of beig icluded i the sample April 6, 06 Stat - Lecture 6 - Review 6
Differet Types of Graphs A distributio describes what values a variable takes ad how frequetly these values occur Boxplots are good for ceter,spread, ad outliers but do t idicate shape of a distributio Histograms much more effective at displayig the shape of a distributio April 6, 06 Stat - Lecture 6 - Review 7 Measures of Ceter ad Spread Ceter: Mea Spread: Stadard Deviatio x i X = = x + x +!+ x s = (x i x For outliers or asymmetry, media/iqr are better Ceter: Media - middle umber i distributio Spread: Iter-Quartile Rage IQR = Q3 - Q We use mea ad SD more sice most distributios are symmetric with o outliers (eg. Normal April 6, 06 Stat - Lecture 6 - Review 8 Relatioships betwee cotiuous var. Scatterplot examies relatioship betwee respose variable (Y ad a explaatory variable (X: Educatio ad Mortality: r = -0.5 Positive vs. egative associatios Correlatio is a measure of the stregth of liear relatioship betwee variables X ad Y r ear or - meas strog liear relatioship r ear 0 meas weak liear relatioship Liear Regressio: come back to later April 6, 06 Stat - Lecture 6 - Review 9 Probability Radom process: outcome ot kow exactly, but have probability distributio of possible outcomes Evet: outcome of radom process with prob. P(A Probability calculatios: combiatios of rules Equally likely outcomes rule Complemet rule Additive rule for disjoit evets Multiplicatio rule for idepedet evets Radom variable: a umerical outcome or summary of a radom process Discrete r.v. has a fiite umber of distict values Cotiuous r.v. has a o-coutable umber of values Liear trasformatios of variables April 6, 06 Stat - Lecture 6 - Review 0 The Normal Distributio The Normal distributio has ceter µ ad spread N(0, N(, N(-, N(0, Iferece usig Samples? Parameters: µ or p Samplig Iferece Have tables for ay probability from the stadard ormal distributio (µ = 0 ad = Stadardizatio: covertig X which has a N(µ, distributio to Z which has a N(0, distributio: Z = X µ Reverse stadardizatio: covertig a stadard ormal Z ito a o-stadard ormal X X = Z + µ April 6, 06 Stat - Lecture 6 - Review Sample Estimatio s: X or p ˆ Cotiuous: pop. mea estimated by sample mea Discrete: pop. proportio estimated by sample proportio Key for iferece: Samplig Distributios Distributio of values take by statistic i all possible samples from the same populatio April 6, 06 Stat - Lecture 6 - Review
Samplig Distributio of Sample Mea The ceter of the samplig distributio of the sample mea is the populatio mea: mea( X = µ Over all samples, the sample mea will, o average, be equal to the populatio mea (o guaratees for sample! The stadard deviatio of the samplig distributio of the sample mea is SD( X = As sample size icreases, stadard deviatio of the sample mea decreases! Cetral Limit Theorem: if the sample size is large eough, the the sample mea X has a approximately Normal distributio Biomial/Normal Dist. For Proportios Sample cout Y follows Biomial distributio which we ca calculate from Biomial tables i small samples If the sample size is large ( p ad (-p 0, sample cout Y follows a Normal distributio: mea(y = p SD(Y = p ( p If the sample size is large, the sample proportio also approximately follows a Normal distributio: mea( p ˆ = p SD( p ˆ = p ( p April 6, 06 Stat - Lecture 6 - Review 3 April 6, 06 Stat - Lecture 6 - Review 4 Summary of Samplig Distributio Type of Data Ukow Parameter Cotiuous µ X Variability of SD( X = Distributio of Normal (if large Itroductio to Iferece Use sample estimate as ceter of a cofidece iterval of likely values for populatio parameter All cofidece itervals have the same form: Estimate ± Margi of Error The margi of error is always some multiple of the stadard deviatio (or stadard error of statistic Cout X i = 0 or p ˆ p SD( p ˆ = p ( p Biomal (if small Normal (if large Hypothesis test: data supports specific hypothesis?. Formulate your Null ad Alterative Hypotheses. Calculate the test statistic: differece betwee data ad your ull hypothesis 3. Fid the p-value for the test statistic: how probable is your data if the ull hypothesis is true? April 6, 06 Stat - Lecture 6 - Review 5 April 6, 06 Stat - Lecture 6 - Review 6 Iferece: Sigle Mea µ Kow SD : cofidece itervals ad test statistics ivolve stadard deviatio ad ormal critical values ' X Z * Ukow SD : cofidece itervals ad test statistics ivolve stadard error ad critical values from a t distributio with - degrees of freedom * X t s (, X + Z * * Z = X - µ 0 / * s, X + t t distributio has wider tails (more coservative ' ( T = X - µ 0 s/ April 6, 06 Stat - Lecture 6 - Review 7 Iferece: Comparig Meas µ ad µ Kow ad : two-sample Z statistic uses ormal distributio (X Z = - X Matched pairs: istead of differece of two samples X ad X, do a oe-sample test o the differece d T = X - 0 d * s X d ± t d ' s d / ( April 6, 06 Stat - Lecture 6 - Review 8 + Ukow ad : two-sample T statistic uses t distributio with degrees of freedom = mi( -, - T = (X - X " s + s * s X - X ± t k + s ' # 3
Iferece: Proportio p Cofidece iterval for p uses the Normal distributio ad the sample proportio: p ˆ ± Z * p ˆ ( p ˆ ' where p ˆ = Y ( Hypothesis test for p = p 0 also uses the Normal distributio ad the sample proportio: Z = ˆ p p 0 p 0 ( p 0 Iferece: Comparig Proportios p ad p Hypothesis test for p - p = 0 uses Normal distributio ad complicated test statistic p ˆ Z = p ˆ SE( p ˆ p ˆ with pooled stadard error: SE( p ˆ p ˆ = p ˆ p ( ˆ where p ˆ = Y ad p ˆ = Y # p p + ( where p ˆ p = Y + Y ' + Cofidece iterval for p = p also uses Normal distributio ad sample proportios p ˆ p ˆ p ˆ ± Z * ( p ˆ + p ˆ ( p ˆ ' ( April 6, 06 Stat - Lecture 6 - Review 9 April 6, 06 Stat - Lecture 6 - Review 0 Liear Regressio Use best fit lie to summarize liear relatioship betwee two cotiuous variables X ad Y: Y i = α + β X i The slope ( b = r s y /s x : average chage you get i the Y variable if you icreased the X variable by oe The itercept ( a = Y b X : average value of the Y variable whe the X variable is equal to zero Liear equatio ca be used to predict respose variable Y for a value of our explaatory variable X Sigificace i Liear Regressio Does the regressio lie show a sigificat liear relatioship betwee the two variables? H 0 : β = 0 versus H a : β 0 Uses the t distributio with - degrees of freedom ad a test statistic calculated from JMP output b T = SE(b Ca also calculate cofidece itervals usig JMP output ad t distributio with - degrees of freedom ( b ± t * SE(b ( a ± t * SE(a April 6, 06 Stat - Lecture 6 - Review April 6, 06 Stat - Lecture 6 - Review Urba Aalytics i Philadelphia Quatitative aalysis of the ecoomic ad social fuctioig of local areas withi large cities Philadelphia is a iterestig case study for cotemporary issues i urba revival ad getrificatio Creatig empirical measures for cocepts like urba vibracy that have bee difficult to quatify Examied associatios betwee crime, poverty, demographics ad lad use Urba Aalytics i Philadelphia It is importat to do quatitative aalysis of large cities carefully ad at the correct level of resolutio What we see whe we look at the city i the aggregate ca be quite differet tha specific eighborhoods Both sides of the classic Jae Jacobs vs. Urba reewal fight were based o empirical argumets Jacobs key iovatio was basig her observatios at a high resolutio: idividual streets ad blocks rather tha aggregatig over etire cities April 6, 06 Stat - Lecture 6 - Review 3 April 6, 06 Stat - Lecture 6 - Review 4 4
Last Class! Thaks everyoe for a great semester! See you o May 0 th for the fial exam! April 6, 06 Stat - Lecture 6 - Review 5 5