Lecture Outline Biost 517 Applied Biostatistics I

Similar documents
Logistic Regression I. HRP 261 2/10/ am

Analyzing Frequencies

Lecture Outline. Biost 518 Applied Biostatistics II. Logistic Regression. Simple Logistic Regression

ST 524 NCSU - Fall 2008 One way Analysis of variance Variances not homogeneous

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 12

Today s logistic regression topics. Lecture 15: Effect modification, and confounding in logistic regression. Variables. Example

Chapter 6 Student Lecture Notes 6-1

Review - Probabilistic Classification

Lecture Outline Biost 518 / Biost 515 Applied Biostatistics II / Biostatistics II

Econ107 Applied Econometrics Topic 10: Dummy Dependent Variable (Studenmund, Chapter 13)

Soft k-means Clustering. Comp 135 Machine Learning Computer Science Tufts University. Mixture Models. Mixture of Normals in 1D

10/7/14. Mixture Models. Comp 135 Introduction to Machine Learning and Data Mining. Maximum likelihood estimation. Mixture of Normals in 1D

Outlier-tolerant parameter estimation

te Finance (4th Edition), July 2017.

A primary objective of a phase II trial is to screen for antitumor activity; agents which are found to have substantial antitumor activity and an

Introduction to logistic regression

COMPLEX NUMBER PAIRWISE COMPARISON AND COMPLEX NUMBER AHP

A Note on Estimability in Linear Models

The Hyperelastic material is examined in this section.

Lucas Test is based on Euler s theorem which states that if n is any integer and a is coprime to n, then a φ(n) 1modn.

You already learned about dummies as independent variables. But. what do you do if the dependent variable is a dummy?

A Probabilistic Characterization of Simulation Model Uncertainties

2. Grundlegende Verfahren zur Übertragung digitaler Signale (Zusammenfassung) Informationstechnik Universität Ulm

CHAPTER 7d. DIFFERENTIATION AND INTEGRATION

The Fourier Transform

EXST Regression Techniques Page 1

Lecture 14. Relic neutrinos Temperature at neutrino decoupling and today Effective degeneracy factor Neutrino mass limits Saha equation

SCITECH Volume 5, Issue 1 RESEARCH ORGANISATION November 17, 2015

Unit 7 Introduction to Analysis of Variance

Unit 7 Introduction to Analysis of Variance

Lecture 3: Phasor notation, Transfer Functions. Context

Lecture 1: Empirical economic relations

Econometrics (10163) MTEE Fall 2010

CHAPTER 33: PARTICLE PHYSICS

Naresuan University Journal: Science and Technology 2018; (26)1

Fakultät III Univ.-Prof. Dr. Jan Franke-Viebach

Economics 600: August, 2007 Dynamic Part: Problem Set 5. Problems on Differential Equations and Continuous Time Optimization

Application of Local Influence Diagnostics to the Linear Logistic Regression Models

Epistemic Foundations of Game Theory. Lecture 1

8-node quadrilateral element. Numerical integration

Higher order derivatives

??? Dynamic Causal Modelling for M/EEG. Electroencephalography (EEG) Dynamic Causal Modelling. M/EEG analysis at sensor level. time.

Outline. Types of Experimental Designs. Terminology. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 12

Search sequence databases 3 10/25/2016

Fakultät III Wirtschaftswissenschaften Univ.-Prof. Dr. Jan Franke-Viebach

Grand Canonical Ensemble

GPC From PeakSimple Data Acquisition

Lecture 6: Introduction to Linear Regression

ph People Grade Level: basic Duration: minutes Setting: classroom or field site

September 27, Introduction to Ordinary Differential Equations. ME 501A Seminar in Engineering Analysis Page 1. Outline

BLOCKS REPLICATION EXPERIMENTAL UNITS RANDOM VERSUS FIXED EFFECTS

On Selection of Best Sensitive Logistic Estimator in the Presence of Collinearity

u x v x dx u x v x v x u x dx d u x v x u x v x dx u x v x dx Integration by Parts Formula

Using Markov Chain Monte Carlo for Modeling Correct Enumeration and Match Rate Variability

An Overview of Markov Random Field and Application to Texture Segmentation

What are those βs anyway? Understanding Design Matrix & Odds ratios

Brief Introduction to Statistical Mechanics

First derivative analysis

Jones vector & matrices

22/ Breakdown of the Born-Oppenheimer approximation. Selection rules for rotational-vibrational transitions. P, R branches.

Observer Bias and Reliability By Xunchi Pu

Function Spaces. a x 3. (Letting x = 1 =)) a(0) + b + c (1) = 0. Row reducing the matrix. b 1. e 4 3. e 9. >: (x = 1 =)) a(0) + b + c (1) = 0

From Structural Analysis to FEM. Dhiman Basu

Dealing with quantitative data and problem solving life is a story problem! Attacking Quantitative Problems

Note If the candidate believes that e x = 0 solves to x = 0 or gives an extra solution of x = 0, then withhold the final accuracy mark.


Physics 256: Lecture 2. Physics

Diagnostics in Poisson Regression. Models - Residual Analysis

R-Estimation in Linear Models with α-stable Errors

Consider a system of 2 simultaneous first order linear equations

Decision-making with Distance-based Operators in Fuzzy Logic Control

HANDY REFERENCE SHEET HRP/STATS 261, Discrete Data

5.80 Small-Molecule Spectroscopy and Dynamics

NON-SYMMETRY POWER IN THREE-PHASE SYSTEMS

A NEW GENERALISATION OF SAM-SOLAI S MULTIVARIATE ADDITIVE GAMMA DISTRIBUTION*

Α complete processing methodology for 3D monitoring using GNSS receivers

orbiting electron turns out to be wrong even though it Unfortunately, the classical visualization of the

Discrete Shells Simulation

x = , so that calculated

Questions k 10k 100k 1M Speaker. output

Binary Choice. Multiple Choice. LPM logit logistic regresion probit. Multinomial Logit

167 T componnt oftforc on atom B can b drvd as: F B =, E =,K (, ) (.2) wr w av usd 2 = ( ) =2 (.3) T scond drvatv: 2 E = K (, ) = K (1, ) + 3 (.4).2.2

A Propagating Wave Packet Group Velocity Dispersion

Ερωτήσεις και ασκησεις Κεφ. 10 (για μόρια) ΠΑΡΑΔΟΣΗ 29/11/2016. (d)

Physics of Very High Frequency (VHF) Capacitively Coupled Plasma Discharges

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

External Equivalent. EE 521 Analysis of Power Systems. Chen-Ching Liu, Boeing Distinguished Professor Washington State University

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Group Codes Define Over Dihedral Groups of Small Order

Partial Derivatives: Suppose that z = f(x, y) is a function of two variables.

COHORT MBA. Exponential function. MATH review (part2) by Lucian Mitroiu. The LOG and EXP functions. Properties: e e. lim.

Multivariate Linear and Non-Linear Causality Tests

6.1 Integration by Parts and Present Value. Copyright Cengage Learning. All rights reserved.

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Math 34A. Final Review

UNIT 8 TWO-WAY ANOVA WITH m OBSERVATIONS PER CELL

Heisenberg Model. Sayed Mohammad Mahdi Sadrnezhaad. Supervisor: Prof. Abdollah Langari

Unbalanced Panel Data Models

1 Minimum Cut Problem

Transcription:

Lctur Outln Bost 57 Appld Bostatstcs I Scott S. Emrson, M.D., Ph.D. Profssor of Bostatstcs Unvrsty of Washngton Lctur 5: Smpl Rgrsson Dcmbr 3, 22 Gnral Rgrsson Sttng Smpl Rgrsson Modls Lnar Rgrsson Infrnc About Gomtrc Mans Logstc Rgrsson Proportonal Hazards Rgrsson Addtonal Commnts About Infrnc 2 22, 23, 25 Scott S. Emrson, M.D., Ph.D. Two Varabl Sttng Gnral Rgrsson Sttng Many statstcal problms consdr th assocaton btwn two varabls Rspons varabl (outcom, dpndnt varabl) Groupng varabl (prdctor, ndpndnt varabl) 3 4 Part :

Addrssng Scntfc Quston Compar th dstrbuton of th rspons varabl across groups that ar dfnd by th groupng varabl Wthn ach group, th valu of th groupng varabl s constant Intro Cours Classfcaton Charactrz statstcal analyss by Numbr of sampls (groups), and Whthr subjcts n groups ar ndpndnt Corrspondnc wth two varabl sttng By charactrzaton of groupng varabl Constant: On sampl problm Bnary: Two sampl problm Catgorcal: k sampl problm (.g., ANOVA) Contnuous: Infnt sampl problm Rgrsson 5 6 Eampl: SBP and Ag Rgrsson Mthods Systolc Blood Prssur (mm Hg) 8 2 4 6 8 2 SBP by Ag Rgrsson tnds on and two sampl statstcs (.g., th t tst) to th nfnt sampl problm Whl w don t rally vr hav (or car) about an nfnt numbr of sampls, t s asst to us modls that would allow that n ordr to handl Contnuous prdctors of ntrst Adjustmnt for othr varabls 7 8 9 Ag (yars) 7 8 Part :2

Rgrsson vs Two Sampls Whn usd wth a bnary groupng varabl common rgrsson modls rduc to th corrspondng two varabl mthods Lnar rgrsson wth a bnary prdctor Classcal: t tst wth qual varanc Robust SE: t tst wth unqual varanc (appro) Gudng Prncpl Evrythng s rgrsson. - Scott Emrson Logstc rgrsson wth a bnary prdctor Scor tst: Ch squard tst for assocaton Co rgrsson wth a bnary prdctor Scor tst: Logrank tst 9 Typs of Varabls Bnary data E.g., s, dath Nomnal data: unordrd, catgorcal data E.g., rac, martal status Ordnal catgorcal data E.g., stag of dsas Quanttatv data E.g., ag, blood prssur Rght cnsord data E.g., tm to dath (whn not vryon has dd) Summary Masurs Th masurs commonly usd to summarz and compar dstrbutons vary accordng to th typs of data Mans: bnary; quanttatv Mdans: ordrd; quanttatv; cnsord Proportons: bnary; nomnal Odds: bnary; nomnal Hazards: cnsord hazard = nstantanous rat of falur 2 Part :3

Rgrsson Modls Accordng to th paramtr compard across groups Mans Lnar rgrsson Gom Mans Lnar rgrsson on logs Odds Logstc rgrsson Rats Posson rgrsson Hazards Proportonal Hazards rgr Quantls Paramtrc survval rgr Gnral Rgrsson Gnral notaton for varabls and paramtr Y Rspons masurd on th th subjct Valu of th prdctor for th th subjct Paramtr of dstrbuton of Y Th paramtr mght b th man, gomtrc man, odds, rat, nstantanous rsk of an vnt (hazard), tc. 3 4 Smpl Rgrsson Gnral notaton for smpl rgrsson modl g g "lnk" functon usd for modlng "Intrcpt" of "lnar prdctor" "Slop (for prdctor )" of "lnar prdctor" Th lnk functon s usually thr Non (also calld dntty) for an addtv modl Most common whn analyzng mans of contnuous Y Log for a multplcatv modl Analyzng gomtrc mans, odds, rats, hazards Borrowng Informaton Us othr groups to mak stmats n groups wth spars data Intutvly: 67 and 69 yar olds would provd som rlvant nformaton about 68 yar olds Assumng straght ln rlatonshp tlls us how to adjust data from othr (vn mor dstant) ag groups If w do not know about th act functonal rlatonshp, w mght want to borrow nformaton only clos to ach group (Nt quartr: splns) 5 6 Part :4

Dfnng Contrasts Dfn a comparson across groups to us whn answrng scntfc quston If straght ln rlatonshp n paramtr, slop s dffrnc n paramtr btwn groups dffrng by yar n If nonlnar rlatonshp n paramtr, slop s avrag dffrnc n paramtr btwn groups dffrng by yar n Statstcal jargon: a contrast across th groups Nt Quartr: Multpl Rgrsson Gnral notaton for smpl rgrsson modl g g j "lnk" functon usd for modlng "Intrcpt" of "lnar prdctor" "Slop (for prdctor Th lnk functon s usually thr Non (also calld dntty) for an addtv modl Most common whn analyzng mans of contnuous Y Log for a multplcatv modl Analyzng gomtrc mans, odds, rats, hazards 2 2 j p )" of "lnar prdctor" p 7 8 Uss of Multpl Rgrsson Modlng compl scntfc factors Smokng Indcator of vr smokd, pack yars, yars snc qut U-shapd trnds Dummy varabls modlng ach group ndpndntly Adjustng for covarats Confoundng Prcson Modlng ffct modfcaton Includ ntracton trms Comparson of Mthods Th major dffrnc btwn rgrsson modls s ntrprtaton of th paramtrs Summary: Man, gomtrc man, odds, hazards Comparson of groups: Dffrnc, rato Issus rlatd to ncluson of covarats rman th sam Addrss th scntfc quston Prdctor of ntrst; Effct modfrs Addrss confoundng Incras prcson 9 2 Part :5

Usual Rgrsson Output Estmats Intrcpt: stmatd g(θ) whn all prdctors ar Slop for ach prdctor: stmatd dffrnc n g(θ) for two groups that dffr by on unt n corrspondng prdctor, but agrng n all othr prdctors Standard rrors Confdnc ntrvals P valus tstng for Intrcpt of zro (who cars?) Slop of zro (tst for assocaton n θ) Intrprtaton: Idntty Lnk Y Intrcpt: - valu of θ for a group wth = Qut oftn not of scntfc ntrst bcaus out of rang of data or vn mpossbl Slop: - (avg) dff n θ across groups dffrng n by unt Usually masurs assocaton btwn Y and Most common ampls: Lnar rgrsson whn θ s man 2 22 log Intrprtaton: Log Lnk Y Intrcpt: p( ) - valu of θ for a group wth = Qut oftn not of scntfc ntrst bcaus out of rang of data or vn mpossbl Slop: p( ) - (avg) rato of θ across groups dffr n by Usually masurs assocaton btwn Y and Most common ampls: Lnar rgrsson on log(y): p( ) s gom man rato Logstc rgrsson: p( ) s odds rato Proportonal hazard rgrsson: p( ) s hazard rato 23 Infrnc wth Rgrsson Rgrsson analyss s commonly usd to answr statstcal qustons of typ 3. Estmatng dstrbuton paramtrs n populaton 4. Comparng dstrbutons across groups 5. Prdctng futur ndvdual obsrvatons Assumptons ndd for vald nfrnc dpnd on quston To dtct assocatons (comparng dstrbutons) w nd Appromat normalty of stmatd slops Corrct modlng of dpndnc among obsrvatons Corrct modlng of varanc wthn groups To us lnar prdctor to stmat θ w also nd (n addton) Corrct lnar rlatonshp To prdct futur valus of Y w also nd (n addton) Corrct assumptons about dstrbuton of Y 24 Part :6

Eampl: Qustons Motvatng Eampl Assocaton btwn blood prssur and ag Scntfc quston: Dos agng affct blood prssur? Statstcal quston: Dos th dstrbuton of systolc blood prssur dffr across ag groups? Acknowldgs varablty of rspons Acknowldgs uncrtanty of caus and ffct Dffrncs could b rlatd to calndar tm of brth nstad of ag 25 26 Eampl: Dfnton of Varabls Eampl: Rgrsson Modl Rspons: Systolc blood prssur contnuous Prdctor of ntrst (groupng): Ag contnuous an nfnt numbr of ags ar possbl w probably wll not sampl vry on of thm (Lnar rgrsson s most oftn usd wth a contnuous rspons varabl and a contnuous POI or any POI adjustd for othr varabls BUT: It maks prfct sns wth bnary POI Argumnts could vn b mad for th cas of bnary rspons, though ths s nonstandard) 27 Answr quston by assssng lnar trnds n, say, avrag SBP by ag Estmat bst fttng ln to avrag SBP wthn ag groups An assocaton wll st f th slop ( ) s nonzro In that cas, th avrag SBP wll b dffrnt across dffrnt ag groups E SBP Ag Ag 28 Part :7

Eampl: Smooth; LS Ln Rul of Thumb SBP by Ag Th rgrsson modl thus producs somthng smlar to a rul of thumb E.g., Normal SBP s plus half your ag Systolc Blood Prssur (mm Hg) 8 2 4 6 8 2 E SBP Ag. 5 Ag 7 8 9 Ag (yars) 29 3 Eampl: Estmats, Infrnc. rgrss sbp ag Numbr of obs = 735 Sourc SS df MS F(, 733) =.63 Modl 456 456.4 Prob > F =.2 Rsdual 27974 733 38.6 R-squard =.43 Total 283796 734 386.6 Adj R-squard =.29 Root MSE = 9.536 Us of Rgrsson Th rgrsson modl srvs to Mak stmats n groups wth spars data by borrowng nformaton from othr groups Dfn a comparson across groups to us whn answrng scntfc quston sbp Cof. St.Err. t P> t [95% Conf Int] ag.43.32 3.26..72.69 _cons 98.9 9.89.. 79.5 8.4 E SBP Ag 98.9. 43 Ag 3 32 Part :8

Borrowng Informaton Us othr groups to mak stmats n groups wth spars data Intutvly: 67 and 69 yar olds would provd som rlvant nformaton about 68 yar olds Assumng straght ln rlatonshp tlls us how to adjust data from othr (vn mor dstant) ag groups If w do not know about th act functonal rlatonshp, w mght want to borrow nformaton only clos to ach group (Nt quartr: splns) Dfnng Contrasts Dfn a comparson across groups to us whn answrng scntfc quston If straght ln rlatonshp n mans, slop s dffrnc n man SBP btwn groups dffrng by yar n ag If nonlnar rlatonshp n mans, slop s avrag dffrnc n man SBP btwn groups dffrng by yar n ag Statstcal jargon: a contrast across th mans 33 34 Lnar Rgrsson Infrnc Th rgrsson output provds Estmats Intrcpt: stmatd man whn ag = Slop: stmatd dffrnc n avrag SBP for two groups dffrng by on yar n ag Standard rrors Eampl: Intrprtaton From lnar rgrsson analyss, w stmat that for ach yar dffrnc n ag, th dffrnc n man SBP s.43 mmhg. A 95% CI suggsts that ths obsrvaton s not unusual f th tru dffrnc n man SBP pr yar dffrnc n ag wr btwn.7 and.69 mmhg. Bcaus th P valu s P <.5, w rjct th null hypothss that thr s no lnar trnd n th avrag SBP across ag groups. Confdnc ntrvals P valus tstng for Intrcpt of zro (who cars?) Slop of zro (tst for lnar trnd n mans) 35 36 Part :9

Ingrdnts: Rgrsson Modl Smpl Lnar Rgrsson Rspons: Man of ths varabl compard across groups Typcally an uncnsord contnuous random varabl But truly can somtms b usd wth dscrt varabls Prdctor: Indcats th groups to b compard Can b contnuous or dscrt (ncludng bnary) Modl: W typcally consdr a lnar prdctor functon that s lnar n th modld prdctors Epctd valu (man) of Y for a partcular valu of E Y 37 38 Us of Straght Ln Rlatonshp Algbra: A ln s of form y = m + b Wth no varaton n th data, ach valu of y would l actly on a straght ln Intrcpt b s valu of y whn = Slop m s dffrnc n y pr unt dffrnc n In th ral world Rspons wthn groups s varabl Hddn varabls Inhrnt randomnss Th ln dscrbs th cntral tndncy of th data n a scattrplot of th rspons vrsus th prdctor Ingrdnts: Intrprtaton Intrprtaton of rgrsson paramtrs Intrcpt : Man Y for a group wth = Qut oftn not of scntfc ntrst Oftn outsd rang of data, somtms mpossbl Slop : Dffrnc n man Y across groups dffrng n by unt Usually masurs assocaton btwn Y and E Y 39 4 Part :

Drvaton of Intrprtaton Smpl lnar rgrsson of rspons Y on prdctor Man for an arbtrary group drvd from modl Intrprtaton of paramtrs by consdrng spcal cass Modl EY EY EY E Y Eampl: Mntal Functon by Ag Cardovascular Halth Study A cohort of ~5, ldrly subjcts n four communts followd wth annual vsts A subst of 735 subjcts Mntal functon masurd at basln by Dgt Symbol Substtuton Tst (DSST) Quston: How dos prformanc on DSST dffr across ag groups 4 42 Eampl: Lowss, LS Ln Last Squars Estmaton. rgrss dsst ag Dgt Symbol Substtuton Tst 2 4 6 8 Cognton by Ag Sourc SS df MS Nbr of obs = 723 ---------+------------------ F(, 72) = 9.57 Modl 5377 5377 Prob > F =. Rsdual 9 72 4.3 R-squard =.39 ---------+------------------ Adj R-sqr =.37 Total 6569 722 6.4 Root MSE =.847 dsst Cof. StdErr t P> t [95% C I] ag -.863.825 -.47. -.3 -.7 _cons 5 6.6 7.. 93.3 7 7 8 9 Ag (yars) 43 44 Part :

Usful Output Dcphrng Stata Output: Mans. rgrss dsst ag Nbr of obs = 723 Estmats of wthn group mans Intrcpt s labld _cons Estmatd ntrcpt: 5. Prob > F =. R-squard =.39 Adj R-sqr =.37 Root MSE =.847 dsst Cof. StdErr P> t [95% C I] ag -.863.825. -.3 -.7 _cons 5 6.6. 93.3 7 Slop s labld by varabl nam: ag Estmatd slop: -.863 Estmatd lnar rlatonshp: Avrag DSST by ag gvn by E DSST Ag 5. 863 Ag 45 46 Dcphrng Stata Output: SD Estmats of wthn group standard dvaton Wthn group SD s labld Root MSE Estmatd wthn group SD:.85 Ths prsums constant varanc n ag groups If not, ths s n basd on avrag wthn group varanc E Intrprtaton of Intrcpt DSST Ag 5. 863 Ag Estmatd man DSST for nwborns s 5 Prtty rdculous stmat W nvr sampld anyon lss than 67 Mamum valu for DSST s Nwborns would n fact (rathr dtrmnstcally) scor In ths problm, th ntrcpt s just a mathmatcal construct to ft a ln ovr th rang of our data 47 48 Part :2

E Intrprtaton of Slop DSST Ag 5. 863 Ag Estmatd dffrnc n man DSST for two groups dffrng by on yar n ag s -.863, wth oldr group avragng a lowr scor For 5 yar ag dffrnc: 5 -.863 = - 4.32 For yar ag dffrnc: - 8.63 (If a straght ln rlatonshp s not tru, w ntrprt th slop as an avrag dffrnc n man DSST pr on yar dffrnc n ag) Commnts on Intrprtaton I prss ths as a dffrnc btwn group mans rathr than a chang wth agng W dd not do a longtudnal study To th tnt that th tru group mans hav a lnar rlatonshp, ths ntrprtaton appls actly If th tru rlatonshp s nonlnar Th slop stmats th frst ordr trnd for th sampld ag dstrbuton W should not rgard th stmats of ndvdual group mans as accurat 49 5 Rgrsson n Stata Infrnc basd on thr classcal lnar rgrsson or robust standard rrors Classcal lnar rgrsson rgrss rspvar prdctor E.g., rgrss dsst ag Robust standard rror stmats rgrss rspvar prdctor, robust E.g., rgrss dsst ag, robust Th two approachs dffr n CI and P valus, not stmats E: Classcal Lnar Rgrsson. rgrss dsst ag Sourc SS df MS Nbr of obs = 723 ---------+------------------ F(, 72) = 9.57 Modl 5377 5377 Prob > F =. Rsdual 9 72 4.3 R-squard =.39 ---------+------------------ Adj R-sqr =.37 Total 6569 722 6.4 Root MSE =.847 dsst Cof. StdErr t P> t [95% C I] ag -.863.825 -.47. -.3 -.7 _cons 5 6.6 7.. 93.3 7 5 52 Part :3

Classcal Lnar Rgrsson Infrnc for assocaton basd on slop Strong null basd nfrnc P valu <. suggsts dstrbuton of DSST dffrs across ag groups T statstc: -.47 (Who cars?) Undr assumptons of homoscdastcty Estmatd trnd n man DSST by ag s an avrag dffrnc of -.863 pr on yar dffrncs n ag (DSST lowr n oldr) CI for trnd: -.3, -.7 E: Robust Standard Errors. rgrss dsst ag, robust Lnar rgrsson Numbr of obs = 723 F(, 72) = 3.72 Prob > F =. R-squard =.39 Root MSE =.847 Robust dsst Cof StdErr t P> t [95% Conf Int] ag -.863.755 -.43. -. -.75 _cons 5 5.7 8.45. 94. 7 53 54 Robust Standard Errors Infrnc for assocaton basd on slop Wak null basd nfrnc Estmatd trnd n man DSST by ag s an avrag dffrnc of -.863 pr on yar dffrncs n ag (DSST lowr n oldr) CI for trnd: -., -.75 P valu <. suggsts man DSST dffrs across ag groups T statstc: -.43 (Who cars?) Whch nfrnc s corrct? Choc of Infrnc Classcal lnar rgrsson and robust standard rror stmats dffr n th strngth of ncssary assumptons As a rul, f all th assumptons of classcal lnar rgrsson hold, t wll b mor prcs (Hnc, w wll hav gratst prcson to dtct assocatons f th lnar modl s corrct) Th robust standard rror stmats ar, howvr, vald for dtcton of assocatons vn n thos nstancs 55 56 Part :4

Choosng th Corrct Modl Choosng th Corrct Modl All modls ar fals, som modls ar usful. - Gorg Bo In statstcs, as n art, nvr fall n lov wth your modl. - Unknown 57 58 Altrnatv Rprsntaton Somtms lnar rgrsson modls ar prssd n trms of th rspons nstad of th man rspons Includs an rror modlng dffrnc btwn obsrvd valu and pctaton Modl Y Modl Sgnal and Nos Th rspons s dvdd nto two parts Th man (systmatc part or sgnal ) Th rror (random part or nos ) dffrnc btwn th obsrvd valu and th corrspondng group man I s calld th rror Y Th rror dstrbuton dscrbs th wthn-group dstrbuton of rspons 59 6 Part :5

Estmats of Error Dstrbuton Th rror dstrbuton s stmatd from th rsduals Rsdual ê Th man of th rrors s assumd to b Th sampl standard dvaton of th rsduals s rportd as th Root Man Squard Error ˆ Y ˆ Eampl Thus w stmat wthn group SD of.85 n th DSST vs ag ampl Classcal lnar rgrsson: SD for ach ag group Robust standard rror stmats: Squar root of avrag varancs across groups 6 62 Rlatonshps to Prvous Mthods Lnar rgrsson on a bnary prdctor Classcal LR: actly th t tst that prsums qual varancs Robust SE: appromats t tst that allows unqual varancs Hubr-Wht sandwch stmator Stata: rgrss dsst ag, robust Classcal smpl lnar rgrsson Tst for slop s actly th tst for sgnfcant corrlaton Infrnc for th Gomtrc Man Smpl Lnar Rgrsson on Log Transformd Data 63 64 Part :6

Part :7 65 Rgrsson on Gomtrc Mans Gomtrc mans of dstrbutons ar typcally analyzd by usng lnar rgrsson on log transformd data Common choc for nfrnc whn a postv rspons varabl s contnuous, and w ar ntrstd n multplcatv modls, w dsr to downwght outlrs, and/or th standard dvaton of rspons n a group s proportonal to th man Error s +/- % nstad of Error s +/- 66 Intrprtaton of Paramtrs Lnar rgrsson on log transformd Y (I am usng natural log) log log log log Modl Y E Y E Y E Y E 67 Intrprtaton of Paramtrs Rstatd modl as log lnk for gomtrc man log log log log GM Modl GM Y GM Y GM Y Y 68 Intrprtaton of Paramtrs Intrprtaton of rgrsson paramtrs by back-transformng modl Eponntaton s nvrs of log GM Modl GM Y GM Y GM Y Y

Intrprtaton of Paramtrs Gomtrc man whn prdctor s Found by ponntaton of th ntrcpt from th lnar rgrsson on log transformd data: p( ) Rato of gomtrc mans btwn groups dffrng n th valu of th prdctor by unt Found by ponntaton of th slop from th lnar rgrsson on log transformd data: p( ) Eampl Trnds n FEV wth hght FEV data st A sampl of 654 halthy chldrn Lung functon masurd by forcd pratory volum (FEV) mamal amount of ar prd n scond Quston: How dos FEV dffr across hght groups Confdnc ntrvals for gomtrc man and ratos found by ponntatng th CI for rgrsson paramtrs 69 7 FEV vrsus Hght Charactrzaton of Scattrplot FEV (l/sc) 2 3 4 5 Dtcton of outlrs Non obvous Trnds n FEV across groups FEV tnds to b largr for tallr chldrn Scond ordr trnds Curvlnar ncras n FEV wth hght Varaton wthn hght groups htroscdastc : unqual varanc across groups man-varanc rlatonshp: hghr varaton n groups wth hghr FEV 45 5 55 6 65 7 75 Hght (nchs) 7 72 Part :8

Choc of Summary Masur Scntfc justfcaton for gomtrc man FEV s a volum Hght s a lnar dmnson Each dmnson of lung sz s proportonal to hght Standard dvaton lkly proportonal to hght Modl Gomtrc Man Scnc dctats any of th modls Statstcal prfrnc for transformaton of rspons May transform to qual varanc across groups Homoscdastcty allows asr nfrnc Scnc Statstcs 3 FEV Hght FEV Hght log( FEV ) 3log( Hght ) 3 Statstcal prfrnc for log transformaton Easr ntrprtaton: multplcatv modl Compar groups usng ratos 73 74 log(fev) vrsus log(hght) log-log Plot of FEV vs Hght log(fev) (log l/sc)..5..5 FEV (l/sc).8. 2. 3. 4. 5. 3.9 4. 4. 4.2 4.3 log(hght) (log nchs) 75 5 6 7 Hght (nchs) 76 Part :9

Estmaton of Rgrsson Modl. rgrss logfv loght, robust Rgrsson wth robust standard rrors Numbr of obs = 654 F(, 652) = 23.8 Prob > F =. R-squard =.7945 Root MSE =.52 Robust logfv Cof. StErr t P> t [95% CI] loght 3.2.68 46.5. 2.99 3.26 _cons -.92.278-42.9. -2.47 -.38 Log Transformd Prdctors Intrprtaton of log transformd prdctors wth log lnk functon Log lnk usd to modl th gomtrc man Eponntatd slop stmats rato of gomtrc mans across groups Compar groups wth a k-fold dffrnc n thr masurd prdctors Estmatd rato of gomtrc mans p β logk β k 77 78 Intrprtaton of Stata Output Scntfc ntrprtaton of th slop log GM FEV loght.9 3. 2 loght Estmatd rato of gomtrc man FEV for two groups dffrng by % n hght (.-fold dffrnc n hght) Eponntat. to th slop:. 3.2 =.35 Group that s % tallr s stmatd to hav a gomtrc man FEV that s.35 tms hghr (35% hghr) Why Transform Prdctor? Typcally chosn accordng to whthr th data lkly follow a straght ln rlatonshp Lnarty ( modl ft ) ncssary to prdct th valu of th paramtr n ndvdual groups Lnarty s not ncssary to stmat stnc of assocaton Lnarty s not ncssary to stmat a frst ordr trnd n th paramtr across groups havng th sampld dstrbuton of th prdctor (Infrnc about ths two qustons wll tnd to b consrvatv f lnarty dos not hold) 79 8 Part :2

Choc of Transformaton Rarly do w know whch transformaton of th prdctor provds bst lnar ft As always, thr s a dangr n usng th data to stmat th bst transformaton to us If thr s no assocaton of any knd btwn th rspons and th prdctor, a lnar ft (wth a zro slop) s th corrct on Tryng to dtct a transformaton s thus an nformal tst for an assocaton Multpl tstng procdurs nflat th typ I rror Somtms Dos Not Mattr It s bst to choos th transformaton of th prdctor on scntfc grounds Howvr, t s oftn th cas that many functons ar wll appromatd by a straght ln ovr a small rang of th data Eampl: In th modlng of FEV as a functon of hght, th logarthm of hght s appromatly lnar ovr th rang of hghts sampld 8 82 log(hght) vrsus Hght Untransformd Prdctors log(hght) (log nchs) 3.9 4. 4. 4.2 4.3 It s thus oftn th cas that w can choos to us an untransformd prdctor vn whn scnc would suggst a nonlnar assocaton Ths can hav advantags whn ntrprtng th rsults of th analyss E.g., t s far mor natural to compar hghts by dffrncs than by ratos Chancs ar w would charactrz two chldrn as dffrng by 4 nchs n hght rathr than as th 44 nch chld as bng % tallr than th 4 nch chld 45 5 55 6 65 7 75 Hght (nchs) 83 84 Part :2

Statstcal Rol of Varabls Lookng ahad to multpl rgrsson: Th rlatv mportanc of havng th tru transformaton for a prdctor dpnds on th statstcal rol Prdctor of Intrst Effct Modfrs Confoundrs Prcson varabls Prdctor of Intrst In gnral, don t worry about modlng th act rlatonshp bfor you hav vn stablshd that thr s an assocaton (bnary sarch) Sarchng for th bst ft can nflat th typ I rror Mak most accurat, prcs nfrnc about th prsnc of an assocaton frst Eploratory analyss can suggst modls for futur analyss 85 86 Effct Modfrs Modlng of ffct modfrs s nvarably just to tst for stnc of th ntracton W rarly hav a lot of prcson to answr qustons n subgroups of th data Pattrns of ntracton can b so compl that t s unlkly that w wll rally captur th ntractons across all subgroups n a sngl modl Typcally w rstrct futur studs to analyss tratng subgroups sparatly Confoundrs It s mportant to hav an approprat modl of th assocaton btwn th confoundr and th rspons Falur to accuratly modl th confoundr mans that som rsdual confoundng wll st Howvr, sarchng for th bst modl may nflat th typ I rror for nfrnc about th prdctor of ntrst by ovrstatng th prcson of th study Luckly, w rarly car about nfrnc for th confoundr, so w ar fr to us nffcnt mans of adjustmnt,.g., stratfd analyss 87 88 Part :22

Prcson Varabls Whn modlng prcson varabls, t s rarly worth th ffort to us th bst transformaton W usually captur th largst part of th addd prcson wth crud modls W gnrally do not car about stmatng assocatons btwn th rspons and th prcson varabl Most oftn, prcson varabls rprsnt known ffcts on rspons Smpl Logstc Rgrsson Infrnc About th Odds 89 9 Bnary rspons varabl Logstc Rgrsson Allows contnuous (or multpl) groupng varabls But s OK wth bnary groupng varabl also Compars odds of rspons across groups Odds rato Bnary Rspons Whn usng rgrsson wth bnary rspons varabls, w typcally modl th (log) odds usng logstc rgrsson Concptually, thr should b no problm modlng th proporton (whch s th man of th dstrbuton) Howvr, thr ar svral tchncal rasons why w do not us lnar rgrsson vry oftn wth bnary rspons 9 92 Part :23

Why not Lnar Rgrsson? Many msconcptons about th advantags and dsadvantags of analyzng th odds Rasons that I consdr vald Scntfc bass Us of odds ratos n cas-control studs Plausblty of lnar trnds and no ffct modfrs Statstcal bass Man varanc rlatonshp (f not usng robust SE) Scnc: Cas-Control Studs Scntfc ntrst: Dstrbuton of ffct across groups dfnd by caus Common samplng schms Cohort study: Sampl by posur Estmat dstrbuton of ffct n posur groups Cas-control study: Sampl by outcoms Estmat dstrbuton of posur n outcom groups E.g., proporton (or odds) of smokrs among popl wth or wthout cancr 93 94 Scnc: Cas-Control Studs Estmabl odds ratos for ach samplng schm Cohort study Odds of cancr among smokrs : odds of cancr among nonsmokrs Cas-control study Odds of smokng among cancr : odds of smokng among noncancr Mathmatcally, th two odds ratos ar th sam Scnc: Cas-Control Studs Th odds rato s asly ntrprtd whn tryng to nvstgat rar vnts Odds = prob / ( prob) Rar vnt: ( prob) s appromatly Odds s appromatly th probablty Odds rato s appromatly th rsk rato Rsk ratos ar asly undrstood Cas-control studs typcally usd whn vnts ar rar 95 96 Part :24

Scnc: Lnarty Proportons hav to b btwn and It s thus unlkly that a straght ln rlatonshp would st btwn a proporton and any prdctor UNLESS th prdctor tslf s boundd OTHERWISE thr vntually must b a thrshold abov whch th probablty dos not ncras (or only ncrass a lttl) Scnc: Effct Modfcaton Th rstrcton on rangs for probablts also mak t lkly that ffct modfcaton wll oftn b prsnt wth proportons E: 2 Yr Rlaps rats by NadrPSA>4, BSS If bon scan scor < 3: A dffrnc of.6 4% of mn wth nadr PSA < 4 rlaps n 24 months % of mn wth nadr PSA > 4 rlaps n 24 months If bon scan scor > 3: 7% of mn wth nadr PSA < 4 rlaps n 24 months Thus mpossbl for mn wth nadr PSA > 4 to hav an absolut dffrnc of.6 hghr 97 98 Why us th odds? Th odds of an vnt ar btwn and nfnty Rcall odds = prob / ( prob) (Evn bttr: log (odds) ar btwn ngatv nfnty and postv nfnty) Thus, thr s a gratr chanc that lnar rlatonshps mght hold wthout ffct modfcaton Statstcs: Man-Varanc Classcal lnar rgrsson rqurs qual varancs n ach prdctor group Wth bnary data, th varanc wthn a group dpnds on th man For bnary Y E(Y) = p Var (Y) = p( p) (Wth robust rgrsson tchnqus, ths problm not a lmtaton) 99 Part :25

Part :26 Smpl Logstc Rgrsson Modlng odds of bnary rspons Y on prdctor log odds log odds log odds log logt Modl Pr Dstrbuton p p p p Y 2 Intrprtaton as Odds Eponntaton of rgrsson paramtrs odds odds odds Modl Pr Dstrbuton p p p Y 3 Estmatng Proportons Proporton = odds / ( + odds) / Modl Pr Dstrbuton p p p p p Y 4 Smpl Logstc Rgrsson Intrprtaton of th modl Odds whn prdctor s Found by ponntaton of th ntrcpt from th logstc rgrsson: p( ) Odds rato btwn groups dffrng n th valu of th prdctor by unt Found by ponntaton of th slop from th logstc rgrsson: p( )

Stata logt rspvar prdvar, [robust] Provds rgrsson paramtr stmats and nfrnc on th log odds scal Intrcpt, slop wth SE, CI, P valus logstc rspvar prdvar, [robust] Provds rgrsson paramtr stmats and nfrnc on th odds rato scal Only slop wth SE, CI, P valus Eampl Prvalnc of strok (crbrovascular accdnt- CVA) by ag n subst of Cardovascular Halth Study Rspons varabl s CVA Bnary varabl: = no hstory of pror strok, = pror hstory of strok Prdctor varabl s Ag Contnuous prdctor 5 6 Lowss Smooth of CVA vs Ag Charactrzaton of Plot Clarly th scattrplot (vn wth suprmposd smooth) s prtty uslss wth a bnary rspons Pror Hstory of Strok (= Ys, = No)..2.4.6.8. (Not that w ar stmatng proportons not odds wth ths plot, so w can not vn judg lnarty for logstc rgrsson) 7 8 9 Ag (yars) 7 8 Part :27

Eampl: Rgrsson Modl Answr quston by assssng lnar trnds n log odds of strok by ag Estmat bst fttng ln to log odds of CVA wthn ag groups logoddscva Ag Ag An assocaton wll st f th slop ( ) s nonzro In that cas, th odds (and probablty) of CVA wll b dffrnt across dffrnt ag groups Paramtr Estmats. logt cva ag (traton nfo dltd) Numbr of obs = 735 LR ch2() = 2.45 Prob > ch2 =.75 Log lklhood = -24.98969 Psudo R2 =.5 cva Cof StdErr z P> z [95% Conf Int] ag.336.2.59. -.77.748 _cons -4.69.59-2.95.3-7.8 -.572 9 Intrprtaton of Stata Output Rgrsson modl for CVA on ag Intrcpt s labld by _cons Estmatd ntrcpt: -4.69 Slop s labld by varabl nam: ag Estmatd slop:.336 Estmatd lnar rlatonshp: log odds CVA by ag group gvn by log odds CVA 4.69. 336 Ag Intrprtaton of Intrcpt log odds CVA 4.69. 336 Ag Estmatd log odds CVA for nwborns s -4.69 Odds of CVA for nwborns s -4.69 =.92 Probablty of CVA for nwborns Us prob = odds / (+odds):.92 / +.92=.9 Prtty rdculous to try to stmat W nvr sampld anyon lss than 67 In ths problm, th ntrcpt s just a tool n fttng th modl 2 Part :28

Intrprtaton of Slop log odds CVA 4.69. 336 Ag Estmatd dffrnc n log odds CVA for two groups dffrng by on yar n ag s.336, wth oldr group tndng to hghr log odds Odds Rato:.336 =.34 For 5 yar ag dffrnc: 5.336 =.34 5 =.83 Stata: logt vrsus logstc Gvn that w ar rarly ntrstd n th ntrcpt, w mght as wll us th logstc command It wll provd nfrnc for th odds rato W don t hav to ponntat th slop stmat (If a straght ln rlatonshp s not tru, w ntrprt th slop as an avrag dffrnc n log odds CVA pr on yar dffrnc n ag) 3 4 Odds Ratos usng logstc.logstc cva ag Logstc rgrsson Numbr of obs = 735 LR ch2() = 2.45 Prob > ch2 =.75 Log lklhood = -24.98969 Psudo R2 =.5 cva Odds Rato StdErr z P> z [95% Conf Int] ag.34.28.59..992.78 Commnts on Intrprtaton I prss ths as a dffrnc btwn group odds rathr than a chang wth agng W dd not do a longtudnal study To th tnt that th tru group log odds hav a lnar rlatonshp, ths ntrprtaton appls actly If th tru rlatonshp s nonlnar Th slop stmats th frst ordr trnd for th sampld ag dstrbuton W should not rgard th stmats of ndvdual group probablts / odds as accurat 5 6 Part :29

Sgnal and Nos Not that th Sgnal and Nos da dos not apply so wll hr W do not tnd to quantfy an rror dstrbuton wth logstc rgrsson Smpl Proportonal Hazards Rgrsson Infrnc About Hazards 7 8 Rght Cnsord Data A spcal typ of mssng data: th act valu s not always known Som masurmnts ar known actly Som masurmnts ar only known to cd som spcfd valu (prhaps dffrnt for ach subjct) Typcally rprsntd by two varabls An obsrvaton tm: Tm to vnt or cnsorng, whchvr cam frst An ndcator of vnt: Tlls us whch wr obsrvd vnts Statstcal Mthods In th prsnc of cnsord data, th usual dscrptv statstcs ar not approprat Sampl man, sampl mdan, smpl proportons, sampl standard dvaton should not b usd Propr dscrptvs should b basd on Kaplan-Mr stmats Smlarly, spcal nfrntal procdurs ar ndd wth cnsord data 9 2 Part :3

Survval Rgrsson Thr ar two fundamntal modls usd to dscrb th way that som factor mght affct tm to vnt Acclratd falur tm Proportonal Hazards Acclratd Falur Tm Modl Assum that a factor causs som subjcts to spnd thr lftm too fast Th basc da: For vry yar n a rfrnc group s lvs, th othr group ags k yars E.g.: human yar = 7 dog yars Ratos of quantls of survval dstrbutons ar constant across two group E.g., rport mdan ratos AFT modls nclud th paramtrc ponntal, Wbull, and lognormal modls 2 22 Proportonal Hazards Modl Consdrs th nstantanous rat of falur at ach tm among thos subjcts who hav not fald Proportonal hazards assums that th rato of ths nstantanous falur rats s constant n tm btwn two groups Proportonal hazards (Co) rgrsson trats th survval dstrbuton wthn a group smparamtrcally A sm-paramtrc modl: Th hazard rato s th paramtr, thr s no ntrcpt AFT vs PH Survval analyss: Who dos Dath prfr? Gvn a collcton of popl n a sampl: Acclratd falur tm modls consdr how oftn Dath taks sombody If popl that Dath prfrs ar avalabl, h/sh wll com mor oftn Proportonal hazards modls just compar whch popl Dath chooss rlatv to thr frquncy n th populaton Why s t that Dath tnds to choos th vry old dspt th fact that thy ar lss than % of th populaton avalabl 23 24 Part :3

Proportonal Hazards Modl Ignors th tm that vnts occur Looks at odds of choosng subjcts rlatv to prvalnc n th populaton Can b drvd as stmatng th odds rato of an vnt at ach tm that an vnt occurs Proportonal hazards modl avrags th odds rato across all obsrvd vnt tms If th odds rato s constant ovr tm btwn two groups, such an avrag rsults n a prcs stmat of th hazard rato Borrowng Informaton Us othr groups to mak stmats n groups wth spars data Borrows nformaton across prdctor groups E.g., 67 and 69 yar olds would provd som rlvant nformaton about 68 yar olds Borrows nformaton ovr tm Rlatv rsk of an vnt at ach tm s prsumd to b th sam undr Proportonal Hazards 25 26 Smpl PH Rgrsson Modl Basln hazard functon s unspcfd Smlar to an ntrcpt Modl log t log t log hazard at t log log hazard at t log log hazard at t log t t t Modl on Hazard scal Eponntatng paramtrs Modl t t hazard at t hazard at t hazard at t t t t 27 28 Part :32

Intrprtaton of th Modl No ntrcpt Gnrally do not look at basln hazard But can b stmatd Slop paramtr Hazard rato btwn groups dffrng n th valu of th prdctor by unt Found by ponntaton of th slop from th proportonal hazards rgrsson: p() Rlatonshp to Survval Hazard functon dtrmns survval functon Hazard CumulatvHzd SurvvalFuncton t t S t t u t t S t du 29 3 Stata stco obsvar vntvar, [robust] Provds rgrsson paramtr stmats and nfrnc on th hazard rato scal Only slop wth SE, CI, P valus Eampl Prognostc valu of nadr PSA rlatv to tm n rmsson PSA data st: 5 mn who rcvd hormonal tratmnt for advancd prostat cancr Followd at last 24 months for clncal progrsson, but act tm of follow-up vars Nadr PSA: lowst lvl of srum prostat spcfc antgn achvd post tratmnt 3 32 Part :33

Scattrplots Scattrplots of cnsord data ar not scntfcally manngful It s thus bttr not to gnrat thm unlss you do somthng to ndcat th cnsord data W can labl cnsord data, but w hav to rmmbr th tru valu may b anywhr largr than that Instad w look at KM curvs across strata Mght nd to catgorz th data Estmaton of Rgrsson Modl. stst obstm rlaps. stco nadr Co rgrsson -- Brslow mthod for ts No. of subj = 5 No. of obs = 5 No. fal = 36 Tm at rsk = 423 LR ch2() =.35 Log lklhood = -3.3 Prob > ch2 =.8 _ t HzRat StdErr z P> z [95% Conf Int] nadr.6.38 4...8.23 33 34 Intrprtaton of Stata Output Scntfc ntrprtaton of th slop Hazard rato nadr. 5 Addtonal Commnts Rgardng Valdty of Infrnc Estmatd hazard rato for two groups dffrng by n nadr PSA s found by ponntaton slop (Stata only rports th hazard rato): Group on unt hghr has nstantanous vnt rat.5 tms hghr (.5% hghr) Group unts hghr has nstantanous vnt rat.5 =.62 tms hghr (6.2% hghr) 35 36 Part :34

Infrnc wth Rgrsson Most commonly ncountrd qustons Quantfyng dstrbutons Dscrbng th dstrbuton of rspons Y wthn groups by stmatng th man E( Y ) Comparng dstrbutons across groups Dstrbutons dffr across groups f th rgrsson slop paramtr s nonzro Prdcton Estmatng a futur obsrvaton of rspons Y Oftn w us th man or gomtrc man Statstcal Valdty of Infrnc Infrnc (CI, P vals) about assocatons rqurs thr gnral assumptons Assumptons about appromat normal dstrbuton for paramtr stmats Assumptons about ndpndnc of obsrvatons Assumptons about varanc of obsrvatons wthn groups 37 38 Normally Dstrbutd Estmats Assumptons about appromat normal dstrbuton for paramtr stmats Classcally or Robust SE: Larg sampl szs Dfnton of larg dpnds on rror dstrbuton and rlatv sampl szs wthn groups But t s oftn surprsng how small larg can b Wth normally dstrbutd rrors, larg s on obsrvaton (two to stmat a slop) Wth havy tals (hgh propnsty to outlrs), larg can b vry larg s Lumly, t al., Ann Rv Pub Hlth, 22 Indpndnc / Dpndnc Assumptons about ndpndnc of obsrvatons for lnar rgrsson Classcally: All obsrvatons ar ndpndnt Robust standard rror stmats: Allow corrlatd obsrvatons wthn dntfd clustrs 39 4 Part :35

Wthn Group Varanc Assumptons about varanc of rspons wthn groups for lnar rgrsson Classcally: Equal varancs across groups Statstcal Valdty of Infrnc Infrnc (CI, P valus) about man rspons n spcfc groups rqurs a furthr assumpton Assumpton about adquacy of lnar modl Robust standard rror stmats: Allow unqual varancs across groups 4 42 Lnarty of Modl Assumpton about adquacy of lnar modl for prdcton of group mans wth lnar rgrsson Classcally OR robust standard rror stmats: Th man rspons n groups s lnar n th modld prdctor (W can modl transformatons of th masurd prdctor) Statstcal Valdty of Infrnc Infrnc (prdcton ntrvals, P valus) about ndvdual obsrvatons n spcfc groups has stll anothr assumpton Assumpton about dstrbuton of rrors wthn ach group 43 44 Part :36

Dstrbuton of Errors Assumpton about dstrbuton of rrors wthn ach group for prdcton ntrvals wth lnar rgrsson Classcally: Errors hav th sam normal dstrbuton wthn ach group Prdcton and Robust SE If you ar usng robust standard rror stmats, prdcton ntrvals basd on lnar rgrsson modls s napproprat Prdcton ntrvals basd on lnar rgrsson assum common rror dstrbuton across groups Possbl tnson: Errors hav th sam dstrbuton wthn ach group, though t nd not b normal Not mplmntd n any softwar that I know of 45 46 Implcatons for Infrnc Rgrsson basd nfrnc about assocatons s far mor robust than stmaton of group mans or ndvdual prdctons A hrarchy of null hypothss Strong null: Total ndpndnc of Y and Intrmdat null: Man of Y th sam for all groups Wak null: No lnar trnd n man of Y across groups Undr Strong Null If th rspons and prdctor of ntrst wr totally ndpndnt: All aspcts of th dstrbuton of th rspons would b th sam n ach group A flat ln would dscrb th man rspons across groups (and a lnar modl s corrct) Slop would b zro Wthn group varanc s th sam n ach group Error dstrbuton s th sam n all groups In larg sampl szs, th rgrsson paramtrs ar normally dstrbutd 47 48 Part :37

Undr Intrmdat Null Mans for ach prdctor group would l on a flat ln Slop would b zro Wthn group varanc could vary across groups Error dstrbuton could dffr across groups In larg sampl szs, th rgrsson paramtrs ar normally dstrbutd Dfnton of larg wll also dpnd upon how much th rror dstrbutons dffr across groups rlatv to th numbr sampld n ach group Undr Wak Null Lnar trnd n mans across prdctor groups would l on a flat ln Slop of bst fttng ln would b zro Wthn group varanc could vary across groups Error dstrbuton could dffr across groups In larg sampl szs, th rgrsson paramtrs ar normally dstrbutd Dfnton of larg wll also dpnd upon how much th rror dstrbutons dffr across groups rlatv to th numbr sampld n ach group 49 5 Classcal Lnar Rgrsson Infrnc about slop tsts strong null Tsts mak nfrnc assumng th null Th data can appar nonlnar or htroscdastc Mrly vdnc strong null s not tru Lmtatons W cannot b confdnt that thr s a dffrnc n th mans Vald nfrnc about mans dmands homoscdastcty W cannot b confdnt of stmats of group mans Vald stmats of group mans dmands lnarty Robust Standard Errors Infrnc about slop tsts wak null Data can appar nonlnar or htroscdastc Robust SE allow unqual varancs Nonlnarty dcrass prcson, but nfrnc stll vald about frst ordr (lnar) trnds Only f lnar rlatonshp holds can w Tst ntrmdat null Estmat group mans 5 52 Part :38

Implcatons for Infrnc Infrnc about assocatons s far mor trustworthy than stmaton of group mans or ndvdual prdctons Nonzro slop suggsts an assocaton btwn rspons and prdctor Infrnc about lnar trnds n mans f us robust SE Intrprtng Postv Rsults If slop s statstcally sgnfcant dffrnt from usng robust SE Obsrvd data s atypcal of a sttng wth no lnar trnd n man rspons across groups Data suggsts vdnc of a trnd toward largr (smallr) mans n groups havng largr valus of th prdctor (To th tnt th data appars lnar, stmats of th group mans wll b rlabl) 53 54 Intrprtng Ngatv Studs Dffrntal dagnoss of rasons for not rjctng null hypothss of zro slop Thr may b no assocaton Thr may b an assocaton but not n th paramtr consdrd (., th man rspons) Modl Chckng Much statstcal ltratur has bn dvotd to mans of chckng th assumptons for rgrsson modls I blv modl chckng s gnrally fraught wth prl, as t ncssarly nvolvs multpl comparsons Thr may b an assocaton n th paramtr consdrd, but th bst fttng ln has a zro slop (a curvlnar assocaton n th paramtr) Thr may b a frst ordr trnd n th paramtr, but w lackd statstcal prcson to b confdnt that t truly sts (typ II rror) 55 56 Part :39

Modl Chckng Blood suckrs hd nath my bd Eypnns, Mark Lnkous (Sparklhors) Modl Chckng W cannot rlably us th sampld data to assss whthr t accuratly portrays th populaton W ar worrd about what data w mght not hav sn It s not so much th monstrs that w s that scar us, but th goblns n th clost (But w do worry mor whn w s a tndncy to outlrs n th sampl or clar dparturs from th modl) 57 58 Choc of Infrnc My gnral rcommndaton: Thr s rlatvly lttl to b lost and much accuracy to b gand n usng th robust standard rror stmats Avods th nd for modl chckng Too larg an lmnt of data drvn analyss for my tast Mor logcal scntfc approach Mnmzs th nd to prsum mor dtald knowldg than th quston w ar tryng to answr E.g., f w don t know how mans mght dffr, why prsum that w know how varancs and shap of dstrbuton mght bhav? Infrnc on Group Mans Infrnc about stmaton of group mans or ndvdual prdctons should b ntrprtd trmly cautously Th dpndnc on knowng th corrct modl and dstrbuton mans that w cannot b as confdnt n th stmats and nfrnc Nvrthlss, such stmats ar oftn th bst appromatons Intrpolaton to unobsrvd groups s lss rsky than trapolaton outsd th rang of prdctors 59 6 Part :4