Review Statistics review 14: Logistic regression Viv Bewick 1, Liz Cheek 1 and Jonathan Ball 2

Similar documents
What are those βs anyway? Understanding Design Matrix & Odds ratios

Observer Bias and Reliability By Xunchi Pu

Applied Statistics II - Categorical Data Analysis Data analysis using Genstat - Exercise 2 Logistic regression

EXST Regression Techniques Page 1

Solution of Assignment #2

Estimation of odds ratios in Logistic Regression models under different parameterizations and Design matrices

Answer Homework 5 PHA5127 Fall 1999 Jeff Stark

Errata. Items with asterisks will still be in the Second Printing

Search sequence databases 3 10/25/2016

22/ Breakdown of the Born-Oppenheimer approximation. Selection rules for rotational-vibrational transitions. P, R branches.

ARIMA Methods of Detecting Outliers in Time Series Periodic Processes

Estimation of apparent fraction defective: A mathematical approach

4.2 Design of Sections for Flexure

Addition of angular momentum

A Propagating Wave Packet Group Velocity Dispersion

Determination of Vibrational and Electronic Parameters From an Electronic Spectrum of I 2 and a Birge-Sponer Plot

What does the data look like? Logistic Regression. How can we apply linear model to categorical data like this? Linear Probability Model

Data Assimilation 1. Alan O Neill National Centre for Earth Observation UK

COMPUTER GENERATED HOLOGRAMS Optical Sciences 627 W.J. Dallas (Monday, April 04, 2005, 8:35 AM) PART I: CHAPTER TWO COMB MATH.

Pipe flow friction, small vs. big pipes

Addition of angular momentum

5.80 Small-Molecule Spectroscopy and Dynamics

Extraction of Doping Density Distributions from C-V Curves

Linear Non-Gaussian Structural Equation Models

Homotopy perturbation technique

Text: WMM, Chapter 5. Sections , ,

The van der Waals interaction 1 D. E. Soper 2 University of Oregon 20 April 2012

Transitional Probability Model for a Serial Phases in Production

EFFECT OF BALL PROPERTIES ON THE BALL-BAT COEFFICIENT OF RESTITUTION

Chapter 13 GMM for Linear Factor Models in Discount Factor form. GMM on the pricing errors gives a crosssectional

Title: Vibrational structure of electronic transition

2008 AP Calculus BC Multiple Choice Exam

Difference -Analytical Method of The One-Dimensional Convection-Diffusion Equation

Forces. Quantum ElectroDynamics. α = = We have now:

MCB137: Physical Biology of the Cell Spring 2017 Homework 6: Ligand binding and the MWC model of allostery (Due 3/23/17)

Evaluating Reliability Systems by Using Weibull & New Weibull Extension Distributions Mushtak A.K. Shiker

15. Stress-Strain behavior of soils

Einstein Equations for Tetrad Fields


General Notes About 2007 AP Physics Scoring Guidelines

u x v x dx u x v x v x u x dx d u x v x u x v x dx u x v x dx Integration by Parts Formula

Recursive Estimation of Dynamic Time-Varying Demand Models

Problem Set 6 Solutions

The pn junction: 2 Current vs Voltage (IV) characteristics

Construction of asymmetric orthogonal arrays of strength three via a replacement method

Errata. Items with asterisks will still be in the Second Printing

Chapter 3 Lecture 14 Longitudinal stick free static stability and control 3 Topics

Unit 30: Inference for Regression

Ch. 24 Molecular Reaction Dynamics 1. Collision Theory

Principles of Humidity Dalton s law

Directivity effect of the 2016 Kumamoto Earthquake on both the ground motion and the damage of wooden house

A. Limits and Horizontal Asymptotes ( ) f x f x. f x. x "±# ( ).

Application of Vague Soft Sets in students evaluation

Cramér-Rao Inequality: Let f(x; θ) be a probability density function with continuous parameter

Two Products Manufacturer s Production Decisions with Carbon Constraint

First derivative analysis

SECTION where P (cos θ, sin θ) and Q(cos θ, sin θ) are polynomials in cos θ and sin θ, provided Q is never equal to zero.

PHA 5127 Answers Homework 2 Fall 2001

Thermodynamical insight on the role of additives in shifting the equilibrium between white and grey tin

MCE503: Modeling and Simulation of Mechatronic Systems Discussion on Bond Graph Sign Conventions for Electrical Systems

3-2-1 ANN Architecture

EXAMINATION QUESTION SVSOS3003 Fall 2004 Some suggestions for answering the questions

3 Finite Element Parametric Geometry

(Upside-Down o Direct Rotation) β - Numbers

Math 34A. Final Review

INTEGRATION BY PARTS

Procdings of IC-IDC0 ( and (, ( ( and (, and (f ( and (, rspctivly. If two input signals ar compltly qual, phas spctra of two signals ar qual. That is

The graph of y = x (or y = ) consists of two branches, As x 0, y + ; as x 0, y +. x = 0 is the

TEMASEK JUNIOR COLLEGE, SINGAPORE. JC 2 Preliminary Examination 2017

Higher order derivatives

cycle that does not cross any edges (including its own), then it has at least

Exam 1. It is important that you clearly show your work and mark the final answer clearly, closed book, closed notes, no calculator.

INFLUENCE OF GROUND SUBSIDENCE IN THE DAMAGE TO MEXICO CITY S PRIMARY WATER SYSTEM DUE TO THE 1985 EARTHQUAKE

MULTIVARIATE BAYESIAN REGRESSION ANALYSIS APPLIED TO PSEUDO-ACCELERATION ATENUATTION RELATIONSHIPS

4. Money cannot be neutral in the short-run the neutrality of money is exclusively a medium run phenomenon.

A Prey-Predator Model with an Alternative Food for the Predator, Harvesting of Both the Species and with A Gestation Period for Interaction

CPSC 665 : An Algorithmist s Toolkit Lecture 4 : 21 Jan Linear Programming

VALUING SURRENDER OPTIONS IN KOREAN INTEREST INDEXED ANNUITIES

Full Waveform Inversion Using an Energy-Based Objective Function with Efficient Calculation of the Gradient

Alpha and beta decay equation practice

CO-ORDINATION OF FAST NUMERICAL RELAYS AND CURRENT TRANSFORMERS OVERDIMENSIONING FACTORS AND INFLUENCING PARAMETERS

+ f. e f. Ch. 8 Inflation, Interest Rates & FX Rates. Purchasing Power Parity. Purchasing Power Parity

Robust surface-consistent residual statics and phase correction part 2

Chapter 6 Student Lecture Notes 6-1

The Open Economy in the Short Run

Dealing with quantitative data and problem solving life is a story problem! Attacking Quantitative Problems

VII. Quantum Entanglement

BINOMIAL COEFFICIENTS INVOLVING INFINITE POWERS OF PRIMES. 1. Statement of results

CS 361 Meeting 12 10/3/18

de/dx Effectively all charged particles except electrons

Chapter 13 Aggregate Supply

Volterra Kernel Estimation for Nonlinear Communication Channels Using Deterministic Sequences

Lecture 28 Title: Diatomic Molecule : Vibrational and Rotational spectra

10. The Discrete-Time Fourier Transform (DTFT)

Sliding Mode Flow Rate Observer Design

Sara Godoy del Olmo Calculation of contaminated soil volumes : Geostatistics applied to a hydrocarbons spill Lac Megantic Case

There is an arbitrary overall complex phase that could be added to A, but since this makes no difference we set it to zero and choose A real.

Propositional Logic. Combinatorial Problem Solving (CPS) Albert Oliveras Enric Rodríguez-Carbonell. May 17, 2018

Introduction to Arithmetic Geometry Fall 2013 Lecture #20 11/14/2013

BINOMIAL COEFFICIENTS INVOLVING INFINITE POWERS OF PRIMES

Transcription:

Critical Car Fbruary 2005 Vol 9 No 1 Bwick t al. Rviw Statistics rviw 14: Logistic rgrssion Viv Bwick 1, Liz Chk 1 and Jonathan Ball 2 1 Snior Lcturr, School of Computing, Mathmatical and Information Scincs, Univrsity of Brighton, Brighton, UK 2 Snior Rgistrar in ICU, Livrpool Hospital, Sydny, Australia Corrsponding author: Viv Bwick, v.bwick@brighton.ac.uk Publishd onlin: 13 January 2005 This articl is onlin at http://ccforum.com/contnt/9/1/112 2005 BioMd Cntral Ltd Critical Car 2005, 9:112-118 (DOI 10.1186/cc3045) Abstract This rviw introducs logistic rgrssion, which is a mthod for modlling th dpndnc of a binary rspons variabl on on or mor xplanatory variabls. Continuous and catgorical xplanatory variabls ar considrd. Kywords binomial distribution, Hosmr Lmshow tst, liklihood, liklihood ratio tst, logit function, maximum liklihood stimation, mdian ffctiv lvl, odds, odds ratio, prdictd probability, Wald tst Introduction Logistic rgrssion provids a mthod for modlling a binary rspons variabl, which taks valus 1 and 0. For xampl, w may wish to invstigat how dath (1) or survival (0) of patints can b prdictd by th lvl of on or mor mtabolic markrs. As an illustrativ xampl, considr a sampl of 2000 patints whos lvls of a mtabolic markr hav bn masurd. Tabl 1 shows th data groupd into catgoris according to mtabolic markr lvl, and th proportion of daths in ach catgory is givn. Th proportions of daths ar stimats of th probabilitis of dath in ach catgory. Figur 1 shows a plot of ths proportions. It suggsts that th probability of dath incrass with th mtabolic markr lvl. Howvr, it can b sn that th rlationship is nonlinar and that th probability of dath changs vry littl at th high or low xtrms of markr lvl. This pattrn is typical bcaus proportions cannot li outsid th rang from 0 to 1. Th rlationship can b dscribd as following an S -shapd curv. Logistic rgrssion with a singl quantitativ xplanatory variabl Th logistic or logit function is usd to transform an S - shapd curv into an approximatly straight lin and to chang th rang of th proportion from 0 1 to to +. Th logit function is dfind as th natural logarithm (ln) of th odds [1] of dath. That is, p logit(p) = ln ( 1 p ) Whr p is th probability of dath. Figur 2 shows th logit-transformd proportions from Fig. 1. Th points now follow an approximatly straight lin. Th rlationship btwn probability of dath and markr lvl x could thrfor b modlld as follows: logit(p) = a + bx Although this modl looks similar to a simpl linar rgrssion modl, th undrlying distribution is binomial and th paramtrs a and b cannot b stimatd in xactly th sam way as for simpl linar rgrssion. Instad, th paramtrs ar usually stimatd using th mthod of maximum liklihood, which is discussd blow. Binomial distribution Whn th rspons variabl is binary (.g. dath or survival), thn th probability distribution of th numbr of daths in a sampl of a particular siz, for givn valus of th xplanatory 112 AUROC = ara undr th rcivr oprating charactristic curv; C.I. = confidnc intrval; ln = natural logarithm; logit = natural logarithm of th odds; MLE = maximum liklihood stimat; OR = odds ratio; ROC = rcivr oprating charactristic curv.

Availabl onlin http://ccforum.com/contnt/9/1/112 Tabl 1 Figur 2 Rlationship btwn lvl of a mtabolic markr and survival Mtabolic Numbr of Numbr of Proportion of markr lvl (x) patints daths daths 0.5 to <1.0 182 7 0.04 1.0 to <1.5 233 27 0.12 1.5 to <2.0 224 44 0.20 2.0 to <2.5 236 91 0.39 2.5 to <3.0 225 130 0.58 3.0 to <3.5 215 168 0.78 3.5 to <4.0 221 194 0.88 4.0 to <4.5 200 191 0.96 4.5 264 260 0.98 Totals 2000 1112 Logit(p) plottd against th mtabolic markr group mid-points for th data prsntd in Tabl 1. Figur 1 Maximum liklihood stimation Maximum liklihood stimation involvs finding th valu(s) of th paramtr(s) that giv ris to th maximum liklihood. For xampl, again w shall tak th svn daths occurring out of 182 patints and us maximum liklihood stimation to stimat th probability of dath, p. Figur 3 shows th liklihood calculatd for a rang of valus of p. From th graph it can b sn that th valu of p giving th maximum liklihood is clos to 0.04. This valu is th maximum liklihood stimat (MLE) of p. Mathmatically, it can b shown that th MLE in this cas is 7/182. Proportion of daths plottd against th mtabolic markr group midpoints for th data prsntd in Tabl 1. variabls, is usually assumd to b binomial. Th probability that th numbr of daths in a sampl of siz n is xactly qual to a valu r is givn by n C r p r (1 p) n r, whr n C r = n!/(r!(n r)!) is th numbr of ways r individuals can b chosn from n and p is th probability of an individual dying. (Th probability of survival is 1 p.) For xampl, using th first row of th data in Tabl 1, th probability that svn daths occurrd out of 182 patints is givn by 182 C 7 p 7 (1 p) 175. If th probability of dath is assumd to b 0.04, thn th probability that svn daths occurrd is 182 C 7 0.047 0.86 175 = 0.152. This probability, calculatd on th assumption of a binomial distribution with paramtr p = 0.04, is calld a liklihood. In mor complicatd situations, itrativ tchniqus ar rquird to find th maximum liklihood and th associatd paramtr valus, and a computr packag is rquird. Odds Th modl logit(p) = a + bx is quivalnt to th following: p 1 p = odds of dath = = a bx or p = probability of dath = 1 + Bcaus th xplanatory variabl x incrass by on unit from x to x + 1, th odds of dath chang from a bx to a b(x + 1) = a bx b. Th odds ratio (OR) is thrfor a bx b / a bx = b. Th odds ratio b has a simplr intrprtation in th cas of a catgorical xplanatory variabl with two catgoris; in this cas it is just th odds ratio for on catgory compard with th othr. Estimats of th paramtrs a and b ar usually obtaind using a statistical packag, and th output for th data 113

Critical Car Fbruary 2005 Vol 9 No 1 Bwick t al. Figur 3 qually likly is calld th mdian ffctiv lvl (EL 50 ). Solving th quation p = 0.5 = 1 + givs x = EL 50 = a/b For th xampl data, EL 50 = 4.229/1.690 = 2.50, indicating that at this markr lvl dath or survival ar qually likly. Liklihood for a rang of valus of p. MLE, maximum liklihood stimat. summarizd in Tabl 1 is givn in Tabl 2. From th output, b = 1.690 and b OR = 5.4. This indicats that, for xampl, th odds of dath for a patint with a markr lvl of 3.0 is 5.4 tims that of a patint with markr lvl 2.0. Prdictd probabilitis Th modl can b usd to calculat th prdictd probability of dath (p) for a givn valu of th mtabolic markr. For xampl, patints with mtabolic markr lvl 2.0 and 3.0 hav th following rspctiv prdictd probabilitis of dath: ( 4.229 + 1.690 2.0) p = = 0.300 1 + ( 4.229 + 1.690 2.0) and ( 4.229 + 1.690 3.0) p = = 0.700 1 + ( 4.229 + 1.690 3.0) Th corrsponding odds of dath for ths patints ar 0.300/(1 0.300) = 0.428 and 0.700/(1 0.700) = 2.320, giving an odds ratio of 2.320/0.428 = 5.421, as abov. Th mtabolic markr lvl at which th prdictd probability quals 0.5 that is, at which th two possibl outcoms ar Assssmnt of th fittd modl Aftr stimating th cofficints, thr ar svral stps involvd in assssing th appropriatnss, adquacy and usfulnss of th modl. First, th importanc of ach of th xplanatory variabls is assssd by carrying out statistical tsts of th significanc of th cofficints. Th ovrall goodnss of fit of th modl is thn tstd. Additionally, th ability of th modl to discriminat btwn th two groups dfind by th rspons variabl is valuatd. Finally, if possibl, th modl is validatd by chcking th goodnss of fit and discrimination on a diffrnt st of data from that which was usd to dvlop th modl. Tsts and confidnc intrvals for th paramtrs Th Wald statistic Wald χ 2 statistics ar usd to tst th significanc of individual cofficints in th modl and ar calculatd as follows: cofficint 2 ( SE cofficint ) Each Wald statistic is compard with a χ 2 distribution with 1 dgr of frdom. Wald statistics ar asy to calculat but thir rliability is qustionabl, particularly for small sampls. For data that produc larg stimats of th cofficint, th standard rror is oftn inflatd, rsulting in a lowr Wald statistic, and thrfor th xplanatory variabl may b incorrctly assumd to b unimportant in th modl. Liklihood ratio tsts (s blow) ar gnrally considrd to b suprior. Th Wald tsts for th xampl data ar givn in Tabl 2. Th tst for th cofficint of th mtabolic markr indicats that th mtabolic markr contributs significantly in prdicting dath. Tabl 2 Output from a statistical packag for logistic rgrssion on th xampl data 95% CI for OR Cofficint SE Wald df P OR Lowr Uppr Markr 1.690 0.071 571.074 1 0.000 5.421 4.719 6.227 Constant 4.229 0.191 489.556 1 0.000 114 CI, confidnc intrval; df, dgrs of frdom; OR, odds ratio; SE, standard rror.

Availabl onlin http://ccforum.com/contnt/9/1/112 Th constant has no simpl practical intrprtation but is gnrally rtaind in th modl irrspctiv of its significanc. Liklihood ratio tst Th liklihood ratio tst for a particular paramtr compars th liklihood of obtaining th data whn th paramtr is zro (L 0 ) with th liklihood (L 1 ) of obtaining th data valuatd at th MLE of th paramtr. Th tst statistic is calculatd as follows: 2 ln(liklihood ratio) = 2 ln(l 0 /L 1 ) = 2 (lnl 0 lnl 1 ) It is compard with a χ 2 distribution with 1 dgr of frdom. Tabl 3 shows th liklihood ratio tst for th xampl data obtaind from a statistical packag and again indicats that th mtabolic markr contributs significantly in prdicting dath. Goodnss of fit of th modl Th goodnss of fit or calibration of a modl masurs how wll th modl dscribs th rspons variabl. Assssing goodnss of fit involvs invstigating how clos valus prdictd by th modl ar to th obsrvd valus. Whn thr is only on xplanatory variabl, as for th xampl data, it is possibl to xamin th goodnss of fit of th modl by grouping th xplanatory variabl into catgoris and comparing th obsrvd and xpctd counts in th catgoris. For xampl, for ach of th 182 patints with mtabolic markr lvl lss than on th prdictd probability of dath was calculatd using th formula ( 4.229 + 1.690 x) 1 + ( 4.229 + 1.690 x) whr x is th mtabolic markr lvl for an individual patint. This givs 182 prdictd probabilitis from which th arithmtic man was calculatd, giving a valu of 0.04. This was rpatd for all mtabolic markr lvl catgoris. Tabl 4 shows th prdictd probabilitis of dath in ach catgory and also th xpctd numbr of daths calculatd as th prdictd probability multiplid by th numbr of patints in th catgory. Th obsrvd and th xpctd numbrs of daths can b compard using a χ 2 goodnss of fit tst, providing th xpctd numbr in any catgory is not lss than 5. Th null hypothsis for th tst is that th numbrs of daths follow th logistic rgrssion modl. Th χ 2 tst statistic is givn by (obsrvd xpctd) 2 χ 2 = Σ xpctd Th tst statistic is compard with a χ 2 distribution whr th dgrs of frdom ar qual to th numbr of catgoris Tabl 3 Liklihood ratio tst for inclusion of th variabl markr in th modl Liklihood ratio Variabl tst statistic df P of th chang Markr 1145.940 1 0.000 minus th numbr of paramtrs in th logistic rgrssion modl. For th xampl data th χ 2 statistic is 2.68 with 9 2= 7 dgrs of frdom, giving P = 0.91, suggsting that th numbrs of daths ar not significantly diffrnt from thos prdictd by th modl. Th Hosmr Lmshow tst Th Hosmr Lmshow tst is a commonly usd tst for assssing th goodnss of fit of a modl and allows for any numbr of xplanatory variabls, which may b continuous or catgorical. Th tst is similar to a χ 2 goodnss of fit tst and has th advantag of partitioning th obsrvations into groups of approximatly qual siz, and thrfor thr ar lss likly to b groups with vry low obsrvd and xpctd frquncis. Th obsrvations ar groupd into dcils basd on th prdictd probabilitis. Th tst statistic is calculatd as abov using th obsrvd and xpctd counts for both th daths and survivals, and has an approximat χ 2 distribution with 8 (= 10 2) dgrs of frdom. Calibration rsults for th modl from th xampl data ar shown in Tabl 5. Th Hosmr Lmshow tst (P = 0.576) indicats that th numbrs of daths ar not significantly diffrnt from thos prdictd by th modl and that th ovrall modl fit is good. Furthr chcks can b carrid out on th fit for individual obsrvations by inspction of various typs of rsiduals (diffrncs btwn obsrvd and fittd valus). Ths can idntify whthr any obsrvations ar outlirs or hav a strong influnc on th fittd modl. For furthr dtails s, for xampl, Hosmr and Lmshow [2]. R 2 for logistic rgrssion Most statistical packags provid furthr statistics that may b usd to masur th usfulnss of th modl and that ar similar to th cofficint of dtrmination (R 2 ) in linar rgrssion [3]. Th Cox & Snll and th Naglkrk R 2 ar two such statistics. Th valus for th xampl data ar 0.44 and 0.59, rspctivly. Th maximum valu that th Cox & Snll R 2 attains is lss than 1. Th Naglkrk R 2 is an adjustd vrsion of th Cox & Snll R 2 and covrs th full rang from 0 to 1, and thrfor it is oftn prfrrd. Th R 2 statistics do not masur th goodnss of fit of th modl but indicat how usful th xplanatory variabls ar in prdicting th rspons variabl and can b rfrrd to as masurs of ffct siz. Th valu of 0.59 indicats that th modl is usful in prdicting dath. 115

Critical Car Fbruary 2005 Vol 9 No 1 Bwick t al. Tabl 4 Rlationship btwn lvl of a mtabolic markr and prdictd probability of dath Mtabolic markr Numbr Expctd numbr lvl (x) Numbr of patints Numbr of daths Proportion of daths Prdictd probability of daths 0.5 to <1.0 182 7 0.04 0.04 8.2 1.0 to <1.5 233 27 0.12 0.10 24.2 1.5 to <2.0 224 44 0.20 0.23 50.6 2.0 to <2.5 236 91 0.39 0.41 96.0 2.5 to <3.0 225 130 0.58 0.62 140.6 3.0 to <3.5 215 168 0.78 0.80 171.7 3.5 to <4.0 221 194 0.88 0.90 199.9 4.0 to <4.5 200 191 0.96 0.96 191.7 4.5 264 260 0.98 0.98 259.2 116 Tabl 5 Contingncy tabl for Hosmr Lmshow tst dath = 0 dath = 1 Obsrvd Expctd Obsrvd Expctd Total 1 191 190.731 10 10.269 201 2 182 181.006 21 21.994 203 3 154 157.131 45 41.869 199 4 130 129.905 70 70.095 200 5 90 94.206 110 105.794 200 6 64 58.726 131 136.274 195 7 31 33.495 168 165.505 199 8 24 17.611 180 186.389 204 9 8 7.985 191 191.015 199 10 1 4.204 199 195.796 200 χ 2 tst statistic = 6.642 (goodnss of fit basd on dcils of risk); dgrs of frdom = 8; P = 0.576. Discrimination Th discrimination of a modl that is, how wll th modl distinguishs patints who surviv from thos who di can b assssd using th ara undr th rcivr oprating charactristic curv (AUROC) [4]. Th valu of th AUROC is th probability that a patint who did had a highr prdictd probability than did a patint who survivd. Using a statistical packag to calculat th AUROC for th xampl data gav a valu of 0.90 (95% C.I. 0.89 to 0.91), indicating that th modl discriminats wll. Validation Whn th goodnss of fit and discrimination of a modl ar tstd using th data on which th modl was dvlopd, thy ar likly to b ovr-stimatd. If possibl, th validity of modl should b assssd by carrying out tsts of goodnss of fit and discrimination on a diffrnt data st from th original on. Logistic rgrssion with mor than on xplanatory variabl W may wish to invstigat how dath or survival of patints can b prdictd by mor than on xplanatory variabl. As an xampl, w shall us data obtaind from patints attnding an accidnt and mrgncy unit. Srum mtabolit lvls wr invstigatd as potntially usful markrs in th arly idntification of thos patints at risk for dath. Two of th mtabolic markrs rcordd wr lactat and ura. Patints wr also dividd into two ag groups: <70 yars and 70 yars. Lik ordinary rgrssion, logistic rgrssion can b xtndd to incorporat mor than on xplanatory variabl, which may b ithr quantitativ or qualitativ. Th logistic rgrssion modl can thn b writtn as follows: logit(p) = a + b 1 x 1 + b 2 x 2 + + b i x i whr p is th probability of dath and x 1, x 2 x i ar th xplanatory variabls. Th mthod of including variabls in th modl can b carrid out in a stpwis mannr going forward or backward, tsting for th significanc of inclusion or limination of th variabl at ach stag. Th tsts ar basd on th chang in liklihood rsulting from including or xcluding th variabl [2]. Backward stpwis limination was usd in th logistic rgrssion of dath/survival on lactat, ura and ag group. Th first modl fittd includd all thr variabls and th tsts for th rmoval of th variabls wr all significant as shown in Tabl 6.

Availabl onlin http://ccforum.com/contnt/9/1/112 Tabl 6 Tsts for th rmoval of th variabls for th logistic rgrssion on th accidnt and mrgncy data Chang in 2ln liklihood df P Lactat 22.100 1 0.000 Ura 9.563 1 0.002 Ag group 18.147 1 0.000 Thrfor all th variabls wr rtaind. For ths data, forward stpwis inclusion of th variabls rsultd in th sam modl, though this may not always b th cas bcaus of corrlations btwn th xplanatory variabls. Svral modls may produc qually good statistical fits for a st of data and it is thrfor important whn choosing a modl to tak account of biological or clinical considrations and not dpnd solly on statistical rsults. Th output from a statistical packag is givn in Tabl 7. Th Wald tsts also show that all thr xplanatory variabls contribut significantly to th modl. This is also sn in th confidnc intrvals for th odds ratios, non of which includ 1 [5]. From Tabl 7 th fittd modl is: logit(p) = 5.716 + (0.270 lactat) + (0.053 ura) + (1.425 ag group) Bcaus thr is mor than on xplanatory variabl in th modl, th intrprtation of th odds ratio for on variabl dpnds on th valus of othr variabls bing fixd. Th intrprtation of th odds ratio for ag group is rlativly simpl bcaus thr ar only two ag groups; th odds ratio of 4.16 indicats that, for givn lvls of lactat and ura, th odds of dath for patints in th 70 yars group is 4.16 tims that in th <70 yars group. Th odds ratio for th quantitativ variabl lactat is 1.31. This indicats that, for a givn ag group and lvl of ura, for an incras of 1 mmol/l in lactat th odds of dath ar multiplid by 1.31. Similarly, for a givn ag group and lvl of lactat, for an incras of 1 mmol/l in ura th odds of dath ar multiplid by 1.05. Th Hosmr Lmshow tst rsults (χ 2 = 7.325, 8 dgrs of frdom, P = 0.502) indicat that th goodnss of fit is satisfactory. Howvr, th Naglkrk R 2 valu was 0.17, suggsting that th modl is not vry usful in prdicting dath. Although th contribution of th thr xplanatory variabls in th prdiction of dath is statistically significant, th ffct siz is small. Th AUROC for ths data gav a valu of 0.76 ((95% C.I. 0.69 to 0.82)), indicating that th discrimination of th modl is only fair. Assumptions and limitations Th logistic transformation of th binomial probabilitis is not th only transformation availabl, but it is th asist to intrprt, and othr transformations gnrally giv similar rsults. In logistic rgrssion no assumptions ar mad about th distributions of th xplanatory variabls. Howvr, th xplanatory variabls should not b highly corrlatd with on anothr bcaus this could caus problms with stimation. Larg sampl sizs ar rquird for logistic rgrssion to provid sufficint numbrs in both catgoris of th rspons variabl. Th mor xplanatory variabls, th largr th sampl siz rquird. With small sampl sizs, th Hosmr Lmshow tst has low powr and is unlikly to dtct subtl dviations from th logistic modl. Hosmr and Lmshow rcommnd sampl sizs gratr than 400. Th choic of modl should always dpnd on biological or clinical considrations in addition to statistical rsults. Conclusion Logistic rgrssion provids a usful mans for modlling th dpndnc of a binary rspons variabl on on or mor xplanatory variabls, whr th lattr can b ithr Tabl 7 Cofficints and Wald tsts for logistic rgrssion on th accidnt and mrgncy data 95% CI for OR Cofficint SE Wald df P OR Lowr Uppr Lactat 0.270 0.060 19.910 1 0.000 1.310 1.163 1.474 Ura 0.053 0.017 9.179 1 0.002 1.054 1.019 1.091 Ag group 1.425 0.373 14.587 1 0.000 4.158 2.001 8.640 Constant 5.716 0.732 60.936 1 0.000 0.003 CI, confidnc intrval; df, dgrs of frdom; OR, odds ratio; SE, standard rror. 117

Critical Car Fbruary 2005 Vol 9 No 1 Bwick t al. catgorical or continuous. Th fit of th rsulting modl can b assssd using a numbr of mthods. Compting intrsts Th author(s) dclar that thy hav no compting intrsts. Rfrncs 1. Kirkwood BR, Strn JAC: Essntial Mdical Statistics, 2nd d. Oxford, UK: Blackwll Scinc Ltd; 2003. 2. Hosmr DW, Lmshow S: Applid Logistic Rgrssion, 2nd d. Nw York, USA: John Wily and Sons; 2000. 3. Bwick V, Chk L, Ball J: Statistics rviw 7: Corrlation and rgrssion. Crit Car 2003, 7:451-459. 4. Bwick V, Chk L, Ball J: Statistics rviw 13: Rcivr oprating charactristic (ROC) curvs. Crit Car 2004, 8:508-512. 5. Bwick V, Chk L, Ball J: Statistics rviw 11: Assssing risk. Crit Car 2004, 8:287-291. 118