Introducton to the R Statstcal Computng Envronment R Programmng John Fox McMaster Unversty ICPSR 2018 John Fox (McMaster Unversty) R Programmng ICPSR 2018 1 / 14 Programmng Bascs Topcs Functon defnton Control structures: Condtonals: f, felse, swtch Iteraton: for, whle, repeat Recurson Avodng teraton: Vectorzaton and functons nt the apply() famly Large data sets John Fox (McMaster Unversty) R Programmng ICPSR 2018 2 / 14
There are two latent classes of cases: Those for whch the response varable y s necessarly zero Those for whch the response condtonal on the predctors, the xs, s Posson dstrbuted and thus may be zero or a postve nteger The probablty π that a partcular case s n the frst (necessarly zero) latent class may be dependent upon potentally dstnct predctors, zs, accordng to a bnary logstc-regresson model: log e π 1 π = γ 0 + γ 1 z 1 + + γ p z p John Fox (McMaster Unversty) R Programmng ICPSR 2018 3 / 14 For an ndvdual n the second latent class, y follows a Posson regresson model wth log lnk, log e µ = β 0 + β 1 x 1 + + β k x k where µ = E (y ), and condtonal dstrbuton p(y x 1,..., x k ) = µy e µ y! for y = 0, 1, 2,... John Fox (McMaster Unversty) R Programmng ICPSR 2018 4 / 14
The probablty of observng a zero count for case, not knowng to whch latent class the case belongs, s therefore p(0) = Pr(y = 0) = π + (1 π )e µ and the probablty of observng a partcular nonzero count y > 0 s p(y ) = (1 π ) µy e µ y! John Fox (McMaster Unversty) R Programmng ICPSR 2018 5 / 14 The log-lkelhood for the ZIP model combnes the two components, for y = 0 and for y > 0: where log e (β, γ) = [ log e π + (1 π )e µ ] y =0 [ + log e (1 π ) µy e µ ] y 1 >0 y! β = (β 0, β 1,..., β k ) s the vector of parameters from the Posson-regresson component of the model (on whch the µ depend) γ = (γ 0, γ 1,..., γ p ) s the vector of parameters from the logstc-regresson component of the model (on whch the π depend) John Fox (McMaster Unversty) R Programmng ICPSR 2018 6 / 14
In maxmzng the lkelhood, t helps (but sn t essental) to have the gradent (vector of partal dervatves wth respect to the parameters) of the log-lkelhood. For the ZIP model the gradent s complcated: β γ = exp[ exp(x β)] exp(x β) exp(z γ) + exp[ exp(x β)]x + [y exp(x β)]x :y >0 = n =1 exp(z γ) exp(z γ) + exp[ exp(x β)]z exp(z γ) 1 + exp(z γ)z John Fox (McMaster Unversty) R Programmng ICPSR 2018 7 / 14 And the Hessan (the matrx of second-order partal dervatves, from whch the covarance matrx of the coeffcents s computed) s even more complcated (thankfully we won t need t): β β = { exp(x β) [exp(x β) 1] exp [exp(x β) + z γ] + 1 exp(2x β) exp(x β)x x :y >0 {exp [exp(x β) + z γ] + 1}2 } x x John Fox (McMaster Unversty) R Programmng ICPSR 2018 8 / 14
(Hessan contnued): γ γ β γ = n =1 = exp [exp(x β) + z γ] {exp [exp(x β) + z γ] + 1}2 z z exp(z γ) [exp(z γ) + 1]2 z z exp [x β + exp(x β) + z γ] {exp [exp(x β) + z γ] + 1}2 x z John Fox (McMaster Unversty) R Programmng ICPSR 2018 9 / 14 We can let a general-purpose optmzer do the work of maxmzng the log-lkelhood Optmzers work by evaluatng the gradent of the objectve functon (the log-lkelhood) at the current estmates of the parameters, ether numercally or analytcally They teratvely mprove the parameter estmates usng the nformaton n the gradent Iteraton ceases when the gradent s suffcently close to zero. The covarance matrx of the coeffcents s the nverse of the matrx of second dervatves of the log-lkelhood, called the Hessan, whch measures curvature of the log-lkelhood at the maxmum There s generally no advantage n usng an analytc Hessan durng optmzaton John Fox (McMaster Unversty) R Programmng ICPSR 2018 10 / 14
I ll use the optm() functon to ft the ZIP model. It takes several arguments, ncludng: par, a vector of start values for the parameters fn, the objectve functon to be mnmzed (n our case the negatve of the log-lkelhood), the frst argument of whch s the parameter vector; there may be other arguments gr (optonal), the gradent, also a functon of the parameter vector (and possbly of other arguments)... (optonal), any other arguments to be passed to fn and gr method, I ll use "BFGS" hessan, set to TRUE to return the numercal Hessan at the soluton See?optm for detals and other optonal arguments John Fox (McMaster Unversty) R Programmng ICPSR 2018 11 / 14 optm() returns a lst wth several element, ncludng: par, the values of the parameters that mnmze the objectve functon value, the value of the objectve functon at the mnmum convergence, a code ndcatng whether the optmzaton has converged: 0 means that convergence occurred hessan, a numercal approxmaton to the Hessan at the soluton Agan, see?optm for detals John Fox (McMaster Unversty) R Programmng ICPSR 2018 12 / 14
Beyond the Bascs: Object-Orented Programmng The S3 Object System Three standard object-orented programmng systems n R: S3, S4, reference classes How the S3 object system works Method dspatch, for object of class "class" : generc(object) = generc.class(object) = generc.default(object) For example, summarzng an object mod of class "lm": summary(mod) = summary.lm(mod) Objects can have more than one class, n whch case the frst applcable method s used. For example, objects produced by glm() are of class c("glm", "lm") and therefore can nhert methods from class "lm". Generc functons: generc <- functon(object, other-arguments,...) UseMethod("generc") For example, summary <- functon(object,...) UseMethod("summary") John Fox (McMaster Unversty) R Programmng ICPSR 2018 13 / 14 Beyond the Bascs: Debuggng and Proflng R Code Tools ntegrated wth the RStudo IDE: Locatng an error: traceback() Settng a breakpont and examnng the local envronment of an executng functon: browser() A smple nteractve debugger: debug() A post-mortem debugger: debugger() Measurng tme and memory usage wth system.tme (or often better, mcrobenchmark() n the mcrobenchmark package) and Rprof John Fox (McMaster Unversty) R Programmng ICPSR 2018 14 / 14