An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions

Size: px
Start display at page:

Download "An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions"

Transcription

1 bimj header will be provided by the publisher An Age-Stratified Poisson Model for Comparing Trends in Cancer Rates Across Overlapping Regions Yi Li, Ram C. Tiwari 2, and Zhaohui Zou 3 Harvard University, 44 Binney Street, Boston, MA 025, USA 2 National Cancer Institute, Bethesda, MD Information Management Services, Silver Spring, MD Received 5 November 2007, revised 28 February 2008, accepted 25 April 2008 Published online Summary The annual percent change (APC) has been used as a measure to describe the trend in the age-adjusted cancer incidence or mortality rate over relatively short time intervals. The yearly data on these age-adjusted rates are available from the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The traditional methods to estimate the APC is to fit a linear regression of logarithm of age-adjusted rates on time using the least squares method or the weighted least squares method, and use the estimate of the slope parameter to define the APC as the percent change in the rates between two consecutive years. For comparing the APC for two regions, one uses a t-test which assumes that the two datasets on the logarithm of the age-adjusted rates are independent and normally distributed with a common variance. Two modifications of this test, when there is an overlap between the two regions or between the time intervals for the two datasets have been recently developed. The first modification relaxes the assumption of the independence of the two datasets but still assumes the common variance. The second modification relaxes the assumption of the common variance also, but assumes that the variances of the age-adjusted rates are obtained using Poisson distributions for the mortality or incidence counts. In this paper, a unified approach to the problem of estimating the APC is undertaken by modeling the counts to follow an age-stratified Poisson regression model, and by deriving a corrected Z-test for testing the equality of two APCs. A simulation study is carried out to assess the performance of the test and an application of the test to compare the trends, for a selected number of cancer sites, for two overlapping regions and with varied degree of overlapping time intervals is presented. Key words: Age-adjusted incidence/mortality rates, age-stratified Poisson Regression, annual percent change (APC), surveillance, trends, hypothesis testing. Introduction The American Cancer Society (ACS) in its annual publication Cancer ACS 2007 ( reports that in 2007 about.5 million new cancer cases are expected to be diagnosed, and approximately 560,000 Americans are expected to die of cancer. Cancer is the most common cause of death in US, exceeded only by heart disease, and accounts for of every 4 deaths. The same report also reveals that, for a number of cancer sites (such as breast, stomach, colon and rectum, lung and bronchus and leukemia), the age-adjusted cancer mortality rates have been steadly decreasing in recent years. In addition, the National Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute (NCI) periodically publishes similar reports on trends of cancer incidence at see Ries et al. (2003) So much has been at stake in terms of human life and cost - for example, the government agencies such as the National Health of Institutes (NIH), and many private sectors spend billions of dollars every year on cancer research, health insurance and medical and other costs - that there is an urgent need for new methods that produce more accurate and reliable estimates of measures of cancer trends. Corresponding author: yili@jimmy.harvard.edu, Phone: , Fax:

2 2 Li, Tiwari, and Zou: Age-Stratified Poisson Model for Cancer Surveillance The annual percent change (APC) has been used as a measure of cancer trends over short time periods, and to compare the recent cancer trends by gender or by geographic regions, one compares their APC values using the two-sample pooled t-test ( Kleinbaum et al., 988) that assumes that the datasets on ageadjusted rates under the study are independent. However, a fundamental statistical difficulty arises when such comparisons, largely for policy making purposes, have to be made for regions or time intervals that overlap, e.g. comparing the most recent changes in trends of cancer rates in a local area (e.g. the mortality rate of breast cancer in California) with a more global level (i.e. the national mortality rate) over two overlapping time periods. For example, as detailed in the data analysis section, it is of substantial interest to compare the changes in California cancer mortality rates with the national cancer mortality rates in the last 5 years. Recently, Li and Tiwari (2007) and Li et al. (2007) developed Z-tests which adjust for the dependence between the two APCs, and are more efficient than the naive test which assumes independence. However, these tests are based on the logarithmic transformation of the age-adjusted rates, and fits a simple linear regression model of the transformed data on time using either the ordinary least squares (OLS) or the other weighted least squares (WLS) procedures. The proposed test procedure is based on the natural assumption that the age-specific mortality or incidence counts are results of underlying Poisson processes (Brillinger, 986), and hence are realizations of independent Poisson random variables. The age-specific instantaneous hazards are modeled by a log-link function, thus leading to an age-stratified Poisson model. The estimation of the parameters is then carried out using a likelihood-based approach. The rest of the paper is organized as follows. In Section 2, we briefly review the existing tests, and derive the new test in Section 3. To compare the performance of the proposed test with respect to the above mentioned tests, a simulation study is carried out in Section 4. In this section, we also give application to breast cancer mortality data from California (CA) and the US extracted from the SEER*STAT software of the SEER Program. Section 5 ends this paper with a short discussion. 2 A Brief Review of Existing Tests Consider two regions, and let d kji denote the number of counts (deaths or new cancer cases) from the population at risk n kji observed in Region k (k, 2) in age-group j (j,..., J) and at times T,..., T m for Region and T s+,..., T s+n for Region 2, where T T s+ < T m T s+n, with 0 s < m leading to overlapping time intervals. Note that this formulation is general and allows one region to have fewer time points than the other. In the SEER program, it is common to choose n kji (at year T i ) to be the mid-year population representing the total person-years in one year, with the assumption of drop-outs being uniform over the unit-intervals. The age-adjusted rates are defined as r ki J j w j d kji n kji, where w j > 0, j,..., J, are the known standards for the age group j so that J j w j. For the SEER analysis, there are J 9 standard age-groups consisting of 0-, -4, 5-9,..., 85+, and w j are chosen to be the year 2000 population standards (Fay et al. 2006). Let y ki log(r ki ), be the logarithmic transformations of the age-adjusted rates. Consider the linear regression models y ki β 0k + β k t ki + e ki, i,...,, () for k, 2, flagging Regions and 2, respectively. Here e ki are random errors with mean 0, and t ki corresponds to the calendar times of data collection in region k with I m and I 2 n. More specifically, (t,..., t I ) (T,..., T m ), while (t 2,..., t 2I2 ) (T s+,..., T s+n ). For the two regions, the

3 bimj header will be provided by the publisher 3 annual percent change (APC) are defined as AP C k 00(e β k ). 00β k, for a small β k, e.g. in the order of 0 2 (Kim et al., 2000; Fay et al., 2006; Tiwari et al., 2006). Then under the assumptions that e ki are independent and have common variance σ 2, the two-sample pooled t-test (Kleinbaum et al., 988) for testing the null hypothesis H 0 : AP C AP C 2 versus the alternative H a : AP C AP C 2 is given by ˆβ T t ˆβ 2 ( ˆσ 2 ( I (t i t ) 2 ) + ( ) t (I +I 2 4), (2) I 2 (t 2i t 2 ) 2 ) where t k t ki / for k, 2, and ˆσ 2 is the pooled unbiased estimate of σ 2 given by I ˆσ 2 (y i ŷ i ) 2 + I 2 (y 2i ŷ 2i ) 2, I + I 2 4 where ŷ ki ˆβ 0k + ˆβ k t ki are the predictions for k, 2. Here, ˆβ 0k and ˆβ k are obtained from the least squares estimation. That is, Ik ˆβ k (t ki t k )(y ki ȳ k ) ik, ˆβ0k ȳ k (t ki t ˆβ k t k k ) 2 where ȳ k y ki /. The above test is not appropriate, however, when there is an overlap between the two regions or the two time periods. For this case, Li and Tiwari (2007) proposed the following corrected Z-test ˆβ Z CT ˆβ 2 ( {ˆσ 2 σ 2 + σ2 2 2σ 2 σ 2 σ 2 2 )} /2, (3) (n (O) ) 2 where σk 2 (t ki t k ) 2, σ 2 m s+ (T i t )(T i t 2 ), n k m J is+ j n kji for k, 2, n (O) m J is+ j n(o) ji and and n (O) ji are the numbers of at-risk population in the overlapping region. Note that there is no suffix k in n (O) ji. The sign of σ 2 determines, whether the covariance between ˆβ and ˆβ 2 is positive or negative, and when there is no overlap in time intervals, σ 2 0. Under the lognormal model, the corrected Z CT test was shown to follow a standard normal distribution under the null hypothesis, and to be more efficient than the pooled t-test; see Li and Tiwari (2007). However, one assumption in Li and Tiwari (2007) is the equal variance in both regression models, which may not be realistic, especially for rare cancers. A further refinement has been made to derive the variance of y ki by using the Poisson assumptions on the first two moments of the counts d kji, i.e. E(d kji ) var(d kji ). Under these assumptions, the consistent estimate of the error variance of e ki is given by vki 2 J rki 2 j w2 d kji j, leading to the following weighted least squares test proposed by Li et n 2 kji al. (2007), referred to as Z W LS : β Z W LS β 2 { σ 2 + σ σ 2 σ 2 σ 2 2 with σ 2 m (T i t ) 2 /v 2 i and σ2 2 s+n is+ (T i t 2 ) 2 /v 2 2i, σ 2 m is+ (T i t )(T i t 2 ) v(o) 2i v 2 i v2 2i, } /2. (4) (n (O) ) 2

4 4 Li, Tiwari, and Zou: Age-Stratified Poisson Model for Cancer Surveillance where and d (o) ji v (o) 2i r i r 2i J j w 2 j d (o) ji (n (o) ji )2, are the counts in the overlapping region and during the overlapping period. Here, m t T i/vi 2 m, t 2 /v2 i s+n is+ T i/v2i 2 s+n, is+ /v2 2i and β, β 2 are weighted least square estimates of β, β 2. Under the null hypothesis, because of the normal approximation, Z W LS approximately follows a standard normal distribution. The Z W LS has been shown to be more conservative than Z CT in retaining the size of the test, but is more powerful for the common cancer sites; see Li et al. (2007). However, there are several disadvantages of the existing methods. First, one key step of Li et al. (2007) is the normal approximation of the age-adjusted rates. Secondly, both Li et al. (2007) and Li and Tiwari (2007) need adjustments for zero counts. 3 Age-stratified Poisson Regression Model As the existing approaches to dealing with age-adjusted cancer rates were all based on the normal approximation, we take a more natural route in the sequel by considering the Poisson nature of the underlying count data and propose an age-stratified Poisson regression to describe the change trend of incident (or death) counts on time. Based on this model, a proper test that accounts for overlapping is proposed. Specifically, since d kji, the number counts (deaths or new cancer cases) observed in Region k (k, 2) in age-group j, is a count, we assume that with d kji ind P ois(n kji λ kji ), log λ kji β 0kj + β k t ki, (5) which is referred to as the Age-stratified Poisson Regression Model as the age-specific intercept β 0kj is assumed for age-group j. The common slope β k is of particular importance as it transcribes the trends of mortality or incidence and, in particular, determines the APC value. Again let AP C and AP C 2 be the corresponding APC values for these two Poisson regressions. A natural test for the null hypothesis H 0 : AP C AP C 2 versus the alternative hypothesis H : AP C AP C 2 would be ˆβ Z P OIS ˆβ 2 V ar{ ˆβ ˆβ, 2 } where ˆβ and ˆβ 2 are the maximum likelihood estimates of β and β 2 derived in the Appendix. Because of the possible overlapping of Regions and 2, ˆβ and ˆβ 2 may be correlated. Thus the key to the derivation of the test lies in a correct evaluation of Cov( ˆβ, ˆβ 2 ).

5 bimj header will be provided by the publisher 5 3. Derivation of the Test To proceed, we let β k (β k, β 0k,..., β 0kJ ), k, 2, whose estimates ˆβ k can be obtained by solving the score equations [based on (5)] U k (β k ) 0, for k, 2, where U k (U k,, U 0k,,..., U 0k,J ) and U k, U 0k,j j J n kji t ki exp(β 0kj + β k t ki ) n kji exp(β 0kj + β k t ki ) d kji, j J d kji t ki, for j,..., J. As U k ( ˆβ k ) 0, expanding it around the true value β k, and ignoring the higher order terms yields 0 U k ( ˆβ k ) U k (β k ) + { ( )} U () k Ik Ik I (β k) Ik ˆβk β k + o p (), k where U () k U k (β k )/ β k are (J +) (J +) matrices with its st row as (U () k,, U () k,0,..., U () k,0j ) and the (j + ) th row (j,..., J) as (U () 0k,j, U () 0k,0j,..., U () 0k,0jJ ), for k, 2. Here U () k, U k, β k U () k,0j U k, β 0kj U () 0k,j U 0k,j β k U () 0k,0jj U 0k,j β 0kj j δ jj J n kji t 2 ki exp(β 0kj + β k t ki ), n kji t ki exp(β 0kj + β k t ki ), n kji t ki exp(β 0kj + β k t ki ), n kji exp(β 0kj + β k t ki ), and δ jj if j j and 0 otherwise. Denote by A k plim Ik U () k (β k)/ for k, 2, where plim denotes the limit in probability. Then for large I ( m) and I 2 ( n), standard probabilistic arguments yield { I ( ˆβ β ), { } I 2 ( ˆβ d 2 β 2 )} U (β ), A U 2 (β 2 ). I I2 A 2 Here, d denotes approximate equivalence in joint distribution functions. Hence, V ar( m( ˆβ β ), m( ˆβ β )) V ar( n( ˆβ 2 β 2 ), n( ˆβ 2 β 2 ) ) Cov( m( ˆβ β ), n( ˆβ 2 β 2 ) ). A m Cov(U (β ), U (β ))A T,. A 2 n Cov(U 2(β 2 ), U 2 (β 2 ))A T 2,. A Cov(U (β ), U 2 (β 2 ))A T mn 2.

6 6 Li, Tiwari, and Zou: Age-Stratified Poisson Model for Cancer Surveillance Let V m Cov(U (β ), U (β )), W n Cov(U 2(β 2 ), U 2 (β 2 )), Σ mn Cov(U (β ), U 2 (β 2 )), and ˆV, Ŵ, ˆΣ be the corresponding estimates, whose derivations are given in the Appendix. Then, Ĉov( ˆβ, ˆβ ) m{u () ( ˆβ )} ˆV {U () ( ˆβ )} T ; Ĉov( ˆβ 2, ˆβ 2 ) n{u () 2 ( ˆβ 2 )} Ŵ {U () 2 ( ˆβ 2 )} T ; Ĉov( ˆβ, ˆβ 2 ) mn{u () ( ˆβ )} ˆΣ{U () 2 ( ˆβ 2 )} T. These are three (J +) (J +) matrices, the (, ) entries of which are ˆσ 2 ˆV ar( ˆβ ), ˆσ 2 2 ˆV ar( ˆβ 2 ) and ˆσ 2 Ĉov( ˆβ, ˆβ 2 ), respectively. From this we compute ˆV ar( ˆβ ˆβ 2 ) ˆV ar( ˆβ )+ ˆV ar( ˆβ 2 ) 2Ĉov( ˆβ, ˆβ 2 ). Hence, the Z-test for comparing APC values is given by Z P OIS ˆβ ˆβ 2 (ˆσ 2 + ˆσ2 2 2ˆσ 2 )/2, (6) which follows the standard normal distribution under H 0 : β β 2. The computation of ˆβ, ˆβ 2 is given in the Appendix. 3.2 ARE Comparison with the WLS test It is of substantial interest to evaluate the gains in efficiency of the proposed test compared with the WLS test. First note (5) implies that E(r ki ) E( j w j d kji n kji ) ( j w je β 0kj )e β k Ti. This in turn implies that E(log r ki ). log E(r ki ) log( j w j e β 0kj ) + β k t ki, (7) when V ar(r ki ) is small, which is often the case for the cancer incidence and mortality data (Kim et al., 2000). A comparison between () and (7) reveals that models () and (5) approximately specify the same first moment of the age-adjusted cancer rates, making it possible to compare the efficiency of the tests based on these two models via the measure of the Pitman Asymptotic Relative Efficiency. Specifically, standard asymptotic analysis will yield ˆβ ˆβ 2 N(β β 2, ˆσ 2 + ˆσ 2 2 2ˆσ 2 ), while, for the WLS estimates, β β 2 N(β β 2, σ 2 + σ σ 2 σ 2 σ 2 2 (n (O) ) 2 n n 2 ). Hence, the Pitman Asymptotic Relative Efficiency (ARE) comparing tests (6) and (4), which is the ratio of the noncentralities of the above two normal distributions, is given by ARE σ 2 + σ σ 2 σ 2 ˆσ 2 + ˆσ2 2 2ˆσ 2 σ 2 2 (n (O) ) 2 n n 2 The evaluation of (8) typically involves numerical computations.. (8)

7 bimj header will be provided by the publisher 7 4 SEER mortality data analysis and Simulations Li et al. (2007) demonstrated that Z W LS performs better than Z CT, via the calculation of ARE, as Z CT relies on the common variance assumption, which may not be realistic. Hence, we focus this paper on comparing Z P OIS with Z W LS. To comprehensively evaluate these tests, we consider several scenarios of overlap in two regions and in two different time intervals. Specifically, we assume that Region consists of Georgia (GA), South Carolina (SC), and North Carolina (NC), and that Region 2 consists of NC, Virginia (VA) and Maryland (MD); with NC as the overlapping state between the two regions. The three different time intervals, with varying degree of overlap in the intervals, are taken to be : (a) [980,989] for Region, and [990,999] for Region 2 so that there is no overlap between the two time intervals and, hence, σ 2 0; (b) [980,989] for Region, and [983,992] for Region 2 so that there a considerable overlap of six years between the two intervals and σ ; (c) [980,989] for Region, and [987, 996] for Region 2 so that there is a little overlap of three years between the two intervals and σ The counts, d kji were generated based on model (5) with t ki taking values in the intervals corresponding to the two regions stated above. More specifically, the t i take values of {0,,..., 9}, while the t 2i take values of {0,..., 9}, {3,..., 2}, and {7,..., 6}, respectively for cases (a)-(c). In order to fully specify λ kji in (5), we assume that β 0kj log(d kj /n kj ) β k t k, where d kj and n kj are respectively the observed number of deaths and the number of at-risk population at the beginning of the intervals considered, and take β k log(ap C k /00 + ), based on the specified values of AP C k. The age-specific counts for the overlapping state, NC, are generated from Poisson distributions with means, n (o) ji 2 (λ ji + λ 2ji ), where n (o) ji denotes the at-risk population in the overlapping region. When λ ji λ 2ji, this reduces to the situation specified by the null hypothesis. The number of at risk population and the observed number of deaths were obtained from the SEER database for all malignant male cancers and prostate cancer. The values of APC were assumed to range from -0.3% to 3.0%. For each parameter configuration, a total of 000 simulated data were obtained. The results for the three time-overlapping cases are summarized in Tables -3. We remark that that, even though both Z W LS and Z P OIS are derived under different model assumptions, they are both valid tests for testing the equality of two APCs and hence the ARE defined in (8) is valid. The tables show that the ARE of Z P OIS with respect to Z W LS is greater than for all the three cases, meaning that Z P OIS would be more powerful than Z W LS when the alternative hypothesis is true. The tables also show that in most situations Z P OIS outperforms the Z W LS in retaining the Type I error probabilities and, hence, yields a more valid test. Also the powers of both WLS and Poisson-based tests are sensitive to the delta values (the differences of APC values). The larger the delta values are, the more powerful the tests are. The larger delta values also lead to slightly larger AREs, though the differences are not so obvious. It is of substantial interest to compare the changes in cancer mortality rates in California with the national levels starting late 980 s as a California law (Health and Safety Code, Section 03885) was passed then, which mandated the reporting of malignancies diagnosed throughout the state. For this purpose, we applied the proposed methodology to compare the annual percent change (APC) in the age-adjusted mortality rates for the United States (US) for the period from to that of California (CA) for the period from 990 to We fitted the weighted linear models as well as the age-stratified Poisson model, and applied both tests to compare the age-adjusted mortality rates of female breast cancer in CA for the 6-year period from to that of US for the 6-year period from , for which the national mortality data were available. The observed values of the log-transformed annual age-adjusted rates and fitted regression lines from the Z P OIS test procedure are plotted in Figure. The parameter estimates and the values of the test statistics are summarized in Table 4. The results indicated the mortality rates of Breast cancer for California and the US have decreased. Both tests reject the null hypothesis of equality of the two APCs, indicating that the annual percent change (APC) of California, is significantly different from the national level. However, the p-value for Z P OIS is much smaller than that of Z W LS, rendering more evidence again the null hypothesis.

8 8 Li, Tiwari, and Zou: Age-Stratified Poisson Model for Cancer Surveillance 5 Discussion In this paper, we have considered an important problem where comparisons have to be made for regions or time intervals that overlap. As opposed to the existing work, e.g Li and Tiwari (2007) and Li et al. (2007), this project advances this area in two distinct ways. First, the developed test does not rely on the normal approximation of the cancer rate, but directly model the counts to follow a Poisson regression model. The parameters are then estimated using their maximum likelihood estimates, and the Z-test is derived for testing the equality of two APCs. Secondly, the developed Poisson regression model can easily accommodate 0 count data (for the rare cancers), as opposed to the log normal model (Li and Tiwari, 2007; Li et al., 2007) which needs to involve extra zero-corrected adjustment. We have applied the developed methodology to the analysis of the major cancer sites from the SEER Program and have found that the corrected Z-test renders more power than the existing tests. A Bayesian Poisson regression would be a useful approach. However, choice of priors is always difficult and computation may not be so straightforward compared to this current work, wherein analytical solutions have been derived. Hence, we envision that the proposed method would be preferable because of simple interpretation of the model parameters, natural choice of the model and computational readiness. In our technical development, we have modelled the logarithm transformation of the age-adjusted rates as a linear regression on time in (5) and have indeed explicitly assumed parallelism across age groups. That is, the growth curves of the cancer rates for various age groups share the same slope, which carries the information for the APC. Indeed, linearity parallelism for the cancer rates could be a debatable issue in cancer surveillance, which is likely to be violated for some cancers. One alternative, along the line of generalized mixed models, is to assume a random slope (as opposed to a constant slope) across age groups. This ongoing work will be reported in a subsequent communication. Acknowledgements The authors thank the editor, an AE and two referees for their insightful suggestion, which led to a better version of this manuscript. References American Cancer Society (2007) Cancer Facts & Figures. Atlanta, Georgia. Brillinger, D.R. (986). The natural variability of vital rates and associated statistics (with discussion). Biometrics 42, Fay, M., Tiwari, R., Feuer, E. and Zou, Z. (2006). Estimating Average Annual Percent Change for Disease Rates without Assuming Constant Change. Biometrics 62, Kim, H., Fay, M., Feuer, E., Midthune, D. (2000) Permutation tests for joinpoint regression with applications to cancer rates. Statistics in Medicine 9, Kleinbaum, D., Kupper, and Muller, P. (988). Applied Regression Analysis and Other Multivariable Methods. PWS-Kent, Boston, Mass., 2nd edition. Li, Y. and Tiwari, R. (2007). Comparing Trends in Age-Adjusted Cancer Rates Across Overlapping Regions or Time Intervals for the NCI SEER Program. Technical Report, Li, Y., Tiwari, R., Walters, K. and Zou, Z. (2007) A Weighted-Least-Squares Estimation Approach to Comparing Trends in Age-Adjusted Cancer Rates Across Overlapping Regions. Technical Report, yili/apc2.pdf Ries, L., Eisner, M., Kosary, C., Hankey, B., Miller, B., Clegg, L., Mariotto, A., Feuer, E. and Edwards, B. (2003). SEER Cancer Statistics Review, , National Cancer Institute. Bethesda, MD,

9 bimj header will be provided by the publisher 9 Tiwari, R., Clegg, L. and Zou, Z. (2006). Efficient interval estimation for age-adjusted cancer rates. Statistical Methods in Medical Research 5, Write where Appendix A: Derivation of ˆV, Ŵ, ˆΣ V ( V V 2 V 2 V 22 ) V m V ar(u,) m ( ) ( ) W W ; W 2 Σ Σ W ; Σ 2 (J+) (J+) 2 W 22 Σ (J+) (J+) 2 Σ 22 (J+) (J+) m j J Ti 2 V ar(d ji ) m Hence a consistent estimate is ˆV m m J j T 2 i d ji. Similarly, ˆV 2 ( ˆV 2,,..., ˆV 2,J ) m j J Ti 2 E(d ji ). where ˆV 2,j Ĉov(U,, U 0,j ) m m T i d ji. Also, V 22 ((V 22,jj )) with ˆV 22,jj Ĉov(U 0,j, U 0,j ) 0, j j ; m m d ji, j j. Next compute the estimate of W: W n V ar(u 2,) n s+n s+n is+ j J Ti 2 V ar(d 2ji ) n s+n is+ j J Ti 2 E(d 2ji ) so that Ŵ J n is+ j T i 2d 2ji Similarly, Ŵ2 (Ŵ2,,..., Ŵ2,J ) where Ŵ2,j Ĉov(U 2,, U 02,j ) n n is+ T id 2ji. Also, Ŵ 22 ((Ŵ22,jj )) with Ŵ22,jj 0, j j ; s+n n is+ d 2ji, j j Finally, the estimate of Σ is computed as follows: Σ Cov(U,, U 2, ) mn m J Cov T i d ji, mn j m J Cov mn j s+n T i ( d (NO) ji is+ j J T i d 2ij ) + d (O) ji, s+n J is+ j T i ( d (NO) 2ji ) + d (O) ji where d kji d (NO) jki +d (O) ji and the superscripts NO and O denote the non-overlapping and overlapping regions, respectively. Thus, ˆΣ m J mn is+ j T i 2d(O) ji. Let ˆΣ 2 (ˆΣ2,,..., ˆΣ ) 2,J where ˆΣ 2,j m mn is+ T id (O) ji. Also, ˆΣ 22 ((ˆΣ 22,jj )) with ˆΣ 22,j,j 0, j j m ; mn is+ d(o) ji, j j.

10 0 Li, Tiwari, and Zou: Age-Stratified Poisson Model for Cancer Surveillance Appendix B: Computation of the MLEs Note that the MLEs of β 0jk and β k satisfy: e β 0jk i d kji ; i eβ k Ti n kji Ũ(β k ) ( ) i n kji T d jki i e β k Ti d j,i i eβ k Ti n kji T i kji j,i ( i d kji T ) ie β k Ti kj. d j i eβ k Ti n k.i T i kji i j d kj. [A kj ()A kj (0)] i d k.i T i 0; k, 2, where d kj. i d kji, d k.i j d kji, and A kj (a) i n kji T a i eβ k Ti for a 0,, 2. Since Ũ(β k ) is a monotonic function of β k, there is a unique solution to this equation. Let ˆβ k be the solution. This is the MLE of β k. We can use a Newton-Raphson method to obtain ˆβ k as follows. Let ˆβ k (l) be the estimate at the l th iteration, then the estimate at the (l + ) th iteration is given by ˆβ k (l + ) ˆβ k (l) [Ũ () ( ˆβ k (l))] Ũ( ˆβ k (l)), where Ũ () (β) is the first (partial) derivative of Ũ(b) with respect to b evaluated at b β. We stop iterating wh en ˆβ k (l + ) ˆβ k (l) < ε for some prespecified value of ε. Note that Ũ () (β k ) j,i j n jki T 2 i e β k Ti ( i d kji i eβ k Ti n kji ) ( ) ( i n kji t2 i eβ k ti i kji) d i eβ k ti n kji j,i j n jki t i e β k ti i n kji t ie β k ti ( 2 i eβ k ti n kji) ( i n kji t ie β k ti ) 2 ( i eβ k ti n kji) 2 j j ( i d n ) kji t2 i eβ k ti kj. i eβ k ti n kji j d kj. [A kj (2)(A kj (0)) ] j ( i n kji t ie β k ti ) 2 ( i eβ k ti n kji) 2 [(A kj ()) 2 (A kj (0)) ], k, 2. Substituting the MLE ˆβ k in place for β k in e β 0kj i d kji i eβ k t i gives the MLE ˆβ ( ) n 0kj log i d kji ( kji ) log i e ˆβ k t i n kji.

11 bimj header will be provided by the publisher Table Comparison of the power functions under various hypotheses between the Poisson-based test Z P OIS and the weighted-least-squares based test Z W LS for two overlapping regions over disjoint time intervals, Region ( ) vs Region 2 ( ). APC and APC2 are the annual percent changes in Regions and 2, respectively. Cancer Sites AP C AP C 2 ARE Z W LS Z P OIS All Malignant Prostate Table 2 Comparison of the power functions between the Poisson based Z P OIS and the weighted-least-squares based Z W LS for two overlapping regions over roughly the same time intervals, Region ( ) vs Region 2 ( ). APC and APC2 are the annual percent changes in Regions and 2, respectively. Cancer Sites AP C AP C 2 ARE Z W LS Z P OIS All Malignant Prostate

12 2 Li, Tiwari, and Zou: Age-Stratified Poisson Model for Cancer Surveillance Table 3 Comparison of the power functions between the Poisson based Z P OIS and the weighted-least-squares based Z W LS for two overlapping regions over slightly overlapping time intervals, Region ( ) vs Region 2 ( ). APC and APC2 are the annual percent changes in Regions and 2, respectively. Cancer Sites AP C AP C 2 ARE Z W LS Z P OIS All Malignant Prostate Table 4 Comparing APC of Breast Cancer Mortality Between CA and the US CA US APC W LS SE W LS Z W LS (p ) APC P OIS SE P OIS Z P OIS (p )

13 bimj header will be provided by the publisher 3 Fig. Observed and fitted log-transformed age-adjusted breast cancer mortality rates in CA [ ] and US [ ]

Harvard University. Harvard University Biostatistics Working Paper Series. Survival Analysis with Change Point Hazard Functions

Harvard University. Harvard University Biostatistics Working Paper Series. Survival Analysis with Change Point Hazard Functions Harvard University Harvard University Biostatistics Working Paper Series Year 2006 Paper 40 Survival Analysis with Change Point Hazard Functions Melody S. Goodman Yi Li Ram C. Tiwari Harvard University,

More information

RESEARCH ARTICLE. Detecting Multiple Change Points in Piecewise Constant Hazard Functions

RESEARCH ARTICLE. Detecting Multiple Change Points in Piecewise Constant Hazard Functions Journal of Applied Statistics Vol. 00, No. 00, Month 200x, 1 12 RESEARCH ARTICLE Detecting Multiple Change Points in Piecewise Constant Hazard Functions Melody S. Goodman a, Yi Li b and Ram C. Tiwari c

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Estimating joinpoints in continuous time scale for multiple change-point models

Estimating joinpoints in continuous time scale for multiple change-point models Computational Statistics & Data Analysis 5 (2007) 2420 2427 wwwelseviercom/locate/csda Estimating joinpoints in continuous time scale for multiple change-point models Binbing Yu a,, Michael J Barrett a,

More information

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Robust Bayesian Variable Selection for Modeling Mean Medical Costs Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution

More information

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley

Zellner s Seemingly Unrelated Regressions Model. James L. Powell Department of Economics University of California, Berkeley Zellner s Seemingly Unrelated Regressions Model James L. Powell Department of Economics University of California, Berkeley Overview The seemingly unrelated regressions (SUR) model, proposed by Zellner,

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Multivariate spatial modeling

Multivariate spatial modeling Multivariate spatial modeling Point-referenced spatial data often come as multivariate measurements at each location Chapter 7: Multivariate Spatial Modeling p. 1/21 Multivariate spatial modeling Point-referenced

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON3150/ECON4150 Introductory Econometrics Date of exam: Wednesday, May 15, 013 Grades are given: June 6, 013 Time for exam: :30 p.m. 5:30 p.m. The problem

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Parametric Empirical Bayes Methods for Microarrays

Parametric Empirical Bayes Methods for Microarrays Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression Rebecca Barter April 20, 2015 Fisher s Exact Test Fisher s Exact Test

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects

Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects Early Detection of a Change in Poisson Rate After Accounting For Population Size Effects School of Industrial and Systems Engineering, Georgia Institute of Technology, 765 Ferst Drive NW, Atlanta, GA 30332-0205,

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May 5-7 2008 Peter Schlattmann Institut für Biometrie und Klinische Epidemiologie

More information

A spatial scan statistic for multinomial data

A spatial scan statistic for multinomial data A spatial scan statistic for multinomial data Inkyung Jung 1,, Martin Kulldorff 2 and Otukei John Richard 3 1 Department of Epidemiology and Biostatistics University of Texas Health Science Center at San

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation Daniel B Rowe Division of Biostatistics Medical College of Wisconsin Technical Report 40 November 00 Division of Biostatistics

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 517 529 http://dx.doi.org/10.5351/csam.2016.23.6.517 Print ISSN 2287-7843 / Online ISSN 2383-4757 A comparison of inverse transform

More information

Negative Multinomial Model and Cancer. Incidence

Negative Multinomial Model and Cancer. Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence S. Lahiri & Sunil K. Dhar Department of Mathematical Sciences, CAMS New Jersey Institute of Technology, Newar,

More information

Two-Variable Regression Model: The Problem of Estimation

Two-Variable Regression Model: The Problem of Estimation Two-Variable Regression Model: The Problem of Estimation Introducing the Ordinary Least Squares Estimator Jamie Monogan University of Georgia Intermediate Political Methodology Jamie Monogan (UGA) Two-Variable

More information

Chapter 20: Logistic regression for binary response variables

Chapter 20: Logistic regression for binary response variables Chapter 20: Logistic regression for binary response variables In 1846, the Donner and Reed families left Illinois for California by covered wagon (87 people, 20 wagons). They attempted a new and untried

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

4. Nonlinear regression functions

4. Nonlinear regression functions 4. Nonlinear regression functions Up to now: Population regression function was assumed to be linear The slope(s) of the population regression function is (are) constant The effect on Y of a unit-change

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data

Person-Time Data. Incidence. Cumulative Incidence: Example. Cumulative Incidence. Person-Time Data. Person-Time Data Person-Time Data CF Jeff Lin, MD., PhD. Incidence 1. Cumulative incidence (incidence proportion) 2. Incidence density (incidence rate) December 14, 2005 c Jeff Lin, MD., PhD. c Jeff Lin, MD., PhD. Person-Time

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

A Bivariate Weibull Regression Model

A Bivariate Weibull Regression Model c Heldermann Verlag Economic Quality Control ISSN 0940-5151 Vol 20 (2005), No. 1, 1 A Bivariate Weibull Regression Model David D. Hanagal Abstract: In this paper, we propose a new bivariate Weibull regression

More information

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016

Lecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016 Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family

More information

Announcing a Course Sequence: March 9 th - 10 th, March 11 th, and March 12 th - 13 th 2015 Historic Charleston, South Carolina

Announcing a Course Sequence: March 9 th - 10 th, March 11 th, and March 12 th - 13 th 2015 Historic Charleston, South Carolina Announcing a Course Sequence: Introduction to Bayesian Disease Mapping (IBDM) BDM with INLA (BDMI) Advanced Bayesian Disease Mapping (ABDM) March 9 th - 10 th, March 11 th, and March 12 th - 13 th 2015

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section: Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,

More information

The STS Surgeon Composite Technical Appendix

The STS Surgeon Composite Technical Appendix The STS Surgeon Composite Technical Appendix Overview Surgeon-specific risk-adjusted operative operative mortality and major complication rates were estimated using a bivariate random-effects logistic

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping

Optimising Group Sequential Designs. Decision Theory, Dynamic Programming. and Optimal Stopping : Decision Theory, Dynamic Programming and Optimal Stopping Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj InSPiRe Conference on Methodology

More information

A nonparametric spatial scan statistic for continuous data

A nonparametric spatial scan statistic for continuous data DOI 10.1186/s12942-015-0024-6 METHODOLOGY Open Access A nonparametric spatial scan statistic for continuous data Inkyung Jung * and Ho Jin Cho Abstract Background: Spatial scan statistics are widely used

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection

Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Biometrical Journal 42 (2000) 1, 59±69 Confidence Intervals of the Simple Difference between the Proportions of a Primary Infection and a Secondary Infection, Given the Primary Infection Kung-Jong Lui

More information

This paper has been submitted for consideration for publication in Biometrics

This paper has been submitted for consideration for publication in Biometrics BIOMETRICS, 1 10 Supplementary material for Control with Pseudo-Gatekeeping Based on a Possibly Data Driven er of the Hypotheses A. Farcomeni Department of Public Health and Infectious Diseases Sapienza

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

PASS Sample Size Software. Poisson Regression

PASS Sample Size Software. Poisson Regression Chapter 870 Introduction Poisson regression is used when the dependent variable is a count. Following the results of Signorini (99), this procedure calculates power and sample size for testing the hypothesis

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series

Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series Division of Pharmacoepidemiology And Pharmacoeconomics Technical Report Series Year: 2013 #006 The Expected Value of Information in Prospective Drug Safety Monitoring Jessica M. Franklin a, Amanda R. Patrick

More information

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley

Models, Testing, and Correction of Heteroskedasticity. James L. Powell Department of Economics University of California, Berkeley Models, Testing, and Correction of Heteroskedasticity James L. Powell Department of Economics University of California, Berkeley Aitken s GLS and Weighted LS The Generalized Classical Regression Model

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing

F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing F3: Classical normal linear rgression model distribution, interval estimation and hypothesis testing Feng Li Department of Statistics, Stockholm University What we have learned last time... 1 Estimating

More information

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Poisson Regression. Ryan Godwin. ECON University of Manitoba Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression

More information

Non-linear panel data modeling

Non-linear panel data modeling Non-linear panel data modeling Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini May 2010 Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University

Lecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk

More information

Contextual Effects in Modeling for Small Domains

Contextual Effects in Modeling for Small Domains University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2011 Contextual Effects in

More information

DETECTION AND ATTRIBUTION

DETECTION AND ATTRIBUTION DETECTION AND ATTRIBUTION Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, NC 27599-3260, USA rls@email.unc.edu 41st Winter Conference in Statistics

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Meta-analysis of epidemiological dose-response studies

Meta-analysis of epidemiological dose-response studies Meta-analysis of epidemiological dose-response studies Nicola Orsini 2nd Italian Stata Users Group meeting October 10-11, 2005 Institute Environmental Medicine, Karolinska Institutet Rino Bellocco Dept.

More information

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Li wan Chen, LENDIS Corporation, McLean, VA Forrest Council, Highway Safety Research Center,

More information

ECON 3150/4150, Spring term Lecture 7

ECON 3150/4150, Spring term Lecture 7 ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23 References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23 Omitted

More information

Weighted Least Squares I

Weighted Least Squares I Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Sample-weighted semiparametric estimates of cause-specific cumulative incidence using left-/interval censored data from electronic health records

Sample-weighted semiparametric estimates of cause-specific cumulative incidence using left-/interval censored data from electronic health records 1 / 22 Sample-weighted semiparametric estimates of cause-specific cumulative incidence using left-/interval censored data from electronic health records Noorie Hyun, Hormuzd A. Katki, Barry I. Graubard

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit

2011/04 LEUKAEMIA IN WALES Welsh Cancer Intelligence and Surveillance Unit 2011/04 LEUKAEMIA IN WALES 1994-2008 Welsh Cancer Intelligence and Surveillance Unit Table of Contents 1 Definitions and Statistical Methods... 2 2 Results 7 2.1 Leukaemia....... 7 2.2 Acute Lymphoblastic

More information

Extended Follow-Up and Spatial Analysis of the American Cancer Society Study Linking Particulate Air Pollution and Mortality

Extended Follow-Up and Spatial Analysis of the American Cancer Society Study Linking Particulate Air Pollution and Mortality Extended Follow-Up and Spatial Analysis of the American Cancer Society Study Linking Particulate Air Pollution and Mortality Daniel Krewski, Michael Jerrett, Richard T Burnett, Renjun Ma, Edward Hughes,

More information

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64

More information