Population size estimation by means of capture-recapture with special emphasis on heterogeneity using Chao and Zelterman bounds
|
|
- Lindsey Charles
- 5 years ago
- Views:
Transcription
1 Population size estimation by means of capture-recapture with special emphasis on heterogeneity using Chao and Zelterman bounds Dankmar Böhning Quantitative Biology and Applied Statistics, School of Biological Sciences University of Reading Ege University, Izmir, June 3, 2008 prepared: June 2, 2008
2 Outline Introduction Some Applications Ways to a Realistic Solution Generalized Chao Bounds and a Monotonicity Property Zelterman Estimation Applications Outlook Zelterman as MLE Zelterman can be extended to case data Zelterman extended to higher counts Zelterman extended: truncation or censoring? Some simulation results
3 Introduction Formulation of the Problem a population has N units of which n are identified by some mechanism (trap, register, police database,...) probability of identifying an unit is (1 p 0 ) so that N = (1 p 0 )N + p 0 N = n + p 0 N and the Horvitz-Thompson estimator follows: ˆN = n 1 p 0
4 Introduction Formulation of the Problem ˆN = n 1 p 0 is fine BUT: p 0 is assumed to be known usually an estimate of p 0 is required
5 Introduction Formulation of the Problem as Frequencies of Frequencies a common setting for estimating p 0 is the Frequencies of Frequencies: the identifying mechanism provides a count Y of repeated identifications (w.r.t. to a reference period), but zero counts are not observed leading to frequencies f 1, f 2,..., f m where m is the largest observed count and f j is the frequency of units with exactly j counts
6 Introduction Formulation of the Problem as Frequencies of Frequencies we have: f 0 is not observed Recall that N = f 0 + n = f 0 + f 1 + f f m, so that ˆf 0 leads to ˆN
7 Introduction Introduction Some Applications Ways to a Realistic Solution Generalized Chao Bounds and a Monotonicity Property Zelterman Estimation Applications Outlook Zelterman as MLE Zelterman can be extended to case data Zelterman extended to higher counts Zelterman extended: truncation or censoring? Some simulation results
8 Some Applications McKendrick s Data on Cholera in India McKendrick (1926) had the following frequency of households with j cases of Cholera in an Indian village: f 0 f 1 f 2 f 3 f 4 n How many households f 0 are affected by the epidemic, but have no cases?
9 Some Applications Mathews s Data on Estimating the Dystrophin Density in the Human Muscle Cullen et al. (1990) attempted to locate dystrophin, a gene product of possible importance in muscular dystrophies, within the muscle fibres of biopsy specimens taken from normal patients Units (epitops) of Dystrophin cannot be detected by the electron microscope until they have been labelled by a suitable electron-dense substance; technique used gold-conjugated antibodies which adhere to the dystrophin
10 Some Applications Mathews s Data on Estimating the Dystrophin Density in the Human Muscle not all units are labelled and it is important to account for all labelled and unlabelled units to achieve an unbiased estimate of the dystrophin density more than one anti-body molecule may attach to a dystrophin unit; observed then is a count variable Y counting the number of antibody molecules on each dystrophin unit Y = 0 means that unit is unlabelled and not observed
11 Some Applications Mathews s Data on Estimating the Dystrophin Density in the Human Muscle the frequency distribution of the antibody count attached to a dystrophin unit: f 0 f 1 f 2 f 3 f 4 f 5 n
12 Some Applications Del Rio Vilas s Data on Estimating Hidden Scrapie in Great Britain 2005 sheep is kept in holdings in great Britain (and elsewhere) the occurrence of scrapie is monitored in the Compulsory Scrapie Flocks Scheme (CSFS) summarizing abbatoir survey, stock survey and the statutory reporting of clinical cases CSFS established since 2004 the frequency distribution of the scrapie count within each holding for the year 2005: f 0 f 1 f 2 f 3 f 4 f 5 f 6 f 7 f 8 n
13 Some Applications Hser s Data on Estimating Hidden Intravenous Drug Users in Los Angeles 1989 intravenous drug users in L.A. county were entered into the California Drug Abuse Data System (CAL-DADS) the data below refer to the frequency distribution of the episode count per drug user in 1989 the frequency distribution of the episode count per drug user for the year 1989: f 0 f 1 f 2 f 3 f 4 f 5 f 6-11,982 3,893 1,959 1, f 7 f 8 f 9 f 10 f 11 f 12 n ,198
14 Some Applications Drakos Data on Estimating Hidden Transnational Terrorist Activity data on terrorism are provided by various databases including RAND terrorism chronology, the terrorism indictment and DFI International research on terrorist organizations on 153 countries and the period terrorism is violence or the threat of violence, calculated to create an atmosphere of fear and alarm
15 Some Applications Drakos Data on Estimating Hidden Transnational Terrorist Activity the frequency distribution (Drakos 2007) of the count of transnational terrorist activity Y it in country i and year t: f 0 f 1 f 2 f 3 f 4 f 5 f f 7 f 8 f 9 f f 136 n similar to the McKendrick data there is an f 0 = 1, 357 however it is thought that there is a hidden number of terrorist activities which is of interest to be estimated estimate f 0 the frequency of periods and countries with unrecorded terrorist activites
16 Some Applications Introduction Some Applications Ways to a Realistic Solution Generalized Chao Bounds and a Monotonicity Property Zelterman Estimation Applications Outlook Zelterman as MLE Zelterman can be extended to case data Zelterman extended to higher counts Zelterman extended: truncation or censoring? Some simulation results
17 Ways to a Realistic Solution Formulation of the Problem and the Idea for its Solution Suppose we can find some model for the count probabilities p j = p j (λ) then estimate λ by some method (truncated likelihood) and then use the model for p 0 : 1 ˆN = 1 p 0 (ˆλ)
18 Ways to a Realistic Solution Formulation of the Problem and the Idea for its Solution Only to illustrate: Poisson model for the count probabilities then estimate λ and arrive at: p j = p j (λ) = exp( λ)λ j /j! ˆN = n 1 ˆp 0 = n 1 exp( ˆλ)
19 Ways to a Realistic Solution Formulation of the Problem and the Idea for its Solution However: using a simple Poisson model for the count probabilities is not appropriate, since every unit is different p j = p j (λ) = exp( λ)λ j /j! there is population heterogeneity so that more realistic p j = p j (λ) = 0 exp( t)t j /j!λ(t)dt where λ(t) stands for the heterogeneity distribution of the Poisson parameter
20 Ways to a Realistic Solution Formulation of the Problem and the Idea for its Solution instead of providing an estimate ˆλ(t) by means of nonparametric mixture models (Böhning and Schön 2005, JRSSC) interest is on two alternatives: 1. lower bound approach by Chao (1987, 1989, Biometrics) monotonicity of the ratios of consecutive mixed Poisson probabilities diagnostic device for presence of a mixed Poisson 2. robust approach of Zelterman (1988, JSPI) likelihood framework for the Zelterman estimate variance via Fisher information and covariate modelling via logistic regression (Böhning and Del Rio Vilas 2008, JABES) bias investigation
21 Ways to a Realistic Solution Introduction Some Applications Ways to a Realistic Solution Generalized Chao Bounds and a Monotonicity Property Zelterman Estimation Applications Outlook Zelterman as MLE Zelterman can be extended to case data Zelterman extended to higher counts Zelterman extended: truncation or censoring? Some simulation results
22 Generalized Chao Bounds and a Monotonicity Property Chao s Lower Bound Estimate Poisson mixture for j = 0, 1, 2,... p j = 0 exp( t)t j /j!λ(t)dt with unknown λ(t) for t > 0. Then, by the Cauchy-Schwartz inequality: E(XY ) 2 E(X 2 )E(Y 2 ) where X = exp( t) and Y = exp( t)t, and expected values are w.r.t. λ(t): ( 2 exp( t)tλ(t)dt) 0 0 exp( t)λ(t)dt 0 exp( t)t 2 λ(t)dt
23 Generalized Chao Bounds and a Monotonicity Property Chao s Lower Bound Estimate ( 2 exp( t)tλ(t)dt) 0 0 exp( t)λ(t)dt 2 p 2 1 p 0 2p 2 p2 1 2p 2 p 0 0 exp( t) t2 2 λ(t)dt which leads to Chao s lower bound estimate (truely nonparametric) ˆf 0 = f 2 1 2f 2
24 Generalized Chao Bounds and a Monotonicity Property A Monotonicity Property Cauchy-Schwartz more generally applicable use X = exp( t)t j 1 and Y = exp( t)t j+1 ): E(X 2 )E(Y 2 ) = ( E(XY ) 2 = 0 0 ) 2 exp( t)t j λ(t)dt exp( t)t j 1 λ(t)dt 0 exp( t)t j+1 λ(t)dt (j!p j ) 2 (j 1)!p j 1 (j + 1)!p j+1 p j j (j + 1) p j+1 p j 1 says: ratios of consecutive mixed Poissons are monotone p j
25 Generalized Chao Bounds and a Monotonicity Property Application of the Monotonicity Property: Ratio Plot p j plot j p j 1 against j for j = 1, 2,..., m 1 replace p j by observed frequency f j so that: f j plot j f j 1 against j for j = 1, 2,..., m 1 monotonicity indicative for a mixture (heterogeneity) conceptually related to the Poisson plot (Hoaglin 1980 American Statistician, Gart 1970, Rao 1971) difference: Poisson plot looks for a horizontal line, the ratio plot looks for a monotone increasing pattern
26 Generalized Chao Bounds and a Monotonicity Property 5 Ratio Plot for Matthews-Data 4 ratio j 3 4
27 Generalized Chao Bounds and a Monotonicity Property 14 Ratio Plot for Scrapie-Data ratio j 5 6 7
28 Generalized Chao Bounds and a Monotonicity Property Ratio Plot for Drug User Data ratio j
29 Generalized Chao Bounds and a Monotonicity Property 12 Ratio Plot for Terrorist Activity Data 10 8 ratio j
30 Generalized Chao Bounds and a Monotonicity Property Conclusions from the Ratio Plot frequently, we find in count data sets evidence for heterogeneity in form of a mixture concept applicable for both: zero-truncated and untruncated count data (normalizing constant cancels out!)
31 Generalized Chao Bounds and a Monotonicity Property Introduction Some Applications Ways to a Realistic Solution Generalized Chao Bounds and a Monotonicity Property Zelterman Estimation Applications Outlook Zelterman as MLE Zelterman can be extended to case data Zelterman extended to higher counts Zelterman extended: truncation or censoring? Some simulation results
32 Zelterman Estimation The Idea of Zelterman (1988) he noted that leading to the proposal and in particular for j = 1 λ = λj+1 λ j = (j + 1) λj+1 /(j + 1)! λ j /j! Po(j + 1; λ) λ = (j + 1) Po(j; λ) ˆλ j = (j + 1) f j+1 f j ˆλ = ˆλ 1 = 2 f 2 f 1
33 Zelterman Estimation ˆλ = 2 f 2 f1 is robust in the sense that it is not affected by any changes in counts larger than 2 count distribution need only to behave like a Poisson for counts of 1 or 2
34 Zelterman Estimation frequently: evidence for a mixture (Scrapie data) frequency Distribution Observed Frequency Fitted Poisson Mixture Fitted Simple Poisson count
35 Zelterman Estimation frequently: evidence for a 2-component mixture model Example McKendrick Matthews Scrapie Drug Use L.A. terrorist activity Non-parametric mixture model homogeneity 2-component 2-component 3-component 6-component
36 Zelterman Estimation Population Size Estimates for the Examples Example n simple MLE Chao Zelterman McKendrick Matthews Scrapie Drug Use L.A. 20,198 26,425 38,637 42,268 terrorist activity ,144 1,429
37 Zelterman Estimation Bias for Zelterman in a 2-component mixture model assume that for j = 0, 1, 2,... p j = (1 p)po(j; λ) + ppo(j; µ) bias of is determined by bias in ˆp 0 ˆN = n 1 ˆp 0
38 Zelterman Estimation Zelterman and non-parametric mixture Example n Chao Zelterman NPMLE of mixture McKendrick (1) Matthews (2) Scrapie (2) Drug Use L.A. 20,198 38,637 42,268 39,173 (2) 56,836 (3) terrorist activity 785 1,144 1,429 -
39 Zelterman Estimation Bias for Zelterman in a 2-component mixture model for Zelterman: ˆp 0 = exp( ˆλ) = exp( 2 f 2 f 1 ) replacing frequencies by expected values bias of ˆp 0 E(ˆp 0 ) exp( 2 p 2 p 1 ) E(ˆp 0 ) p 0 exp( 2 p 2 p 1 ) [(1 p)e λ + pe µ ]
40 Zelterman Estimation Bias for Zelterman in a 2-component mixture model bias of ˆp 0 ( exp 2 p ) 2 [(1 p)e λ + pe µ ] p 1 = exp ( (1 p)λ2 e λ + pµ 2 e µ ) (1 p)λe λ + pµe µ [(1 p)e λ + pe µ ] µ e λ (1 p)e λ = pe λ small amount of contamination = small bias
41 Zelterman Estimation Bias for Zelterman in a 2-component mixture model bias of ˆp 0 for large µ pe λ > 0 Zelterman overestimates (upper bound) small amount of contamination = small bias
42 Zelterman Estimation For Comparison: Bias of simple MLE in a 2-component mixture model bias of ˆp 0 = exp( Ȳ ) 0 by Jensen s inequality e [(1 p)λ+pµ] [(1 p)e λ + pe µ ] so that simple homogeneity model underestimates for all mixture models and for large µ bias of ˆp 0 = (1 p)e λ small amount of contamination = large bias
43 Zelterman Estimation next graph shows: bias of Zelterman ˆp 0 = pe λ bias of simple MLE ˆp 0 = (1 p)e λ
44 Zelterman Estimation Asymptotic Bias Variable Zelterman simple MLE p
45 Zelterman Estimation next graphs show: exact bias of Zelterman ˆp 0 exp ( (1 p)λ2 e λ + pµ 2 e µ ) (1 p)λe λ + pµe µ [(1 p)e λ + pe µ ] exact bias of simple MLE ˆp 0 e [(1 p)λ+pµ] [(1 p)e λ + pe µ ]
46 Zelterman Estimation Zelterman Bias Bias under Homogenous Poisson μ λ μ λ p = 0.05
47 Zelterman Estimation Zelterman Bias Bias under Homogenous Poisson λ μ μ λ p = 0.5
48 Zelterman Estimation Introduction Some Applications Ways to a Realistic Solution Generalized Chao Bounds and a Monotonicity Property Zelterman Estimation Applications Outlook Zelterman as MLE Zelterman can be extended to case data Zelterman extended to higher counts Zelterman extended: truncation or censoring? Some simulation results
49 Applications Zelterman larger than Chao? ˆN Z = n 1 exp( ˆλ) = n + n exp(ˆλ) 1 n + n 1 + ˆλ ˆλ 2 1 = n + n ˆλ ˆλ 2 = n + n 2f 2 f ( 2f2 f 1 ) 2 = n + ( ) f 2 n + 1 = 2f ˆN C 2 yes, if ˆλ is small (Böhning and Brittain 2008) ( ) f 2 1 n 2f 2 f 1 + f 2
50 Applications Zelterman larger than Chao? f 2 f 1 n f 1 +f 2 Example n Chao Zelterman McKendrick Matthews Scrapie Drug Use L.A. 20,198 38,637 42, terrorist activity 785 1,144 1,
51 Outlook Zelterman as MLE Zelterman Estimation offers Flexibility Zelterman estimate truncates all counts different from 1 or 2: write 1 p = p 1 = exp( λ)λ exp( λ)λ + exp( λ)λ 2 /2 = λ/2 exp( λ)λ 2 /2 p = p 2 = exp( λ)λ + exp( λ)λ 2 /2 = λ/2 1 + λ/2 and consider associated binomial log-likelihood f 1 log(p 1 ) + f 2 log(p 2 ) = f 1 log(1 p) + f 2 log(p) which is maximized for ˆp = ˆp 2 = f 1 f 1 +f 2, or ˆλ = 2ˆp 2 1 ˆp 2 = 2f 2 f 1
52 Outlook Zelterman as MLE Zelterman Estimation offers Flexibility a likelihood framework offers generalizations: (correct) variance estimate of the Zelterman estimator (Fisher information) (Böhning 2008, Statistical Methodology) extension of the estimator for case data incorporation of covariates (binomial logistic regression with log-link function to the Poisson parameter) (Böhning and van der Heijden 2008 Ann. Appl. Statist.) efficiency
53 Outlook Zelterman can be extended to case data Zelterman Estimation: Extension to Case Data Table: Illustration of Case Data with Individual Recapture Counts Unit i Count y i δ i Sex i Age i Male Male Female Male Female Female
54 Outlook Zelterman can be extended to case data Zelterman Estimation offers Flexibility Binomial likelihood for grouped data f 1 log(1 p) + f 2 log(p) becomes for case data (1 δ i ) log(1 p) + δ i log(p) i which becomes with covariate information on case i a logistic regression model p i = exp(βt x i ) 1 + exp(β T x i )
55 Outlook Zelterman can be extended to case data Zelterman Estimation offers Flexibility covariate information on case i p i = exp(βt x i ) 1 + exp(β T x i ) compare with parameterization in capture probability λ it follows that p i = λ i/2 1 + λ i /2 λ i = 2 exp(β T x i ) and the generalization of the Horvitz-Thompson estimator is n i=1 1 1 exp( 2e βt x i )
56 Outlook Zelterman extended to higher counts Generalizing the Idea of Zelterman: Improving Efficiency not only but also Po(j + 1; λ) λ = (j + 1) Po(j; λ) 1 {( }}{ j ) i=1 λ = λ λi j = i=1 λi j i=1 (i + 1) λi+1 (i+1)! j i=1 λi /i! = j i=1 (i + 1)Po(i + 1; λ) j i=1 Po(i; λ)
57 Outlook Zelterman extended to higher counts λ = leads to the proposal j i=1 (i + 1)Po(i + 1; λ) j i=1 Po(i; λ) ˆλ j = j i=1 (i + 1)f i+1 j i=1 f i and in particular for j = 1, j = 2, j = 3, j = 4 ˆλ = ˆλ 1 = 2 f 2 f 1, ˆλ 2 = 2f 2 + 3f 3 f 1 + f 2, ˆλ 3 = 2f 2 + 3f 3 + 4f 4 f 1 + f 2 + f 3 ˆλ 4 = 2f 2 + 3f 3 + 4f 4 + 5f 5 f 1 + f 2 + f 3 + f 4 n Z i = 1 exp( ˆλ i )
58 Outlook Zelterman extended: truncation or censoring? Generalizing the idea of Zelterman: truncation or censoring? disadvantage of conventional Zelterman: uses only f 1 and f 2 idea of truncation: ignore all counts different from 1 and 2 idea of censoring: use marginal likelihood for all counts of 2 and larger p 1 = P(Y = 1) = p 2+ = P(Y > 1) = exp( λ) 1 exp( λ) λ exp( λ) 1 exp( λ) [λ2 /2! + λ 3 /3! +...] = 1 exp( λ) 1 exp( λ) λ
59 Outlook Zelterman extended: truncation or censoring? Generalizing the idea of Zelterman: truncation or censoring? leads to the binomial likelihood f 1 log(p 1 ) + f 2+ log(p 2+ ) since we have leads to f 1 /n = ˆp 1 = f 1 /n exp( λ) 1 exp( λ) λ = 1 exp(λ) 1 λ 1 λ + λ 2 /2 λ ˆλ C = 2(n f 1) f 1
60 Outlook Zelterman extended: truncation or censoring? Bias reduced estimator ˆλ C = 2(n f 1) f 1 = 2f 2+2f f 1 is biased since equate E(ˆλ C ) 2λ2 /2 + 2λ 3 / λ ˆλ C = λ + λ 2 /3 λ + λ 2 /3 and solve for λ to provide ˆλ C bias = ˆλ C denote Z C bias = n 1 exp( λ C bias )
61 Outlook Some simulation results Simulation Experiment Goal: Compare Z j = n 1 exp( ˆλ j ) N C = n + f 1 2 n 2f 2 as well as Z C = corrected version and Chao s estimator 1 exp( ˆλ C ) and its bias count data: f j arise from 0.5Po(j; 0.5) + 0.5Po(j; µ) for µ = 1, 2,..., 7 and j = 0, 1, 2,... population size: N = f 0 + f = 100 f 0 is truncated N estimated using Z 1, Z 2, Z 3, Z C, Z C Bias and N C
62 Outlook Some simulation results Bias Estimator Chao Z_C Z_C_Bias Z1 Z2 Z3 Mean lambda 4 5
63 Outlook Some simulation results Standard Deviation Estimator Chao Z_C Z_C_Bias Z1 Z2 Z3 SD lambda 4 5
64 Outlook Some simulation results RMSE Estimator Chao Z_C Z_C_Bias Z1 Z2 Z3 RMSE lambda 4 5
65 Outlook Some simulation results Main Conclusions Generalized Chao Bounds lead to a diagnostic device for presence of heterogeneity Zelterman Estimation appears appropriate since it offers flexible likelihood concept Further generalizations of Zelterman estimation (covariate modelling and increasing efficiency by including higher counts) easily possible
Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1
Estimation September 24, 2018 STAT 151 Class 6 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1
More informationCapture recapture estimation based upon the geometric distribution allowing for heterogeneity
Capture recapture estimation based upon the geometric distribution allowing for heterogeneity Sa-aat Niwitpong, Dankmar Böhning, Peter G. M. van der Heijden & Heinz Holling Metrika International Journal
More informationMixture Models for Capture- Recapture Data
Mixture Models for Capture- Recapture Data Dankmar Böhning Invited Lecture at Mixture Models between Theory and Applications Rome, September 13, 2002 How many cases n in a population? Registry identifies
More informationOne-Source Capture-Recapture: Models, applications and software
One-Source Capture-Recapture: Models, applications and software Maarten Cruyff, Guus Cruts, Peter G.M. van der Heijden, * Utrecht University Trimbos ISI 2011 1 Outline 1. One-source data 2. Models and
More informationCapture Recapture Methods for Estimating the Size of a Population: Dealing with Variable Capture Probabilities
18 Capture Recapture Methods for Estimating the Size of a Population: Dealing with Variable Capture Probabilities Louis-Paul Rivest and Sophie Baillargeon Université Laval, Québec, QC 18.1 Estimating Abundance
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationOther Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model
Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);
More informationNonparametric estimation of the number of zeros in truncated count distributions
Nonparametric estimation of the number of zeros in truncated count distributions Célestin C. KOKONENDJI University of Franche-Comté, France Laboratoire de Mathématiques de Besançon - UMR 6623 CNRS-UFC
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationFully general Chao and Zelterman estimators with application to a whale shark population
Appl. Statist. (2018) 67, Part 1, pp. 217 229 Fully general Chao and Zelterman estimators with application to a whale shark population Alessio Farcomeni Sapienza University of Rome, Italy [Received May
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationYou know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?
You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David
More informationLecture 3. Truncation, length-bias and prevalence sampling
Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in
More informationTwo-stage Adaptive Randomization for Delayed Response in Clinical Trials
Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Guosheng Yin Department of Statistics and Actuarial Science The University of Hong Kong Joint work with J. Xu PSI and RSS Journal
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The
More informationStatistics 135 Fall 2008 Final Exam
Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationt x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.
Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae
More informationLecture 5: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationCapture Recapture to Estimate Criminal Populations. Synonyms
Capture Recapture to Estimate Criminal Populations 267 C Brower AM, Carroll L (2007) Spatial and temporal aspects of alcohol-related crime in a college town. J Am Coll Health 55:267 75 Cohen L, Felson
More informationApproximation of Survival Function by Taylor Series for General Partly Interval Censored Data
Malaysian Journal of Mathematical Sciences 11(3): 33 315 (217) MALAYSIAN JOURNAL OF MATHEMATICAL SCIENCES Journal homepage: http://einspem.upm.edu.my/journal Approximation of Survival Function by Taylor
More informationA class of latent marginal models for capture-recapture data with continuous covariates
A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit
More information11 Survival Analysis and Empirical Likelihood
11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with
More informationWEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION
WEIGHTED LIKELIHOOD NEGATIVE BINOMIAL REGRESSION Michael Amiguet 1, Alfio Marazzi 1, Victor Yohai 2 1 - University of Lausanne, Institute for Social and Preventive Medicine, Lausanne, Switzerland 2 - University
More informationSTATISTICS SYLLABUS UNIT I
STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationarxiv: v1 [stat.ap] 27 Jul 2011
The Annals of Applied Statistics 2011, Vol. 5, No. 2B, 1512 1533 DOI: 10.1214/10-AOAS436 c Institute of Mathematical Statistics, 2011 POPULATION SIZE ESTIMATION BASED UPON RATIOS OF RECAPTURE PROBABILITIES
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 248 Application of Time-to-Event Methods in the Assessment of Safety in Clinical Trials Kelly
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationEconomics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation
Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 9: Asymptotics III(MLE) 1 / 20 Jensen
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationLecture 4: Generalized Linear Mixed Models
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 An example with one random effect An example with two nested random effects
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationEmpirical Likelihood Methods for Sample Survey Data: An Overview
AUSTRIAN JOURNAL OF STATISTICS Volume 35 (2006), Number 2&3, 191 196 Empirical Likelihood Methods for Sample Survey Data: An Overview J. N. K. Rao Carleton University, Ottawa, Canada Abstract: The use
More informationMeasurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007
Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationLecture 5 Models and methods for recurrent event data
Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.
More informationIndirect Sampling in Case of Asymmetrical Link Structures
Indirect Sampling in Case of Asymmetrical Link Structures Torsten Harms Abstract Estimation in case of indirect sampling as introduced by Huang (1984) and Ernst (1989) and developed by Lavalle (1995) Deville
More informationImprecise Prior for Imprecise Inference on Poisson Sampling Model
Imprecise Prior for Imprecise Inference on Poisson Sampling Model A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Doctor
More informationPackage SPECIES. R topics documented: April 23, Type Package. Title Statistical package for species richness estimation. Version 1.
Package SPECIES April 23, 2011 Type Package Title Statistical package for species richness estimation Version 1.0 Date 2010-01-24 Author Ji-Ping Wang, Maintainer Ji-Ping Wang
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationLecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL
Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL The Cox PH model: λ(t Z) = λ 0 (t) exp(β Z). How do we estimate the survival probability, S z (t) = S(t Z) = P (T > t Z), for an individual with covariates
More informationEstimating the number of opiate users in Rotterdam using statistical models for incomplete count data *
Estimating the number of opiate users in Rotterdam using statistical models for incomplete count data * December 1997 Filip Smit 1, Jaap Toet 2, Peter van der Heijden 3 1 Methods Section, Trimbos Institute,
More informationEstimating direct effects in cohort and case-control studies
Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research
More informationCRISP: Capture-Recapture Interactive Simulation Package
CRISP: Capture-Recapture Interactive Simulation Package George Volichenko Carnegie Mellon University Pittsburgh, PA gvoliche@andrew.cmu.edu December 17, 2012 Contents 1 Executive Summary 1 2 Introduction
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationTargeted Maximum Likelihood Estimation in Safety Analysis
Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35
More informationEstimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk
Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationA hidden semi-markov model for the occurrences of water pipe bursts
A hidden semi-markov model for the occurrences of water pipe bursts T. Economou 1, T.C. Bailey 1 and Z. Kapelan 1 1 School of Engineering, Computer Science and Mathematics, University of Exeter, Harrison
More informationConstrained estimation for binary and survival data
Constrained estimation for binary and survival data Jeremy M. G. Taylor Yong Seok Park John D. Kalbfleisch Biostatistics, University of Michigan May, 2010 () Constrained estimation May, 2010 1 / 43 Outline
More informationFrailty Models and Copulas: Similarities and Differences
Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationGeneral Regression Model
Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationA NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL
Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl
More informationMulti-level Models: Idea
Review of 140.656 Review Introduction to multi-level models The two-stage normal-normal model Two-stage linear models with random effects Three-stage linear models Two-stage logistic regression with random
More informationPairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion
Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationVariations. ECE 6540, Lecture 10 Maximum Likelihood Estimation
Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More information2 Inference for Multinomial Distribution
Markov Chain Monte Carlo Methods Part III: Statistical Concepts By K.B.Athreya, Mohan Delampady and T.Krishnan 1 Introduction In parts I and II of this series it was shown how Markov chain Monte Carlo
More informationLogistic regression: Miscellaneous topics
Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio
More informationLecture 2: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationTo appear in The American Statistician vol. 61 (2007) pp
How Can the Score Test Be Inconsistent? David A Freedman ABSTRACT: The score test can be inconsistent because at the MLE under the null hypothesis the observed information matrix generates negative variance
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationChapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be
Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables
More informationCensoring mechanisms
Censoring mechanisms Patrick Breheny September 3 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Fixed vs. random censoring In the previous lecture, we derived the contribution to the likelihood
More informationReview of Maximum Likelihood Estimators
Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationStatistical Methods for Alzheimer s Disease Studies
Statistical Methods for Alzheimer s Disease Studies Rebecca A. Betensky, Ph.D. Department of Biostatistics, Harvard T.H. Chan School of Public Health July 19, 2016 1/37 OUTLINE 1 Statistical collaborations
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #27 Estimation-I Today, I will introduce the problem of
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationUNIVERSITY OF CALIFORNIA, SAN DIEGO
UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department
More informationCompare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method
Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School
More informationAnalysis of competing risks data and simulation of data following predened subdistribution hazards
Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More information45 6 = ( )/6! = 20 6 = selected subsets: 6 = and 10
Homework 1. Selected Solutions, 9/12/03. Ch. 2, # 34, p. 74. The most natural sample space S is the set of unordered -tuples, i.e., the set of -element subsets of the list of 45 workers (labelled 1,...,
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationGENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University
GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationApplication of Latent Class with Random Effects Models to Longitudinal Data. Ken Beath Macquarie University
Application of Latent Class with Random Effects Models to Longitudinal Data Ken Beath Macquarie University Introduction Latent trajectory is a method of classifying subjects based on longitudinal data
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview
Introduction to Empirical Processes and Semiparametric Inference Lecture 01: Introduction and Overview Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationThe Accelerated Failure Time Model Under Biased. Sampling
The Accelerated Failure Time Model Under Biased Sampling Micha Mandel and Ya akov Ritov Department of Statistics, The Hebrew University of Jerusalem, Israel July 13, 2009 Abstract Chen (2009, Biometrics)
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationLikelihood Construction, Inference for Parametric Survival Distributions
Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make
More informationβ j = coefficient of x j in the model; β = ( β1, β2,
Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)
More informationON THE FAILURE RATE ESTIMATION OF THE INVERSE GAUSSIAN DISTRIBUTION
ON THE FAILURE RATE ESTIMATION OF THE INVERSE GAUSSIAN DISTRIBUTION ZHENLINYANGandRONNIET.C.LEE Department of Statistics and Applied Probability, National University of Singapore, 3 Science Drive 2, Singapore
More informationFederated analyses. technical, statistical and human challenges
Federated analyses technical, statistical and human challenges Bénédicte Delcoigne, Statistician, PhD Department of Medicine (Solna), Unit of Clinical Epidemiology, Karolinska Institutet What is it? When
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationSTATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS
STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS David A Binder and Georgia R Roberts Methodology Branch, Statistics Canada, Ottawa, ON, Canada K1A 0T6 KEY WORDS: Design-based properties, Informative sampling,
More informationNon-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety
Non-parametric bootstrap and small area estimation to mitigate bias in crowdsourced data Simulation study and application to perceived safety David Buil-Gil, Reka Solymosi Centre for Criminology and Criminal
More information