GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University
|
|
- Madeline Carter
- 5 years ago
- Views:
Transcription
1 GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist University Advertisement: Measurement Error in Nonlinear Models R. Carroll, D. Ruppert, L. Stefanski Chapman & Hall, 1995 SAS and Splus programs for common GLIM s are at
2 OUTLINE The overheads for this talk are available from the web if you have Adobe Acrobat 3.0 and higher. The main talk is at: newmexico.losalamos.talks.directory/mixedmodels/nantuc01.pdf The plots are called nantucket.framingham.plot01.ps nantucket.simex.plot02.ps nantucket.framingham.plot02.ps nantucket.simex.plot03.ps nantucket.plot01.ps nantucket.simex.plot04.ps nantucket.plot02.ps nantucket.plot03.ps nantucket.simex.plot01.ps
3 OUTLINE Generalized Linear Mixed Models (GLMM) Generalized Linear Mixed Measurement Error Models (GLMMeM) are GLMM s with the wrong mean and variance structure Bias analysis: Ordinary regression model Globally independent covariates Correlated covariates within a cluster Surprising effects of cluster size on biases Functional vs. Structural Models Functional: SIMEX and regression calibration Structural: MLE Functional tests for variance components using SIMEX Example
4 THE DATA In what follows, there will be i =1,..., m clusters. Within each cluster, there are n i observations. The data are structured: Y =(Y 1,..., Y n ) = responses within a cluster X =(X 1,..., X n ) = error prone predictors within a cluster Z =(Z 1,..., Z n ) = exactly measured predictors within a cluster W =(W 1,..., W n ) = measured version of X within a cluster When I talk about asymptotics, it will be as the number of clusters gets large, for a fixed number of observations within a cluster.
5 THE MODELS We consider a generalized linear mixed model (GLMM) with the linear part within a cluster given by g(µ) =β 0 +Xβ x +Zβ z +Cb b = Normal {0, D(θ)} GLMMeM is a GLMM with an unobservable fixed effect W = X + U, cov(u) =Σ uu U = Normal(0, Σ uu ) U (Z, Y, C, b, X) The estimation and inference methods are not restricted to additive errors Bias analysis is more detailed and uses the additive structure.
6 THE MODELS Suppose that [X Z] follows a normal linear model Then by the usual calculations, X = Γ 0 + Γ z Z + Γ w W + e e = Normal(0, Σ x zw ) The original model is g(µ) =β 0 +Xβ x +Zβ z +Cb The observed data also follow a GLMM, but with a more complex mean and variance structure: g(µ) =β 0 +(Γ w W)β x +(Γ z Zβ x +Zβ z )+C b C =(C,I) ( ) b b = eβ x Ignoring measurement errors means you may misspecify the structure of the fixed and random effects.
7 EXAMPLES: Ordinary GLIM s In the case of a single observation per cluster, no variance component, we have the usual GLIM. Loosely speaking, for estimating the slope in X, β x, the effects of ignoring measurement error are the same in linear, logistic and Poisson regression, namely attenuation Effectively, with no Z, one estimates var(x) var(x) + var(measurement error) β x In GLMM, we have four additional factors: variance component, which has to be estimated cluster size (surprisingly important) covariance structure of X: are they correlated within a cluster? covariance structure of errors U: are they correlated within a cluster? We will address the first three points.
8 EXAMPLES Suppose that there is no Z (i.e., covariates measured exactly), and no cluster effects in X and W: X = Normal(0,σ 2 xi) U= Normal(0,σ 2 ui) Note that the X s are independent even within a cluster, hence fully exchangeable, etc. We call this the homogeneous case. Within a cluster, b Normal(0,θ), and the model is g(µ) =β 0 +Xβ x +bj The observed data follow g(µ) =β 0 +W(λβ x )+bj+eβ x e= Normal(0,λσ 2 ui) λ= reliability = σ 2 x/(σ 2 x + σ 2 u) Note the change in the error structure
9 EXAMPLES In the homogeneous case, we obtain the following results if one ignores measurement error. In linear regression: β x : estimates λβ x, λ = reliability. θ: consistently estimated In logistic regression (using the probit approximation): β x : λβ x /τ, τ = 1+λσ 2 uβ 2 x/2.9. θ: θ/τ 2 In Poisson regression: β x : λβ x θ: a detailed and nontrivial analysis is required. Ignoring error with cluster sizes of size n estimates θ + log Note that in Poisson regression: { (n 1) + exp(β 2 x σx) 2 } (n 1) + exp(λβxσ 2 x) 2 Bias depends on the cluster size θ is overestimated
10 EXAMPLES WITH CLUSTER CORRELATIONS Suppose that there is no Z (i.e., covariates measured exactly), and that there are cluster effects in X and W. Thus, the clusters have a random mean with variance σxµ 2 and within each cluster, the X s have variance σx: 2 X = Normal(0,σ 2 xi+σ 2 xµj) Now we have that for clusters of size n, with within cluster mean W E(X W) =λw+(1 λ)(1 λ)w J cov(x W) =λσ 2 ui +(σ 2 u/n)(1 λ)(1 λ)j λ = σ 2 x + σ 2 u σ 2 x + σ2 u + nσ2 xµ Difficult structure. Note dependence on n.
11 EXAMPLES WITH CLUSTER CORRELATIONS As n, one can show that one gets the same results as in the homogeneous model, if one replaces θ there by θ +(1 λ) 2 βxσ 2 xµ. 2 Overestimating of the variance component in linear and Poisson case, typically also in logistic model. For fixed cluster size n, a detailed analysis yields exact formulae to determine bias in the linear case. No exact formulae in the probit or logistic cases. We used numerical and Monte Carlo integration (both) Surprisingly strong effects of cluster size, both for β x and for θ.
12 FUNCTIONAL AND STRUCTURAL INFERENCE Functional and structural approaches differ in what they assume about the X s. A structural approach would typically assume that within a cluster, the X s are independent and normal, with the cluster means themselves being normally distributed. Current attempts in the GLIM literature try to weaken the normality assumption, e.g., hierarchical models, mixtures of normals,... A functional approach tries to make no assumptions about distributions of the X s. Model robustness gained at the potential cost of loss of efficiency. New versions include NPMLE (Nonparametric mle). We have implemented one functional and one structural method.
13 FUNCTIONAL METHODS Most common functional method: regression calibration. The idea is to replace X within a cluster by its best linear prediction based on (W, Z). Works reasonably well for estimating fixed effects Does not work to estimate random effect variances. A computationally intensive alternative is called SIMEX Due to Cook & Stefanski (1994) Theory and standard errors described in the book. The idea is to add increasing but known amounts of error onto W via SIMulation, fit the data using a method which ignores measurement error, trace the fits out, fit a function to the trace, and then EXtrapolate back to the no error case. Here is the method defined via graphs.
14 SIMEX To cut down on simulation variability, instead of adding on error once, add it on many (100) times and use the average or median for the given amount of error variance. There is no single function to fit and then extrapolate. The safe default is a quadratic function. There is a theory of exact extrapolants, but it is wildly difficult to implement in this context In general, the SIMEX estimates are not consistent, but they are approximately so. Essentially exact, to order O(σu), 6 for small error. Because we have a good fast algorithm for it, we used CPQL of Breslow & Lin as the basic estimation method (which ignores measurement error).
15 SIMEX ASYMPTOTIC THEORY The book derives a general asymptotic theory and computable standard errors under two general conditions Very simple estimates if the error variance is known. QVF has a bootstrap: very fast implementation The true parameters are required to be in the interior of the parameter space The estimate being used ignoring measurement error must be an M estimator synonym: solution to an estimating equation this is the case for the mle, CPQL, etc. We showed that the SIMEX estimates are themselves solutions to computable estimating equations (not obvious, but rather nice!) Thus, in the interior of the parameter space, we have computable asymptotic standard errors as the number of clusters gets large.
16 SIMEX INFERENCE On the boundary, special techniques required. testing whether a variance component equals zero. There are a number of score tests for variance components, recently reviewed by Xihong Lin in a technical report. We focus on the global hypothesis: no variance components exist The score test statistic is of the form S = U T I 1 U I = estimated covariance of U under the hypothesis U = average of independent r.v. s with estimated parameters For the random intercept model, U is an overdispersion statistic, based on averages across clusters of squares of weighted within cluster residuals.
17 SIMEX SCORE TESTS U = average of independent r.v. s with estimated parameters The trick is simple. Our general theory is based on statistics which are equivalent to averages of independent r.v. s with estimated parameters. But U is just such a statistic! Thus, we can use SIMEX to estimate U simex, what U would be if there were no measurement error Merely need an estimate of the variance of U simex, which is what our asymptotic theory provides anyway, call this I simex The SIMEX score test is S simex = U T simexi 1 simex U simex
18 SIMEX SCORE TESTS We simulated data closely related to the example discussed below, and computed the actual level of nominal 5% tests The actual level ignoring error was > 10% The actual level of our score test was very nearly 5%
19 MAXIMUM LIKELIHOOD If there is no measurement error, there are a wide variety of possible MLE algorithms. With measurement error, we wanted to use the EM algorithm. E step not available in closed form, and would in general require numerical integration. The missing data are the random effects and the X s The E step requires analysis of E (log likelihood of complete data observed data) This requires expectations of functions of the random effects and the X s given the observed data and current parameter estimates
20 MAXIMUM LIKELIHOOD The missing data are the random effects and the X s The E step requires analysis of E (log likelihood of complete data observed data) We repeatedly generated observations from the appropriate conditional distributions using the Metropolis Hastings algorithm This repeatedly gives observations from the unknown X s and the random effects. This is a generalization of an idea due to C. Mc- Culloch in the no error case He observed in the no error case (and it generalizes to our case) that the random effects and X s generated in the E step automatically lead to simple solutions to the M step. In the no error case, the method reliably reproduces the MLE as judged by EGRET, but is very slow in this first implementation.
21 EXAMPLE We considered data from the Framingham Heart Study There were m = 75 clusters (individuals) with most having n = 4 observations, each taken 2 years apart. The variables were Y = evidence of LVH diagnosed by ECG in patients who developed coronary heart disease before or during the study period W = log(sbp-50) Z = age, exam number, smoking status, body mass index. X = average log(sbp-50) reading over many applications within 6 months (say) of each exam. Since blood pressures are only taken every two years, there is no direct evidence of how W differs from X, and hence no direct way to estimate Σ uu, the measurement error covariance.
22 EXAMPLE It is known that besides simple variation in the measurement process, SBP varies according to time of day, day of week, stress, etc. Data do not allow us to get at this without assumptions. It is possible that the errors of W as measures of X are correlated, although with a 2 year lag in a fairly broad population one would not expect this correlation to be terribly large. This design issue is not restricted merely to SBP. Nutrition experiments with long time lags face exactly the same problem Thus, to illustrate the methods we assumed independent measurement errors, i.e., Σ uu = σui. 2 The GLMM is logistic regression with an individual level random intercept.
23 EXAMPLE The residuals from the regression of W on Z show strong cluster (individual) effects (1/3) is the observed variability in these residuals is within individual variance. Thus, we varied σu 2 from extreme to another among the various possibilities: σu 2 =0 no measurement error, within individual variation entirely due to changes in SBP σ 2 u = (1/3) total variation within individual variation entirely due to measurement error To estimate σu 2 we would need additional measurements of SBP at days relatively close to but not the same as the major exam date. Next we show how SIMEX performed for CPQL
24 EXAMPLE As expected, score test for variance component is highly significant p value increases from to as the measurement error variance increases. Estimates ranges as error variance increases: CPQL θ: decreases from 2.05 to 1.85 CPQL β: increases from 2.80 to 3.90 MLE θ: decreases from 2.65 to 2.20 That θ mle >θ cpql is expected, as are the directions given above.
25 DISCUSSION We have shown some of the effects of measurement error on biases of parameter estimates. Major new observation is the effect of cluster size GLMMeM s are GLMM s with a different fixed effect and random effect structure The SIMEX method is one simple functional method for approximately consistent estimation. We used our previous results to find a score test for global variance components The MLE can be computed via EM:M H. While our implementation is slow, it does allow for non-normal X s. The example illustrated a design problem Without supplemental reliability studies, it will be impossible to estimate the measurement error structure. Identifiability, etc. is still an open question.
26 BAYES ESTIMATION VIA GIBBS SAMPLING It is easy enough to specify various priors and write down expressions for the complete conditionals in a Gibbs sampling implementation There are lots of Metropolis Hastings steps Generating the X s and the random effects actually uses the same code as the EM algorithm We have had various problems though Sensitivity of the answer to the prior Convergence difficulties even with proper priors Work still in progress Natural tests for and shrinkage of the variance components is the aim
Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova
Measurement error modeling Statistisches Beratungslabor Institut für Statistik Ludwig Maximilians Department of Statistical Sciences Università degli Studi Padova 29.4.2010 Overview 1 and Misclassification
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationMeasurement Error in Covariates
Measurement Error in Covariates Raymond J. Carroll Department of Statistics Faculty of Nutrition Institute for Applied Mathematics and Computational Science Texas A&M University My Goal Today Introduce
More informationA NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL
Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationMeasurement error as missing data: the case of epidemiologic assays. Roderick J. Little
Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods
More informationMeasurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007
Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η
More informationPQL Estimation Biases in Generalized Linear Mixed Models
PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized
More informationMulti-level Models: Idea
Review of 140.656 Review Introduction to multi-level models The two-stage normal-normal model Two-stage linear models with random effects Three-stage linear models Two-stage logistic regression with random
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More informationStatistics 203: Introduction to Regression and Analysis of Variance Penalized models
Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance
More informationMEASUREMENT ERROR IN HEALTH STUDIES
MEASUREMENT ERROR IN HEALTH STUDIES Lecture 1 Introduction, Examples, Effects of Measurement Error in Linear Models Lecture 2 Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors,
More information36-463/663: Multilevel & Hierarchical Models
36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationNon-Gaussian Berkson Errors in Bioassay
Non-Gaussian Berkson Errors in Bioassay Alaa Althubaiti & Alexander Donev First version: 1 May 011 Research Report No., 011, Probability and Statistics Group School of Mathematics, The University of Manchester
More informationLinear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.
Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationDiscussion of Maximization by Parts in Likelihood Inference
Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationMixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina
Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -
More informationEstimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004
Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure
More informationDSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference
DSGE-Models General Method of Moments and Indirect Inference Dr. Andrea Beccarini Willi Mutschler, M.Sc. Institute of Econometrics and Economic Statistics University of Münster willi.mutschler@uni-muenster.de
More informationSPATIAL LINEAR MIXED MODELS WITH COVARIATE MEASUREMENT ERRORS
Statistica Sinica 19 (2009), 1077-1093 SPATIAL LINEAR MIXED MODELS WITH COVARIATE MEASUREMENT ERRORS Yi Li 1,3, Haicheng Tang 2 and Xihong Lin 3 1 Dana Farber Cancer Institute, 2 American Express and 3
More informationWorking Paper No Maximum score type estimators
Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationImpact of serial correlation structures on random effect misspecification with the linear mixed model.
Impact of serial correlation structures on random effect misspecification with the linear mixed model. Brandon LeBeau University of Iowa file:///c:/users/bleb/onedrive%20 %20University%20of%20Iowa%201/JournalArticlesInProgress/Diss/Study2/Pres/pres.html#(2)
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationSampling bias in logistic models
Sampling bias in logistic models Department of Statistics University of Chicago University of Wisconsin Oct 24, 2007 www.stat.uchicago.edu/~pmcc/reports/bias.pdf Outline Conventional regression models
More informationDSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.
DSGE Methods Estimation of DSGE models: GMM and Indirect Inference Willi Mutschler, M.Sc. Institute of Econometrics and Economic Statistics University of Münster willi.mutschler@wiwi.uni-muenster.de Summer
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationWhen is MLE appropriate
When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationMeasurement error, GLMs, and notational conventions
The Stata Journal (2003) 3, Number 4, pp. 329 341 Measurement error, GLMs, and notational conventions James W. Hardin Arnold School of Public Health University of South Carolina Columbia, SC 29208 Raymond
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationPOLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR
POLYNOMIAL REGRESSION AND ESTIMATING FUNCTIONS IN THE PRESENCE OF MULTIPLICATIVE MEASUREMENT ERROR Stephen J. Iturria and Raymond J. Carroll 1 Texas A&M University, USA David Firth University of Oxford,
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationGeneralized, Linear, and Mixed Models
Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New
More informationPropensity Score Weighting with Multilevel Data
Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationPh.D. Preliminary Examination Statistics June 2, 2014
Ph.D. Preliminary Examination Statistics June, 04 NOTES:. The exam is worth 00 points.. Partial credit may be given for partial answers if possible.. There are 5 pages in this exam paper. I have neither
More information1. Introduction This paper focuses on two applications that are closely related mathematically, matched-pair studies and studies with errors-in-covari
Orthogonal Locally Ancillary Estimating Functions for Matched-Pair Studies and Errors-in-Covariates Molin Wang Harvard School of Public Health and Dana-Farber Cancer Institute, Boston, USA and John J.
More informationGeneralized linear mixed models for biologists
Generalized linear mixed models for biologists McMaster University 7 May 2009 Outline 1 2 Outline 1 2 Coral protection by symbionts 10 Number of predation events Number of blocks 8 6 4 2 2 2 1 0 2 0 2
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationEric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION
Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationBayesian inference for multivariate extreme value distributions
Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of
More informationEconometric Analysis of Cross Section and Panel Data
Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND
More informationFunctional Latent Feature Models. With Single-Index Interaction
Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and
More informationConstrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources
Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan
More informationRandom Numbers and Simulation
Random Numbers and Simulation Generating random numbers: Typically impossible/unfeasible to obtain truly random numbers Programs have been developed to generate pseudo-random numbers: Values generated
More informationA New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models
A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models Laurence S. Freedman 1,, Vitaly Fainberg 1, Victor Kipnis 2, Douglas Midthune 2, and Raymond J. Carroll 3 1
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationCombining multiple observational data sources to estimate causal eects
Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,
More informationProfessors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th
DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann
More informationThe Problem of Modeling Rare Events in ML-based Logistic Regression s Assessing Potential Remedies via MC Simulations
The Problem of Modeling Rare Events in ML-based Logistic Regression s Assessing Potential Remedies via MC Simulations Heinz Leitgöb University of Linz, Austria Problem In logistic regression, MLEs are
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationA Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,
A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type
More informationWU Weiterbildung. Linear Mixed Models
Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationPenalized Splines, Mixed Models, and Recent Large-Sample Results
Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong
More informationChapter 2: simple regression model
Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.
More informationRegression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y
Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationDescribing Change over Time: Adding Linear Trends
Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section
More informationQuantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017
Summary of Part II Key Concepts & Formulas Christopher Ting November 11, 2017 christopherting@smu.edu.sg http://www.mysmu.edu/faculty/christophert/ Christopher Ting 1 of 16 Why Regression Analysis? Understand
More informationStatistics: A review. Why statistics?
Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationLecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011
Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector
More informationWISE International Masters
WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationBayesian Model Diagnostics and Checking
Earvin Balderama Quantitative Ecology Lab Department of Forestry and Environmental Resources North Carolina State University April 12, 2013 1 / 34 Introduction MCMCMC 2 / 34 Introduction MCMCMC Steps in
More information13 Notes on Markov Chain Monte Carlo
13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful
More informationOpen Problems in Mixed Models
xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationMulticollinearity and A Ridge Parameter Estimation Approach
Journal of Modern Applied Statistical Methods Volume 15 Issue Article 5 11-1-016 Multicollinearity and A Ridge Parameter Estimation Approach Ghadban Khalaf King Khalid University, albadran50@yahoo.com
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Motivation: Why Applied Statistics?
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationHakone Seminar Recent Developments in Statistics
Hakone Seminar Recent Developments in Statistics November 12-14, 2015 Hotel Green Plaza Hakone: http://www.hgp.co.jp/language/english/sp/ Organizer: Masanobu TANIGUCHI (Research Institute for Science &
More informationOn Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data
On Measurement Error Problems with Predictors Derived from Stationary Stochastic Processes and Application to Cocaine Dependence Treatment Data Yehua Li Department of Statistics University of Georgia Yongtao
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationGeneral Regression Model
Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical
More informationReview of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.
Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 1 Review of Panel Data 2 Model Types 3 Review and Looking Forward 1 Review of Panel Data 2 Model Types 3 Review
More informationBayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London
Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview
More informationLeast Squares Estimation-Finite-Sample Properties
Least Squares Estimation-Finite-Sample Properties Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU) Finite-Sample 1 / 29 Terminology and Assumptions 1 Terminology and Assumptions
More informationThe impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference
The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference An application to longitudinal modeling Brianna Heggeseth with Nicholas Jewell Department of Statistics
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationBias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates
Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates by c Ernest Dankwa A thesis submitted to the School of Graduate
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationModeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use
Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression
More information