General Regression Model
|
|
- Juliet West
- 5 years ago
- Views:
Transcription
1 Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical analysis models to an infinite sample setting in which groups being compared may be defined by each combination of values for multiple variables, including continuous variables. The most commonly used regression models (linear, logistic, Poisson, proportional hazards, and parametric accelerated failure time models) differ primarily in the functional (summary measure) of the distribution that is used as the basis of inference. In this document I describe the general notation shared by all of these models, and then I highlight a few of the issues unique to the individual methods. In all cases I restrict attention to the use of univariate (i.e., one response variable) regression models to investigate associations among multiple variables. 1 Notation for the Statistical Analysis Model The general regression model involves: 1. Random variable Y is the response variable. We will estimate some summary measure of the distribution of Y within groups. 2. Random variable X is the predictor of interest (POI) variable. We will divide the population into subpopulations based on this variable. Each individual in a given subpopulation will have the same value of X as all other individuals in that subpopulation. 3. Random variables W 1, W 2,..., W p are additional covariates that are used to adjust for confounding or to provide additional precision for inference. We will also use these variables to divide the population into subpopulations. Each individual in a given subpopulation will have the same value of each of the W j s as all other individuals in that subpopulation. 4. θ is some summary measure of the distribution of Y. (The technical statistical term is functional. It is something that can be computed from the probability distribution, but in the most general case it may be itself a function.) This will have to depend on the type of variable (e.g., binary, unordered categorical, ordered categorical, continuous). Common choices for continuous random variables include (a) Mean (b) Geometric mean (providing Y is always positive) (c) Median (or, less commonly, some other quantile such as 25th or 75th percentile) (d) Probability that Y > c for some specified, scientifically relevant value of c (e) Odds that Y > c for some specified, scientifically relevant value of c 1
2 (f) Hazard function (instantaneous rate of failure at a specified time, conditional on being at risk of failure at that time) 5. β is a p+2 dimensional vector of regression parameters (or borrowing the algebraic term, regression coefficients ). 6. η is a linear predictor that describes the combined effect of all regression predictors (X, W 1, W 2,..., W p ) on θ. For specified values of those predictors, the exact form of η is taken to be η = β 0 + β 1 X + β 2 W 1 + β 3 W β p+1 W p. 7. g( ) is some link function that links the linear predictor to θ. That is, we have regression model g(θ) = η = β 0 + β 1 X + β 2 W 1 + β 3 W β p+1 W p. Although more varied choices of link functions are sometimes considered, our typical use of the link function is either (a) to describe an additive model using an identity link (g(θ) = θ), (b) the describe a multiplicative model using the log link (g(θ) = log(θ)). Interpretation of the regression parameters depends on the link function. With an identity link: 1. β 0 is the value of θ within a subpopulation having X = 0, W 1 = 0, W 2 = 0,..., W p = 0 (i.e., all 2. β 1 is the value of the difference in θ between two subpopulations that differ by 1 unit in their value of X, but agree in their values of W 1, W 2,..., W p. Depending on how the variables W 1, W 2,..., W p are defined, it may not be scientifically possible to have groups differing in their value of X and not differing in the value of some W j. In that case we will have to find other interpretations of 3. β j for j > 1 has interpretations analogous to that for beta 1 : we consider differences in θ across groups that differ by one unit in the variable corresponding to β j, but agreeing in their values for all other variables. With a log link: 1. e β0 is the value of θ within a subpopulation having X = 0, W 1 = 0, W 2 = 0,..., W p = 0 (i.e., all 2. e β1 is the value of the ratio of θ between two subpopulations that differ by 1 unit in their value of X, but agree in their values of W 1, W 2,..., W p. Version: January 5,
3 Depending on how the variables W 1, W 2,..., W p are defined, it may not be scientifically possible to have groups differing in their value of X and not differing in the value of some W j. In that case we will have to find other interpretations of 3. e βj for j > 1 has interpretations analogous to that for beta 1 : we consider ratios of θ across groups that differ by one unit in the variable corresponding to β j, but agreeing in their values for all other variables. Note that in the above interpretations, we were pretending that the differences in θ (for an identity link) or the ratios of θ (for the log link) would be the same for every two subpopulations that differed by 1 unit in the corresponding variable. If this linearity assumption does not hold, we then interpret the slope parameters as some sort of average difference or average ratio across subpopulations differing by 1 unit in the corresponding variable. 1.1 Linear Regression The linear regression model uses our general regression model with 1. θ is the mean of the distribution of Y. 2. g( ) is the identity link (g(θ) = θ). 1. β 0 is the mean of Y within a subpopulation having X = 0, W 1 = 0, W 2 = 0,..., W p = 0 (i.e., all 2. β 1 is the value of the difference in the mean of Y between two subpopulations that differ by 1 unit in their value of X, but agree in their values of W 1, W 2,..., W p. Hence for some arbitrary x, w 1, w 2, β 1 = E[Y X = x + 1, W 1 = w 1,..., W p = w p ] E[Y X = x, W 1 = w 1,..., W p = w p ]. Estimates of the regression parameters uses least squares. This corresponds to maximum likelihood estimation when the distribution of Y is Gaussian within subpopulations. Optimality properties of linear regression include Version: January 5,
4 In the presence of independent observations, homoscedasticity (variances of Y within subpopulations are equal to each other) and the correct linear model for the means, the ordinary least squares estimates (OLSE) are best linear unbiased estimates (BLUE). This means that among all unbiased estimators that use linear combinations of the observed values of Y, the OLSE have the greatest precision. (In the presence of heteroscedasticity, we should use weighted least squares estimates (WLSE).) In the presence of independent observations, homoscedasticity, the correct linear model for the means, and a Gaussian distribution of Y within subpopulations, the OLSE are MLE and are the efficient estimators of the means. Because we already know the OLSE are BLUE, this added fact just tells us that it is best to use linear combinations of the observed values of Y nonlinear combinations of Y will not be better estimates. The probability distribution of OLSE can be shown to be asymptotically normal under reasonable assumptions on the distribution of the predictors. That is, as the sample sizes get larger, the probability distribution for the OLSE gets closer and closer to a Gaussian distribution. If Y is normally distributed within subpopulations, the OLSE are exactly normally distributed no matter how small the sample size (so long as their is enough data to make the estimates). 1.2 Linear Regression on log Transformed Response Variables This regression model is relevant only when Y is always positive. In this case, the linear regression model uses our general regression model with 1. θ is the geometric mean of the distribution of Y. 1. e β0 is the geometric mean of Y within a subpopulation having X = 0, W 1 = 0, W 2 = 0,..., W p = 0 (i.e., all 2. e β1 is the value of the ratio of geometric means of Y between two subpopulations that differ by 1 unit in their value of X, but agree in their values of W 1, W 2,..., W p. Hence for some arbitrary x, w 1, w 2, e β1 = GM[Y X = x + 1, W 1 = w 1,..., W p = w p ]. GM[Y X = x, W 1 = w 1,..., W p = w p ] Estimates of the regression parameters uses least squares. This corresponds to maximum likelihood estimation when the distribution of Y is lognormal (so the distribution of log(y ) is Gaussian) within subpopulations. Version: January 5,
5 Optimality properties and probability distributions for the regression parameters of linear regression on log transformed response variables derive directly from those of linear regression. 1.3 Logistic Regression This regression model is relevant only when Y is a binary 0-1 variable. In this case, the logistic regression model uses our general regression model with 1. θ is the odds that Y = e β0 is the odds of Y = 1 within a subpopulation having X = 0, W 1 = 0, W 2 = 0,..., W p = 0 (i.e., all 2. e β1 is the value of the ratio of the odds that Y = 1 between two subpopulations that differ by 1 unit in their value of X, but agree in their values of W 1, W 2,..., W p. Hence for some arbitrary x, w 1, w 2, e β1 = odds[y = 1 X = x + 1, W 1 = w 1,..., W p = w p ] odds[y = 1 X = x, W 1 = w 1,..., W p = w p ] = P r[y = 1 X = x + 1, W 1 = w 1,..., W p = w p ] P r[y = 0 X = x, W 1 = w 1,..., W p = w p ] P r[y = 0 X = x + 1, W 1 = w 1,..., W p = w p ] P r[y = 1 X = x, W 1 = w 1,..., W p = w p ]. Estimates of the regression parameters uses maximum likelihood estimation (MLE) based on the Bernoulli (or binomial) distribution for Y. As such, the variability of the regression parameter estimates is classically based on the mean-variance relationship dictated by the Bernoulli distribution: if E[Y ] = p, then V ar(y ) = p(1 p). If the observations of Y are independent, and if the log odds across subpopulations adhere to the linear model, the MLE are asymptotically efficient. 1.4 Poisson Regression This regression model is relevant only when Y is nonnegative (it is classically defined for count data that has Poisson distributions within subpopulations).. In this case, the Poisson regression model uses our general regression model with Version: January 5,
6 1. θ is the mean value of Y (if the data is truly Poisson count data, that mean is also an event rate). 1. e β0 is the mean (event rate?) of Y within a subpopulation having X = 0, W 1 = 0, W 2 = 0,..., W p = 0 (i.e., all 2. e β1 is the value of the ratio of the mean value of Y between two subpopulations that differ by 1 unit in their value of X, but agree in their values of W 1, W 2,..., W p. Hence for some arbitrary x, w 1, w 2, e β1 = E[Y X = x + 1, W 1 = w 1,..., W p = w p ]. E[Y X = x, W 1 = w 1,..., W p = w p ] Estimates of the regression parameters uses maximum likelihood estimation (MLE) based on the Poisson distribution for Y. As such, the variability of the regression parameter estimates is classically based on the mean-variance relationship dictated by the Poisson distribution: if E[Y ] = λ, then V ar(y ) = λ. When used with the robust Huber-White sandwich estimator of the standard errors, this model can be viewed as a regression model for ratios of means across groups. If the observations of Y are independent Poisson random variables, and if the log mean across subpopulations adhere to the linear model, the MLE are asymptotically efficient. 1.5 Proportional Hazards Regression This regression model is classically relevant only when Y is nonnegative (it is classically defined for timeto-event data that exhibits proportional hazards across subpopulations). In this case, the proportional hazards regression model uses our general regression model with 1. θ is the hazard function (instantaneous rate of failure) for Y. (This is a function that might vary over time.) 1. The intercept in this model is a hazard function λ Y (t) of Y within a subpopulation having X = 0, W 1 = 0, W 2 = 0,..., W p = 0 (i.e., all Standard statistical software does not automatically return this estimated function, because the slope parameters can be estimated without ever estimating this baseline hazard function. Version: January 5,
7 Most often we are not really interested in the baseline hazard function. 2. e β1 is the value of the ratio of the hazard funtion of Y between two subpopulations that differ by 1 unit in their value of X, but agree in their values of W 1, W 2,..., W p. Under the proportional hazards assumption, the same ratio would be obtained at every point in time. Hence for some arbitrary x, w 1, w 2, e β1 = λ Y (t X = x + 1, W 1 = w 1,..., W p = w p ). λ Y (t X = x, W 1 = w 1,..., W p = w p ) Estimates of the regression parameters uses maximum partial likelihood estimation (MLE) based on the proportional hazards assumption for Y. This is somewhat related to logistic regression, and as such, the variability of the regression parameter estimates is classically based on the mean-variance relationship dictated in part by the Bernoulli distribution. Version: January 5,
Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationLecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018
Introduction Statistics 211 - Statistical Methods II Presented January 8, 2018 linear models Dan Gillen Department of Statistics University of California, Irvine 1.1 Logistics and Contact Information Lectures:
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationLogistic Regression in R. by Kerry Machemer 12/04/2015
Logistic Regression in R by Kerry Machemer 12/04/2015 Linear Regression {y i, x i1,, x ip } Linear Regression y i = dependent variable & x i = independent variable(s) y i = α + β 1 x i1 + + β p x ip +
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationAccounting for Baseline Observations in Randomized Clinical Trials
Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA October 6, 0 Abstract In clinical
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationLecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:
Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of
More informationAccounting for Baseline Observations in Randomized Clinical Trials
Accounting for Baseline Observations in Randomized Clinical Trials Scott S Emerson, MD, PhD Department of Biostatistics, University of Washington, Seattle, WA 9895, USA August 5, 0 Abstract In clinical
More informationMarginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal
Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal Overview In observational and experimental studies, the goal may be to estimate the effect
More informationStatistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames
Statistical Methods in HYDROLOGY CHARLES T. HAAN The Iowa State University Press / Ames Univariate BASIC Table of Contents PREFACE xiii ACKNOWLEDGEMENTS xv 1 INTRODUCTION 1 2 PROBABILITY AND PROBABILITY
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationISQS 5349 Spring 2013 Final Exam
ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices
More informationSome Observations on the Wilcoxon Rank Sum Test
UW Biostatistics Working Paper Series 8-16-011 Some Observations on the Wilcoxon Rank Sum Test Scott S. Emerson University of Washington, semerson@u.washington.edu Suggested Citation Emerson, Scott S.,
More informationWhen is MLE appropriate
When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the
More informationFaculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics
Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationLab 8. Matched Case Control Studies
Lab 8 Matched Case Control Studies Control of Confounding Technique for the control of confounding: At the design stage: Matching During the analysis of the results: Post-stratification analysis Advantage
More informationWU Weiterbildung. Linear Mixed Models
Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes
More informationp y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.
1. Suppose Y 1, Y 2,..., Y n is an iid sample from a Bernoulli(p) population distribution, where 0 < p < 1 is unknown. The population pmf is p y (1 p) 1 y, y = 0, 1 p Y (y p) = (a) Prove that Y is the
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationVarieties of Count Data
CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function
More information3 Joint Distributions 71
2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationOther Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model
Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationPENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA
PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University
More informationFinite Population Sampling and Inference
Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationEstimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk
Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:
More informationLecture 2: Linear and Mixed Models
Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =
More informationModel-based boosting in R
Model-based boosting in R Families in mboost Nikolay Robinzonov nikolay.robinzonov@stat.uni-muenchen.de Institut für Statistik Ludwig-Maximilians-Universität München Statistical Computing 2011 Model-based
More information* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.
Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationTurning a research question into a statistical question.
Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE
More informationPreface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of
Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationGeneralized Linear Modeling - Logistic Regression
1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation
More informationA general mixed model approach for spatio-temporal regression data
A general mixed model approach for spatio-temporal regression data Thomas Kneib, Ludwig Fahrmeir & Stefan Lang Department of Statistics, Ludwig-Maximilians-University Munich 1. Spatio-temporal regression
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationBeyond GLM and likelihood
Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence
More informationProbability Distributions Columns (a) through (d)
Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationLecture 18: Simple Linear Regression
Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength
More informationComparing IRT with Other Models
Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used
More informationRandom Effects Models for Network Data
Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of
More informationCOMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract
Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationImproving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates
Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationStatistics Ph.D. Qualifying Exam: Part II November 9, 2002
Statistics Ph.D. Qualifying Exam: Part II November 9, 2002 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your
More informationSurvival Analysis I (CHL5209H)
Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really
More informationIndex. Regression Models for Time Series Analysis. Benjamin Kedem, Konstantinos Fokianos Copyright John Wiley & Sons, Inc. ISBN.
Regression Models for Time Series Analysis. Benjamin Kedem, Konstantinos Fokianos Copyright 0 2002 John Wiley & Sons, Inc. ISBN. 0-471-36355-3 Index Adaptive rejection sampling, 233 Adjacent categories
More informationPoisson regression 1/15
Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More informationTest Problems for Probability Theory ,
1 Test Problems for Probability Theory 01-06-16, 010-1-14 1. Write down the following probability density functions and compute their moment generating functions. (a) Binomial distribution with mean 30
More informationGeneralized Linear Models and Exponential Families
Generalized Linear Models and Exponential Families David M. Blei COS424 Princeton University April 12, 2012 Generalized Linear Models x n y n β Linear regression and logistic regression are both linear
More informationA strategy for modelling count data which may have extra zeros
A strategy for modelling count data which may have extra zeros Alan Welsh Centre for Mathematics and its Applications Australian National University The Data Response is the number of Leadbeater s possum
More informationSTAT 526 Spring Final Exam. Thursday May 5, 2011
STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More information5. Parametric Regression Model
5. Parametric Regression Model The Accelerated Failure Time (AFT) Model Denote by S (t) and S 2 (t) the survival functions of two populations. The AFT model says that there is a constant c > 0 such that
More informationBinary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017
Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationSemiparametric Regression
Semiparametric Regression Patrick Breheny October 22 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Introduction Over the past few weeks, we ve introduced a variety of regression models under
More informationInstructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses
ISQS 5349 Final Spring 2011 Instructions: Closed book, notes, and no electronic devices. Points (out of 200) in parentheses 1. (10) What is the definition of a regression model that we have used throughout
More informationSTAT 6350 Analysis of Lifetime Data. Probability Plotting
STAT 6350 Analysis of Lifetime Data Probability Plotting Purpose of Probability Plots Probability plots are an important tool for analyzing data and have been particular popular in the analysis of life
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationRepeated ordinal measurements: a generalised estimating equation approach
Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationDistribution Fitting (Censored Data)
Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More information7 Sensitivity Analysis
7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption
More information2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling
2016 SISG Module 17: Bayesian Statistics for Genetics Lecture 3: Binomial Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington Outline Introduction and Motivating
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationA Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,
A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type
More information