STAT 526 Spring Final Exam. Thursday May 5, 2011
|
|
- Brett Boyd
- 5 years ago
- Views:
Transcription
1 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will be deducted for false statements, even if the final answer is correct. Please circle your final answer where appropriate. This exam is closed-book. You may consult two pages with your hand-written notes. Calculators are permitted. Honor code: I promise not to cheat on this exam. I will neither give nor receive any unauthorized assistance. I will not to share information about the exam with anyone who may be taking it at a different time. I have not been told anything about the exam by someone who has taken it earlier. Signature: Date: 1
2 Question Possible Points Actual Points
3 1. The following data are counts of road accidents in Maine, classified according to location and gender of the patients, and whether the person was wearing a seat belt. The response categories are (1) not injured, (2) injured but not transported by emergency medical services, (3) injured and transported by emergency medical services, (4) injured, hospitalized and survived, (5) injured and not survived. Response Gender Location Seat Belt Female Urban No Yes Rural No Yes Male Urban No Yes Rural No Yes A proportional odds model was fit to the response, using the remaining categories as predictors. Parameter estimates and standard errors of the fit are given below. All parameters have the correct signs (i.e. you do not need to take the negative value of parameter estimates, as is usually done with the polr output). Coefficients: Value Std. Error t value (Intercept): (Intercept): (Intercept): (Intercept): genderfemale seatbeltno locationrural seatbeltno:locationrural (a) (6 pts) State the model and the distributional assumptions. Y X Multinomial(π 1, π 2, π 3, π 4, π 5 ). If we define P j = Pr(Y j X), then the model is ( ) Pj log = α j + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 23 X 2 X 3, j = 1,..., 4 1 P j where X 1 = I {gender=female}, X 2 = I {seatbelt=no} and X 3 = I {location=rural}. 3
4 (b) (6 pts) For males in urban areas wearing seat belts, calculate the estimated probabilities of the first two response categories. The cumulative probabilities are P j = exp(x β) ( exp( ) 1+exp( ), 1+exp(x β) The response probabilities are, and the values for all the categories are ) exp( ) 1+exp( ), exp( ) 1+exp( ), 1 exp( ) 1+exp( ), π 1 = P 1, and π j = P j P j 1 for j = 2,..., 5 therefore the values for all the categories are (0.9647, , , , ) (c) (6 pts) Give a point estimate of the cumulative odds ratio of the gender, given seat belt use and location. Interpret this estimate. The odds ratio is defined as OR = odds P {Y j female} odds P {Y j male} The point estimate is ÔR = exp(β 1) = exp( ) = Given any status of seat belt use and location, the estimated odds of injury below any fixed level for a female are times the estimated odds for a male. 4
5 (d) (6 pts) Find and interpret the estimated cumulative odds ratio between response and seat belt use, given that the accidents occurred in a rural location. Why is this estimate different from the estimate for the accidents in urban locations? The odds ratio is defined as OR = odds P {Y j seatbelt = no, rural} odds P {Y j seatbelt = yes, rural} The point estimate is ÔR = exp(β 2 + β 23 ) = exp( ) = 0.41 in rural locations and exp( ) = 0.47 in urban locations. The interaction effect represents the difference between the log odds ratios in rural and urban locations. 5
6 2. Consider an I J 2 table with variables X, W and Y respectively. Assume that the counts in the table are consistent with the Poisson sampling. (a) (6 pts) Specify a log-linear model that assumes an association between each pair of the variables for each level of the third variable, i.e. (XY, W Y, XW ). State the model, the assumptions, the constraints and the model degrees of freedom. We assume that the count in each cell ind Y ijk P oisson(λ ijk ), where i = 1,..., I, j = 1,..., J and k = 1, 2 and log λ ijk = µ + α i + β j + γ k + (αβ) ij + (αγ) ik + (βγ) jk µ = log E{Y 111 }, α 1 = β 1 = γ k = αγ 1k = αγ i1 = βγ 1k = βγ j1 = αβ 1j + αβ i1 = 0 The model df = 1 + (I-1) + (J-1) + (2-1) + (I-1)(J-1) + (I-1)(2-1) + (J-1)(2-1) = IJ + I + J -1 Equivalently, the model df can be obtained as IJ2 (I 1)(J 1)(2 1) (b) (6 pts) Specify the equivalent logistic regression model with Y as the response. State the model, the assumptions, the constraints and the model degrees of freedom. The equivalent logistic regression model is Y 2 ij ind Binomial(n ij, π 2 ij ), where ( ) ( ) π2 ij π2 11 log 1 π = µ + α 2 ij i + β j ; µ = log 1 π and α = β 1 = 0 The model df = 1 + (I-1) + (J-1) = I + J -1. Since the model conditions on n ij, this accounts for the additional IJ degrees of freedom. Therefore the two models are equivalent. 6
7 (c) (6 pts) Denote the fitted cell counts according to the log-linear model ˆλ ijk and the fitted probabilities according to the logistic regression ˆπ 1 ij and ˆπ 2 ij. Write ˆπ 2 ij in terms of ˆλ ijk. ˆπ 2 ij = ˆλ ij2 ˆλ ij1 + ˆλ ij2 (d) (6 pts) Show that according to this model (in either formulation (a) or (b), since these are equivalent) the odds ratios of levels of Y given different levels of X are independent of W. E.g., using the formulation in (b): log OR = log odds(y = 2 X = i, W = j) odds(y = 2 X = i, W = j) = log ( ) π2 i j/(1 π 2 i j) = α i α i π 2 ij /(1 π 2 ij ) 7
8 3. The geometric probability distribution can be interpreted as the distribution of the number of iid Bernoulli trials observed until a first success. Its probability mass function is f(y π) = π y (1 π), y = 0, 1, 2,... (a) (6 pts) Show that this is a member of the exponential family of distributions. According to the notation used in class, state φ, a(φ), θ, b(θ) and c(y, φ). Therefore f(y π) = exp (y log π + log(1 π)) θ = log(π) π = exp(θ) φ = 1 a(φ) = 1 c(y, φ) = 0 b(θ) = log(1 π) = log (1 exp θ) (b) (6 pts) StateE{Y }, V ar{y } and the canonical link. Since E{Y } = b(θ) θ = eθ 1 e θ = π 1 π =notation = µ V ar{y } = 2 b(θ) θ 2 a(φ) = e θ (1 e θ ) 2 = π (1 π) 2 then the canonical link is µ = e θ 1 e θ, ( ) µ θ = log 1 + µ 8
9 (c) Consider a two-arm randomized clinical trial where we study remission of a disease. The patients are randomly assigned to a treatment (arm 1) or to a control (arm 2). The protocol of the trial says that each arm stops whenever a remission is observed. i. (6 pts) Specify a probability model that would be appropriate for this study. Use the canonical link function. The number of patients seen until the first remission follows a geometric distribution, with probability of an event different between the arms. Since the canonical link is log ( µ 1 + µ ) use notation for µ = log ( π / (1 π) 1 / (1 π) where trt takes value of 1 for treatment, and 0 otherwize. ) = log π model = β 0 + β 1 trt ii. (6 pts) Suppose that the control arm stops at the fifth patient, and the treatment arm stops at the third patient. Write the log-likelihood function that will be maximized to obtain parameter estimates. The observed sequences of events are R, R, R, R, R for the control, and R, R, R for the treatment. Using the exponential family representation in 3(a), the log-likelihood is [ ] l(β 0, β 1 ) = 4 β 0 + log(1 e β 0 ) + 2 (β 0 + β 1 ) + log(1 e β 0+β 1 ) 9
10 4. Researchers conducted a study to determine whether female breast cancer patients can be more accurately classified into subtypes using immunohistochemical (IH) examination. The study reported survival times (in months) of patients with negative and with positive outcomes of the exam. The data are shown below, where + indicates a censored observation. Survival times t i n n i=1 t i IH IH (a) The researchers would like to estimate the probability of survival for each group, without making any parametric distributional assumptions. i. (6 pts) State the assumptions of the Kaplan-Meier model for survival, and report Ŝ KM (38). The survival function is S(t) = c i, t i t < t i+1 for each observed event time t i, where 1 c i c i+1 0 Its Kaplan-Meier estimator is Ŝ KM (38) = i:t i t n i d i n i = (1 1/18) (1 1/17) (1 1/16) (1 1/13) = ii. (6 pts) Report the 95% confidence interval of ŜKM (38). V ar{log ŜKM (38)} = i:t i t Then the 95% confidence interval for log ŜKM (38) is d i n i (n i d i ) = = log ± [L, R] = [ , 0.002] and the approximate 95% confidence interval for ŜKM (38) is [e L, e R ] = [0.5934, ] 10
11 (b) The researchers now assume that the survival function for each group follows an exponential distribution. i. (6 pts) State the assumptions of the model, and report Ŝexponential (38). The survival function is S placebo (t) = e λ t, S trt (t) = e λ IH+ t, where λ is the reciprocal of the expected survival time. The MLE of λ is ˆλ = 18 i=1 δ i 18 i=1 t i = 10 = , and 1319 Ŝ exponential (38) = e = , comparable to the K-M estimate. ii. (6 pts) Report the 95% confidence interval of Ŝ exponential (38). How does this confidence interval compare to the confidence interval in (a, ii)? Explain. (Hint: use the observed information to estimate V ar{ŝexponential (38)}) SE{ˆλ } = 18 i=1 δ i 10 ( 18 i=1 t i) = = SE[log S(t)] = (V ar{log S(t)}) = (V ar{ λt}) = (t 2 V ar{λ}) = tse(λ) = t Then the 95% confidence interval for log Ŝexponential (38) is log ± [L, R] = [ , ] and the approximate 95% confidence interval for Ŝexponential (38) is [e L, e R ] = [0.6336, ] The confidence interval is narrower due to the parametric nature of the estimation. 11
12 (c) To determine the effectiveness of the immunohistochemical examination, the researchers fit the model given in the partial output below. In the output, the variable ih=0 for the IH negative and ih=1 for the IH positive group. Call: survreg(formula = Surv(time, delta) ~ ih, data = X, dist = "exponential") Value Std. Error z p (Intercept) e-54 ih e-01 Scale fixed at 1 Exponential distribution Loglik(model)= Loglik(intercept only)= Chisq= 0.62 on 1 degrees of freedom, p= 0.43 i. (6 pts) State the assumptions of the model, and interpret the parameters. This is an accelerated failure time model, which assumes that the covariates have the effect of adjusting time to event S(t) = S 0 (t e β 0+β 1 ih ), where S 0 (t) is the survival function of the standard exponential random variable. ii. (6 pts) Report the estimate of Ŝ(38) according to this model. Does the estimate agree with the result in (b)? Explain. Ŝ AF (38) = exp( 38 e ) = Up to numeric roundings, Ŝ AF (38) = Ŝexponential (38). The two models specify the same likelihood functions for each of the groups, and therefore yield the same parameter estimates. 12
13 5. Consider a proportional odds model for an ordered multinomial response Y, as function of the predictor variable X. For each question below, circle TRUE or FALSE, and provide the rationale. (a) (6 pts) Suppose that the cumulative odds ratio of a particular category of Y for two levels of X is 3. If we invert the order of the response categories, then the odds ratio will equal to 1/3. TRUE FALSE True. The model is defined as ( ) P {Y j} log 1 P {Y j} = α j + βx The model in the new order of the response categories can be expressed in terms of the old order of the response categories as ( ) P {Y j} log = α j + βx, i.e. 1 P {Y j} ( ) P {Y j} log = α j + βx 1 P {Y j} and the odds ratio OR = odds P {Y j X + 1} odds P {Y j X} = e β = 1 β (b) (6 pts) A positive value of the parameter β associated with the predictor X can be interpreted as the positive association between the predictor and the latent continuous variable underlying the response. TRUE FALSE False. A positive parameter indicates a negative association. The model expresses the fact that an increase in X is associated with a higher odds of lower categories of Y. 13
STAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon January 26, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 32 Table of contents Contingency Tables 1 Contingency Tables Mahinda Samarakoon
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More informationMAS3301 / MAS8311 Biostatistics Part II: Survival
MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationChapter 2: Describing Contingency Tables - I
: Describing Contingency Tables - I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationGeneralized Linear Modeling - Logistic Regression
1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationBinomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials
Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More informationSTAT 705: Analysis of Contingency Tables
STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationSurvival Analysis I (CHL5209H)
Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really
More information2 Describing Contingency Tables
2 Describing Contingency Tables I. Probability structure of a 2-way contingency table I.1 Contingency Tables X, Y : cat. var. Y usually random (except in a case-control study), response; X can be random
More informationStatistics 3858 : Contingency Tables
Statistics 3858 : Contingency Tables 1 Introduction Before proceeding with this topic the student should review generalized likelihood ratios ΛX) for multinomial distributions, its relation to Pearson
More informationADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables
ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationYou may use a calculator. Translation: Show all of your work; use a calculator only to do final calculations and/or to check your work.
GROUND RULES: Print your name at the top of this page. This is a closed-book and closed-notes exam. You may use a calculator. Translation: Show all of your work; use a calculator only to do final calculations
More informationYou know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?
You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David
More informationTypical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction
Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationDepartment of Statistical Science FIRST YEAR EXAM - SPRING 2017
Department of Statistical Science Duke University FIRST YEAR EXAM - SPRING 017 Monday May 8th 017, 9:00 AM 1:00 PM NOTES: PLEASE READ CAREFULLY BEFORE BEGINNING EXAM! 1. Do not write solutions on the exam;
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationContingency Tables. Safety equipment in use Fatal Non-fatal Total. None 1, , ,128 Seat belt , ,878
Contingency Tables I. Definition & Examples. A) Contingency tables are tables where we are looking at two (or more - but we won t cover three or more way tables, it s way too complicated) factors, each
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationPh.D. course: Regression models. Introduction. 19 April 2012
Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 19 April 2012 www.biostat.ku.dk/~pka/regrmodels12 Per Kragh Andersen 1 Regression models The distribution of one outcome variable
More informationSurvival Analysis. STAT 526 Professor Olga Vitek
Survival Analysis STAT 526 Professor Olga Vitek May 4, 2011 9 Survival Data and Survival Functions Statistical analysis of time-to-event data Lifetime of machines and/or parts (called failure time analysis
More informationORF 245 Fundamentals of Engineering Statistics. Final Exam
Princeton University Department of Operations Research and Financial Engineering ORF 245 Fundamentals of Engineering Statistics Final Exam May 22, 2008 7:30pm-10:30pm PLEASE DO NOT TURN THIS PAGE AND START
More informationLoglinear models. STAT 526 Professor Olga Vitek
Loglinear models STAT 526 Professor Olga Vitek April 19, 2011 8 Can Use Poisson Likelihood To Model Both Poisson and Multinomial Counts 8-1 Recall: Poisson Distribution Probability distribution: Y - number
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationPh.D. course: Regression models. Regression models. Explanatory variables. Example 1.1: Body mass index and vitamin D status
Ph.D. course: Regression models Introduction PKA & LTS Sect. 1.1, 1.2, 1.4 25 April 2013 www.biostat.ku.dk/~pka/regrmodels13 Per Kragh Andersen Regression models The distribution of one outcome variable
More informationBeyond GLM and likelihood
Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence
More informationGeneralized linear models III Log-linear and related models
Generalized linear models III Log-linear and related models Peter McCullagh Department of Statistics University of Chicago Polokwane, South Africa November 2013 Outline Log-linear models Binomial models
More informationLikelihoods for Generalized Linear Models
1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ)
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationStatistics 135 Fall 2008 Final Exam
Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations
More informationLecture 8: Summary Measures
Lecture 8: Summary Measures Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 8:
More informationMultinomial Regression Models
Multinomial Regression Models Objectives: Multinomial distribution and likelihood Ordinal data: Cumulative link models (POM). Ordinal data: Continuation models (CRM). 84 Heagerty, Bio/Stat 571 Models for
More informationProblem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56
STAT 391 - Spring Quarter 2017 - Midterm 1 - April 27, 2017 Name: Student ID Number: Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 Directions. Read directions carefully and show all your
More informationIntroduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University
Introduction Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 56 Course logistics Let Y be a discrete
More informationYou may use your calculator and a single page of notes.
LAST NAME (Please Print): KEY FIRST NAME (Please Print): HONOR PLEDGE (Please Sign): Statistics 111 Midterm 4 This is a closed book exam. You may use your calculator and a single page of notes. The room
More information,..., θ(2),..., θ(n)
Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Thursday, August 30, 2018 Work all problems. 60 points are needed to pass at the Masters Level and 75
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationQualifying Exam in Probability and Statistics.
Part 1: Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationTMA 4275 Lifetime Analysis June 2004 Solution
TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,
More informationLecture 7 Time-dependent Covariates in Cox Regression
Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the
More informationREGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520
REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationSurvival Analysis. Stat 526. April 13, 2018
Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationGeneral Regression Model
Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part 1: Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationLecture 8. Poisson models for counts
Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationFigure 36: Respiratory infection versus time for the first 49 children.
y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationOther Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model
Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);
More informationProportional hazards regression
Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression
More informationAnders Skrondal. Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine. Based on joint work with Sophia Rabe-Hesketh
Constructing Latent Variable Models using Composite Links Anders Skrondal Norwegian Institute of Public Health London School of Hygiene and Tropical Medicine Based on joint work with Sophia Rabe-Hesketh
More informationSTAT 5500/6500 Conditional Logistic Regression for Matched Pairs
STAT 5500/6500 Conditional Logistic Regression for Matched Pairs Motivating Example: The data we will be using comes from a subset of data taken from the Los Angeles Study of the Endometrial Cancer Data
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationChapter 20: Logistic regression for binary response variables
Chapter 20: Logistic regression for binary response variables In 1846, the Donner and Reed families left Illinois for California by covered wagon (87 people, 20 wagons). They attempted a new and untried
More informationBinary Response: Logistic Regression. STAT 526 Professor Olga Vitek
Binary Response: Logistic Regression STAT 526 Professor Olga Vitek March 29, 2011 4 Model Specification and Interpretation 4-1 Probability Distribution of a Binary Outcome Y In many situations, the response
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationLecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018
Introduction Statistics 211 - Statistical Methods II Presented January 8, 2018 linear models Dan Gillen Department of Statistics University of California, Irvine 1.1 Logistics and Contact Information Lectures:
More informationSection IX. Introduction to Logistic Regression for binary outcomes. Poisson regression
Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationContingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.
Contingency Tables Definition & Examples. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels. (Using more than two factors gets complicated,
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationHomework 10 - Solution
STAT 526 - Spring 2011 Homework 10 - Solution Olga Vitek Each part of the problems 5 points 1. Faraway Ch. 4 problem 1 (page 93) : The dataset parstum contains cross-classified data on marijuana usage
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationSolution to Tutorial 7
1. (a) We first fit the independence model ST3241 Categorical Data Analysis I Semester II, 2012-2013 Solution to Tutorial 7 log µ ij = λ + λ X i + λ Y j, i = 1, 2, j = 1, 2. The parameter estimates are
More informationLecture 3. Truncation, length-bias and prevalence sampling
Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationLogistic Regression. Advanced Methods for Data Analysis (36-402/36-608) Spring 2014
Logistic Regression Advanced Methods for Data Analysis (36-402/36-608 Spring 204 Classification. Introduction to classification Classification, like regression, is a predictive task, but one in which the
More informationChapter 4 Regression Models
23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,
More information