STAT 526 Spring Midterm 1. Wednesday February 2, 2011
|
|
- Julie Johnson
- 5 years ago
- Views:
Transcription
1 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will be deducted for false statements, even if the final answer is correct. Please circle your final answer where appropriate. This exam is closed-book. You may consult two pages with your hand-written notes. Calculators are permitted. Honor code: I promise not to cheat on this exam. I will neither give nor receive any unauthorized assistance. I will not to share information about the exam with anyone who may be taking it at a different time. I have not been told anything about the exam by someone who has taken it earlier. Signature: Date: 1
2 Question Possible Points Actual Points
3 1. Researchers study the performance of nurse practitioners in three specialities (pediatrics, obstetrics and diabetes). They randomly selected 3 cities, and recorded competency scores of 4 nurses randomly selected within each speciality and each city. The scores are on a continuous scale, and the values are summarized below. City 1 City 2 City 3 Mean Diabetes Obstetrics Pediatrics (a) (6 pts) State the ANOVA model that is appropriate for these data, and the assumptions. y ijk = µ + α i + β j + (αβ) ij + ɛ ijk, where y ijk is the score from speciality i = 1,..., 3, city j = 1,..., 3 and replicate k = 1,..., 4 µ is the overall expected value 3 α i is the deviation of the expected score of speciality i from the overall mean, = 0 β j is the deviation of the expected score of city j from the overall mean, β j iid N (0, σ 2 β ) (αβ) ij is the non-additive deviation of speciality i and city j, (αβ) ij iid N (0, σ 2 αβ ) ɛ ijk is the random error, ɛ ijk iid N (0, σ 2 ) β j, (αβ) ij, ɛ ijk are independent i=1 (b) (6 pts) Provide the estimates of the fixed effects of the model in the zero-sum model parametrization. α 1 = = α 2 = = α 3 = =
4 (c) (6 pts) Provide the estimates of the fixed effects of the model in the baseline model parametrization. α 1 = 0 α 2 = = α 3 = = (d) (6 pts) Based on the R output below, estimate and interpret the variance components of the model. > aov(score ~ spec*city, data=x) Call: aov(formula = score ~ spec * city, data = X) spec city spec:city Residuals Sum of Squares The ANOVA table is Therefore Source df MS EMS Pi A nb a 1 nσ2 αβ + σ 2 B naσβ 2 + nσαβ 2 + σ 2 AB nσαβ 2 + σ 2 Error σ 2 ˆσ 2 = MSE = ˆσ αβ 2 = MS(AB) MSE = = n 4 ˆσ β 2 = MS(B) MS(AB) = = na 4 3 The first estimate is negative, therefore we assign it to zero. The second estimate is much smaller that the MSE. Therefore the between-city variation does not contribute substantially to the overall variation. 4
5 (e) (6 pts) The researchers decided to exclude city (both main effects and interactions) from the model. Use the new model to test whether there is a difference between the specialities. Use confidence level of 95%. The model is The ANOVA table is y ijk = µ + α i + ɛ ijk Source df MS A Error =33 ( )/33 = We test H 0 : α i = 0 for all i, against H a : α i 0 for some i. F = MS(A) MS(E) = = > F (2, 33) = We reject H 0, and conclude that there is a difference between the specialities. (f) (6 pts) Use the new model to provide the 95% CI for the difference of the expected scores of pediatrics and diabetes. ( ) ± t / ( ) ± ( , ) 5
6 2. (6 pts) In genetics, when a gene has two different alleles A and a, each individual in a population must have one of three possible genotypes: AA, Aa and aa. If the alleles are passed independently from the two parents, and every parent has the same probability θ of passing the first allele to each offspring, then the probability distribution of the three genotypes is Genotype AA Aa aa Probability π 1 = θ 2 π 2 = 2θ(1 θ) π 3 = (1 θ) 2 where 0 < θ < 1 and 3 i=1 π i = 1. A random sample of n = 100 individuals is taken from this population, resulting in the following counts of individuals with each genotype: Genotype AA Aa aa Total Observed counts n 1 = 70 n 2 = 25 n 3 = Conduct a deviance goodness-of-fit test to determine whether the hypothesize form of π 1, π 2 and π 3 is appropriate for these data. State the null and the alternative hypotheses, the test statistic, and your conclusion at the confidence level of 95%. The likelihood and the log-likelihood are L(θ) = C 1 [θ 2 ] 70 [2θ(1 θ)] 25 [(1 θ) 2 ] 5 l(θ) = C log(θ) + 25 log(θ) + 25 log(1 θ) log(1 θ) The derivative is u(θ) = l(θ) θ = 140 θ + 25 θ 25 1 θ 10 1 θ Solving u(θ) = 0, θ θ(1 θ) = 0, ˆθ = = Testing H 0 : π 1, π 2, π 3 are as specified vs H a : π 1, π 2, π 3 unrelated, 3 i=1 π i = 1, using the deviance test: G 2 = 2 3 n i log(n i /µ i ) i=1 [ ] 70 = 2 70 log log ( ) + 5 log ( ) 2 = < χ 2 2 1(1 0.05) = Therefore we fail to reject H 0, and conclude that the specified model has a good fit. 6
7 3. Investigators would like to establish whether a genetic fingerprint technique (called polymerase chain reaction, PCR) can be used as a tool for diagnostics of relapse status of acute lumphoblastic leukemia. PCR was performed on the bone marrow of 178 children who were currently in remission. Results of the study are tabulated as follows: Relapse status PCR status Yes No Total Traces of cancer Cancer free Total (a) (6 pts) Test whether the probability of relapse is different among children with and without trances of cancer, by comparing proportions. State the null and the alternative hypotheses, the non-pooled test statistic, and your conclusion at the confidence level of 95%. Denote π 1 = P {Relapse T races of cancer} and π 2 = P {Relapse Cancer free}. We test H 0 : π 1 π 2 = 0 vs H a : π 1 π 2 0. ˆπ 1 = 30/75 = 0.4, ˆπ 2 = 8/103 = The test statistic is T = ˆπ 1 ˆπ 2 = ˆπ 1 (1 ˆπ 1 ) 75 + ˆπ 2(1 ˆπ 2 ) = > z /2 = 1.96 We reject H 0 and conclude that the relapse rate is significantly different for the two outcomes of PCR. (b) (6 pts) Estimate the odds ratio of relapse and its 95% confidence interval, and interpret the result. The odds ratio θ, and the estimated SE of log(ˆθ) are ˆθ = n 11 n = n 12 n = [ 1 s(log(ˆθ)) = ] 1/2 = The 95% CI of log(ˆθ) is log(ˆθ) ± z /2 s(log(ˆθ)), i.e. ( , ) On the scale of the odds ratio, the CI is (e , e ) = ( , ). The CI does not contain 1, i.e. at the confidence level of 95%, the odds of relapse are significantly higher for patients with traces of cancer. 7
8 (c) (6 pts) The Pearson standardized residuals are given in the table below. What are your conclusions from this table? r ij Relapse status PCR status Yes No Traces of cancer Cancer free The residuals are larger in absolute value that z /2 = Therefore the cells show a greater discrepancy between the observed cell counts and the cell counts predicted under independence. This indicates that the hypothesis of independence is not appropriate. (d) (6 pts) Estimate the sensitivity and the specificity of the PCR test. Sensitivity = P {P CR = Y es Relapse = Y es} = = Specificity = P {P CR = No Relapse = No} = =
9 4. Researchers conduct a retrospective case-control study of lang cancer, comparing the smoking habits (# of cigarettes/day) of individuals with and without the disease. The data are summarized as follows: # cigarettes per day # cases # controls Total Total The R output at the end of the exam presents the results of three models fit to these data. (a) (6 pts) Consider Model 1. State the model and the assumptions. Denote X the number of cigarettes, and Y the disease status. Then ( ) πi Y i Binomial(π i ), where log = β 0 + β 1 I X= β 2 I X= β 3 I X= β 4 I X=50+ 1 π i (b) (6 pts) Based on the output of Model 1, obtain the estimated odds ratio of lung cancer of subjects who smoke more than 50 cigarettes a day, and those who smoke 1-14 cigarettes a day. log(or) = P {Y = 1 X = 50+} / P {Y = 0 X = 50+} log P {Y = 1 X = 1 14} / P {Y = 0 X = 1 14} = β 0 + β 4 β 0 β 1 = β 4 β 1 log(ôr) = ˆβ 4 ˆβ 1 = = Therefore ÔR = exp( ˆβ 4 ˆβ 1 ) = exp(1.4033) =
10 (c) (6 pts) Based on the output of Model 1, obtain a 95%CI for the odds ratio above. On the log(or) scale: V ar{log(ôr)} = V ar{ ˆβ 4 ˆβ 1 } = V ar{ ˆβ 4 } + V ar{ ˆβ 1 } 2 Cov{ ˆβ 4, ˆβ 1 } = = The CI for log(or) is ± z / ± ( , ) The CI for the OR is (exp( ), exp( )) = ( , ) (d) (6 pts) Consider Model 2. State the model and the assumption. State whether you prefer Model 1 or Model 2, and why. Denote X the score indicating the number of cigarettes per day. Then ( ) πi Y i Binomial(π i ), where log = β 0 + β 1 X i 1 π i Based on AIC, Model 1 works best. However Model 1 is a saturated model which is likely to overfit the data, and it does not account for the ordinal nature of the predictor. Therefore we prefer Model 2. The residual deviance is >> 3, indicating that there is either an insufficient quality of fit for the expected value, or presence of overdispersion. (e) (6 pts) Consider Model 3. State the model and the assumption. State whether you prefer Model 2 or Model 3, and why. Y i Quasibinomial(π i ), where E{Y i } = π i, V ar{y i } = σ 2 pi i (1 π i ) and ( ) πi log = β 0 + β 1 X i 1 π i The model accounts for the insufficient quality of fit by overdispersion. If we want to specify a linear relationship between logit(π) and X, then Model 3 is preferred. 10
11 5. Consider a generalized linear model for the expected value of a binomial response Y, as function of the predictor variable X. For each question below, circle TRUE or FALSE, and provide the rationale. (a) (6 pts) Suppose that we would like to use the identity link function. Then the least squares estimates of model parameters will be identical to the maximum likelihood estimates. TRUE FALSE False. The least squares parameter estimates are identical to the Maximum Likelihood estimates for the Normal distribution of Y. However in this case the distribution of Y is Binomial, and the likelihood and the resulting parameter estimates will differ. (b) (6 pts) Suppose that we would like to model the data from a retrospective case-control type study design. The logistic link is the only link function that yields the same estimate and the same interpretation of the parameter associated with X as in the prospective study. TRUE FALSE True. The logistic link function is the only function that allows us to interpret the parameters as log(oddsratio), and odds ratio is the same for both prospective and retrospective designs. 11
12 (c) (6 pts) Suppose that we have ungrouped data, and would like to evaluate the quality of model fit using deviance test. Under the null hypothesis that the model of interest holds, the deviance test statistic approaches χ 2 as the sample size increases. TRUE FALSE False. The number of parameters under the alternative model grows with the sample size, and therefore the asymptotic theory does not hold. (d) (6 pts) The value reported for the deviance of the model depends on whether the data are grouped (i.e. report the number of successes and the number of failures for each value of X), or as individual Bernoulli observations. However the difference between the deviances of two unsaturated models does not depend on the form of the data entry. TRUE FALSE True. The log-likelihood of unsaturated models does not depend on the form of the data. The log-likelihood of the saturated model depends on the form of the data, however it cancels out when we compare two unsaturated models. 12
13 Problem 4 Model 1 > lc cigarettes score cases controls > lc$cigarettesf <- factor(lc$cigarettes, levels=levels(lc$cigarettes)) > lc.fit1 <- glm(cbind(cases,controls) ~ cigarettesf, data=lc, family=binomial()) > summary(lc.fit1) Call: glm(formula = cbind(cases, controls) ~ cigarettesf, family = binomial(), data = lc) Deviance Residuals: [1] Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) e-08 *** cigarettesf e-06 *** cigarettesf e-08 *** cigarettesf e-12 *** cigarettesf e-10 *** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: e+02 on 4 degrees of freedom Residual deviance: e-15 on 0 degrees of freedom AIC: > summary(lc.fit1)$cov.unscaled (Intercept) cigarettesf1-14 cigarettesf15-24 (Intercept) cigarettesf cigarettesf cigarettesf cigarettesf cigarettesf25-49 cigarettesf50+ (Intercept) cigarettesf cigarettesf cigarettesf cigarettesf > predict(lc.fit1, type="link") > predict(lc.fit1, type="response")
14 Model 2 > fit2 <- glm(cbind(cases,controls) ~ score, data=lc, family=binomial()) > summary(fit2) Call: glm(formula = cbind(cases, controls) ~ score, family = binomial(), data = lc) Deviance Residuals: Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 *** score <2e-16 *** --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for binomial family taken to be 1) Null deviance: on 4 degrees of freedom Residual deviance: on 3 degrees of freedom AIC: Model 3 > fit3 <- glm(cbind(cases,controls) ~ score, data=lc, family=quasibinomial()) > summary(fit3) Call: glm(formula = cbind(cases, controls) ~ score, family = quasibinomial(), data = lc) Deviance Residuals: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) * score * --- Signif. codes: 0 *** ** 0.01 * (Dispersion parameter for quasibinomial family taken to be ) Null deviance: on 4 degrees of freedom Residual deviance: on 3 degrees of freedom AIC: NA 14
STAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationSTAT 526 Spring Final Exam. Thursday May 5, 2011
STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationHomework 5 - Solution
STAT 526 - Spring 2011 Homework 5 - Solution Olga Vitek Each part of the problems 5 points 1. Agresti 10.1 (a) and (b). Let Patient Die Suicide Yes No sum Yes 1097 90 1187 No 203 435 638 sum 1300 525 1825
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationSTAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS
STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationHomework 3 - Solution
STAT 526 - Spring 2011 Homework 3 - Solution Olga Vitek Each part of the problems 5 points 1. KNNL 25.17 (Note: you can choose either the restricted or the unrestricted version of the model. Please state
More informationBinary Response: Logistic Regression. STAT 526 Professor Olga Vitek
Binary Response: Logistic Regression STAT 526 Professor Olga Vitek March 29, 2011 4 Model Specification and Interpretation 4-1 Probability Distribution of a Binary Outcome Y In many situations, the response
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationIntroduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)
1/45 Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) Dr. Yen-Yi Ho (hoyen@stat.sc.edu) Feb 9, 2018 2/45 Objectives of Lecture 6 Association between Variables Goodness
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationLecture 25. Ingo Ruczinski. November 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University
Lecture 25 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University November 24, 2015 1 2 3 4 5 6 7 8 9 10 11 1 Hypothesis s of homgeneity 2 Estimating risk
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationThis exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points.
GROUND RULES: This exam contains 5 questions. Each question is worth 10 points. Therefore, this exam is worth 50 points. Print your name at the top of this page in the upper right hand corner. This is
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationRegression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102
Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap
More informationSTAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)
STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points
More informationStat 401B Final Exam Fall 2016
Stat 40B Final Exam Fall 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationCategorical Variables and Contingency Tables: Description and Inference
Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements
More informationToday. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:
Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y
More informationTruck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation
Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationHomework 10 - Solution
STAT 526 - Spring 2011 Homework 10 - Solution Olga Vitek Each part of the problems 5 points 1. Faraway Ch. 4 problem 1 (page 93) : The dataset parstum contains cross-classified data on marijuana usage
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationSTATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours
Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationLecture 6 Multiple Linear Regression, cont.
Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression
More informationIntroduction to logistic regression
Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn
More informationExam 2 (KEY) July 20, 2009
STAT 2300 Business Statistics/Summer 2009, Section 002 Exam 2 (KEY) July 20, 2009 Name: USU A#: Score: /225 Directions: This exam consists of six (6) questions, assessing material learned within Modules
More informationBinary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017
Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More informationExercise 5.4 Solution
Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationCh 3: Multiple Linear Regression
Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 5 Logit Models for Binary Data 173 5.1 The Bernoulli and binomial distributions......... 173 5.1.1 Mean, variance and higher order moments.... 173 5.1.2 Normal limit....................
More informationFigure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim
0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationStat 401B Exam 3 Fall 2016 (Corrected Version)
Stat 401B Exam 3 Fall 2016 (Corrected Version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationIntroduction to the Analysis of Tabular Data
Introduction to the Analysis of Tabular Data Anthropological Sciences 192/292 Data Analysis in the Anthropological Sciences James Holland Jones & Ian G. Robertson March 15, 2006 1 Tabular Data Is there
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationST430 Exam 1 with Answers
ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.
More informationSTATISTICS 141 Final Review
STATISTICS 141 Final Review Bin Zou bzou@ualberta.ca Department of Mathematical & Statistical Sciences University of Alberta Winter 2015 Bin Zou (bzou@ualberta.ca) STAT 141 Final Review Winter 2015 1 /
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationThe factors in higher-way ANOVAs can again be considered fixed or random, depending on the context of the study. For each factor:
M. Two-way Random Effects ANOVA The factors in higher-way ANOVAs can again be considered fixed or random, depending on the context of the study. For each factor: Are the levels of that factor of direct
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationThe material for categorical data follows Agresti closely.
Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical
More informationFinal Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More information36-463/663: Multilevel & Hierarchical Models
36-463/663: Multilevel & Hierarchical Models (P)review: in-class midterm Brian Junker 132E Baker Hall brian@stat.cmu.edu 1 In-class midterm Closed book, closed notes, closed electronics (otherwise I have
More informationSection 4.6 Simple Linear Regression
Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval
More informationMath 3330: Solution to midterm Exam
Math 3330: Solution to midterm Exam Question 1: (14 marks) Suppose the regression model is y i = β 0 + β 1 x i + ε i, i = 1,, n, where ε i are iid Normal distribution N(0, σ 2 ). a. (2 marks) Compute the
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More information22s:152 Applied Linear Regression. Take random samples from each of m populations.
22s:152 Applied Linear Regression Chapter 8: ANOVA NOTE: We will meet in the lab on Monday October 10. One-way ANOVA Focuses on testing for differences among group means. Take random samples from each
More informationBIOSTATS Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS
BIOSTATS 640 - Intermediate Biostatistics Spring 2017 Exam 2 (Units 3, 4 & 5) Practice Problems SOLUTIONS Practice Question 1 Both the Binomial and Poisson distributions have been used to model the quantal
More informationMSH3 Generalized linear model Ch. 6 Count data models
Contents MSH3 Generalized linear model Ch. 6 Count data models 6 Count data model 208 6.1 Introduction: The Children Ever Born Data....... 208 6.2 The Poisson Distribution................. 210 6.3 Log-Linear
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationLecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015
Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits.
More informationSTAT 510 Final Exam Spring 2015
STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and
More informationRandom and Mixed Effects Models - Part II
Random and Mixed Effects Models - Part II Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Two-Factor Random Effects Model Example: Miles per Gallon (Neter, Kutner, Nachtsheim, & Wasserman, problem
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More informationST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples
ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationPoisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction
More informationLecture 5: ANOVA and Correlation
Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions
More informationStatistics 512: Applied Linear Models. Topic 9
Topic Overview Statistics 51: Applied Linear Models Topic 9 This topic will cover Random vs. Fixed Effects Using E(MS) to obtain appropriate tests in a Random or Mixed Effects Model. Chapter 5: One-way
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationLecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2
Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y
More informationMath 152. Rumbos Fall Solutions to Exam #2
Math 152. Rumbos Fall 2009 1 Solutions to Exam #2 1. Define the following terms: (a) Significance level of a hypothesis test. Answer: The significance level, α, of a hypothesis test is the largest probability
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationLecture 01: Introduction
Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction
More information