Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
|
|
- Russell Hill
- 5 years ago
- Views:
Transcription
1 Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability measures how probable it is that D will occur rather than some other data value. After the experiment, data D are known, and P (D H), with D fixed and H varying, can be regarded as a measure of how likely it is that H is the true hypothesis, rather than another, after observing data D. When P (D H) is regarded as a function of H rather than D we call P (D H) the likelihood of H (if considering a single H) or the likelihood function (if considering all possible H). We regard H 1 as more compatible with the data than H 2 if P (D H 1 ) > P (D H 2 ). Usually H specifies the value of a parameter, or set of parameters, and the likelihood is the probability (or probability density) for the data values, considered as a function of the parameters of the distribution. It is more convenient to work with the (natural) log of the likelihood function than the likelihood itself. When comparing hypotheses or parameter values, only relative values of likelihood are important (ratios of likelihoods, or differences between log-likelihoods). Any multiplicative factor in the likelihood (or additive term in the log-likelihood) that does not include the parameter can be ignored. See examples below. The laws of probability apply when we regard P (D H) as a function of D. They do not apply when P (D H) is regarded as a function of H. Binomial distribution Imagine we perform a simple experiment (e.g. tossing a coin) n times, and each trial results in success (with probability p), or failure (with probability 1 p). Each success contributes a factor p to the likelihood, and each failure contributes a factor (1 p). The corresponding (additive) contributions to the log-likelihood are log p and log(1 p). If there are Y successes and n Y failures, the log likelihood is Y log p + (n Y ) log(1 p). The maximum value of the (log) likelihood occurs when p = Y/n, which we call the maximum likelihood or ML estimate of p. Poisson distribution If Y is Poisson with mean m, the log-likelihood for m is Y log m m. The ML estimate of m is Y. If we have a sample of size n from this distn, the log-likelihood is the sum of contributions from each sample value, ( Y i ) log m nm, and the ML estimate of m is Ȳ = ( Y i )/n. Multinomial distribution This is a generalization of the binomial distribution, in which each trial has k possible outcomes, with probabilities p 1 p k. In n independent trials the first outcome occurs Y 1 times, the second Y 2 times, etc. The binomial distn is the case k = 2. By an argument analogous to that used above for the binomial distribution, the log-likelihood is Y 1 log p 1 + Y 2 log p Y k log p k. With no restrictions on the probabilities, apart from satisfying p p k = 1, the ML estimate of p i is Y i /n, for i = 1 to k. Multinomial applications of ML are more interesting when a null hypothesis imposes constraints on the probabilities. See below. 7
2 Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, 2 log L = n log υ + (1/υ)[S yy + n(ȳ m)2 ] The final term is minimized when m = Ȳ, so the ML estimate of m is the sample mean. Substituting this value for m leads to the profile likelihood for υ, n log υ + S yy /υ which is minimized when υ = S yy /n. The ML estimator in this case is biased, and this is generally true of ML estimators of variance components. The following argument is often used when estimating variance components. The value of S yy gives us information about υ, and if m is known, so does (Ȳ m)2. But if, as is usually the case, m is unknown, the information in (Ȳ m)2 is not available. Maximum likelihood estimation of υ should therefore be based on the probability distn of S yy alone (rather than the joint distn of S yy and Ȳ ). This argument leads to a modified (restricted, or residual) likelihood with 2 log L = (n 1) log υ + S yy /υ (based on the fact that the distn of S yy /υ is chi-squared with n 1 d.f.) The estimate obtained by maximizing the modified likelihood is called the REML estimate. Here the REML estimate is the sample variance. In previous examples the estimators could have been guessed without invoking ML. In the following examples the estimator is less obvious. Genetic linkage The progeny of two maize plants are categorized as starchy or sugary, with leaves which are either green or white. The combinations, starchy-green, starchy-white, sugary-green and sugary-white occur with probabilities (2 + θ), (1 θ), (1 θ), θ (Starchiness and leaf colour are determined at two genetic loci. The parameter θ is a measure of linkage between the two loci.) With frequencies Y 1 Y in the categories, the log-likelihood is const + Y 1 log(2 + θ) + (Y 2 + Y 3 ) log(1 θ) + Y log θ. The log-likelihood shown in slide... is based on the following data for 3839 progeny: Starchy Sugary Green White Green White Total The ML estimate of θ is ± See Q3 on the problem sheet. The standard error is calculated from the curvature of logl at the maximum. Usually there is no explicit expression for the ML estimate, and it has to be found by numerical optimization. This is an iterative process, based on successive refinements of an initial guess. 8
3 ABO blood group system In the ABO system, three alleles (A, B and O) at a single locus give rise to six genotypes and four phenotypes. Unknown frequencies for A, B, and O alleles are p, q and r (p + q + r = 1). Phenotype Genotype(s) Probability Observed frequency A AA, AO p 2 + 2pr n A B BB, BO q 2 + 2qr n B AB AB 2pq n AB O OO r 2 n O After simplification, the log-likelihood is (n A + n AB ) log p + (n B + n AB ) log q + n A log(p + 2r) + n B log(q + 2r) + 2n O log r This can be expressed as a function of just two parameters (p and q for example) by setting r = 1 p q. See question on the problem sheet. Likelihood ratio test The likelihood give us a general method for finding estimators of unknown parameters. It also provides a statistic for hypothesis testing. Suppose a model specifies that parameter vector ψ belongs to (lives in) a d dimensional space, so that there are d free parameters. For example, if we have multinomial data with k probabilities which sum to 1, d = k 1. The null hypothesis H 0 specifies that ψ is restricted to a subspace of dimension s. In other words, H 0 expresses ψ in terms of s < d free parameters (s = 0 when the vector of probabilities is completely specified by H 0 ). Denote by l c and l u the maximized log-likelihood with and without constraints. The likelihood ratio test statistic is 2(l u l c ). In large samples, the null distn of the LRT statistic is approximately chi-squared with d s d.f. For example, suppose we have sample counts of genotypes A 1 A 1, A 1 A 2, and A 2 A 2. In the population, genotype probabilities (or proportions) are p 1, p 2, and p 3. The vector parameter is ψ = (p 1, p 2, p 3 ). Because p 1 + p 2 + p 3 = 1, the number of free parameters is d = 3 1 = 2. The hypothesis of Hardy-Weinberg equilibrium states that p 1 = θ 2, p 2 = 2θ(1 θ), p 3 = (1 θ) 2, for some unspecified value of θ. Here d = 2, s = 1, d s = 1. A trivial example Take the unconstrained model to be that for the genetic linkage example above, and the constrained model to be H 0 : θ = θ 0, where θ 0 is a single specified value. Here d = 1, s = 0. In the figure on slide..., l u is the maximum value of the log-likelihood, and l c is the value at θ 0. The LRT statistic is twice the difference between these values. The null distn is chi-squared with d s = 1 d.f. H 0 is rejected at the 5% level if 2(l u l c ) > 3.8 The confidence interval shown in slide... is obtained as the set of all θ 0 values not rejected by this test. These satisfy the condition l c > l u In the figure, the arbitrary constant in logl has been chosen so that the maximum value of logl is zero. 9
4 Binomial regression For i = 1... k, Y i is binomial with index n i and parameter p i. Associated with each observation Y i is the value of an explanatory variable X i : Generalized regression model for p i : ( ) pi log = b 0 + b 1 X i ( logit transform ) 1 p i which resembles a standard regression equation. The log-likelihood for parameters b 0 and b 1 (details omitted) is [Yi log p i + (n i Y i ) log(1 p i )] where p i is given by the regression equation above, p i = exp(b 0 + b 1 X i ) 1 + exp(b 0 + b 1 X i ) ( logistic function) Example: Samples of 50 people were taken at five different ages. Numbers in each group affected by a disease were counted. Incidence of the disease obviously increases with age. Age Number The slope estimate is ˆb 1 = ± The fitted logistic curve is shown in slide... The value of the LRT statistic for testing b 1 = 0 is with 1 d.f. Log-linear models Frequencies Y 1 Y k are assumed independently Poisson distributed with means m 1 m k. In a log-linear model, we assume that log m i = b 0 + b 1 X i +, or, equivalently, that m i = exp(b 0 + b 1 X i + ) but this assumption is not used in deriving the LRT. The log-likelihood is Y i log m i m i With no restrictions on means, the ML estimate of m i is Y i, and the unconstrained maximum log-likelihood is Yi log Y i Y i (unconstrained) Now consider null hypothesis H 0 which specifies a model with s < k parameters. Denote ML estimates under H 0 by ˆm 1 ˆm k. Maximum log-likelihood under H 0 is Yi log ˆm i ˆm i (constrained) 50
5 LRT statistic is twice the difference between unconstrained and constrained maxima: 2 Y i log(y i / ˆm i ) 2 (Y i ˆm i ) (Second term is zero if model includes an intercept, which it nearly always does.) In large samples, null distn of LRT is approximately chi-squared with k s d.f. When used with the multinomial, binomial or Poisson distn, the LRT statistic is called the residual deviance and denoted G 2. It is an alternative to the chi-squared goodness of fit statistic. X 2 = (O E) 2, G 2 = 2 O log O E E The two statistics have the same d.f., and are usually similar in magnitude. Generalized linear models The binomial regression and log-linear models are examples of a generalized linear model (GLM). In the linear (multiple regression) model, the response variable Y is assumed to be normally distributed with constant variance. The mean value E(Y ) is assumed to be related to predictor variables X 1, X 2,... E(Y ) = b 0 + b 1 X 1 + b 2 X 2 + The expression on the right-hand side is the linear predictor. With a GLM, a) other distns are allowed for Y (binomial, Poisson, exponential). b) var(y ) is allowed to depend on E(Y ). c) the linear predictor is related to a function of E(Y ) (the link function). Estimation of parameters is by ML. The linear predictor can include any mix of covariates and factors, just as for the multiple regression model. E.g. with Poisson data categorized by two factors we can analyse an association table. This provides an alternative to the chi-squared association test. Example: test for association The test for association is a test of proportionality of the frequencies in the two-way table, and this is equivalent to additivity on the log scale. An alternative to the chi-squared test assumes the table counts are Poisson distributed and fits the log-linear model: log m = (Intercept) + Row effect + Column effect For a table with r rows and c columns, the unrestricted number of parameters is rc, and the null hypothesis specifies an additive model with 1 + (r 1) + (c 1) parameters. The LRT test statistic therefore has (r 1)(c 1) d.f. Using R In R, simple ML calculations are performed with mle( ). Create a function to evaluate minus the log-likelihood, then pass this function together with starting values to mle( ). For the genetic linkage example, 51
6 minuslogl <- function(x = 0.1) { if (xˆ2 < x) * log(2 + x) * log(1 - x) - 32 * log(x) else Inf } library(stats) fit <- mle(minuslogl) summary(fit) For the ABO example, the function has two arguments: na <- 212; nb <- 103; nab <- 39; no <- 19 minuslogl <- function(p, q) { r <- 1 - p - q -(na + nab) * log(p) -... } The mle( ) function is part of the stats package. Use library( ) just once to load the package. The mle( ) function is then available for the rest of your R session. # Binomial regression X <- c(20, 35, 5, 55, 70) R <- c(6, 17, 26, 37, ) N <- rep(50, 5) fit <- glm(r/n X, weight = N, family = binomial(link = "logit")) summary(fit) The output from summary(fit) gives two deviances: null deviance = 82.1 with d.f., residual deviance = 0.32 with 3 d.f. The difference between these, with 1 d.f., is the LRT for the hypothesis b 1 = 0 (very significant, off the scale). An example of a 2 2 association table (tonsils data from week 3): freqs <- c(19, 53, 97, 829) carrier <- gl(2, 2, labels = c("yes", "no")) enlarged <- gl(2, 1,, labels = c("no", "yes")) fit <- glm(freqs carrier + enlarged, family = poisson(link = "log")) summary(fit) The residual deviance is the LRT equivalent to the chi-squared statistic. When using family = poisson, the chi-squared statistic can be obtained with sum(resid(fit, type = "pearson")ˆ2) and fitted values (same as for chi-squared test) are given by fitted(fit). 52
Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationChapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments
Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationStatistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018
Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Sampling A trait is measured on each member of a population. f(y) = propn of individuals in the popn with measurement
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationEM algorithm and applications Lecture #9
EM algorithm and applications Lecture #9 Bacground Readings: Chapters 11.2, 11.6 in the text boo, Biological Sequence Analysis, Durbin et al., 2001.. The EM algorithm This lecture plan: 1. Presentation
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More information2.3 Analysis of Categorical Data
90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationf X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2
Chapter 7: EM algorithm in exponential families: JAW 4.30-32 7.1 (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe,
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationStatistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014
Overview - 1 Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Elizabeth Thompson University of Washington Seattle, WA, USA MWF 8:30-9:20; THO 211 Web page: www.stat.washington.edu/ thompson/stat550/
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationGeneralized Linear Models
Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More informationSTAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).
STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationBinomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials
Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationML Testing (Likelihood Ratio Testing) for non-gaussian models
ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationIntroduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)
1/45 Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) Dr. Yen-Yi Ho (hoyen@stat.sc.edu) Feb 9, 2018 2/45 Objectives of Lecture 6 Association between Variables Goodness
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 1/15/008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationOne-Way Tables and Goodness of Fit
Stat 504, Lecture 5 1 One-Way Tables and Goodness of Fit Key concepts: One-way Frequency Table Pearson goodness-of-fit statistic Deviance statistic Pearson residuals Objectives: Learn how to compute the
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More information2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationStatistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017
Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017 I. χ 2 or chi-square test Objectives: Compare how close an experimentally derived value agrees with an expected value. One method to
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationCase-Control Association Testing. Case-Control Association Testing
Introduction Association mapping is now routinely being used to identify loci that are involved with complex traits. Technological advances have made it feasible to perform case-control association studies
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationSimple and Multiple Linear Regression
Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013
Lecture 3: Basic Statistical Tools Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 1 Basic probability Events are possible outcomes from some random process e.g., a genotype is AA, a phenotype
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More information12 Generalized linear models
12 Generalized linear models In this chapter, we combine regression models with other parametric probability models like the binomial and Poisson distributions. Binary responses In many situations, we
More informationGROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION
FOR SAMPLE OF RAW DATA (E.G. 4, 1, 7, 5, 11, 6, 9, 7, 11, 5, 4, 7) BE ABLE TO COMPUTE MEAN G / STANDARD DEVIATION MEDIAN AND QUARTILES Σ ( Σ) / 1 GROUPED DATA E.G. AGE FREQ. 0-9 53 10-19 4...... 80-89
More informationMcGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination
McGill University Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II Final Examination Date: 20th April 2009 Time: 9am-2pm Examiner: Dr David A Stephens Associate Examiner: Dr Russell Steele Please
More informationParametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios
ST3241 Categorical Data Analysis I Two-way Contingency Tables 2 2 Tables, Relative Risks and Odds Ratios 1 What Is A Contingency Table (p.16) Suppose X and Y are two categorical variables X has I categories
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationOct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope
Oct 2017 1 / 28 Minimum MSE Y is the response variable, X the predictor variable, E(X) = E(Y) = 0. BLUP of Y minimizes average discrepancy var (Y ux) = C YY 2u C XY + u 2 C XX This is minimized when u
More informationTUTORIAL 8 SOLUTIONS #
TUTORIAL 8 SOLUTIONS #9.11.21 Suppose that a single observation X is taken from a uniform density on [0,θ], and consider testing H 0 : θ = 1 versus H 1 : θ =2. (a) Find a test that has significance level
More informationWORKSHOP 3 Measuring Association
WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression
More informationJust Enough Likelihood
Just Enough Likelihood Alan R. Rogers September 2, 2013 1. Introduction Statisticians have developed several methods for comparing hypotheses and for estimating parameters from data. Of these, the method
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationMATH4427 Notebook 2 Fall Semester 2017/2018
MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationLECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit
LECTURE 6 Introduction to Econometrics Hypothesis testing & Goodness of fit October 25, 2016 1 / 23 ON TODAY S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define
More informationThe number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them.
Chapter 2 Statistics In the present chapter, I will briefly review some statistical distributions that are used often in this book. I will also discuss some statistical techniques that are important in
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationExam details. Final Review Session. Things to Review
Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit
More informationExperimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.
Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationSection VII. Chi-square test for comparing proportions and frequencies. F test for means
Section VII Chi-square test for comparing proportions and frequencies F test for means 0 proportions: chi-square test Z test for comparing proportions between two independent groups Z = P 1 P 2 SE d SE
More informationThe Multinomial Model
The Multinomial Model STA 312: Fall 2012 Contents 1 Multinomial Coefficients 1 2 Multinomial Distribution 2 3 Estimation 4 4 Hypothesis tests 8 5 Power 17 1 Multinomial Coefficients Multinomial coefficient
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More information