Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Similar documents
Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Generalized Linear Models 1

Goodness of Fit Goodness of fit - 2 classes

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Chapter 22: Log-linear regression for Poisson counts

Linear Regression Models P8111

EM algorithm and applications Lecture #9

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

2.3 Analysis of Categorical Data

Statistical Distribution Assumptions of General Linear Models

Generalized Linear Models (1/29/13)

BMI 541/699 Lecture 22

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2

Generalized linear models

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

STAT 536: Genetic Statistics

8 Nominal and Ordinal Logistic Regression

Generalized Linear Models

High-Throughput Sequencing Course

UNIVERSITY OF TORONTO Faculty of Arts and Science

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

Logistic Regression - problem 6.14

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Binary Logistic Regression

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Sociology 6Z03 Review II

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Unit 9: Inferences for Proportions and Count Data

ML Testing (Likelihood Ratio Testing) for non-gaussian models

Lecture 10: Introduction to Logistic Regression

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

Model Estimation Example

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Unit 9: Inferences for Proportions and Count Data

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

One-Way Tables and Goodness of Fit

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Statistical Analysis for QBIC Genetics Adapted by Ellen G. Dow 2017

9 Generalized Linear Models

Correlation and Regression

Case-Control Association Testing. Case-Control Association Testing

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Statistics 3858 : Maximum Likelihood Estimators

Simple and Multiple Linear Regression

Semiparametric Generalized Linear Models

Exam Applied Statistical Regression. Good Luck!

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Stat 5101 Lecture Notes

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Stat 642, Lecture notes for 04/12/05 96

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

12 Generalized linear models

GROUPED DATA E.G. FOR SAMPLE OF RAW DATA (E.G. 4, 12, 7, 5, MEAN G x / n STANDARD DEVIATION MEDIAN AND QUARTILES STANDARD DEVIATION

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Single-level Models for Binary Responses

Generalized linear models

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

TUTORIAL 8 SOLUTIONS #

WORKSHOP 3 Measuring Association

Just Enough Likelihood

MATH 644: Regression Analysis Methods

Lecture 12: Effect modification, and confounding in logistic regression

Stat 5102 Final Exam May 14, 2015

Investigating Models with Two or Three Categories

MATH4427 Notebook 2 Fall Semester 2017/2018

Confidence Intervals, Testing and ANOVA Summary

BTRY 4830/6830: Quantitative Genomics and Genetics

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them.

BIOS 2083 Linear Models c Abdus S. Wahed

Modeling Overdispersion

Correlation and regression

Exam details. Final Review Session. Things to Review

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Discrete Multivariate Statistics

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

The Multinomial Model

Classification. Chapter Introduction. 6.2 The Bayes classifier

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Transcription:

Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability measures how probable it is that D will occur rather than some other data value. After the experiment, data D are known, and P (D H), with D fixed and H varying, can be regarded as a measure of how likely it is that H is the true hypothesis, rather than another, after observing data D. When P (D H) is regarded as a function of H rather than D we call P (D H) the likelihood of H (if considering a single H) or the likelihood function (if considering all possible H). We regard H 1 as more compatible with the data than H 2 if P (D H 1 ) > P (D H 2 ). Usually H specifies the value of a parameter, or set of parameters, and the likelihood is the probability (or probability density) for the data values, considered as a function of the parameters of the distribution. It is more convenient to work with the (natural) log of the likelihood function than the likelihood itself. When comparing hypotheses or parameter values, only relative values of likelihood are important (ratios of likelihoods, or differences between log-likelihoods). Any multiplicative factor in the likelihood (or additive term in the log-likelihood) that does not include the parameter can be ignored. See examples below. The laws of probability apply when we regard P (D H) as a function of D. They do not apply when P (D H) is regarded as a function of H. Binomial distribution Imagine we perform a simple experiment (e.g. tossing a coin) n times, and each trial results in success (with probability p), or failure (with probability 1 p). Each success contributes a factor p to the likelihood, and each failure contributes a factor (1 p). The corresponding (additive) contributions to the log-likelihood are log p and log(1 p). If there are Y successes and n Y failures, the log likelihood is Y log p + (n Y ) log(1 p). The maximum value of the (log) likelihood occurs when p = Y/n, which we call the maximum likelihood or ML estimate of p. Poisson distribution If Y is Poisson with mean m, the log-likelihood for m is Y log m m. The ML estimate of m is Y. If we have a sample of size n from this distn, the log-likelihood is the sum of contributions from each sample value, ( Y i ) log m nm, and the ML estimate of m is Ȳ = ( Y i )/n. Multinomial distribution This is a generalization of the binomial distribution, in which each trial has k possible outcomes, with probabilities p 1 p k. In n independent trials the first outcome occurs Y 1 times, the second Y 2 times, etc. The binomial distn is the case k = 2. By an argument analogous to that used above for the binomial distribution, the log-likelihood is Y 1 log p 1 + Y 2 log p 2 +... + Y k log p k. With no restrictions on the probabilities, apart from satisfying p 1 + + p k = 1, the ML estimate of p i is Y i /n, for i = 1 to k. Multinomial applications of ML are more interesting when a null hypothesis imposes constraints on the probabilities. See below. 7

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, 2 log L = n log υ + (1/υ)[S yy + n(ȳ m)2 ] The final term is minimized when m = Ȳ, so the ML estimate of m is the sample mean. Substituting this value for m leads to the profile likelihood for υ, n log υ + S yy /υ which is minimized when υ = S yy /n. The ML estimator in this case is biased, and this is generally true of ML estimators of variance components. The following argument is often used when estimating variance components. The value of S yy gives us information about υ, and if m is known, so does (Ȳ m)2. But if, as is usually the case, m is unknown, the information in (Ȳ m)2 is not available. Maximum likelihood estimation of υ should therefore be based on the probability distn of S yy alone (rather than the joint distn of S yy and Ȳ ). This argument leads to a modified (restricted, or residual) likelihood with 2 log L = (n 1) log υ + S yy /υ (based on the fact that the distn of S yy /υ is chi-squared with n 1 d.f.) The estimate obtained by maximizing the modified likelihood is called the REML estimate. Here the REML estimate is the sample variance. In previous examples the estimators could have been guessed without invoking ML. In the following examples the estimator is less obvious. Genetic linkage The progeny of two maize plants are categorized as starchy or sugary, with leaves which are either green or white. The combinations, starchy-green, starchy-white, sugary-green and sugary-white occur with probabilities (2 + θ), (1 θ), (1 θ), θ (Starchiness and leaf colour are determined at two genetic loci. The parameter θ is a measure of linkage between the two loci.) With frequencies Y 1 Y in the categories, the log-likelihood is const + Y 1 log(2 + θ) + (Y 2 + Y 3 ) log(1 θ) + Y log θ. The log-likelihood shown in slide... is based on the following data for 3839 progeny: Starchy Sugary Green White Green White Total 1997 906 90 32 3839 The ML estimate of θ is 0.0358±0.0060. See Q3 on the problem sheet. The standard error is calculated from the curvature of logl at the maximum. Usually there is no explicit expression for the ML estimate, and it has to be found by numerical optimization. This is an iterative process, based on successive refinements of an initial guess. 8

ABO blood group system In the ABO system, three alleles (A, B and O) at a single locus give rise to six genotypes and four phenotypes. Unknown frequencies for A, B, and O alleles are p, q and r (p + q + r = 1). Phenotype Genotype(s) Probability Observed frequency A AA, AO p 2 + 2pr n A B BB, BO q 2 + 2qr n B AB AB 2pq n AB O OO r 2 n O After simplification, the log-likelihood is (n A + n AB ) log p + (n B + n AB ) log q + n A log(p + 2r) + n B log(q + 2r) + 2n O log r This can be expressed as a function of just two parameters (p and q for example) by setting r = 1 p q. See question on the problem sheet. Likelihood ratio test The likelihood give us a general method for finding estimators of unknown parameters. It also provides a statistic for hypothesis testing. Suppose a model specifies that parameter vector ψ belongs to (lives in) a d dimensional space, so that there are d free parameters. For example, if we have multinomial data with k probabilities which sum to 1, d = k 1. The null hypothesis H 0 specifies that ψ is restricted to a subspace of dimension s. In other words, H 0 expresses ψ in terms of s < d free parameters (s = 0 when the vector of probabilities is completely specified by H 0 ). Denote by l c and l u the maximized log-likelihood with and without constraints. The likelihood ratio test statistic is 2(l u l c ). In large samples, the null distn of the LRT statistic is approximately chi-squared with d s d.f. For example, suppose we have sample counts of genotypes A 1 A 1, A 1 A 2, and A 2 A 2. In the population, genotype probabilities (or proportions) are p 1, p 2, and p 3. The vector parameter is ψ = (p 1, p 2, p 3 ). Because p 1 + p 2 + p 3 = 1, the number of free parameters is d = 3 1 = 2. The hypothesis of Hardy-Weinberg equilibrium states that p 1 = θ 2, p 2 = 2θ(1 θ), p 3 = (1 θ) 2, for some unspecified value of θ. Here d = 2, s = 1, d s = 1. A trivial example Take the unconstrained model to be that for the genetic linkage example above, and the constrained model to be H 0 : θ = θ 0, where θ 0 is a single specified value. Here d = 1, s = 0. In the figure on slide..., l u is the maximum value of the log-likelihood, and l c is the value at θ 0. The LRT statistic is twice the difference between these values. The null distn is chi-squared with d s = 1 d.f. H 0 is rejected at the 5% level if 2(l u l c ) > 3.8 The confidence interval shown in slide... is obtained as the set of all θ 0 values not rejected by this test. These satisfy the condition l c > l u 1.92. In the figure, the arbitrary constant in logl has been chosen so that the maximum value of logl is zero. 9

Binomial regression For i = 1... k, Y i is binomial with index n i and parameter p i. Associated with each observation Y i is the value of an explanatory variable X i : Generalized regression model for p i : ( ) pi log = b 0 + b 1 X i ( logit transform ) 1 p i which resembles a standard regression equation. The log-likelihood for parameters b 0 and b 1 (details omitted) is [Yi log p i + (n i Y i ) log(1 p i )] where p i is given by the regression equation above, p i = exp(b 0 + b 1 X i ) 1 + exp(b 0 + b 1 X i ) ( logistic function) Example: Samples of 50 people were taken at five different ages. Numbers in each group affected by a disease were counted. Incidence of the disease obviously increases with age. Age 20 35 5 55 70 Number 6 17 26 37 The slope estimate is ˆb 1 = 0.081 ± 0.0108. The fitted logistic curve is shown in slide... The value of the LRT statistic for testing b 1 = 0 is 81.83 with 1 d.f. Log-linear models Frequencies Y 1 Y k are assumed independently Poisson distributed with means m 1 m k. In a log-linear model, we assume that log m i = b 0 + b 1 X i +, or, equivalently, that m i = exp(b 0 + b 1 X i + ) but this assumption is not used in deriving the LRT. The log-likelihood is Y i log m i m i With no restrictions on means, the ML estimate of m i is Y i, and the unconstrained maximum log-likelihood is Yi log Y i Y i (unconstrained) Now consider null hypothesis H 0 which specifies a model with s < k parameters. Denote ML estimates under H 0 by ˆm 1 ˆm k. Maximum log-likelihood under H 0 is Yi log ˆm i ˆm i (constrained) 50

LRT statistic is twice the difference between unconstrained and constrained maxima: 2 Y i log(y i / ˆm i ) 2 (Y i ˆm i ) (Second term is zero if model includes an intercept, which it nearly always does.) In large samples, null distn of LRT is approximately chi-squared with k s d.f. When used with the multinomial, binomial or Poisson distn, the LRT statistic is called the residual deviance and denoted G 2. It is an alternative to the chi-squared goodness of fit statistic. X 2 = (O E) 2, G 2 = 2 O log O E E The two statistics have the same d.f., and are usually similar in magnitude. Generalized linear models The binomial regression and log-linear models are examples of a generalized linear model (GLM). In the linear (multiple regression) model, the response variable Y is assumed to be normally distributed with constant variance. The mean value E(Y ) is assumed to be related to predictor variables X 1, X 2,... E(Y ) = b 0 + b 1 X 1 + b 2 X 2 + The expression on the right-hand side is the linear predictor. With a GLM, a) other distns are allowed for Y (binomial, Poisson, exponential). b) var(y ) is allowed to depend on E(Y ). c) the linear predictor is related to a function of E(Y ) (the link function). Estimation of parameters is by ML. The linear predictor can include any mix of covariates and factors, just as for the multiple regression model. E.g. with Poisson data categorized by two factors we can analyse an association table. This provides an alternative to the chi-squared association test. Example: test for association The test for association is a test of proportionality of the frequencies in the two-way table, and this is equivalent to additivity on the log scale. An alternative to the chi-squared test assumes the table counts are Poisson distributed and fits the log-linear model: log m = (Intercept) + Row effect + Column effect For a table with r rows and c columns, the unrestricted number of parameters is rc, and the null hypothesis specifies an additive model with 1 + (r 1) + (c 1) parameters. The LRT test statistic therefore has (r 1)(c 1) d.f. Using R In R, simple ML calculations are performed with mle( ). Create a function to evaluate minus the log-likelihood, then pass this function together with starting values to mle( ). For the genetic linkage example, 51

minuslogl <- function(x = 0.1) { if (xˆ2 < x) -1997 * log(2 + x) - 1810 * log(1 - x) - 32 * log(x) else Inf } library(stats) fit <- mle(minuslogl) summary(fit) For the ABO example, the function has two arguments: na <- 212; nb <- 103; nab <- 39; no <- 19 minuslogl <- function(p, q) { r <- 1 - p - q -(na + nab) * log(p) -... } The mle( ) function is part of the stats package. Use library( ) just once to load the package. The mle( ) function is then available for the rest of your R session. # Binomial regression X <- c(20, 35, 5, 55, 70) R <- c(6, 17, 26, 37, ) N <- rep(50, 5) fit <- glm(r/n X, weight = N, family = binomial(link = "logit")) summary(fit) The output from summary(fit) gives two deviances: null deviance = 82.1 with d.f., residual deviance = 0.32 with 3 d.f. The difference between these, 81.82 with 1 d.f., is the LRT for the hypothesis b 1 = 0 (very significant, off the scale). An example of a 2 2 association table (tonsils data from week 3): freqs <- c(19, 53, 97, 829) carrier <- gl(2, 2, labels = c("yes", "no")) enlarged <- gl(2, 1,, labels = c("no", "yes")) fit <- glm(freqs carrier + enlarged, family = poisson(link = "log")) summary(fit) The residual deviance is the LRT equivalent to the chi-squared statistic. When using family = poisson, the chi-squared statistic can be obtained with sum(resid(fit, type = "pearson")ˆ2) and fitted values (same as for chi-squared test) are given by fitted(fit). 52