Lecture Notes. Introduction

Size: px
Start display at page:

Download "Lecture Notes. Introduction"

Transcription

1 5/3/016 Lecture Notes R. Rekaya June 1-10, 016 Introduction Variance components play major role in animal breeding and genetic (estimation of BVs) It has been an active area of research since early 1950 Dr. Henderson proposed 3 methods for VC estimation (not covered in these notes) MIVQUE and MINQUE 1

2 5/3/016 Introduction Maximum Likelihood (ML) Restricted Maximum likelihood (REML) Bayesian Methods Quadratic form Let y~n(μ, V) Introduction E y Qy = tr QV + μ, Qμ var y Qy = tr QV + 4μ, QVQμ p tr V = v ii i=1

3 5/3/016 Introduction n n y Qy = q ij y i y j i=1 j=1 n n E y Qy = E q ij y i y j i=1 j=1 n n = q ij E y y i=1 j=1 n n = q ij [cov y i, y j + E y i E(y j )] i=1 j=1 n n = q ij [v ij + μ i μ j ] i=1 j=1 n n n n = q ij v ij + i=1 j=1 i=1 j=1 n n q ij μ i = q ij v ij + μ Qμ i=1 j=1 = tr QV + μ Qμ μ j Introduction y μ) Q(y μ = y Qy μ Qy y Qμ + μ Qμ E[ y μ) Q(y μ ] = E(y Qy) μ QE(y) E(y )Qμ + μ Qμ = E y Qy μ Qμ μ Qμ + μ Qμ = E(y Qy) μ Qμ and E y Qy = E y μ) Q(y μ + μ Qμ because = tr QV + μ Qμ E y μ) Q(y μ = E tr y μ) Q(y μ = E tr[q y μ)(y μ ] = tr E[Q y μ)(y μ ] = tr QE[ y μ)(y μ ] = tr QV 3

4 5/3/016 Examples y~n(0, I) E y y = n var y y = n y~n(0, Iσ ) E y y = nσ var y y = nσ 4 y~n(μ, Iσ ) E y y = nσ + nμ var y y = nσ 4 + 4nμ σ Example Let Q 1 = I; Q = 1 n 11 ; and Q 3 = Z(Z Z) 1 Z Then y Q 1 y = y y y Q y = 1 n y 11 y y Q 3 y = y (Z Z Z 1 Z )y var y = V = ZZ σ u + Iσ e E y y = tr V + nμ = nσ u + nσ e 4

5 5/3/016 Translation Invariance Let y~n(xb + Zu, Iσ ) If QX = 0 Quadratic form is translation invariant E y Qy var y Qy = tr QV = tr QV Variance components are estimated using translation invariant quadratic forms Maximum Likelihood (ML) ML approach (Hartley and Rao; 1967) Conceptually simple Efficient in the use of information Assumes the fixed effect are known Estimates of fixed effects are used instead Biased estimates of variance components Bias depends on the number of fixed effects and the amount of information in each class 5

6 5/3/016 ML Simple model y = μ + e with e~n(0, Iσ e ) Conditional distribution of the data p y μ, σ e = p(y i μ, σ e ) n i=1 = (πσ e ) 1 exp[ n (y i μ) i=1 σ ] e It is a PDF where y is the random variable Likelihood function ML l(μ, σ e ; y) = (πσ e ) n (y i μ) exp[ ] σ i=1 e In the likelihood function, the random variables are the unknown parameters Thus, the likelihood function is not a PDF when viewed as a function of the unknown parameters n 6

7 5/3/016 ML To find the ML estimates, we need to maximize the likelihood function or its log L μ, σ e ; y = 0.5n ln π + ln σ e n i=1 (y i μ) = 0.5n[ln π + ln σ e + 1 n (y nσ i=1 e i μ) ] One way to find the ML estimates is to the take derivatives σ e ML Derivatives L μ, σ n e ; y i=1 y i nμ = μ σ e L μ, σ e ; y = n SSE (1 σ e σ e nσ ) e Where SSE is the sum of the squares of the errors 7

8 5/3/016 ML Setting the derivate equal to zero leads to: n i=1 y i nμ=0 μ = 1 y n i=1 i nσ e = SSE σ e = SSE n As μ is unknown, SSE is calculated using estimates of the mean SSE = 1 n (y n i=1 i μ) Estimate of the mean is independent of the variance. However, the latter is not. ML estimate of the variance is biased n ML Mixed Model General mixed model y = Xβ + Zu + e With u~n 0, G and e~n 0, R Likelihood function l(β, G, R; y) E y β Var y β = Xβ = ZGZ + R = ZAZ σ u + Iσ e = V = (π) n V 1 exp[ 1 y Xβ V 1 y Xβ ] 8

9 5/3/016 ML-Mixed model Maximization L β, G, R; y =.5n ln π.5 ln V Partial derivatives L β, V; y β 1 y Xβ V 1 y Xβ = X V 1 (y Xβ) L β, V; y σ u Recall ML-mixed model =.5tr V 1 ZAZ +. 5( y Xβ V 1 ZAZ V 1 y Xβ =.5tr V 1 V u +. 5( y Xβ V 1 V u V 1 y Xβ ln V x V 1 x V = tr(v 1 x ) V σ u = ZAZ = V 1 V x V 1 9

10 5/3/016 ML-mixed model Further, L β, V; y σ u =.5tr V 1 V u +. 5 y Xβ V 1 V u V 1 y Xβ +. 5 β β V 1 V u V 1 β β Similarly, L β, V; y σ e =.5tr V y Xβ V 1 V 1 y Xβ Because V σ =I e +. 5 β β V 1 V 1 β β ML-mixed model ML of fixed effects β = (X V 1 X) 1 X V 1 y Setting β = β and the partial derivatives respect to the variance components equal to zero leads to: tr V 1 V u = y Xβ V 1 V u V 1 y Xβ tr V 1 = ( y Xβ V 1 V 1 y Xβ 10

11 5/3/016 ML-mixed model Let P = V 1 V 1 X(X V 1 X) 1 X V 1 Py = V 1 y V 1 X X V 1 X 1 X V 1 y = V 1 y Xβ Thus, tr V 1 V u = y PV u Py tr V 1 ZAZ = y P(ZAZ )Py tr V 1 = y PPy Note that P is function of V ML-mixed model With V = ZAZ σ u + Iσ e Special case: one observation per animal (Z=I) V = Aσ u + Iσ e Several random effects V = k i=1 Z i G i Z i σ i + Iσ e where k is the number of random effects and p u i G i, σ i ~N(0, G i σ i ) for i=(1,, k) 11

12 5/3/016 ML-mixed model Problems Estimates of fixed effects are function of variance components The matrix V is a function of the unknown variance components Solutions for variance components involve non-linear equations due the inverse for V Estimates of variance components are biased (remember we replaced β by β ) Iterative approaches are needed to solve the system of equations Restricted Maximum Likelihood (REML) ML estimates of variance components are biased unless the fixed effect are known without errors Fixed effects are seldom known and estimating them will not solve the problem Can we estimate variance components after removing the fixed effects from the model? REML does just that! We will visit this again from a Bayesian perspective 1

13 5/3/016 REML Basic idea: Remove the fixed effects from the model! This could be accomplished by some linear transformation of the observed data REML Mixed model y = Xβ + Zu + e With u~n 0, G and e~n 0, R Imagine there is a transformation matrix K such that: KX = 0 13

14 5/3/016 REML Multiplying both side of the mixed model with K y = Ky = K Xβ + Zu + e = KXβ + KZu + Ke = KZu + Ke The likelihood function using the transformed data X n r l(g, R; y) = (π) KVK 1 exp[ 1 Ky KVK 1 Ky ] Keep in mind: ML REML Y KY X KX=0 Z KZ V KVK n n-r(x) r(x)=rank of X REML Log-likelihood function L G, R; y =.5(n r X ) ln π.5 ln KVK 1 Ky (KVK ) 1 Ky Using the following results (Searle, 1979) ln KVK = ln V + ln X V 1 X P = K (KVK ) 1 K 14

15 5/3/016 REML y K KVK 1 Ky = y Py = y Xβ V 1 y Xβ The log likelihood can be written as: L G, R; y =.5 ln V 1 ln X V 1 X.5 y Xβ V 1 y Xβ REML Derivative of log-likelihood L G, R; y σ i For i=e, u V =.5tr V 1 σ i.5tr Remember: X V 1 X 1 V X V 1 σ V 1 X i +. 5 y Xβ V V 1 σ V 1 y Xβ i =.5tr V 1. V 1 X X V 1 X 1 X V 1 V σ i +. 5y P V σ i Py Py = V 1 y Xβ 15

16 5/3/016 REML Remember: V σ = ZAZ and V u σ =I e Then L G, R; y σ e =.5tr V 1 V 1 X X V 1 X 1 X V y PPy =.5tr P +. 5y PPy L G, R; y σ u =.5tr V 1 V 1 X X V 1 X 1 X V 1 ZAZ +. 5y P(ZAZ )Py =.5tr PZAZ +. 5y PZAZ Py REML Setting the derivatives to zero leads to: tr P = y PPy for σ e tr PZAZ = y PZAZ Py for σ u Note that P is a function of V = ZAZ σ u + Iσ e 16

17 5/3/016 Algorithms Derivative based methods Newton-Raphson Let θ = σ e, σ u, NR is an iterative procedure that requires the second order derivatives θ k+1 = θ k H k 1 L θ; y θ θ k Where k is the iteration number and H is the Hessian matrix H = L θ; y θ θ REML Observed information matrix or Fisher information matrix L( θ, y) I H θ θ' If θ is a scalar L(, y) I Expected Fisher matrix L(, y) E ( I) E The expectation being taken over possible values of y for a fixed value of 17

18 5/3/016 Algorithms Derivative based methods Thus, in our case L θ; y σ θ k =.5tr P k +. 5y Pk P k y e L θ; y σ θ k =.5tr P k ZAZ +. 5y P k ZAZ P k y u H k ij = L θ; y σ u σ e =.5tr P k V u P k V e y P k V u P k V e P k y With V u = ZAZ and V e = I Algorithms Derivative based methods Fisher s Scoring Same as NR, except the Hessian matrix is replaced by its expectation Expectation is taking over the data Thus, it is free of the data θ k+1 = θ k + F k 1 L G, R; y θ With F k ij =.5tr P k V i P k V j There have been several variations of these two algorithms (AIREML). θ k 18

19 5/3/016 Recall: V = ZGZ + R Algorithms Derivative Free methods = R I + R 1 ZGZ = R I + R 1 ZGZ because AB = A B = R G 1 + R 1 ZZ G = R G G 1 + Z R 1 Z Algorithms Derivative based methods Let C = X R 1 X X R 1 Z Z R 1 X Z R 1 Z + G 1 C = (X R 1 X) G 1 + Z (R 1 R 1 X X R 1 X 1 X R 1 )Z = (Z R 1 Z + G 1 ) X (R 1 R 1 Z Z R 1 Z + G 1 1 Z R 1 )X Proof: A C B D = A D CA 1 B = D A BD 1 C 19

20 5/3/016 Algorithms Derivative based methods Using previous results ln V = ln R + ln G + ln G 1 + R 1 ZR 1 Z ln X V 1 X = ln C ln G 1 + Z R 1 Z Then L G, R; y =.5 ln R.5 ln G.5 ln C.5 y Xβ V 1 y Xβ =.5 ln R.5 ln G.5 ln C.5y Py Recall Algorithms Derivative based methods ln R = ln Iσ e = n ln(σ e ) ln G = q ln σ u ln C = ln X R 1 X + ln G 1 + Z WZ Where W= (R 1 R 1 X X R 1 X 1 X R 1 ) ln X R 1 X = ln X Xσ e = ln σ e r X X X = r x ln σ e + ln X X G 1 + Z WZ = σ e (G 1 σ e + Z MZ) 0

21 5/3/016 Algorithms Derivative based methods Finally, L G, R; y =.5 n r x q lnσ e.5qlnσ u.5ln C.5y Py where C = X X X Z Z X Z Z + G 1 σ e Expectation/Maximization (EM) Algorithm Basic idea The observed data is incomplete If the missing data is observed, it facilitate greatly the implementation What is the complete data? What is the missing data? If the random effects are observed, estimating variance components will be easy 1

22 5/3/016 EM Algorithm Thus, if u and e are observed σ u = E(u A 1 u) q σ e = E(e e) n Unfortunately, u and e are not observed! EM Algorithm However, E u y = σ u Z V 1 y Xβ = σ u Z Py var u y = σ u I σ 4 u Z V 1 y and, E u A 1 u y = qσ u + σ 4 u [y P ZAZ Py tr(pzaz ) E e e y = nσ e + σ e 4 [y PPy tr P ] Both expectations are function of the variances we are trying to estimates!

23 5/3/016 EM Algorithm To overcome this problem, Dempster et al (1977) suggested the following iterative approach 1. Expectation (E ) step: Using starting values for variances, calculate the expectation of the quadratic forms. Maximization (M) step: expectations from step 1 will be used to obtain VC 3. Repeat steps 1 and until convergence Set k=0; (σ e ) k, (σ u ) k E-Step EM-Algorithm E u A 1 u y k = q(σ u ) k + (σ u 4 ) k [y P ZAZ Py tr(pzaz ) E e e y k = n(σ e ) k + (σ e 4 ) k [y PPy tr P ] M-Step σ (k+1) u = E(u A 1 u) k q σ (k+1) e = E(e e) k n 3

REML Variance-Component Estimation

REML Variance-Component Estimation REML Variance-Component Estimation In the numerous forms of analysis of variance (ANOVA) discussed in previous chapters, variance components were estimated by equating observed mean squares to expressions

More information

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is Likelihood Methods 1 Likelihood Functions The multivariate normal distribution likelihood function is The log of the likelihood, say L 1 is Ly = π.5n V.5 exp.5y Xb V 1 y Xb. L 1 = 0.5[N lnπ + ln V +y Xb

More information

Chapter 12 REML and ML Estimation

Chapter 12 REML and ML Estimation Chapter 12 REML and ML Estimation C. R. Henderson 1984 - Guelph 1 Iterative MIVQUE The restricted maximum likelihood estimator (REML) of Patterson and Thompson (1971) can be obtained by iterating on MIVQUE,

More information

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012

Mixed-Model Estimation of genetic variances. Bruce Walsh lecture notes Uppsala EQG 2012 course version 28 Jan 2012 Mixed-Model Estimation of genetic variances Bruce Walsh lecture notes Uppsala EQG 01 course version 8 Jan 01 Estimation of Var(A) and Breeding Values in General Pedigrees The above designs (ANOVA, P-O

More information

Chapter 11 MIVQUE of Variances and Covariances

Chapter 11 MIVQUE of Variances and Covariances Chapter 11 MIVQUE of Variances and Covariances C R Henderson 1984 - Guelph The methods described in Chapter 10 for estimation of variances are quadratic, translation invariant, and unbiased For the balanced

More information

Mixed-Models. version 30 October 2011

Mixed-Models. version 30 October 2011 Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector

More information

Estimation of Variances and Covariances

Estimation of Variances and Covariances Estimation of Variances and Covariances Variables and Distributions Random variables are samples from a population with a given set of population parameters Random variables can be discrete, having a limited

More information

Estimation of Variance Components in Animal Breeding

Estimation of Variance Components in Animal Breeding Estimation of Variance Components in Animal Breeding L R Schaeffer July 19-23, 2004 Iowa State University Short Course 2 Contents 1 Distributions 9 11 Random Variables 9 12 Discrete Random Variables 10

More information

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP)

VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) VARIANCE COMPONENT ESTIMATION & BEST LINEAR UNBIASED PREDICTION (BLUP) V.K. Bhatia I.A.S.R.I., Library Avenue, New Delhi- 11 0012 vkbhatia@iasri.res.in Introduction Variance components are commonly used

More information

Chapter 5 Prediction of Random Variables

Chapter 5 Prediction of Random Variables Chapter 5 Prediction of Random Variables C R Henderson 1984 - Guelph We have discussed estimation of β, regarded as fixed Now we shall consider a rather different problem, prediction of random variables,

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Estimation: Problems & Solutions

Estimation: Problems & Solutions Estimation: Problems & Solutions Edps/Psych/Soc 587 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline 1. Introduction: Estimation

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Mixed models Yan Lu March, 2018, week 8 1 / 32 Restricted Maximum Likelihood (REML) REML: uses a likelihood function calculated from the transformed set

More information

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013 Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 013 1 Estimation of Var(A) and Breeding Values in General Pedigrees The classic

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Economics 620, Lecture 5: exp

Economics 620, Lecture 5: exp 1 Economics 620, Lecture 5: The K-Variable Linear Model II Third assumption (Normality): y; q(x; 2 I N ) 1 ) p(y) = (2 2 ) exp (N=2) 1 2 2(y X)0 (y X) where N is the sample size. The log likelihood function

More information

Estimation: Problems & Solutions

Estimation: Problems & Solutions Estimation: Problems & Solutions Edps/Psych/Stat 587 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline 1. Introduction: Estimation of

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

SB1a Applied Statistics Lectures 9-10

SB1a Applied Statistics Lectures 9-10 SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted

More information

Chapter 3 Best Linear Unbiased Estimation

Chapter 3 Best Linear Unbiased Estimation Chapter 3 Best Linear Unbiased Estimation C R Henderson 1984 - Guelph In Chapter 2 we discussed linear unbiased estimation of k β, having determined that it is estimable Let the estimate be a y, and if

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Mixed effects models - Part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Lecture 34: Properties of the LSE

Lecture 34: Properties of the LSE Lecture 34: Properties of the LSE The following results explain why the LSE is popular. Gauss-Markov Theorem Assume a general linear model previously described: Y = Xβ + E with assumption A2, i.e., Var(E

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives

More information

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32 ANOVA Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 32 We now consider the ANOVA approach to variance component estimation. The ANOVA approach

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Lecture 16 Solving GLMs via IRWLS

Lecture 16 Solving GLMs via IRWLS Lecture 16 Solving GLMs via IRWLS 09 November 2015 Taylor B. Arnold Yale Statistics STAT 312/612 Notes problem set 5 posted; due next class problem set 6, November 18th Goals for today fixed PCA example

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Merlise Clyde STA721 Linear Models Duke University August 31, 2017 Outline Topics Likelihood Function Projections Maximum Likelihood Estimates Readings: Christensen Chapter

More information

Lecture 24: Weighted and Generalized Least Squares

Lecture 24: Weighted and Generalized Least Squares Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 3 - Linear and Logistic Regression

Lecture 3 - Linear and Logistic Regression 3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Regression Estimation Least Squares and Maximum Likelihood

Regression Estimation Least Squares and Maximum Likelihood Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur Module Random Processes Version, ECE IIT, Kharagpur Lesson 9 Introduction to Statistical Signal Processing Version, ECE IIT, Kharagpur After reading this lesson, you will learn about Hypotheses testing

More information

Properties of Random Variables

Properties of Random Variables Properties of Random Variables 1 Definitions A discrete random variable is defined by a probability distribution that lists each possible outcome and the probability of obtaining that outcome If the random

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin March 23, 2011 1 Introduction 2 3 4 5 Outline Introduction 1 Introduction 2 3 4 5 Preliminaries Introduction Ordinary least squares

More information

Sampling Distributions

Sampling Distributions Sampling Distributions Mathematics 47: Lecture 9 Dan Sloughter Furman University March 16, 2006 Dan Sloughter (Furman University) Sampling Distributions March 16, 2006 1 / 10 Definition We call the probability

More information

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1, Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Poisson Regression. Ryan Godwin. ECON University of Manitoba Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression

More information

Sensitivity of GLS estimators in random effects models

Sensitivity of GLS estimators in random effects models of GLS estimators in random effects models Andrey L. Vasnev (University of Sydney) Tokyo, August 4, 2009 1 / 19 Plan Plan Simulation studies and estimators 2 / 19 Simulation studies Plan Simulation studies

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

EM Algorithm. Expectation-maximization (EM) algorithm.

EM Algorithm. Expectation-maximization (EM) algorithm. EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

Gaussian Processes 1. Schedule

Gaussian Processes 1. Schedule 1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32

ANOVA Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 32 ANOVA Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 32 We now consider the ANOVA approach to variance component estimation. The ANOVA approach

More information

Variance Component Estimation Using Constrained Nonlinear.Maximization. Franz Preitschopf Universitat Augsburg. George Casella Cornell University

Variance Component Estimation Using Constrained Nonlinear.Maximization. Franz Preitschopf Universitat Augsburg. George Casella Cornell University Variance Component Estimation Using Constrained Nonlinear.Maximization BU-1029-M June 1989 Franz Preitschopf Universitat Augsburg George Casella Cornell University Key words and phrases: Mixed model, maximum

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

2 Statistical Estimation: Basic Concepts

2 Statistical Estimation: Basic Concepts Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark

Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark March 22, 2017 WLS and BLUE (prelude to BLUP) Suppose that Y has mean β and known covariance matrix V (but Y need not

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information