f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2

Size: px
Start display at page:

Download "f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2"

Transcription

1 Chapter 7: EM algorithm in exponential families: JAW (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe, but cannot. (ii) Recall homework example (with added parameter σ 2 ): Y i i.i.d. from mixture 1 2 N( θ, σ2 ) N(θ, σ2 ) f Y (y; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( y 2 /(2σ 2 )) exp( θ 2 /(2σ 2 )) (exp( yθ/σ 2 ) + exp(yθ/σ 2 )) l n (θ, σ 2 ) = const (n/2) log(σ 2 ) ( Yi 2 )/(2σ 2 ) nθ 2 /(2σ 2 ) i log((exp( Y i θ/σ 2 ) + exp(y i θ/σ 2 )) + i (iv) What is the sufficient statistic? How would we estimate (θ, σ 2 )? (v) Now suppose each Y i caries a flag Z i = 1 or 1 as obsn i comes from N( θ, σ 2 ) or N(θ, σ 2 ). Let X i = (Y i, Z i ). f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) = const (n/2) log(σ 2 ) i (Y i θz i ) 2 /2σ 2 = const (n/2) log(σ 2 ) (2σ 2 ) 1 ( Y 2 i 2θ Y i Z i + θ 2 n) (vi) What now is sufficient statistic, if X i were observed? How now could you estimate (θ, σ 2 )? The Z i are latent variables ; X i are complete data (vii) E(l c,n Y ) requires only E(Z i Y i ) = φ((y i θ)/σ) φ((y i + θ)/σ) φ((y i θ)/σ) + φ((y i + θ)/σ) 1

2 7.2 Defining the EM algorithm (i) Data random variables Y, with probability measure Q θ, with θ Θ R k. Suppose Y has density f θ w.r.t. some σ-finite measure µ (usually Lebegue measure (pdf) or counting measure (pmf)). (ii) l(θ) = log L(θ) = log f θ (y), for observed data Y = y. Suppose maximization of l(θ) is messy/impossible; k > 1 but not huge. (iii) Suppose we augment the random variables to complete data X: Y = Y (X). Suppose X has density g θ (x). Then L(θ) = f θ (y) = and l c (θ) = log g θ (x) x:y(x)=y g θ(x)dµ(x) is known as the complete-data likelihood. (iv) Let Q(θ; θ ) = E θ (l c (θ) Y (X) = y). Then g θ (X) = h θ (X Y = y)f θ (y) l c (θ; X) = log h θ (X Y = y) + l(θ; y) Q(θ; θ ) = H(θ; θ ) + l(θ) y where H(θ; θ ) = E θ (log h θ (X Y = y) Y (X) = y) (v) EM algorithm is: E-step: at current estimate θ compute Q(θ; θ ). M-step: Maximize Q(θ; θ ) w.r.t. θ to obtain new estimate θ. Set θ = θ and repeat ad nauseam. 2

3 7.3 Why does EM work? (i) Recall (see Kullback-Leibler info) that for any densities p and q of r.v. Z, E q (log p(z)) = log(p(z))q(z)dµ(z) is maximized w.r.t p by p = q. (K(q; p) = E q log(q/p) 0.) (ii) Hence H(θ; θ ) H(θ, θ ) for all θ, θ. (iii) Now with new/old estimates θ, θ l( θ) l(θ ) = Q( θ; θ ) Q(θ ; θ ) (H( θ; θ ) H(θ ; θ )) by 7.2(iv) H(θ ; θ ) H( θ; θ ) by M-step 0 by (ii). Also l( θ) l(θ ) > 0, unless h θ(x Y ) = h θ (X Y ) (iv) Thus each step of EM cannot decrease l(θ) and usually increases l(θ). (v) If the MLE θ is the unique stationary point of l(θ) in the interior of the space, then θ θ (vi) In practice, EM is very robust, but can be very slow, especially in final stages: cgce is first-order. (vii) Caution: we do NOT use expectations to impute the missing data We compute the expected complete-data log-likelihood. This normally involves using conditional expectations to impute the complete-data sufficient statistics. This is NOT the same thing see hwk. And it could be more complicated than this although not if we have chosen sensible complete-data. 3

4 7.4 A multinomial example (i) Bernstein (1928) used population data to validate the hypothesis that human ABO blood tyoes are determined by 3 alleles, A, B and O at a single genetic locus, rather than being 2 independent factors A/not-A, B/not-B. (ii) Suppose that the population frequencies of the A, B and O are p, q and r (p + q + r = 1); we want to estimate (p, q, r). (iii) We assume that the types of the two alleles carried by an individual are independent (Hardy-Weinberg Equil: 1908), and that individuals are independent ( unrelated ). (iv) ABO blood types are determined as follows: blood type genotype freq. type geno. freq. A AA p 2 A AO 2pr B BB q 2 B BO 2qr AB AB 2pq O OO r 2 (v) Y M 4 (n, (p 2 + 2pr, q 2 + 2qr, 2pq, r 2 )) X M 6 (n, (p 2, 2pr, q 2, 2qr, 2pq, r 2 )) (vi) l(p, q, r) easy to evaluate but hard to max. l(p, q, r) = const + y A log(p 2 + 2pr) + y B log(q 2 + 2qr) +y AB log(2pq) + y O log(r 2 ) (v) l c (p, q, r) is easy to maximize: l c (p, q, r) = const + x AA log(p 2 ) + x AO log(2pr) + x BB log(q 2 ) +x BO log(2qr) + x AB log(2pq) + x OO log(r 2 ) = const + (2x AA + x AO + x AB ) log p + (2x BB + x BO + x AB ) log q + (2x OO + x AO + x BO ) log r 4

5 (vi) E-step: x AA = E p,q,r (X AA Y = y) = p2 p 2 +2pr y A = (vii) M-step: p = (2n) 1 (2x AA + x AO + y AB ), q = (2n) 1 (2x BB + x BO + y AB ), r = 1 p q. p p+2r y A etc. (viii) This method know to geneticists in 1950s: genecounting. (EM algorithm dates to 1977: Dempster,Laird & Rubin) 7.5 A mixture example (see prev hwk) (i) Y i i.i.d, with pdf f(y; θ, ψ) = θf 1 (y; ψ) + (1 θ)f 2 (y; ψ) (ii) l(θ, ψ) = i log(θf 1 (y i ; ψ) + (1 θ)f 2 (y i ; ψ)) (iii) Let Z i = I(Y i f 1 ). P (Z i = 1) = θ, l c (θ, ψ) = i(z i log θ + (1 Z i ) log(1 θ) + Z i log f 1 (y i ; ψ) + (1 Z i ) log f 2 (y i ; ψ)) (iv) l c is linear in Z i, so E-step requires only δ i = E(Z i Y ) = θf 1 (y i ; ψ) θf 1 (y i ; ψ) + (1 θ)f 2 (y i ; ψ) (v) M-step: θ = i δ i /n and ψ maximizes i δ i log f 1 (y i ; ψ) + (1 δ i ) log f 2 (y i ; ψ)) (vi) Example: f j (y i ; ψ) = ψj 1 exp( y i /ψ j ) i δ i log f 1 (y i ; ψ) = log ψ 1 i δ i i δ i y i ψ 1 = i δ i y i /( i δ i ), ψ2 = i(1 δ i )y i / i(1 δ i ). (vii) Be careful about identifiability exchanging the probs and labels on components gives same mixture: e.g. fix ψ 1 < ψ 2. 5

6 7.6 Other types of example (i) Missing data actual Caution: we do NOT use expectations to impute the missing data. (ii) Variance component models (see hwk 9) (a) Y = AZ + e, e N n (0, τ 2 ), Z N r (0, σ 2 G), A is n r matrix. Y N n (0, σ 2 AGA + τ 2 I) (b) X = (Y, Z): l c (σ 2, τ 2 ) = (n/2) log(τ 2 ) (r/2) log(σ 2 ) (2τ 2 ) 1 (y Az) (y Az) (2σ 2 ) 1 z G 1 z (c) σ 2 = r 1 E(z G 1 z y), τ 2 = n 1 E((Y AZ) (Y AZ) Y ), but E(z G 1 z y) E(z y) G 1 E(z y). (d) Note X N n+r (0, V ), so we have usual formulae E(Z Y ) = V zy Vyy 1 Y, var(z Y ) = V zz V zy Vyy 1 V yz. (e) Also, if E(W ) = 0, E(W BW ) = E( i,j W i W j B ij ) = i,j var(w ) ij B ij = tr(var(w )B). (iii) Censored data, age-of-onset-data, competing risks models, etc. (iv) Hidden states, latent variables: Models in Genetics, Biology, Climate modelling, Environmental modelling. (v) General auxiliary variables: the latent variables do not have to mean anything they are simply a tool, s.t. that the complete-data log-likelihood is easy. 6

7 7.7 Likelihood in Exponential families (i) See Chapter 2, for definitions, the natural sufficient statistics T j, the natural parameter space and parametrization π j. (iii) Moment formulae. see 2.2 (vii) log c(π) E(t j (X)) = Cov(t j (X), t j (X)) = 2 log c(π) π l (iv) Likelihood equation for exponential family see 4.6 l = n(n 1 The natural sufficient statistics T j expectations nτ j. n 1 t j (X i ) E(t j (X))) (v) The information results (see 4.6) are set equal to their I(π) = J(π) = var(t 1,..., T k ) I(τ) = (var(t 1,..., T k )) 1 where τ j = E(t j (X)) (T 1,..., T k ) achieves (multiparameter) CRLB for (τ 1,..., τ k ). (v) Suppose complete-data X has exp.fam. form: for n- sample T j (X) = n i=1 t j (X i ) log g θ (X) = log c(θ) + k j=1 Q(θ; θ ) = log c(θ) + k j=1 π j (θ)t j (X) + log w(x) π j (θ)e θ (T j (X) Y ) + E θ (log w(x) Y ). 7

8 7.8 EM for exponential families (i) In natural parametrization π j : Q(π; π ) = log c(π) + k Q j=1 π j E π (T j (X) Y ) = E π (T j (X) Y ) + log c(π) = E π (T j (X) Y ) E π (T j (X)) Thus EM iteratively fits unconditioned to conditioned expectations of T j. At MLE E π (T j (X) Y ) = E π (T j (X)). (ii) Recall l(π) = log g π (X) log h π (X Y ) w(x) exp( j π j t j (X)) but h π (X Y ) = y(x)=y w(x) exp( j π j t j (X))dX = c (π; Y )w(x) exp( π j t j (X)) j so l(π) = log c(π) log c (π; Y ) (iii) Hence, differentiating this: l = E π (T j ) + E π (T j Y ) At MLE : E π (T j ) = E π (T j Y ) (iv) Differentiating again: 2 l = Cov(T j, T l ) Cov((T j, T l ) Y ) π l If Y determines X, var(t (X) Y ) = 0, and then observed information is var(t ) as for any exp fam. If Y tells nothing about X, var(t (X) Y ) = var(t (X)), and observed information is 0. Information lost due to observing Y not X is var(t (X) Y ). 8

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Overview - 1 Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Elizabeth Thompson University of Washington Seattle, WA, USA MWF 8:30-9:20; THO 211 Web page: www.stat.washington.edu/ thompson/stat550/

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Allele Frequency Estimation

Allele Frequency Estimation Allele Frequency Estimation Examle: ABO blood tyes ABO genetic locus exhibits three alleles: A, B, and O Four henotyes: A, B, AB, and O Genotye A/A A/O A/B B/B B/O O/O Phenotye A A AB B B O Data: Observed

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Classical Estimation Topics

Classical Estimation Topics Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Optimization Methods II. EM algorithms.

Optimization Methods II. EM algorithms. Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

EM Algorithm. Expectation-maximization (EM) algorithm.

EM Algorithm. Expectation-maximization (EM) algorithm. EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,

More information

EM algorithm and applications Lecture #9

EM algorithm and applications Lecture #9 EM algorithm and applications Lecture #9 Bacground Readings: Chapters 11.2, 11.6 in the text boo, Biological Sequence Analysis, Durbin et al., 2001.. The EM algorithm This lecture plan: 1. Presentation

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

EM for ML Estimation

EM for ML Estimation Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: method of moments. the Now we

More information

Review of Maximum Likelihood Estimators

Review of Maximum Likelihood Estimators Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Lagrange Multipliers

Lagrange Multipliers Calculus 3 Lia Vas Lagrange Multipliers Constrained Optimization for functions of two variables. To find the maximum and minimum values of z = f(x, y), objective function, subject to a constraint g(x,

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Evaluating the Performance of Estimators (Section 7.3)

Evaluating the Performance of Estimators (Section 7.3) Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

Methods of Estimation

Methods of Estimation Methods of Estimation MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Methods of Estimation I 1 Methods of Estimation I 2 X X, X P P = {P θ, θ Θ}. Problem: Finding a function θˆ(x ) which is close to θ.

More information

MIT Spring 2016

MIT Spring 2016 Exponential Families II MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Exponential Families II 1 Exponential Families II 2 : Expectation and Variance U (k 1) and V (l 1) are random vectors If A (m k),

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input

More information

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Chapter 4: Asymptotic Properties of the MLE (Part 2) Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response

More information

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644: Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

STA 260: Statistics and Probability II

STA 260: Statistics and Probability II Al Nosedal. University of Toronto. Winter 2017 1 Properties of Point Estimators and Methods of Estimation 2 3 If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Inference in non-linear time series

Inference in non-linear time series Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

1.5 MM and EM Algorithms

1.5 MM and EM Algorithms 72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students

More information

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

More information

Likelihood, Estimation and Testing

Likelihood, Estimation and Testing Chapter 2 Likelihood, Estimation and Testing 2.1 Likelihood and log-likelihood. In this and the following section, we review briefly the basic ideas and results of likelihood inference: details may be

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1, Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem

More information

Chapter 6 Likelihood Inference

Chapter 6 Likelihood Inference Chapter 6 Likelihood Inference CHAPTER OUTLINE Section 1 The Likelihood Function Section 2 Maximum Likelihood Estimation Section 3 Inferences Based on the MLE Section 4 Distribution-Free Methods Section

More information

Theory and Use of the EM Algorithm. Contents

Theory and Use of the EM Algorithm. Contents Foundations and Trends R in Signal Processing Vol. 4, No. 3 (2010) 223 296 c 2011 M. R. Gupta and Y. Chen DOI: 10.1561/2000000034 Theory and Use of the EM Algorithm By Maya R. Gupta and Yihua Chen Contents

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

1 Complete Statistics

1 Complete Statistics Complete Statistics February 4, 2016 Debdeep Pati 1 Complete Statistics Suppose X P θ, θ Θ. Let (X (1),..., X (n) ) denote the order statistics. Definition 1. A statistic T = T (X) is complete if E θ g(t

More information

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your answer

More information

A note on shaved dice inference

A note on shaved dice inference A note on shaved dice inference Rolf Sundberg Department of Mathematics, Stockholm University November 23, 2016 Abstract Two dice are rolled repeatedly, only their sum is registered. Have the two dice

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Lecture 14 More on structural estimation

Lecture 14 More on structural estimation Lecture 14 More on structural estimation Economics 8379 George Washington University Instructor: Prof. Ben Williams traditional MLE and GMM MLE requires a full specification of a model for the distribution

More information

MATH4427 Notebook 2 Fall Semester 2017/2018

MATH4427 Notebook 2 Fall Semester 2017/2018 MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Fall 2003: Maximum Likelihood II

Fall 2003: Maximum Likelihood II 36-711 Fall 2003: Maximum Likelihood II Brian Junker November 18, 2003 Slide 1 Newton s Method and Scoring for MLE s Aside on WLS/GLS Application to Exponential Families Application to Generalized Linear

More information

26. LECTURE 26. Objectives

26. LECTURE 26. Objectives 6. LECTURE 6 Objectives I understand the idea behind the Method of Lagrange Multipliers. I can use the method of Lagrange Multipliers to maximize a multivariate function subject to a constraint. Suppose

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Lecture 17: Likelihood ratio and asymptotic tests

Lecture 17: Likelihood ratio and asymptotic tests Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f

More information

Nonparametric Tests for Multi-parameter M-estimators

Nonparametric Tests for Multi-parameter M-estimators Nonparametric Tests for Multi-parameter M-estimators John Robinson School of Mathematics and Statistics University of Sydney The talk is based on joint work with John Kolassa. It follows from work over

More information

Stat 451 Lecture Notes Monte Carlo Integration

Stat 451 Lecture Notes Monte Carlo Integration Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:

More information

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk

Theory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

1 Expectation Maximization

1 Expectation Maximization Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Gaussian Mixtures and the EM algorithm

Gaussian Mixtures and the EM algorithm Gaussian Mixtures and the EM algorithm 1 sigma=1.0 sigma=1.0 Responsibilities 0.0 0.2 0.4 0.6 0.8 1.0 sigma=0.2 sigma=0.2 Responsibilities 0.0 0.2 0.4 0.6 0.8 1.0 2 Details of figure Left panels: two Gaussian

More information