f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2
|
|
- Julius Allison
- 5 years ago
- Views:
Transcription
1 Chapter 7: EM algorithm in exponential families: JAW (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe, but cannot. (ii) Recall homework example (with added parameter σ 2 ): Y i i.i.d. from mixture 1 2 N( θ, σ2 ) N(θ, σ2 ) f Y (y; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( y 2 /(2σ 2 )) exp( θ 2 /(2σ 2 )) (exp( yθ/σ 2 ) + exp(yθ/σ 2 )) l n (θ, σ 2 ) = const (n/2) log(σ 2 ) ( Yi 2 )/(2σ 2 ) nθ 2 /(2σ 2 ) i log((exp( Y i θ/σ 2 ) + exp(y i θ/σ 2 )) + i (iv) What is the sufficient statistic? How would we estimate (θ, σ 2 )? (v) Now suppose each Y i caries a flag Z i = 1 or 1 as obsn i comes from N( θ, σ 2 ) or N(θ, σ 2 ). Let X i = (Y i, Z i ). f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) = const (n/2) log(σ 2 ) i (Y i θz i ) 2 /2σ 2 = const (n/2) log(σ 2 ) (2σ 2 ) 1 ( Y 2 i 2θ Y i Z i + θ 2 n) (vi) What now is sufficient statistic, if X i were observed? How now could you estimate (θ, σ 2 )? The Z i are latent variables ; X i are complete data (vii) E(l c,n Y ) requires only E(Z i Y i ) = φ((y i θ)/σ) φ((y i + θ)/σ) φ((y i θ)/σ) + φ((y i + θ)/σ) 1
2 7.2 Defining the EM algorithm (i) Data random variables Y, with probability measure Q θ, with θ Θ R k. Suppose Y has density f θ w.r.t. some σ-finite measure µ (usually Lebegue measure (pdf) or counting measure (pmf)). (ii) l(θ) = log L(θ) = log f θ (y), for observed data Y = y. Suppose maximization of l(θ) is messy/impossible; k > 1 but not huge. (iii) Suppose we augment the random variables to complete data X: Y = Y (X). Suppose X has density g θ (x). Then L(θ) = f θ (y) = and l c (θ) = log g θ (x) x:y(x)=y g θ(x)dµ(x) is known as the complete-data likelihood. (iv) Let Q(θ; θ ) = E θ (l c (θ) Y (X) = y). Then g θ (X) = h θ (X Y = y)f θ (y) l c (θ; X) = log h θ (X Y = y) + l(θ; y) Q(θ; θ ) = H(θ; θ ) + l(θ) y where H(θ; θ ) = E θ (log h θ (X Y = y) Y (X) = y) (v) EM algorithm is: E-step: at current estimate θ compute Q(θ; θ ). M-step: Maximize Q(θ; θ ) w.r.t. θ to obtain new estimate θ. Set θ = θ and repeat ad nauseam. 2
3 7.3 Why does EM work? (i) Recall (see Kullback-Leibler info) that for any densities p and q of r.v. Z, E q (log p(z)) = log(p(z))q(z)dµ(z) is maximized w.r.t p by p = q. (K(q; p) = E q log(q/p) 0.) (ii) Hence H(θ; θ ) H(θ, θ ) for all θ, θ. (iii) Now with new/old estimates θ, θ l( θ) l(θ ) = Q( θ; θ ) Q(θ ; θ ) (H( θ; θ ) H(θ ; θ )) by 7.2(iv) H(θ ; θ ) H( θ; θ ) by M-step 0 by (ii). Also l( θ) l(θ ) > 0, unless h θ(x Y ) = h θ (X Y ) (iv) Thus each step of EM cannot decrease l(θ) and usually increases l(θ). (v) If the MLE θ is the unique stationary point of l(θ) in the interior of the space, then θ θ (vi) In practice, EM is very robust, but can be very slow, especially in final stages: cgce is first-order. (vii) Caution: we do NOT use expectations to impute the missing data We compute the expected complete-data log-likelihood. This normally involves using conditional expectations to impute the complete-data sufficient statistics. This is NOT the same thing see hwk. And it could be more complicated than this although not if we have chosen sensible complete-data. 3
4 7.4 A multinomial example (i) Bernstein (1928) used population data to validate the hypothesis that human ABO blood tyoes are determined by 3 alleles, A, B and O at a single genetic locus, rather than being 2 independent factors A/not-A, B/not-B. (ii) Suppose that the population frequencies of the A, B and O are p, q and r (p + q + r = 1); we want to estimate (p, q, r). (iii) We assume that the types of the two alleles carried by an individual are independent (Hardy-Weinberg Equil: 1908), and that individuals are independent ( unrelated ). (iv) ABO blood types are determined as follows: blood type genotype freq. type geno. freq. A AA p 2 A AO 2pr B BB q 2 B BO 2qr AB AB 2pq O OO r 2 (v) Y M 4 (n, (p 2 + 2pr, q 2 + 2qr, 2pq, r 2 )) X M 6 (n, (p 2, 2pr, q 2, 2qr, 2pq, r 2 )) (vi) l(p, q, r) easy to evaluate but hard to max. l(p, q, r) = const + y A log(p 2 + 2pr) + y B log(q 2 + 2qr) +y AB log(2pq) + y O log(r 2 ) (v) l c (p, q, r) is easy to maximize: l c (p, q, r) = const + x AA log(p 2 ) + x AO log(2pr) + x BB log(q 2 ) +x BO log(2qr) + x AB log(2pq) + x OO log(r 2 ) = const + (2x AA + x AO + x AB ) log p + (2x BB + x BO + x AB ) log q + (2x OO + x AO + x BO ) log r 4
5 (vi) E-step: x AA = E p,q,r (X AA Y = y) = p2 p 2 +2pr y A = (vii) M-step: p = (2n) 1 (2x AA + x AO + y AB ), q = (2n) 1 (2x BB + x BO + y AB ), r = 1 p q. p p+2r y A etc. (viii) This method know to geneticists in 1950s: genecounting. (EM algorithm dates to 1977: Dempster,Laird & Rubin) 7.5 A mixture example (see prev hwk) (i) Y i i.i.d, with pdf f(y; θ, ψ) = θf 1 (y; ψ) + (1 θ)f 2 (y; ψ) (ii) l(θ, ψ) = i log(θf 1 (y i ; ψ) + (1 θ)f 2 (y i ; ψ)) (iii) Let Z i = I(Y i f 1 ). P (Z i = 1) = θ, l c (θ, ψ) = i(z i log θ + (1 Z i ) log(1 θ) + Z i log f 1 (y i ; ψ) + (1 Z i ) log f 2 (y i ; ψ)) (iv) l c is linear in Z i, so E-step requires only δ i = E(Z i Y ) = θf 1 (y i ; ψ) θf 1 (y i ; ψ) + (1 θ)f 2 (y i ; ψ) (v) M-step: θ = i δ i /n and ψ maximizes i δ i log f 1 (y i ; ψ) + (1 δ i ) log f 2 (y i ; ψ)) (vi) Example: f j (y i ; ψ) = ψj 1 exp( y i /ψ j ) i δ i log f 1 (y i ; ψ) = log ψ 1 i δ i i δ i y i ψ 1 = i δ i y i /( i δ i ), ψ2 = i(1 δ i )y i / i(1 δ i ). (vii) Be careful about identifiability exchanging the probs and labels on components gives same mixture: e.g. fix ψ 1 < ψ 2. 5
6 7.6 Other types of example (i) Missing data actual Caution: we do NOT use expectations to impute the missing data. (ii) Variance component models (see hwk 9) (a) Y = AZ + e, e N n (0, τ 2 ), Z N r (0, σ 2 G), A is n r matrix. Y N n (0, σ 2 AGA + τ 2 I) (b) X = (Y, Z): l c (σ 2, τ 2 ) = (n/2) log(τ 2 ) (r/2) log(σ 2 ) (2τ 2 ) 1 (y Az) (y Az) (2σ 2 ) 1 z G 1 z (c) σ 2 = r 1 E(z G 1 z y), τ 2 = n 1 E((Y AZ) (Y AZ) Y ), but E(z G 1 z y) E(z y) G 1 E(z y). (d) Note X N n+r (0, V ), so we have usual formulae E(Z Y ) = V zy Vyy 1 Y, var(z Y ) = V zz V zy Vyy 1 V yz. (e) Also, if E(W ) = 0, E(W BW ) = E( i,j W i W j B ij ) = i,j var(w ) ij B ij = tr(var(w )B). (iii) Censored data, age-of-onset-data, competing risks models, etc. (iv) Hidden states, latent variables: Models in Genetics, Biology, Climate modelling, Environmental modelling. (v) General auxiliary variables: the latent variables do not have to mean anything they are simply a tool, s.t. that the complete-data log-likelihood is easy. 6
7 7.7 Likelihood in Exponential families (i) See Chapter 2, for definitions, the natural sufficient statistics T j, the natural parameter space and parametrization π j. (iii) Moment formulae. see 2.2 (vii) log c(π) E(t j (X)) = Cov(t j (X), t j (X)) = 2 log c(π) π l (iv) Likelihood equation for exponential family see 4.6 l = n(n 1 The natural sufficient statistics T j expectations nτ j. n 1 t j (X i ) E(t j (X))) (v) The information results (see 4.6) are set equal to their I(π) = J(π) = var(t 1,..., T k ) I(τ) = (var(t 1,..., T k )) 1 where τ j = E(t j (X)) (T 1,..., T k ) achieves (multiparameter) CRLB for (τ 1,..., τ k ). (v) Suppose complete-data X has exp.fam. form: for n- sample T j (X) = n i=1 t j (X i ) log g θ (X) = log c(θ) + k j=1 Q(θ; θ ) = log c(θ) + k j=1 π j (θ)t j (X) + log w(x) π j (θ)e θ (T j (X) Y ) + E θ (log w(x) Y ). 7
8 7.8 EM for exponential families (i) In natural parametrization π j : Q(π; π ) = log c(π) + k Q j=1 π j E π (T j (X) Y ) = E π (T j (X) Y ) + log c(π) = E π (T j (X) Y ) E π (T j (X)) Thus EM iteratively fits unconditioned to conditioned expectations of T j. At MLE E π (T j (X) Y ) = E π (T j (X)). (ii) Recall l(π) = log g π (X) log h π (X Y ) w(x) exp( j π j t j (X)) but h π (X Y ) = y(x)=y w(x) exp( j π j t j (X))dX = c (π; Y )w(x) exp( π j t j (X)) j so l(π) = log c(π) log c (π; Y ) (iii) Hence, differentiating this: l = E π (T j ) + E π (T j Y ) At MLE : E π (T j ) = E π (T j Y ) (iv) Differentiating again: 2 l = Cov(T j, T l ) Cov((T j, T l ) Y ) π l If Y determines X, var(t (X) Y ) = 0, and then observed information is var(t ) as for any exp fam. If Y tells nothing about X, var(t (X) Y ) = var(t (X)), and observed information is 0. Information lost due to observing Y not X is var(t (X) Y ). 8
Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014
Overview - 1 Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Elizabeth Thompson University of Washington Seattle, WA, USA MWF 8:30-9:20; THO 211 Web page: www.stat.washington.edu/ thompson/stat550/
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationComputing the MLE and the EM Algorithm
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations
More informationAllele Frequency Estimation
Allele Frequency Estimation Examle: ABO blood tyes ABO genetic locus exhibits three alleles: A, B, and O Four henotyes: A, B, AB, and O Genotye A/A A/O A/B B/B B/O O/O Phenotye A A AB B B O Data: Observed
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationChapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments
Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationClassical Estimation Topics
Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationOptimization Methods II. EM algorithms.
Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationEM Algorithm. Expectation-maximization (EM) algorithm.
EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,
More informationEM algorithm and applications Lecture #9
EM algorithm and applications Lecture #9 Bacground Readings: Chapters 11.2, 11.6 in the text boo, Biological Sequence Analysis, Durbin et al., 2001.. The EM algorithm This lecture plan: 1. Presentation
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationEM for ML Estimation
Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationStat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota
Stat 5102 Lecture Slides Deck 3 Charles J. Geyer School of Statistics University of Minnesota 1 Likelihood Inference We have learned one very general method of estimation: method of moments. the Now we
More informationReview of Maximum Likelihood Estimators
Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationLagrange Multipliers
Calculus 3 Lia Vas Lagrange Multipliers Constrained Optimization for functions of two variables. To find the maximum and minimum values of z = f(x, y), objective function, subject to a constraint g(x,
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationEvaluating the Performance of Estimators (Section 7.3)
Evaluating the Performance of Estimators (Section 7.3) Example: Suppose we observe X 1,..., X n iid N(θ, σ 2 0 ), with σ2 0 known, and wish to estimate θ. Two possible estimators are: ˆθ = X sample mean
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationMethods of Estimation
Methods of Estimation MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Methods of Estimation I 1 Methods of Estimation I 2 X X, X P P = {P θ, θ Θ}. Problem: Finding a function θˆ(x ) which is close to θ.
More informationMIT Spring 2016
Exponential Families II MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Exponential Families II 1 Exponential Families II 2 : Expectation and Variance U (k 1) and V (l 1) are random vectors If A (m k),
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationGeneralized Linear Models
Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input
More informationChapter 4: Asymptotic Properties of the MLE (Part 2)
Chapter 4: Asymptotic Properties of the MLE (Part 2) Daniel O. Scharfstein 09/24/13 1 / 1 Example Let {(R i, X i ) : i = 1,..., n} be an i.i.d. sample of n random vectors (R, X ). Here R is a response
More informationGibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:
Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationSTA 260: Statistics and Probability II
Al Nosedal. University of Toronto. Winter 2017 1 Properties of Point Estimators and Methods of Estimation 2 3 If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More informationAffected Sibling Pairs. Biostatistics 666
Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationInference in non-linear time series
Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationFitting Narrow Emission Lines in X-ray Spectra
Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:
More information1.5 MM and EM Algorithms
72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationStatistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it
Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students
More informationSeries 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)
Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof
More informationLikelihood, Estimation and Testing
Chapter 2 Likelihood, Estimation and Testing 2.1 Likelihood and log-likelihood. In this and the following section, we review briefly the basic ideas and results of likelihood inference: details may be
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationEconomics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,
Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem
More informationChapter 6 Likelihood Inference
Chapter 6 Likelihood Inference CHAPTER OUTLINE Section 1 The Likelihood Function Section 2 Maximum Likelihood Estimation Section 3 Inferences Based on the MLE Section 4 Distribution-Free Methods Section
More informationTheory and Use of the EM Algorithm. Contents
Foundations and Trends R in Signal Processing Vol. 4, No. 3 (2010) 223 296 c 2011 M. R. Gupta and Y. Chen DOI: 10.1561/2000000034 Theory and Use of the EM Algorithm By Maya R. Gupta and Yihua Chen Contents
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2
More information1 Complete Statistics
Complete Statistics February 4, 2016 Debdeep Pati 1 Complete Statistics Suppose X P θ, θ Θ. Let (X (1),..., X (n) ) denote the order statistics. Definition 1. A statistic T = T (X) is complete if E θ g(t
More informationStatistics Ph.D. Qualifying Exam: Part I October 18, 2003
Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your answer
More informationA note on shaved dice inference
A note on shaved dice inference Rolf Sundberg Department of Mathematics, Stockholm University November 23, 2016 Abstract Two dice are rolled repeatedly, only their sum is registered. Have the two dice
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationLecture 14 More on structural estimation
Lecture 14 More on structural estimation Economics 8379 George Washington University Instructor: Prof. Ben Williams traditional MLE and GMM MLE requires a full specification of a model for the distribution
More informationMATH4427 Notebook 2 Fall Semester 2017/2018
MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationML Testing (Likelihood Ratio Testing) for non-gaussian models
ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationFall 2003: Maximum Likelihood II
36-711 Fall 2003: Maximum Likelihood II Brian Junker November 18, 2003 Slide 1 Newton s Method and Scoring for MLE s Aside on WLS/GLS Application to Exponential Families Application to Generalized Linear
More information26. LECTURE 26. Objectives
6. LECTURE 6 Objectives I understand the idea behind the Method of Lagrange Multipliers. I can use the method of Lagrange Multipliers to maximize a multivariate function subject to a constraint. Suppose
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationLecture 17: Likelihood ratio and asymptotic tests
Lecture 17: Likelihood ratio and asymptotic tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f
More informationNonparametric Tests for Multi-parameter M-estimators
Nonparametric Tests for Multi-parameter M-estimators John Robinson School of Mathematics and Statistics University of Sydney The talk is based on joint work with John Kolassa. It follows from work over
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationTheory and Applications of Stochastic Systems Lecture Exponential Martingale for Random Walk
Instructor: Victor F. Araman December 4, 2003 Theory and Applications of Stochastic Systems Lecture 0 B60.432.0 Exponential Martingale for Random Walk Let (S n : n 0) be a random walk with i.i.d. increments
More informationanalysis of incomplete data in statistical surveys
analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More information1 Expectation Maximization
Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationGaussian Mixtures and the EM algorithm
Gaussian Mixtures and the EM algorithm 1 sigma=1.0 sigma=1.0 Responsibilities 0.0 0.2 0.4 0.6 0.8 1.0 sigma=0.2 sigma=0.2 Responsibilities 0.0 0.2 0.4 0.6 0.8 1.0 2 Details of figure Left panels: two Gaussian
More information