1.5 MM and EM Algorithms

Size: px
Start display at page:

Download "1.5 MM and EM Algorithms"

Transcription

1 72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood l(y ; θ) of observed data Y over model parameters θ. Given a current guess θ (m) for the parameters, the MM algorithm prescribes a minorizing function h(θ θ (m) ) such that h(θ θ (m) ) l(y ; θ) h(θ (m) θ (m) ) = l(y ; θ (m) ). Fig. 1.3 gives an example of a log likelihood with accompanying minorizing function. The minorizing function must be chosen such that the log likelihood dominates it everywhere except at θ (m), where they are equal. If h(θ θ (m) ) has derivatives everywhere, then the tangents h (θ θ (m) ) = l (Y ; θ) are also equal at θ = θ (m), and it is easy to see that choosing θ (m+1) to maximize h(θ θ (m) ) must also increase l(y ; θ). The challenge is to find a minorizing function h(θ θ (m) ) that is easy to maximize. The EM algorithm is a special case of the MM algorithm that capitalizes on the concept of hidden or missing data to construct a minorizing function. The EM applies when one can imagine unobserved data Z that when added to the observed data Y to produce complete data X = (Y, Z) yields a far simpler log likelihood function l C (X; θ), called the complete log likelihood. A simpler function is one that is easier to maximize. The unobserved data Z can be truly missing, in the sense that we forgot or were unable to measure them, or it can be hypothetically missing, in the sense that we can only measure them in our imagination (examples of the latter below). In fact, your imagination plays an important role in deciding when and how to apply the EM to problems. The EM algorithm consists of two steps, iterated repeatedly until algorithm convergence. E step. Fill in the missing data Z by computing the expected missing data E[Z Y ] given the current parameter estimates. M step. Given the expected complete data X = (Y, E[Z Y ]), estimate all parameters using the (easier) expected complete log likelihood E[l C (X; θ)]. The MM and EM algorithms have the very desirable property that they are guaranteed to improve with each iteration. However, they may converge to a local

2 1.5. MM AND EM ALGORITHMS 73 Log density l(y;θ) h(θ θ (m) ) θ (m) θ (m+1) θ^ θ Figure 1.3: MM algorithm for minorization function h(θ θ (m) ) of the observed log likelihood l(y ; θ) for genotype data (see Problem 1). maximum when the log likelihood has multiple modes. Consistent results from many random starting points give us some confidence that we have found the global maximum. We now illustrate the EM algorithm in a particular example Motif Finding A common problem in sequence analysis is to find a motif that is common to a set of sequences. A typical example would be the task of identifying a specific protein binding site in a set of unaligned DNA fragments (e.g., data derived from ChIPchip or ChIP-seq experiments). We shall specify a general probabilistic model for the sequences and then show how to best parameterize the model. Let there be N sequences S 1, S 2,..., S N of lengths L 1, L 2,..., L N, consisting of letters drawn from an alphabet A ({A, C, G, T } for DNA). We assume the motif is exactly of length w and may or may not occur in any one of the sequences. In each position i of the motif, let the letter j be observed with probability p ij. If the motif

3 74 CHAPTER 1. SEQUENCE MODELS occurs, then we allow for the possibility that the sequences upstream (to the left) and downstream (to the right) are of different average composition. Precisely, we assume that in the left context, letters are drawn independently with probability distribution p Lj, j A. Similarly, there is a right probability distribution p Rj, j A. To completely specify the model, let the probability that sequence S n contains the motif be p s. Let θ = {p Lj, p ij, p Rj, p s } denote a particular choice of parameters. Given these parameters, the likelihood of sequence S n is g(s n ; θ), and if the sequences are independent, then the likelihood of the entire set of sequences is simply g(s 1,..., S N ; θ) = n g(s n; θ). It is our goal to find a parameter estimate ˆθ such that the data likelihood g(s 1,..., S N ; θ) is maximized. In other words, we are looking for the best possible model fit for the observed sequences. For simplicity of notation, we consider a single sequence S n. The general case of N sequences follows immediately and only requires replacing sums over k by double sums over n and k. The missing data that would make this maximization problem easy is the position of the motif in each sequence. With this information, the likelihood of the single sequence is easy to compute. Let K n = k be the start of the motif in sequence S n, the conditional likelihood is g(s n K n = k; θ) = ( j p n Lnkj Lj ) ( k+w i=k j p I(S ni=j) ij ) ( j p n Rnkj Rj ) (1.13) where n Lnkj is the number of occurrences of letter j in positions 1 through k 1, n Rnkj is the number of occurrences of letter j in positions k + w + 1 to L n, and I(S ni = j) is an indicator function that evaluates to 1 if the ith position of the sequence S n is occupied by letter j. It is trivial to maximize this likelihood. Indeed, p Lj = n Lnkj, p j n ij = I(S n,k+i=j), and p Lnkj 1 Rj = n Rnkj. (Clearly, the MLE j n Rnkj p ij only becomes interesting after we have observed several sequences S n.) Unfortunately, life is not so easy. We do not know K n. When the location of the motif is unknown, we must integrate over all possibilities g(s n ; θ) = k g(s n, k; θ) = k g(s k; θ)g n (k; p s ), where g n (k; p s ) is the model (a priori) probability that the motif occurs at position k in S n. If sequences vary in length L n, g n (k; p s ) will depend on length L n, hence the subscript n. For unaligned sequences and no prior information where the motif

4 1.5. MM AND EM ALGORITHMS 75 might be, we may specify uniform probabilities g n (k; p s ) = p s L n w+1 k = 1, 2,..., L n w p s 2 k = 0, L n w + 2. (1.14) Here, k = 0 represents the case that the sequence is all right context, and k = L n w + 2 represents the case that the sequence is all left context. You can think about it or trust us; ln g(s n ; θ) is not easy to maximize. The EM algorithm tells us to instead maximize E [ln g(s n, K n ; θ)] = L n w+2 k=0 g(k S n ; θ (m) ) ln g(s n, k; θ), where the expectation is taken against density g(k n S n ; θ (m) ), that is the probability mass function for hidden data K n given observed data S n and the current estimate of the parameters θ (m). This density is available through Bayes rule as where g(s n Eq. (1.14). g(k n S n ; θ (m) ) = g(s n K n ; θ (m) )g n (K n ; p (m) s ) k g(s n k; θ (m) )g n (k; p (m) s ), K n ; θ (m) ) is given by Eq. (1.13) and g n (K n ; p (m) s ) is given by To show that maximizing this expected complete log likelihood works, i.e. to prove the ascent property of the EM algorithm in this case where the hidden data is a discrete variable. We start by converting the expected complete log likelihood to a minorizing function by adding and subtracting exactly what we need so the minorizing function equals the log likelihood at θ = θ (m). h ( θ θ (m)) = E [ln g(s n, K n ; θ)] + ln g(s n ; θ (m) ) E [ ln g(s n, K n ; θ (m) ) ]. (1.15) We need to show this minorizing function actually minorizes the log density for all θ. [ ( )] h(θ θ (m) ) = ln g(s n ; θ (m) g(sn, K n ; θ) ) + E ln g(s n, K n ; θ (m) ) [ ( = ln g(s n ; θ (m) g(sn ; θ) ) + E ln g(s n ; θ (m) ) g(k )] n S n ; θ) g(k n S n ; θ (m) ) [ ( )] g(kn S n ; θ) = ln g(s n ; θ) + E ln g(k n S n ; θ (m) )

5 76 CHAPTER 1. SEQUENCE MODELS Above, we used g(s n, K n ; θ) = g(s n ; θ)g(k n S n ; θ) and recognized g(s n ; θ) is constant with respect to the expectation. Finally, we obtain [ ( )] h(θ θ (m) g(kn S n ; θ) ) ln g(s n ; θ) = E ln 0 g(k n S n ; θ (m) ) because θ (m) maximizes E [ln g(k n S n ; θ)], viewed as a function of θ. To see the last claim, consider maximizing i w i ln x i over x i given the constraint that i x i = C sums to some constant. Using Lagrange multipliers, one can see that x k = w k i. In our case, w w k = g(k S i n ; θ (m) ) and x k = g(k S n ; θ) both sum to 1, so x k = ln g(k S n ; θ (m) ). We are now ready to maximize the minorizing function of Eq. (1.15) to generate new estimate θ (m+1), which we now know will at least not decrease the observed log likelihood. As already claimed, maximizing h(θ θ (m) ) is equivalent to maximizing the expected complete log likelihood since the other terms are constant in θ. The expected complete log likelihood is [ k g(k S n; θ (m) ) j n Lskj log p Lj + w + ] j n Rskj log p Rj + log g n (k; p s ) i=0 j I(S n,i+k = j) log p ij = [ j k n Lskjg(k S n ; θ (m) ) ] log p Lj + w [ i=0 j k I(S n,i+k = j)g(k S n ; θ (m) ) ] log p ij + [ j k n Rskjg(k S n ; θ (m) ) ] log p Rj + k g(k S n; θ (m) ) log p n (k; p s ), which consists of a bunch of sums of the form i w i ln x i, with i x i = 1. As before, maximization is achieved for p (m+1) k Lj = g(k S n; θ (m) )n Lskj l k g(k S n; θ (m) )n Lskl k p (m+1) ij = g(k S n; θ (m) ) k+w i=k I(S ni = j) l k g(k S n; θ (m) ) k+w i=k I(S ni = l) p (m+1) k Rj = g(k S n; θ (m) )n Rskj l k g(k S, and n; θ (m+1) )n Rskl Ln w+1 p (m+1) k=1 g(k S n ; θ (m) ) s = g(0 S n ; θ (m) ) + g(l n w + 2 S n ; θ (m) ). Problems

6 1.5. MM AND EM ALGORITHMS One of the simplest illustrations of the EM algorithm comes from genetics. Suppose an observable phenotype is controlled by a single locus with only two possible alleles, A and a, with A dominant to a. Because of the dominance of A, genotypes AA and Aa have the same phenotype 1, which is distinct from the phenotype 2 of genotype aa. Suppose we can only observe the phenotype, and we observe n 1 = 39 individuals of phenotype 1 and n 2 = 11 of phenotype 2. At the genotype level, there are n AA of type AA and n Aa of type Aa such that n AA + n Aa = n 1. The split into the two genotypes of this phenotype is the hidden information. We directly observe n aa = n 2 individuals with the last genotype. Under Hardy-Weinberg equilibrium, the probabilities of the genotypes in terms of the allele frequency p A are p AA = p 2 A p Aa = 2p A (1 p A ) p aa = (1 p A ) 2. Use the given data to estimate the maximum likelihood p A. Notice, it is possible to maximize the observed log likelihood log g(n 1, n 2 ; p A ) = n 1 log [ p 2 A + 2p A (1 p A ) ] + 2n 2 log [ (1 p A ) 2] in this simple case. You can use this fact to check your answer. Fig. 1.3 visually demonstrates one iteration of the EM algorithm for this data set starting from current estimate θ (m) = The TAL effectors are proteins found in Xanthomonas pathogens that infect plants. Each TAL consists of an N terminal domain, a variable number of 34 amino acid repeats, and a C terminal domain. Residues 12 and 13, we will call them diresidues, of each 34 amino acid repeat are thought to directly interact with a nucleotide in the binding site. So, if the TAL effector has 17 repeats, then it will bind a site of 17 contiguous nucleotides. Suppose the ith TAL effector has L i repeats, with diresidue sequence w i = (w i1,..., w ili ), where w ij represents one diresidue. Let x i = (x i1,..., x ini ) be the N i 1,000 nucleotides immediately upstream of the translation start site of gene known to be targeted by the ith TAL effector. We will assume that a targeted gene has a TAL effector binding site in this sequence. Let Y i {L i,..., N i } indicate the position of the binding site relative to the translation start site at position 0. This is hidden information that we

7 78 CHAPTER 1. SEQUENCE MODELS 1 do not know. Without prior information, we assume s im = N i L i is the +1 probability that TAL effector i binds position m. Assume the orientation of binding is known, so that the first diresidue binds distal to the translation start site. Let p(a, b) be the probability that diresidue a binds nucleotide b, and let q b be the probability that an unbound upstream site is nucleotide b. Develop and fit an EM algorithm for this model using the data in listings 1.1 and HG NI NS NG N HD NN IG Listing 1.1: These are the diresidue sequences for 10 TAL effectors. The 8 distinct diresidues (listed on the last line) map to numbers 1, 2,..., 8, in the order given. The first line indicates the number of TAL effectors and the number of distinct diresidues. The second line lists the number of repeats in each TAL effector. Lines 3 through 13 give the TAL diresidue sequences

8 1.5. MM AND EM ALGORITHMS

9 80 CHAPTER 1. SEQUENCE MODELS

10 1.5. MM AND EM ALGORITHMS Listing 1.2: The nucleotides upstream of each known target gene. The first line gives the number of genes. Each gene is known to bind the corresponding TAL effector of Listing 1.1. The second line gives the length of each upstream region. Most are around 1,000 nucleotides. The next 10 lines give the nucleotide sequences, encoded as A=0, C=1, G=2, T=3.

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Review of Maximum Likelihood Estimators

Review of Maximum Likelihood Estimators Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

A note on shaved dice inference

A note on shaved dice inference A note on shaved dice inference Rolf Sundberg Department of Mathematics, Stockholm University November 23, 2016 Abstract Two dice are rolled repeatedly, only their sum is registered. Have the two dice

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

EM algorithm and applications Lecture #9

EM algorithm and applications Lecture #9 EM algorithm and applications Lecture #9 Bacground Readings: Chapters 11.2, 11.6 in the text boo, Biological Sequence Analysis, Durbin et al., 2001.. The EM algorithm This lecture plan: 1. Presentation

More information

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

Selection Page 1 sur 11. Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION

Selection Page 1 sur 11. Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION Selection Page 1 sur 11 Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION * I- Introduction II- Modeling and selective values III- Basic model IV- Equation of the recurrence of allele

More information

What is the expectation maximization algorithm?

What is the expectation maximization algorithm? primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises

More information

EM for ML Estimation

EM for ML Estimation Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation

More information

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM? Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test

More information

Expectation-Maximization (EM) algorithm

Expectation-Maximization (EM) algorithm I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce

More information

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.

Name Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have. Section 1: Chromosomes and Meiosis KEY CONCEPT Gametes have half the number of chromosomes that body cells have. VOCABULARY somatic cell autosome fertilization gamete sex chromosome diploid homologous

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

Problems for 3505 (2011)

Problems for 3505 (2011) Problems for 505 (2011) 1. In the simplex of genotype distributions x + y + z = 1, for two alleles, the Hardy- Weinberg distributions x = p 2, y = 2pq, z = q 2 (p + q = 1) are characterized by y 2 = 4xz.

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Lagrange Multipliers

Lagrange Multipliers Calculus 3 Lia Vas Lagrange Multipliers Constrained Optimization for functions of two variables. To find the maximum and minimum values of z = f(x, y), objective function, subject to a constraint g(x,

More information

OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang)

OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang) MATHEMATICAL BIOSCIENCES doi:10.3934/mbe.2015.12.503 AND ENGINEERING Volume 12, Number 3, June 2015 pp. 503 523 OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION

More information

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2 Chapter 7: EM algorithm in exponential families: JAW 4.30-32 7.1 (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe,

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Heredity and Genetics WKSH

Heredity and Genetics WKSH Chapter 6, Section 3 Heredity and Genetics WKSH KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. Vocabulary trait purebred law of segregation genetics cross MAIN IDEA:

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Week 5: Logistic Regression & Neural Networks

Week 5: Logistic Regression & Neural Networks Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

A minimalist s exposition of EM

A minimalist s exposition of EM A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability

More information

Partitioning Genetic Variance

Partitioning Genetic Variance PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive

More information

Me n d e l s P e a s Exer c i se 1 - Par t 1

Me n d e l s P e a s Exer c i se 1 - Par t 1 !! Me n d e l s P e a s Exer c i se 1 - Par t 1 TR UE - BR E E D I N G O R G A N I S M S Go a l In this exercise you will use StarGenetics, a genetics experiment simulator, to understand the concept of

More information

Computational statistics

Computational statistics Computational statistics EM algorithm Thierry Denœux February-March 2017 Thierry Denœux Computational statistics February-March 2017 1 / 72 EM Algorithm An iterative optimization strategy motivated by

More information

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Phasing via the Expectation Maximization (EM) Algorithm

Phasing via the Expectation Maximization (EM) Algorithm Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline

More information

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?

Question: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation? October 12, 2009 Bioe 109 Fall 2009 Lecture 8 Microevolution 1 - selection The Hardy-Weinberg-Castle Equilibrium - consider a single locus with two alleles A 1 and A 2. - three genotypes are thus possible:

More information

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /

Machine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct / Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,

More information

Exhaustive search. CS 466 Saurabh Sinha

Exhaustive search. CS 466 Saurabh Sinha Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction

More information

Deterministic Finite Automata

Deterministic Finite Automata Deterministic Finite Automata COMP2600 Formal Methods for Software Engineering Ranald Clouston Australian National University Semester 2, 2013 COMP 2600 Deterministic Finite Automata 1 Pop quiz What is

More information

Data Mining in Bioinformatics HMM

Data Mining in Bioinformatics HMM Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

More information

STAT 536: Genetic Statistics

STAT 536: Genetic Statistics STAT 536: Genetic Statistics Frequency Estimation Karin S. Dorman Department of Statistics Iowa State University August 28, 2006 Fundamental rules of genetics Law of Segregation a diploid parent is equally

More information

Hidden Markov Models

Hidden Markov Models Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly

More information

Closed-form sampling formulas for the coalescent with recombination

Closed-form sampling formulas for the coalescent with recombination 0 / 21 Closed-form sampling formulas for the coalescent with recombination Yun S. Song CS Division and Department of Statistics University of California, Berkeley September 7, 2009 Joint work with Paul

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

D. Incorrect! That is what a phylogenetic tree intends to depict.

D. Incorrect! That is what a phylogenetic tree intends to depict. Genetics - Problem Drill 24: Evolutionary Genetics No. 1 of 10 1. A phylogenetic tree gives all of the following information except for. (A) DNA sequence homology among species. (B) Protein sequence similarity

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012

CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012 CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012 Silvia Heubach/CINQA 2012 Workshop Objectives To familiarize biology faculty with one of

More information

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations Chapter 1: Systems of linear equations and matrices Section 1.1: Introduction to systems of linear equations Definition: A linear equation in n variables can be expressed in the form a 1 x 1 + a 2 x 2

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Key Concepts: Economic Computation, Part III

Key Concepts: Economic Computation, Part III Key Concepts: Economic Computation, Part III Brent Hickman Summer, 8 1 Using Newton s Method to Find Roots of Real- Valued Functions The intuition behind Newton s method is that finding zeros of non-linear

More information

The genome encodes biology as patterns or motifs. We search the genome for biologically important patterns.

The genome encodes biology as patterns or motifs. We search the genome for biologically important patterns. Curriculum, fourth lecture: Niels Richard Hansen November 30, 2011 NRH: Handout pages 1-8 (NRH: Sections 2.1-2.5) Keywords: binomial distribution, dice games, discrete probability distributions, geometric

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding

Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12:00 4001 Motif finding This exposition was developed by Knut Reinert and Clemens Gröpl. It is based on the following

More information

Whole-genome analysis of GCN4 binding in S.cerevisiae

Whole-genome analysis of GCN4 binding in S.cerevisiae Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?

More information

Processes of Evolution

Processes of Evolution 15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection

More information

Introduction to Linkage Disequilibrium

Introduction to Linkage Disequilibrium Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Introduction to population genetics & evolution

Introduction to population genetics & evolution Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1. Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information