1.5 MM and EM Algorithms
|
|
- Lambert Osborne Sharp
- 6 years ago
- Views:
Transcription
1 72 CHAPTER 1. SEQUENCE MODELS 1.5 MM and EM Algorithms The MM algorithm [1] is an iterative algorithm that can be used to minimize or maximize a function. We focus on using it to maximize the log likelihood l(y ; θ) of observed data Y over model parameters θ. Given a current guess θ (m) for the parameters, the MM algorithm prescribes a minorizing function h(θ θ (m) ) such that h(θ θ (m) ) l(y ; θ) h(θ (m) θ (m) ) = l(y ; θ (m) ). Fig. 1.3 gives an example of a log likelihood with accompanying minorizing function. The minorizing function must be chosen such that the log likelihood dominates it everywhere except at θ (m), where they are equal. If h(θ θ (m) ) has derivatives everywhere, then the tangents h (θ θ (m) ) = l (Y ; θ) are also equal at θ = θ (m), and it is easy to see that choosing θ (m+1) to maximize h(θ θ (m) ) must also increase l(y ; θ). The challenge is to find a minorizing function h(θ θ (m) ) that is easy to maximize. The EM algorithm is a special case of the MM algorithm that capitalizes on the concept of hidden or missing data to construct a minorizing function. The EM applies when one can imagine unobserved data Z that when added to the observed data Y to produce complete data X = (Y, Z) yields a far simpler log likelihood function l C (X; θ), called the complete log likelihood. A simpler function is one that is easier to maximize. The unobserved data Z can be truly missing, in the sense that we forgot or were unable to measure them, or it can be hypothetically missing, in the sense that we can only measure them in our imagination (examples of the latter below). In fact, your imagination plays an important role in deciding when and how to apply the EM to problems. The EM algorithm consists of two steps, iterated repeatedly until algorithm convergence. E step. Fill in the missing data Z by computing the expected missing data E[Z Y ] given the current parameter estimates. M step. Given the expected complete data X = (Y, E[Z Y ]), estimate all parameters using the (easier) expected complete log likelihood E[l C (X; θ)]. The MM and EM algorithms have the very desirable property that they are guaranteed to improve with each iteration. However, they may converge to a local
2 1.5. MM AND EM ALGORITHMS 73 Log density l(y;θ) h(θ θ (m) ) θ (m) θ (m+1) θ^ θ Figure 1.3: MM algorithm for minorization function h(θ θ (m) ) of the observed log likelihood l(y ; θ) for genotype data (see Problem 1). maximum when the log likelihood has multiple modes. Consistent results from many random starting points give us some confidence that we have found the global maximum. We now illustrate the EM algorithm in a particular example Motif Finding A common problem in sequence analysis is to find a motif that is common to a set of sequences. A typical example would be the task of identifying a specific protein binding site in a set of unaligned DNA fragments (e.g., data derived from ChIPchip or ChIP-seq experiments). We shall specify a general probabilistic model for the sequences and then show how to best parameterize the model. Let there be N sequences S 1, S 2,..., S N of lengths L 1, L 2,..., L N, consisting of letters drawn from an alphabet A ({A, C, G, T } for DNA). We assume the motif is exactly of length w and may or may not occur in any one of the sequences. In each position i of the motif, let the letter j be observed with probability p ij. If the motif
3 74 CHAPTER 1. SEQUENCE MODELS occurs, then we allow for the possibility that the sequences upstream (to the left) and downstream (to the right) are of different average composition. Precisely, we assume that in the left context, letters are drawn independently with probability distribution p Lj, j A. Similarly, there is a right probability distribution p Rj, j A. To completely specify the model, let the probability that sequence S n contains the motif be p s. Let θ = {p Lj, p ij, p Rj, p s } denote a particular choice of parameters. Given these parameters, the likelihood of sequence S n is g(s n ; θ), and if the sequences are independent, then the likelihood of the entire set of sequences is simply g(s 1,..., S N ; θ) = n g(s n; θ). It is our goal to find a parameter estimate ˆθ such that the data likelihood g(s 1,..., S N ; θ) is maximized. In other words, we are looking for the best possible model fit for the observed sequences. For simplicity of notation, we consider a single sequence S n. The general case of N sequences follows immediately and only requires replacing sums over k by double sums over n and k. The missing data that would make this maximization problem easy is the position of the motif in each sequence. With this information, the likelihood of the single sequence is easy to compute. Let K n = k be the start of the motif in sequence S n, the conditional likelihood is g(s n K n = k; θ) = ( j p n Lnkj Lj ) ( k+w i=k j p I(S ni=j) ij ) ( j p n Rnkj Rj ) (1.13) where n Lnkj is the number of occurrences of letter j in positions 1 through k 1, n Rnkj is the number of occurrences of letter j in positions k + w + 1 to L n, and I(S ni = j) is an indicator function that evaluates to 1 if the ith position of the sequence S n is occupied by letter j. It is trivial to maximize this likelihood. Indeed, p Lj = n Lnkj, p j n ij = I(S n,k+i=j), and p Lnkj 1 Rj = n Rnkj. (Clearly, the MLE j n Rnkj p ij only becomes interesting after we have observed several sequences S n.) Unfortunately, life is not so easy. We do not know K n. When the location of the motif is unknown, we must integrate over all possibilities g(s n ; θ) = k g(s n, k; θ) = k g(s k; θ)g n (k; p s ), where g n (k; p s ) is the model (a priori) probability that the motif occurs at position k in S n. If sequences vary in length L n, g n (k; p s ) will depend on length L n, hence the subscript n. For unaligned sequences and no prior information where the motif
4 1.5. MM AND EM ALGORITHMS 75 might be, we may specify uniform probabilities g n (k; p s ) = p s L n w+1 k = 1, 2,..., L n w p s 2 k = 0, L n w + 2. (1.14) Here, k = 0 represents the case that the sequence is all right context, and k = L n w + 2 represents the case that the sequence is all left context. You can think about it or trust us; ln g(s n ; θ) is not easy to maximize. The EM algorithm tells us to instead maximize E [ln g(s n, K n ; θ)] = L n w+2 k=0 g(k S n ; θ (m) ) ln g(s n, k; θ), where the expectation is taken against density g(k n S n ; θ (m) ), that is the probability mass function for hidden data K n given observed data S n and the current estimate of the parameters θ (m). This density is available through Bayes rule as where g(s n Eq. (1.14). g(k n S n ; θ (m) ) = g(s n K n ; θ (m) )g n (K n ; p (m) s ) k g(s n k; θ (m) )g n (k; p (m) s ), K n ; θ (m) ) is given by Eq. (1.13) and g n (K n ; p (m) s ) is given by To show that maximizing this expected complete log likelihood works, i.e. to prove the ascent property of the EM algorithm in this case where the hidden data is a discrete variable. We start by converting the expected complete log likelihood to a minorizing function by adding and subtracting exactly what we need so the minorizing function equals the log likelihood at θ = θ (m). h ( θ θ (m)) = E [ln g(s n, K n ; θ)] + ln g(s n ; θ (m) ) E [ ln g(s n, K n ; θ (m) ) ]. (1.15) We need to show this minorizing function actually minorizes the log density for all θ. [ ( )] h(θ θ (m) ) = ln g(s n ; θ (m) g(sn, K n ; θ) ) + E ln g(s n, K n ; θ (m) ) [ ( = ln g(s n ; θ (m) g(sn ; θ) ) + E ln g(s n ; θ (m) ) g(k )] n S n ; θ) g(k n S n ; θ (m) ) [ ( )] g(kn S n ; θ) = ln g(s n ; θ) + E ln g(k n S n ; θ (m) )
5 76 CHAPTER 1. SEQUENCE MODELS Above, we used g(s n, K n ; θ) = g(s n ; θ)g(k n S n ; θ) and recognized g(s n ; θ) is constant with respect to the expectation. Finally, we obtain [ ( )] h(θ θ (m) g(kn S n ; θ) ) ln g(s n ; θ) = E ln 0 g(k n S n ; θ (m) ) because θ (m) maximizes E [ln g(k n S n ; θ)], viewed as a function of θ. To see the last claim, consider maximizing i w i ln x i over x i given the constraint that i x i = C sums to some constant. Using Lagrange multipliers, one can see that x k = w k i. In our case, w w k = g(k S i n ; θ (m) ) and x k = g(k S n ; θ) both sum to 1, so x k = ln g(k S n ; θ (m) ). We are now ready to maximize the minorizing function of Eq. (1.15) to generate new estimate θ (m+1), which we now know will at least not decrease the observed log likelihood. As already claimed, maximizing h(θ θ (m) ) is equivalent to maximizing the expected complete log likelihood since the other terms are constant in θ. The expected complete log likelihood is [ k g(k S n; θ (m) ) j n Lskj log p Lj + w + ] j n Rskj log p Rj + log g n (k; p s ) i=0 j I(S n,i+k = j) log p ij = [ j k n Lskjg(k S n ; θ (m) ) ] log p Lj + w [ i=0 j k I(S n,i+k = j)g(k S n ; θ (m) ) ] log p ij + [ j k n Rskjg(k S n ; θ (m) ) ] log p Rj + k g(k S n; θ (m) ) log p n (k; p s ), which consists of a bunch of sums of the form i w i ln x i, with i x i = 1. As before, maximization is achieved for p (m+1) k Lj = g(k S n; θ (m) )n Lskj l k g(k S n; θ (m) )n Lskl k p (m+1) ij = g(k S n; θ (m) ) k+w i=k I(S ni = j) l k g(k S n; θ (m) ) k+w i=k I(S ni = l) p (m+1) k Rj = g(k S n; θ (m) )n Rskj l k g(k S, and n; θ (m+1) )n Rskl Ln w+1 p (m+1) k=1 g(k S n ; θ (m) ) s = g(0 S n ; θ (m) ) + g(l n w + 2 S n ; θ (m) ). Problems
6 1.5. MM AND EM ALGORITHMS One of the simplest illustrations of the EM algorithm comes from genetics. Suppose an observable phenotype is controlled by a single locus with only two possible alleles, A and a, with A dominant to a. Because of the dominance of A, genotypes AA and Aa have the same phenotype 1, which is distinct from the phenotype 2 of genotype aa. Suppose we can only observe the phenotype, and we observe n 1 = 39 individuals of phenotype 1 and n 2 = 11 of phenotype 2. At the genotype level, there are n AA of type AA and n Aa of type Aa such that n AA + n Aa = n 1. The split into the two genotypes of this phenotype is the hidden information. We directly observe n aa = n 2 individuals with the last genotype. Under Hardy-Weinberg equilibrium, the probabilities of the genotypes in terms of the allele frequency p A are p AA = p 2 A p Aa = 2p A (1 p A ) p aa = (1 p A ) 2. Use the given data to estimate the maximum likelihood p A. Notice, it is possible to maximize the observed log likelihood log g(n 1, n 2 ; p A ) = n 1 log [ p 2 A + 2p A (1 p A ) ] + 2n 2 log [ (1 p A ) 2] in this simple case. You can use this fact to check your answer. Fig. 1.3 visually demonstrates one iteration of the EM algorithm for this data set starting from current estimate θ (m) = The TAL effectors are proteins found in Xanthomonas pathogens that infect plants. Each TAL consists of an N terminal domain, a variable number of 34 amino acid repeats, and a C terminal domain. Residues 12 and 13, we will call them diresidues, of each 34 amino acid repeat are thought to directly interact with a nucleotide in the binding site. So, if the TAL effector has 17 repeats, then it will bind a site of 17 contiguous nucleotides. Suppose the ith TAL effector has L i repeats, with diresidue sequence w i = (w i1,..., w ili ), where w ij represents one diresidue. Let x i = (x i1,..., x ini ) be the N i 1,000 nucleotides immediately upstream of the translation start site of gene known to be targeted by the ith TAL effector. We will assume that a targeted gene has a TAL effector binding site in this sequence. Let Y i {L i,..., N i } indicate the position of the binding site relative to the translation start site at position 0. This is hidden information that we
7 78 CHAPTER 1. SEQUENCE MODELS 1 do not know. Without prior information, we assume s im = N i L i is the +1 probability that TAL effector i binds position m. Assume the orientation of binding is known, so that the first diresidue binds distal to the translation start site. Let p(a, b) be the probability that diresidue a binds nucleotide b, and let q b be the probability that an unbound upstream site is nucleotide b. Develop and fit an EM algorithm for this model using the data in listings 1.1 and HG NI NS NG N HD NN IG Listing 1.1: These are the diresidue sequences for 10 TAL effectors. The 8 distinct diresidues (listed on the last line) map to numbers 1, 2,..., 8, in the order given. The first line indicates the number of TAL effectors and the number of distinct diresidues. The second line lists the number of repeats in each TAL effector. Lines 3 through 13 give the TAL diresidue sequences
8 1.5. MM AND EM ALGORITHMS
9 80 CHAPTER 1. SEQUENCE MODELS
10 1.5. MM AND EM ALGORITHMS Listing 1.2: The nucleotides upstream of each known target gene. The first line gives the number of genes. Each gene is known to bind the corresponding TAL effector of Listing 1.1. The second line gives the length of each upstream region. Most are around 1,000 nucleotides. The next 10 lines give the nucleotide sequences, encoded as A=0, C=1, G=2, T=3.
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationLinear Regression (1/1/17)
STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationReview of Maximum Likelihood Estimators
Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationGoodness of Fit Goodness of fit - 2 classes
Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationA note on shaved dice inference
A note on shaved dice inference Rolf Sundberg Department of Mathematics, Stockholm University November 23, 2016 Abstract Two dice are rolled repeatedly, only their sum is registered. Have the two dice
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationUse of hidden Markov models for QTL mapping
Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationCSEP 590A Summer Lecture 4 MLE, EM, RE, Expression
CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationEM algorithm and applications Lecture #9
EM algorithm and applications Lecture #9 Bacground Readings: Chapters 11.2, 11.6 in the text boo, Biological Sequence Analysis, Durbin et al., 2001.. The EM algorithm This lecture plan: 1. Presentation
More informationCSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators
CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationSelection Page 1 sur 11. Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION
Selection Page 1 sur 11 Atlas of Genetics and Cytogenetics in Oncology and Haematology SELECTION * I- Introduction II- Modeling and selective values III- Basic model IV- Equation of the recurrence of allele
More informationWhat is the expectation maximization algorithm?
primer 2008 Nature Publishing Group http://www.nature.com/naturebiotechnology What is the expectation maximization algorithm? Chuong B Do & Serafim Batzoglou The expectation maximization algorithm arises
More informationEM for ML Estimation
Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation
More informationNeyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?
Neyman-Pearson More Motifs WMM, log odds scores, Neyman-Pearson, background; Greedy & EM for motif discovery Given a sample x 1, x 2,..., x n, from a distribution f(... #) with parameter #, want to test
More informationExpectation-Maximization (EM) algorithm
I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce
More informationLecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models
Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationExpression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia
Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.
More informationName Class Date. KEY CONCEPT Gametes have half the number of chromosomes that body cells have.
Section 1: Chromosomes and Meiosis KEY CONCEPT Gametes have half the number of chromosomes that body cells have. VOCABULARY somatic cell autosome fertilization gamete sex chromosome diploid homologous
More informationEM for Spherical Gaussians
EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical
More informationProblems for 3505 (2011)
Problems for 505 (2011) 1. In the simplex of genotype distributions x + y + z = 1, for two alleles, the Hardy- Weinberg distributions x = p 2, y = 2pq, z = q 2 (p + q = 1) are characterized by y 2 = 4xz.
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationLagrange Multipliers
Calculus 3 Lia Vas Lagrange Multipliers Constrained Optimization for functions of two variables. To find the maximum and minimum values of z = f(x, y), objective function, subject to a constraint g(x,
More informationOPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION. (Communicated by Yang Kuang)
MATHEMATICAL BIOSCIENCES doi:10.3934/mbe.2015.12.503 AND ENGINEERING Volume 12, Number 3, June 2015 pp. 503 523 OPTIMALITY AND STABILITY OF SYMMETRIC EVOLUTIONARY GAMES WITH APPLICATIONS IN GENETIC SELECTION
More informationf X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2
Chapter 7: EM algorithm in exponential families: JAW 4.30-32 7.1 (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe,
More informationLecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010
Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationMIXED MODELS THE GENERAL MIXED MODEL
MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Tests for Hardy Weinberg Equilibrium Karin S. Dorman Department of Statistics Iowa State University September 7, 2006 Statistical Hypothesis Testing Identify a hypothesis,
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationHeredity and Genetics WKSH
Chapter 6, Section 3 Heredity and Genetics WKSH KEY CONCEPT Mendel s research showed that traits are inherited as discrete units. Vocabulary trait purebred law of segregation genetics cross MAIN IDEA:
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More information. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)
Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,
More informationA minimalist s exposition of EM
A minimalist s exposition of EM Karl Stratos 1 What EM optimizes Let O, H be a random variables representing the space of samples. Let be the parameter of a generative model with an associated probability
More informationPartitioning Genetic Variance
PSYC 510: Partitioning Genetic Variance (09/17/03) 1 Partitioning Genetic Variance Here, mathematical models are developed for the computation of different types of genetic variance. Several substantive
More informationMe n d e l s P e a s Exer c i se 1 - Par t 1
!! Me n d e l s P e a s Exer c i se 1 - Par t 1 TR UE - BR E E D I N G O R G A N I S M S Go a l In this exercise you will use StarGenetics, a genetics experiment simulator, to understand the concept of
More informationComputational statistics
Computational statistics EM algorithm Thierry Denœux February-March 2017 Thierry Denœux Computational statistics February-March 2017 1 / 72 EM Algorithm An iterative optimization strategy motivated by
More informationMACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION
MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be
More informationThe E-M Algorithm in Genetics. Biostatistics 666 Lecture 8
The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationPhasing via the Expectation Maximization (EM) Algorithm
Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline
More informationQuestion: If mating occurs at random in the population, what will the frequencies of A 1 and A 2 be in the next generation?
October 12, 2009 Bioe 109 Fall 2009 Lecture 8 Microevolution 1 - selection The Hardy-Weinberg-Castle Equilibrium - consider a single locus with two alleles A 1 and A 2. - three genotypes are thus possible:
More informationMachine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /
Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,
More informationExhaustive search. CS 466 Saurabh Sinha
Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction
More informationDeterministic Finite Automata
Deterministic Finite Automata COMP2600 Formal Methods for Software Engineering Ranald Clouston Australian National University Semester 2, 2013 COMP 2600 Deterministic Finite Automata 1 Pop quiz What is
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationSTAT 536: Genetic Statistics
STAT 536: Genetic Statistics Frequency Estimation Karin S. Dorman Department of Statistics Iowa State University August 28, 2006 Fundamental rules of genetics Law of Segregation a diploid parent is equally
More informationHidden Markov Models
Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly
More informationClosed-form sampling formulas for the coalescent with recombination
0 / 21 Closed-form sampling formulas for the coalescent with recombination Yun S. Song CS Division and Department of Statistics University of California, Berkeley September 7, 2009 Joint work with Paul
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationD. Incorrect! That is what a phylogenetic tree intends to depict.
Genetics - Problem Drill 24: Evolutionary Genetics No. 1 of 10 1. A phylogenetic tree gives all of the following information except for. (A) DNA sequence homology among species. (B) Protein sequence similarity
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationCINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012
CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012 Silvia Heubach/CINQA 2012 Workshop Objectives To familiarize biology faculty with one of
More informationChapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations
Chapter 1: Systems of linear equations and matrices Section 1.1: Introduction to systems of linear equations Definition: A linear equation in n variables can be expressed in the form a 1 x 1 + a 2 x 2
More informationStatistical learning. Chapter 20, Sections 1 3 1
Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationGeneralized linear models
Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................
More informationKey Concepts: Economic Computation, Part III
Key Concepts: Economic Computation, Part III Brent Hickman Summer, 8 1 Using Newton s Method to Find Roots of Real- Valued Functions The intuition behind Newton s method is that finding zeros of non-linear
More informationThe genome encodes biology as patterns or motifs. We search the genome for biologically important patterns.
Curriculum, fourth lecture: Niels Richard Hansen November 30, 2011 NRH: Handout pages 1-8 (NRH: Sections 2.1-2.5) Keywords: binomial distribution, dice games, discrete probability distributions, geometric
More informationHidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)
Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationAlgorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding
Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12:00 4001 Motif finding This exposition was developed by Knut Reinert and Clemens Gröpl. It is based on the following
More informationWhole-genome analysis of GCN4 binding in S.cerevisiae
Whole-genome analysis of GCN4 binding in S.cerevisiae Lillian Dai Alex Mallet Gcn4/DNA diagram (CREB symmetric site and AP-1 asymmetric site: Song Tan, 1999) removed for copyright reasons. What is GCN4?
More informationProcesses of Evolution
15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection
More informationIntroduction to Linkage Disequilibrium
Introduction to September 10, 2014 Suppose we have two genes on a single chromosome gene A and gene B such that each gene has only two alleles Aalleles : A 1 and A 2 Balleles : B 1 and B 2 Suppose we have
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationIntroduction to population genetics & evolution
Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#7:(Mar-23-2010) Genome Wide Association Studies 1 The law of causality... is a relic of a bygone age, surviving, like the monarchy,
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationMarkov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University
Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationMotifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.
Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More information