A note on stochastic context-free grammars, termination and the EM-algorithm

Size: px
Start display at page:

Download "A note on stochastic context-free grammars, termination and the EM-algorithm"

Transcription

1 A note on stochastic context-free grammars, termination and the EM-algorithm Niels Richard Hansen Department of Applied Mathematics and Statistics, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen Ø, Denmark Abstract Termination of a stochastic context-free grammar, i.e. almost sure finiteness of the random trees it produces, is shown to be equivalent to extinction of an embedded multitype branching process. We show that the maximum likelihood estimator in a saturated model based on complete or partial observation of a finite tree always gives terminating grammars. With partial observation we show that this in fact holds for the whole sequence of parameters obtained by the EM-algorithm. Finally, aspects of the size of the tree related to the embedded branching process is discussed. Key words: EM-algorithm, maximum likelihood estimator, multitype braching process, stochastic context-free grammar 1 Introduction A stochastic context-free grammar can be defined as a probability measure on a set of rooted trees. We give a formal definition in Section 2. This measure is specified by a set of rules for evolving symbols known as non-terminals into sequences of non-terminals and terminals (another set of symbols) and by probabilities assigned to these evolution rules. The purpose is to end up with a probability measure on the set of finite sequences of terminals. An modern application of stochastic context-free grammars can be found in the area of biological sequence analysis and especially the modeling of RNA-molecules (Eddy & Durbin, 1994), (Yasubumi et. al., 1994), (Durbin et. al., 1998). address: richard@math.ku.dk (Niels Richard Hansen). URL: richard (Niels Richard Hansen).

2 We first show that one faces a termination problem in the sense that for stochastic context-free grammars the evolution can continue forever, i.e. the tree can be infinite with positive probability, and a resulting finite sequence of terminals is thus never obtained. We call the stochastic context-free grammar terminating if the tree is finite almost surely. This issue was first mentioned by Sankoff (1971). He remarks that a related branching process should be subcritical for the stochastic context-free grammar to be terminating, but this seems to have been neglected in the more recent literature. The main purpose of this paper is to show that if we consider a class of saturated models and uses maximum-likelihood estimation to infer the unknown parameters, the resulting stochastic context-free grammar will always be terminating. This holds whether or not we observe the entire tree or only the resulting finite sequence of terminals. In the last case where we have only partial observations, one will typically apply the EM-algorithm to find the maximum-likelihood estimate, and we actually show that the whole sequence of parameters obtained by running the EM-algorithm gives terminating stochastic context-free grammars. In the final section, Section 4, we discuss issues related to the distribution of the size of the tree especially for parameters on the boundary between terminating and non-terminating stochastic context-free grammars. 2 Stochastic context-free grammars Let E and A be two finite sets. We call elements in E non-terminals and elements in A terminals. We let T denote the set of (possibly infinite) rooted trees with internal nodes from E and leafs from A. Let S denote the set of finite sequences from the disjoint union E A and let P be an E S stochastic matrix. If ν is a probability measure on E, then P ν is the probability measure on T defined recursively as follows: First the root, a non-terminal x, is drawn from ν, and then conditionally on x the children of x, a sequence of terminals and non-terminals, are drawn from P(x, ). This is the first generation. If x 1,...,x m denotes the set of non-terminal children obtained in the n th generation, m new sequences, the n + 1 th generation, are drawn independently from P(x 1, ),..., P(x m, ) the children of x i being drawn from P(x i, ). Definition 1 The probability measure P ν on T defined above is called a stochastic context-free grammar. If T 0 T denote the subset of finite trees, the stochastic context-free grammar is said be terminating if P ν (T 0 ) = 1. Stochastic context-free grammars are used as a means to construct probability measures on the set A of finite sequences of terminals from A by a leaf traversal 2

3 map, f LT : T 0 A, defined as follows: If the tree T = {a}, for a A, only consists of a root terminal then f LT (T) = a. If the root of T is a non-terminal it has m 1 children each being the root of a tree. Denoting these trees T 1,...,T m, f LT is defined recursively by f LT (T) = f LT (T 1 ),...,f LT (T m ). This definition produces a sensible, finite sequence from A only if the tree T is finite, hence for f LT (P ν ) to be a well defined probability measure on A it is crucial that we restrict our attention to terminating stochastic context-free grammars. Let Λ denote the E E matrix with Λ x,y being the expected number of nonterminals y produced by the probability measure P(x, ). Let ρ(λ) denote the spectral radius of the matrix. Up to singularity (to be defined below), this spectral radius determines whether the stochastic context-free grammar is terminating. Definition 2 A stochastic context-free grammar is said to be singular if there exists a set E 0 E such that for all x E 0 and all γ S with P(x, γ) > 0 the sequence γ contains exactly one state from E 0. The grammar is called non-singular, if it is not singular. Theorem 3 A non-singular stochastic context-free grammar is terminating if and only if ρ(λ) 1. PROOF. Let X n,x denote the number of non-terminals x in the n th generation, and let X n = (X n,x ) x E N E 0. Then (X n ) n 0 is a multitype branching process with the types being the set of non-terminals E and Λ the matrix of offspring expectations. Moreover, if the stochastic context-free grammar is non-singular no subset of E will reproduce exactly one non-terminal from that subset with probability one, and according to (Harris, 1963), Theorem II.10.1, this branching process becomes extinct with probability one if and only if ρ(λ) 1 see also Chapter 2 in (Mode, 1971) and Chapter 4 in (Jagers, 1975). Since termination of the stochastic context-free grammar is the same as extinction of the branching process the result follows. 3 Maximum likelihood inference We consider a parametric model for stochastic context-free grammars, where we apriori for each non-terminal x fix a finite set of sequences γ S for which 3

4 we allow P(x, γ) > 0. The saturated model consists of all P s concordant with this requirement. We observe a single tree T (complete observation) or some transformation f(t) (partial observation) of the tree. The results generalize trivially to observing a number of iid trees. 3.1 With complete observation For convenience assume from here on that E = {1,..., n} and let Γ S be a finite set of sequences γ ij for i = 1,...,n and j = 1,...,m i, m i 1. We think of γ ij as the set of allowed sequences that can be drawn from non-terminal i. Denote by n ijk the number of times non-terminal k occurs in γ ij, and let n i11... n i1n N 1 N i =.. and N =.. (1) n imi 1... n imi n N n Thus N i is a m i n matrix and N is a ( i m i ) n matrix. Let p = (p ij ) with p ij = P(i, γ ij ) for i = 1,...,n, j = 1,..., m i. Define p i = (p i1,...,p imi ) and P = diag(p 1,...,p n ). (2) Since Λ ik = j p ij n ijk we see that Λ = PN. Consider Γ and i 0 as fixed and p = (p ij ) as a parameterization of the stochastic context-free grammars with root i 0, i.e. a parameterization of the probability measures P i0 = P i0,p on T conditionally on the root non-terminal being i 0. The parameter space is thus Θ = {p = (p ij ) m i p ij = 1, i = 1,..., n, p ij 0}. (3) j Having observed a finite tree T T 0 with root i 0, let c ij = c ij (T) denote the number of times that γ ij occurs in T. Then the log-likelihood is n m i l T (p) = c ij log(p ij ), (4) i=1 j=1 which of course gives the usual maximum likelihood estimator ˆp ij = s i 1 c ij, where s i = j c ij for i = 1,..., n provided s i > 0. If s i = 0 we can simply throw away non-terminal i before continuing. 4

5 Defining c i = (c i1,...,c imi ), C = diag(c 1,...,c n ), and S = diag(s 1,...,s n ), (5) we find that ˆP = S 1 C and therefore the estimated expectation matrix can be given the representation ˆΛ = S 1 CN. (6) We show below that ˆΛ has spectral radius 1. Besides some matrix manipulations, the crucial observation, resembling Kirchhoff s law for electric current, is that the total number of times that non-terminal k occurs as a child in T equals the total number of times that non-terminal k occurs as a parent in T when disregarding the root non-terminal i 0, which is the child of nobody. This is because the leafs in T are all terminals. Thus for k = 1,...,n n m i m k δ i0,k + c ij n ijk = c kj = s k. (7) i=1 j=1 j=1 If we define the vectors c and s by c = (c 1,...,c n ) and s = (s 1,...,s n ), (8) the equations given by (7) can be written in matrix form as δ i0 + cn = s. (9) with δ i0 = (δ i0,1,...,δ i0,n). Theorem 4 If ˆΛ is given by (6) with C and S defined in (5) and if c and s, as given by (8), fulfill equation (9) then ρ(ˆλ) 1. PROOF. First, for two square matrices A and B it holds that ρ(ab) = ρ(ba). The proof is elementary see for instance exercise I.3.7 in (Bhatia, 1997). Thus ρ(ˆλ) = ρ(s 1 CN) = ρ(cns 1 ). 5

6 Regarding the spectral radius of any matrix A with nonnegative entries it holds that ρ(a) max j A ij, (10) i cf. Theorem and Corollary 1.1 in (Seneta, 1981) covering the case when A is irreducible. By decomposing a reducible matrix into irreducible blocks, it follows that (10) holds also if A is reducible. For the matrix CNS 1 one easily shows that the (row)vector of column sums equals cns 1 = (s δ i0 )S 1 = 1 δ i0 S 1 (11) where the first equality follows from (9). Here 1 denotes the vector of all ones. From (10) we get that ρ(ˆλ) 1. Remark 5 If ˆΛ is irreducible then ρ(ˆλ) < 1. This is due to the fact that for irreducible matrices the bound in (10) is strict unless all row sums are equal, cf. Corollary 1.1 in (Seneta, 1981). 3.2 With partial observation If we only observe the tree T partially we can rely on the EM-algorithm for estimation of the parameters. Let f : T F for some set F with t = f(t) denoting the partial observation. If f, like the leaf traversal map f LT, is defined apriori on T 0 only, we can extend it to take values in F { } by letting it take the value on T \T 0. For convenience we assume that the root non-terminal i 0 is known and fixed. As in the previous section, Γ is the fixed finite set of finite sequences γ ij, i = 1,...,n and j = 1,..., m i, and the parameter space is Θ as given by (3). We define Θ t = {p Θ P i0,p(f(t) = t) > 0}. For p Θ t let c ij (p, t) := E i0,p(c ij (T) t) 6

7 denote the conditional expectation of the variables c ij given the partial observation t F under the measure P i0,p given by the parameter p. Choosing some initial parameter value p 0 Θ t, the EM-algorithm updates the parameters recursively as follows; given p n (1) compute c ij (p n, t) (2) compute p n+1 by maximizing the log-likelihood n m i l(p) = c ij (p n, t) log(p ij ), i=1 j=1 that is, p ij,n+1 = s i (p n, t) 1 c ij (p n, t) where s i (p n, t) = j c ij (p n, t) for i = 1,...,n. It is a well known property of the EM-algorithm that the sequence (p n ) n 0 of parameters gives an increasing (marginal) likelihood (Dempster et. al., 1977), (Lari & Young, 1990), and we see from above that the maximization part of the EM-algorithm can be carried out explicitly in each iteration of the algorithm. The only problem is the computation of c ij (p n, t). Different algorithms exist depending on the transformation f and the stochastic context-free grammar. For details the reader is referred to the literature, e.g. (Durbin et. al., 1998), (Lari & Young, 1990), and (Baker, 1979). It should be mentioned that these algorithms can be quite computationally demanding. With Λ(p) the matrix of expectations under P i0,p, we have the following theorem. Theorem 6 Suppose that t F such that f 1 (t) T 0 and p 0 Θ t then the EM-algorithm produces a sequence (p n ) n 0 satisfying p n Θ t, ρ(λ(p n )) 1 for n 1, and if p n p for n then ρ(λ(p)) 1. PROOF. First, since the EM-algorithm increases the marginal likelihood, p n Θ t for all n 1 if p 0 Θ t. Since f 1 (t) T 0 the equality obtained in (9), i.e. δ i0 + c(t)n = s(t), holds for all T with f(t) = t. The linearity of conditional expectations then gives that for all p Θ t δ i0 + c(p, t)n = s(p, t) (12) where c(p, t) and s(p, t) denote the collection of c ij (p, t) and s i (p, t) into vectors exactly as in the previous section. 7

8 It follows from Theorem 4 that ρ(λ(p n )) 1 for all n 1 since (12) holds. Continuity of Λ as a function of p as well as continuity of the spectral radius map shows that for an eventual limit p also ρ(λ(p)) 1 holds. Remark 7 Note that if p 0 is the maximum likelihood estimate, provided it exists and is unique, the sequence of parameters obtained by the EM-algorithm will be constantly equal to p 0 and consequently ρ(λ(p 0 )) 1. Example 8 Let E = {1, 2}, A = {a, b}, γ 11 = (1, 2), γ 12 = b, γ 21 = 1, γ 22 = a, and let P be defined by P(1, (1, 2)) = 1 P(1, b) = p 1, P(2, 1) = 1 P(2, a) = p 2 with 0 < p 1, p 2 < 1. Let the root non-terminal be 1. We consider f = f LT so that f LT (T) = b...b }{{} a }.{{..a}...b...b }{{} a }.{{..a} n 1 n k m 1 m k where n 1 > 0. Observe that any sequence starting with b can occur. Observe also that the counts c ij satisfy the following equations c 22 = c 11 c 21, c 12 = 1 + c 21, c 12 = n := k n i, and c 22 = m := i k m i. i Hence c 21 = n 1 and c 11 = m + n 1. In this example we see that f LT is in fact sufficient and that the maximum-likelihood estimates for p 1 and p 2 are given explicitly as ˆp 1 = m + n 1 m + 2n 1 The expectation matrix is Λ(p) = p 1 p 1 p 2 0 and ˆp 2 = n 1 m + n 1. and ρ(λ(p)) 1 if and only if p 2 p Theorem 4 gives that ρ(λ(ˆp)) 1, but this can also be verified directly as 1 ˆp 1 1 = n m + n 1 > n 1 m + n 1 = ˆp 2. 8

9 4 On the size of the tree and the number of leafs It would clearly be interesting to understand the distribution of the size of the tree produced by a terminating stochastic context-free grammar in more details. For instance, the length of the sequence produced by the leaf traversal map f LT equals the number of leafs in the tree. One should expect a fundamental difference between the two cases where the embedded branching process is critical (ρ = 1) or sub-critical (ρ < 1). Considering only non-terminals, let H i denote the offspring distribution for the branching process from non-terminal i, i.e. H i (m 1,...,m n ) = p ij, j:n ij1 =m 1,...,n ijn =m n and let h i denote the corresponding generating function, i.e. h i (z 1,...,z n ) = z m 1 1 zn mn H i (m 1,...,m n ). m 1,...,m n One can easily prove that the distribution R i of the total number of nonterminals produced given that we start with one non-terminal i have generating function r i fulfilling the equation with z = (z 1,...,z n ). r i (z) = z i h i (r 1 (z),...,r n (z)) Otter (1949) used this equation for n = 1 to show that for m R 1 (m) = c α m m O(α m m 5 2 ) m 1 (mod q) (13) for some constants c, α 1 and q the period of H 1. The critical case is equivalent to α = 1, so the tail of R 1 has a power-law decay with exponent 3/2 in the critical case, whereas it has an exponentially light tail in the sub-critical case. A similar result for e.g. the total number of individuals, or the total number of individuals of a given type, produced by a critical or sub-critical multitype branching process doesn t seem to exist. There is, however, a result due to Good (1958) on how in principle to compute R i. In Section 3, Examples and applications, Good (1958) shows that R i (m 1,...,m n ) is the coefficient of z m 1 1 z m i 1 i zn mn in 1 rn mn I { z i j r i } i,j (14) r i r m 1 with denoting determinant. 9

10 log(r(m1,.)) log(r(.,m2)) log(m1) log(m2) Fig. 1. The log-marginals plotted against log(m). In both cases we see a tail decay that is asymptotically linear on this log-log-plot, thus the decay is like a power function. The straight line has slope 3/2. Example 9 (Example 8 continued) The generating functions for R 1 and R 2 from Example 8 are r 1 (z 1, z 2 ) = q 1 + p 1 z 1 z 2 and r 2 (z 1, z 2 ) = q 2 + p 2 z 1 with q i = 1 p i. Using (14), this gives R 1 (m 1, m 2 ) = ( )( m1 m 2 m 2 2m 2 m ) ( 1 m 1 1 m 1 ) p m 2 1 pm 1 m q m 1 m 2 1 q 2m 2 m for 1 + m 2 m 1 2m It still seems complicated to compute the tail behavior of the two marginals analytically, but we can investigate the distribution numerically. If we consider the case p 1 = 3/4 and p 2 = 1/3 (which is a critical example), Figure 1 shows a log-log plot of the marginal point probabilities of R 1 compared to a straight line with slope 3/2. On both graphs we see that asymptotically, the decay of the marginals of R 1 is like m 3/2. Moreover, for this example the length of the sequence f LT (T) equals c , cf. Example 8, which in turn equals the total number of occurrences of non-terminal 2 in the tree plus 1. Thus in this critical example, the tail of the distribution of the length of f LT (T) decays as a power-law with exponent 3/2. We may suggest that this is a general phenomenon, that for critical, terminating stochastic context-free grammars, the distribution of the length of f LT (T) has a power law decay with exponent 3/2. To obtain such a result we will have to deal with the tail behavior of the distributions R i for multitype branching processes. The method used by Otter (1949) does not seem to generalize, 10

11 as it relies heavily on the theory of unit variable complex functions. Dealing rigorously with this aspect of multitype branching processes seems to be an open problem. References Baker, James K. (1979), Trainable grammars for speech recognition, in: Klatt, D.H.. and Wolf, J.J., eds. Speech Communication Papers for the 97th Meeting of the Acoustical Society of America pp Bhatia, Rajendra (1997), Matrix Analysis (Springer Verlag). Dempster, A. P., Laird, N.M., Rubin, D.B. (1977), Maximum Likelihood from Incomplete Data via the EM algorithm, J. Roy. Statist. Soc. 39, Durbin, R., Eddy, S., Krogh, A. and Mitchinson, G. (1998), Biological Sequence Analysis. Probabilistic models of proteins and nucleic acids (Cambridge University Press). Eddy, Sean, R. Durbin, Richard (1994), RNA sequence analysis using covariance models, Nucleic Acids Research 22, 11, Harris, Theodore E. (1963), The Theory of Branching Processes (Springer- Verlag). Jagers, Peter (1975), Branching Processes with Biological Applications (John Wiley and Sons). Lari, K. and Young, S. J. (1990), The estimation of stochastic context-free grammars using the Inside-Outside algorithm, Computer Speech and Language 4, Mode, Charles J. (1971), Multitype Branching Processes (Elsevier). Sankoff, David. (1971), Branching processes with terminal types: Application to context-free grammars, J. Appl. Prop. 8, Seneta, E. (1981), Non-negative Matrices and Markov Chains. Second edition (Springer Verlag). Yasubumi, Sakakibara et al. (1994), Stochastic context-free grammars for trna modeling, Nucleic Acids Research 22, 23, Otter, Richard (1949), The Multiplicative Process, Ann. Math. Statist. 20, Good, I. J. (1960), Generalizations to several variables of Lagranges expansion, with applications to stochastic processes, Proc. Cambridge Philos. Soc. 56,

A NOTE ON THE ASYMPTOTIC BEHAVIOUR OF A PERIODIC MULTITYPE GALTON-WATSON BRANCHING PROCESS. M. González, R. Martínez, M. Mota

A NOTE ON THE ASYMPTOTIC BEHAVIOUR OF A PERIODIC MULTITYPE GALTON-WATSON BRANCHING PROCESS. M. González, R. Martínez, M. Mota Serdica Math. J. 30 (2004), 483 494 A NOTE ON THE ASYMPTOTIC BEHAVIOUR OF A PERIODIC MULTITYPE GALTON-WATSON BRANCHING PROCESS M. González, R. Martínez, M. Mota Communicated by N. M. Yanev Abstract. In

More information

Lecture 20 : Markov Chains

Lecture 20 : Markov Chains CSCI 3560 Probability and Computing Instructor: Bogdan Chlebus Lecture 0 : Markov Chains We consider stochastic processes. A process represents a system that evolves through incremental changes called

More information

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods

Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods Non-Parametric Bayesian Inference for Controlled Branching Processes Through MCMC Methods M. González, R. Martínez, I. del Puerto, A. Ramos Department of Mathematics. University of Extremadura Spanish

More information

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to

More information

Statistical NLP: Hidden Markov Models. Updated 12/15

Statistical NLP: Hidden Markov Models. Updated 12/15 Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

The Inside-Outside Algorithm

The Inside-Outside Algorithm The Inside-Outside Algorithm Karl Stratos 1 The Setup Our data consists of Q observations O 1,..., O Q where each observation is a sequence of symbols in some set X = {1,..., n}. We suspect a Probabilistic

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

PAC Generalization Bounds for Co-training

PAC Generalization Bounds for Co-training PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research

More information

Statistics 992 Continuous-time Markov Chains Spring 2004

Statistics 992 Continuous-time Markov Chains Spring 2004 Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics

More information

A Generalization of Wigner s Law

A Generalization of Wigner s Law A Generalization of Wigner s Law Inna Zakharevich June 2, 2005 Abstract We present a generalization of Wigner s semicircle law: we consider a sequence of probability distributions (p, p 2,... ), with mean

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes

Lecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

Descriptional Complexity of Formal Systems (Draft) Deadline for submissions: April 20, 2009 Final versions: June 18, 2009

Descriptional Complexity of Formal Systems (Draft) Deadline for submissions: April 20, 2009 Final versions: June 18, 2009 DCFS 2009 Descriptional Complexity of Formal Systems (Draft) Deadline for submissions: April 20, 2009 Final versions: June 18, 2009 On the Number of Membranes in Unary P Systems Rudolf Freund (A,B) Andreas

More information

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing

More information

Linear-fractional branching processes with countably many types

Linear-fractional branching processes with countably many types Branching processes and and their applications Badajoz, April 11-13, 2012 Serik Sagitov Chalmers University and University of Gothenburg Linear-fractional branching processes with countably many types

More information

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected 4. Markov Chains A discrete time process {X n,n = 0,1,2,...} with discrete state space X n {0,1,2,...} is a Markov chain if it has the Markov property: P[X n+1 =j X n =i,x n 1 =i n 1,...,X 0 =i 0 ] = P[X

More information

arxiv:math.pr/ v1 17 May 2004

arxiv:math.pr/ v1 17 May 2004 Probabilistic Analysis for Randomized Game Tree Evaluation Tämur Ali Khan and Ralph Neininger arxiv:math.pr/0405322 v1 17 May 2004 ABSTRACT: We give a probabilistic analysis for the randomized game tree

More information

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Diagonal matrix solutions of a discrete-time Lyapunov inequality

Diagonal matrix solutions of a discrete-time Lyapunov inequality Diagonal matrix solutions of a discrete-time Lyapunov inequality Harald K. Wimmer Mathematisches Institut Universität Würzburg D-97074 Würzburg, Germany February 3, 1997 Abstract Diagonal solutions of

More information

Eigenvalues, random walks and Ramanujan graphs

Eigenvalues, random walks and Ramanujan graphs Eigenvalues, random walks and Ramanujan graphs David Ellis 1 The Expander Mixing lemma We have seen that a bounded-degree graph is a good edge-expander if and only if if has large spectral gap If G = (V,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with

More information

SMSTC (2007/08) Probability.

SMSTC (2007/08) Probability. SMSTC (27/8) Probability www.smstc.ac.uk Contents 12 Markov chains in continuous time 12 1 12.1 Markov property and the Kolmogorov equations.................... 12 2 12.1.1 Finite state space.................................

More information

Lecture 9 : Identifiability of Markov Models

Lecture 9 : Identifiability of Markov Models Lecture 9 : Identifiability of Markov Models MATH285K - Spring 2010 Lecturer: Sebastien Roch References: [SS03, Chapter 8]. Previous class THM 9.1 (Uniqueness of tree metric representation) Let δ be a

More information

SIMILAR MARKOV CHAINS

SIMILAR MARKOV CHAINS SIMILAR MARKOV CHAINS by Phil Pollett The University of Queensland MAIN REFERENCES Convergence of Markov transition probabilities and their spectral properties 1. Vere-Jones, D. Geometric ergodicity in

More information

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321 Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process

More information

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition

Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition Harvard CS 121 and CSCI E-207 Lecture 12: General Context-Free Recognition Salil Vadhan October 11, 2012 Reading: Sipser, Section 2.3 and Section 2.1 (material on Chomsky Normal Form). Pumping Lemma for

More information

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Proceedings of Machine Learning Research vol 73:153-164, 2017 AMBN 2017 On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Kei Amii Kyoto University Kyoto

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

Lecture 21: Spectral Learning for Graphical Models

Lecture 21: Spectral Learning for Graphical Models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

More information

14 Branching processes

14 Branching processes 4 BRANCHING PROCESSES 6 4 Branching processes In this chapter we will consider a rom model for population growth in the absence of spatial or any other resource constraints. So, consider a population of

More information

Model reversibility of a two dimensional reflecting random walk and its application to queueing network

Model reversibility of a two dimensional reflecting random walk and its application to queueing network arxiv:1312.2746v2 [math.pr] 11 Dec 2013 Model reversibility of a two dimensional reflecting random walk and its application to queueing network Masahiro Kobayashi, Masakiyo Miyazawa and Hiroshi Shimizu

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

ELEMENTARY LINEAR ALGEBRA

ELEMENTARY LINEAR ALGEBRA ELEMENTARY LINEAR ALGEBRA K R MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND First Printing, 99 Chapter LINEAR EQUATIONS Introduction to linear equations A linear equation in n unknowns x,

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS

A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS A REVIEW AND APPLICATION OF HIDDEN MARKOV MODELS AND DOUBLE CHAIN MARKOV MODELS Michael Ryan Hoff A Dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment

More information

An Introduction to Entropy and Subshifts of. Finite Type

An Introduction to Entropy and Subshifts of. Finite Type An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview

More information

Markov Chains Handout for Stat 110

Markov Chains Handout for Stat 110 Markov Chains Handout for Stat 0 Prof. Joe Blitzstein (Harvard Statistics Department) Introduction Markov chains were first introduced in 906 by Andrey Markov, with the goal of showing that the Law of

More information

New Negative Latin Square Type Partial Difference Sets in Nonelementary Abelian 2-groups and 3-groups

New Negative Latin Square Type Partial Difference Sets in Nonelementary Abelian 2-groups and 3-groups New Negative Latin Square Type Partial Difference Sets in Nonelementary Abelian 2-groups and 3-groups John Polhill Department of Mathematics, Computer Science, and Statistics Bloomsburg University Bloomsburg,

More information

1 Adeles over Q. 1.1 Absolute values

1 Adeles over Q. 1.1 Absolute values 1 Adeles over Q 1.1 Absolute values Definition 1.1.1 (Absolute value) An absolute value on a field F is a nonnegative real valued function on F which satisfies the conditions: (i) x = 0 if and only if

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

arxiv: v1 [math.pr] 21 Mar 2014

arxiv: v1 [math.pr] 21 Mar 2014 Asymptotic distribution of two-protected nodes in ternary search trees Cecilia Holmgren Svante Janson March 2, 24 arxiv:4.557v [math.pr] 2 Mar 24 Abstract We study protected nodes in m-ary search trees,

More information

Supplemental for Spectral Algorithm For Latent Tree Graphical Models

Supplemental for Spectral Algorithm For Latent Tree Graphical Models Supplemental for Spectral Algorithm For Latent Tree Graphical Models Ankur P. Parikh, Le Song, Eric P. Xing The supplemental contains 3 main things. 1. The first is network plots of the latent variable

More information

Asymptotic distribution of two-protected nodes in ternary search trees

Asymptotic distribution of two-protected nodes in ternary search trees Asymptotic distribution of two-protected nodes in ternary search trees Cecilia Holmgren Svante Janson March 2, 204; revised October 5, 204 Abstract We study protected nodes in m-ary search trees, by putting

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 11 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,, a n, b are given real

More information

Notes on Paths, Trees and Lagrange Inversion

Notes on Paths, Trees and Lagrange Inversion Notes on Paths, Trees and Lagrange Inversion Today we are going to start with a problem that may seem somewhat unmotivated, and solve it in two ways. From there, we will proceed to discuss applications

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 2. Countable Markov Chains I started Chapter 2 which talks about Markov chains with a countably infinite number of states. I did my favorite example which is on

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

Faithful couplings of Markov chains: now equals forever

Faithful couplings of Markov chains: now equals forever Faithful couplings of Markov chains: now equals forever by Jeffrey S. Rosenthal* Department of Statistics, University of Toronto, Toronto, Ontario, Canada M5S 1A1 Phone: (416) 978-4594; Internet: jeff@utstat.toronto.edu

More information

CASE STUDY: EXTINCTION OF FAMILY NAMES

CASE STUDY: EXTINCTION OF FAMILY NAMES CASE STUDY: EXTINCTION OF FAMILY NAMES The idea that families die out originated in antiquity, particilarly since the establishment of patrilineality (a common kinship system in which an individual s family

More information

Lecture 9 Classification of States

Lecture 9 Classification of States Lecture 9: Classification of States of 27 Course: M32K Intro to Stochastic Processes Term: Fall 204 Instructor: Gordan Zitkovic Lecture 9 Classification of States There will be a lot of definitions and

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

1.1.1 Algebraic Operations

1.1.1 Algebraic Operations 1.1.1 Algebraic Operations We need to learn how our basic algebraic operations interact. When confronted with many operations, we follow the order of operations: Parentheses Exponentials Multiplication

More information

A PERIODIC APPROACH TO PLANE PARTITION CONGRUENCES

A PERIODIC APPROACH TO PLANE PARTITION CONGRUENCES A PERIODIC APPROACH TO PLANE PARTITION CONGRUENCES MATTHEW S. MIZUHARA, JAMES A. SELLERS, AND HOLLY SWISHER Abstract. Ramanujan s celebrated congruences of the partition function p(n have inspired a vast

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Lecture notes for the course Games on Graphs B. Srivathsan Chennai Mathematical Institute, India 1 Markov Chains We will define Markov chains in a manner that will be useful to

More information

Foundations of Matrix Analysis

Foundations of Matrix Analysis 1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the

More information

Solution. Daozheng Chen. Challenge 1

Solution. Daozheng Chen. Challenge 1 Solution Daozheng Chen 1 For all the scatter plots and 2D histogram plots within this solution, the x axis is for the saturation component, and the y axis is the value component. Through out the solution,

More information

Notes on Linear Algebra and Matrix Theory

Notes on Linear Algebra and Matrix Theory Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Sequence modelling. Marco Saerens (UCL) Slides references

Sequence modelling. Marco Saerens (UCL) Slides references Sequence modelling Marco Saerens (UCL) Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004), Introduction to machine learning.

More information

Streaming Algorithms for Optimal Generation of Random Bits

Streaming Algorithms for Optimal Generation of Random Bits Streaming Algorithms for Optimal Generation of Random Bits ongchao Zhou Electrical Engineering Department California Institute of echnology Pasadena, CA 925 Email: hzhou@caltech.edu Jehoshua Bruck Electrical

More information

A matrix over a field F is a rectangular array of elements from F. The symbol

A matrix over a field F is a rectangular array of elements from F. The symbol Chapter MATRICES Matrix arithmetic A matrix over a field F is a rectangular array of elements from F The symbol M m n (F ) denotes the collection of all m n matrices over F Matrices will usually be denoted

More information

Modern Discrete Probability Branching processes

Modern Discrete Probability Branching processes Modern Discrete Probability IV - Branching processes Review Sébastien Roch UW Madison Mathematics November 15, 2014 1 Basic definitions 2 3 4 Galton-Watson branching processes I Definition A Galton-Watson

More information

CS 188 Introduction to AI Fall 2005 Stuart Russell Final

CS 188 Introduction to AI Fall 2005 Stuart Russell Final NAME: SID#: Section: 1 CS 188 Introduction to AI all 2005 Stuart Russell inal You have 2 hours and 50 minutes. he exam is open-book, open-notes. 100 points total. Panic not. Mark your answers ON HE EXAM

More information

Machine Learning for natural language processing

Machine Learning for natural language processing Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

A fast algorithm to generate necklaces with xed content

A fast algorithm to generate necklaces with xed content Theoretical Computer Science 301 (003) 477 489 www.elsevier.com/locate/tcs Note A fast algorithm to generate necklaces with xed content Joe Sawada 1 Department of Computer Science, University of Toronto,

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

SMT 2013 Power Round Solutions February 2, 2013

SMT 2013 Power Round Solutions February 2, 2013 Introduction This Power Round is an exploration of numerical semigroups, mathematical structures which appear very naturally out of answers to simple questions. For example, suppose McDonald s sells Chicken

More information

MARKING A BINARY TREE PROBABILISTIC ANALYSIS OF A RANDOMIZED ALGORITHM

MARKING A BINARY TREE PROBABILISTIC ANALYSIS OF A RANDOMIZED ALGORITHM MARKING A BINARY TREE PROBABILISTIC ANALYSIS OF A RANDOMIZED ALGORITHM XIANG LI Abstract. This paper centers on the analysis of a specific randomized algorithm, a basic random process that involves marking

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Eigenvectors Via Graph Theory

Eigenvectors Via Graph Theory Eigenvectors Via Graph Theory Jennifer Harris Advisor: Dr. David Garth October 3, 2009 Introduction There is no problem in all mathematics that cannot be solved by direct counting. -Ernst Mach The goal

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Lecture 3: Markov chains.

Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

A NOTE ON TENSOR CATEGORIES OF LIE TYPE E 9

A NOTE ON TENSOR CATEGORIES OF LIE TYPE E 9 A NOTE ON TENSOR CATEGORIES OF LIE TYPE E 9 ERIC C. ROWELL Abstract. We consider the problem of decomposing tensor powers of the fundamental level 1 highest weight representation V of the affine Kac-Moody

More information

Markov chains and the number of occurrences of a word in a sequence ( , 11.1,2,4,6)

Markov chains and the number of occurrences of a word in a sequence ( , 11.1,2,4,6) Markov chains and the number of occurrences of a word in a sequence (4.5 4.9,.,2,4,6) Prof. Tesler Math 283 Fall 208 Prof. Tesler Markov Chains Math 283 / Fall 208 / 44 Locating overlapping occurrences

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Improved TBL algorithm for learning context-free grammar

Improved TBL algorithm for learning context-free grammar Proceedings of the International Multiconference on ISSN 1896-7094 Computer Science and Information Technology, pp. 267 274 2007 PIPS Improved TBL algorithm for learning context-free grammar Marcin Jaworski

More information

(Inv) Computing Invariant Factors Math 683L (Summer 2003)

(Inv) Computing Invariant Factors Math 683L (Summer 2003) (Inv) Computing Invariant Factors Math 683L (Summer 23) We have two big results (stated in (Can2) and (Can3)) concerning the behaviour of a single linear transformation T of a vector space V In particular,

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1 MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Directed Probabilistic Graphical Models CMSC 678 UMBC

Directed Probabilistic Graphical Models CMSC 678 UMBC Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,

More information

Mean-field dual of cooperative reproduction

Mean-field dual of cooperative reproduction The mean-field dual of systems with cooperative reproduction joint with Tibor Mach (Prague) A. Sturm (Göttingen) Friday, July 6th, 2018 Poisson construction of Markov processes Let (X t ) t 0 be a continuous-time

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information

Stochastic modelling of epidemic spread

Stochastic modelling of epidemic spread Stochastic modelling of epidemic spread Julien Arino Department of Mathematics University of Manitoba Winnipeg Julien Arino@umanitoba.ca 19 May 2012 1 Introduction 2 Stochastic processes 3 The SIS model

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Resistance Growth of Branching Random Networks

Resistance Growth of Branching Random Networks Peking University Oct.25, 2018, Chengdu Joint work with Yueyun Hu (U. Paris 13) and Shen Lin (U. Paris 6), supported by NSFC Grant No. 11528101 (2016-2017) for Research Cooperation with Oversea Investigators

More information

MULTI-ORDERED POSETS. Lisa Bishop Department of Mathematics, Occidental College, Los Angeles, CA 90041, United States.

MULTI-ORDERED POSETS. Lisa Bishop Department of Mathematics, Occidental College, Los Angeles, CA 90041, United States. INTEGERS: ELECTRONIC JOURNAL OF COMBINATORIAL NUMBER THEORY 7 (2007), #A06 MULTI-ORDERED POSETS Lisa Bishop Department of Mathematics, Occidental College, Los Angeles, CA 90041, United States lbishop@oxy.edu

More information