Models of Language Evolution: Part II
|
|
- Alexander Wilkerson
- 6 years ago
- Views:
Transcription
1 Models of Language Evolution: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015
2 Main Reference Partha Niyogi, The computational nature of language learning and evolution, MIT Press, 2006.
3 Multiple Language Models H = {G 1,..., G n } a space with n grammars corresponding languages L 1,..., L n A on alphabet A probability measure P on A according to which sentences are drawn in data set D speakers of language L i draw sentences according to probability P i with support on L i A (positive examples) distribution P is a weighted combination P = i α i P i where α i = α i,t are the fractions of population (time/generation t) that speak L i, with i α i = 1 learning algorithm A : D H computable function
4 probabilistic convergence: grammar G i is learnable if lim P(A(τ m) = G i ) = 1 m when τ m examples in L i drawn according to P i Population Dynamics State Space: S = space of all possible linguistic compositions of the population identify states s S with possible probability distributions P = (P G ) on H...identify with previous α = (α i ) Example: in 3-parameter model (with 8 possible grammars) have S = {P = (P i ) i=1,...,8 P i 0, i P i = 1} = 7 then distribution on A is P(x) = i P i P i (x)
5 finite sample: probability of formulating a certain hypothesis after a sample of size m p m (G i ) := P(A(τ m ) = G i ) limiting sample: limiting behavior for sample size m p(g i ) := lim m P(A(τ m) = G i ) given at generation/time t a distribution P t S of speakers of the different languages get recursion relation P t+1 = F (P t ) take P t+1,i = p m (G i ) in finite sample case, or P t+1,i = p(g i ) in limiting sample case (P t here determines P)
6 Markov Chain Model specify grammars G i through their syntactic parameters (n = 2 N number of possible settings of parameters) trigger learning algorithm as Markov Chain with 2 N nodes T = transition matrix of the Markov Chain then probabilities p m (G i ) = P(A(τ m ) = G i ) p m (G i ) = (2 N 1 2 N T m ) i i-th component of vector obtained by applying (on the right) matrix T m to normalized row vector 2 N 1 2 N (all components 2 N ) assuming starting with uniform distribution 2 N 1 2 N limiting distribution p(g i ) with T S
7 Interpreting limiting behavior seen in case of Markov Chain Model, if non-unique closed class in Markov Chain decomposition, limiting T matrix can have initial states with different probabilities of reaching different targets Example of matrix T in 3-parameter model seen before 2/5 3/5 1 2/5 3/5 T = starting at s 1 or s 3 will reach s 5 with probability 3/5 and s 2 with probability 2/5 interpret these probabilities as limiting composition of speakers population
8 3-parameter model with homogeneous initial population assume initial population consists only of speakers of one of the languages L i (that is, initial P has P i = 1 and all other P j = 0) after a number of generations observe drift towards other languages: population no longer homogeneous (or remains homogeneous, depending on initial state) at next generation p m (G i ) = (PT m ) i or p(g i ) = (PT ) i with P the initial distribution then this gives new P = (P i ) and recompute p m and p with this P for next generation, etc.
9 Shortcomings of the model simulations in case of 3-parameter model show: +V2-languages remain stable; -V2-languages all drift towards +V2-languages contrary to what known from historical linguistics: languages tend to lose the +V2-parameter (switching to -V2): both English and French historically switched from +V2 to -V2 model relies on many ad hoc assumptions about the state space, the learning algorithm, the probability distributions... some of these are clearly not realistic/adequate to actual language evolution Problem: identify which assumptions are inadequate and modify them and improve the model
10 Can model the S-shape? An observation of Historical Linguistics: S-shape of Language Change a certain change in a language begins very gradually, then proceeds at a much faster rate, then tails off slowly again in evolutionary biology, a logistic shaped curve governs replacement of organisms and of genetic alleles that differ in Darwinian fitness in the case of linguistic change, what mechanism would produce this kind of shape? in the model of language evolution have dependence on maturation time K and probability distribution P (a, b parameters in the 2-languages case; distribution P in multilingual case)
11 simulations in 2-languages model finds: initial rate of language change is highest when K small curves not S-shaped: if start with homogeneous L 1 population, percentage of L 2 speakers with generations first grows to above limiting value, then decreases slopes and location of the peak and of asymptotic value depend on K not clear is these are spurious phenomena due to assumptions in the model, or they say something more about the S-shape empirical observation of historical linguists
12
13 Effect of the dependence on P i probability distributions P i with which sentences in L i are generated these are the underlying parameters of the dynamical system seen already in 2-languages model, as dependence on a, b parameters if think of these parameters as changing in time, can affect a language change by (gradual) modification of the parameters can use for reverse engineering : if know a certain change occurs, find what diachronic modification of the parameters is needed in order to affect that change, then use to model other changes
14 3-parameter model with non-homogeneous initial population P = (P i ) i=1,...,8 probability distribution on the 8 possible grammars of the 3-parameter model P 7 point in 7-dimensional simplex in R 8 + T = transition matrix of the Markov Chain of 3-parameter model given P t (with P 0 = P) this determines P = P t on A which determines entries of T = T t non-homogrneous Markov Chain with T = T t for finite-sample size m use matrix T m t recursion step: next generation distribution P t+1 = P t T m t
15 Stability Analysis given distributions P i used in computing transition matrix T T ij = P(s i s j ) = x A : A(s i,x)=s j P(x) A(s, x) determines algorithm s next hypothesis, given current state of Markov Chain s and input x let T i be transition matrix when P = P i (target language is L i ) data D all coming from L i drawn according to P i (instead of mixture of languages with combination P) at generation/time t new distribution will be P = P t = i P t,i P i, where P t distribution of languages at generation t
16 get T t transition matrix by (T t ) ab = P t,i P i (x) = i i x A : A(s a,x)=s b P t,i (T i ) ab fixed points are solutions of P t+1 = P t finite-sample case size m: P = (P i ) satisfying P = ( i P i T i ) m limiting case: can show it becomes solution P = (P i ) of P = 1 8 (I 8 8 i P i T i ) 1 where I N N identity and 1 N N matrix with all entries 1 assuming initial distribution 1/8 1 8
17 Multilingual Learners realistically, leaners in a multilingual population do not zoom in on a unique grammar, but become multilingual themselves (even if in previous generation each individual speaks only one language) a more realistic model should take this into account instead of learning algorithm A : D H consider as a map A : D M(H) with M(H) = set of probability measures on H if H has n elements, identify M(H) = n 1 simplex
18 Bilingualism learning algorithm A : D [0, 1] a bilingual speaker (languages L 1 and L 2 ) produces sentences x A according to probability with P i supported on L i A P λ (x) = λp 1 (x) + (1 λ)p 2 (x) for a single speaker a particular value λ is fixed over the entire population it varies according to a distribution P(λ) over [0, 1]
19 probability of a sentence x A being produced, when input comes from entire population P(x) = P 1 (x) P(x) = 1 with expectation value P λ (x) P(λ) dλ λ P(λ) dλ + P 2 (x) 1 0 (1 λ) P(λ) dλ P(x) = E P (λ) P 1 (x) + (1 E P (λ)) P 2 (x) E P (λ) = 1 0 λ P(λ) dλ so input from population with probabilities P λ distributed according to P(λ) same as input from single speaker with probability P EP (λ)
20 learner (next generation) will also acquire bilingualism with a factor λ which is deduced (via the learning algorithm) from the incoming data again subdivide m input data sentences into groups 1 n 1 sentences in L 1 L 2 2 n 2 sentences in L 1 L 2 3 n 3 sentences in L 2 L 1 with n 1 + n 2 + n 3 = m three possible learning algorithms 1 A 1 sets λ = n 1 n 1 +n 3 (ignoring ambiguous sentences) 2 n A 1 sets λ = 1 n 1 +n 2 +n 3 (ambiguous sentences read as in L 2 ) 3 A 3 sets λ = n 1+n 2 n 1 +n 2 +n 3 (ambiguous sentences read as in L 1 )
21 at time/generation t set u t := E P (λ) average over population in that generation recursive relation u t+1 from u t in terms of parameters a = P 1 (L 1 L 2 ) b = P 2 (L 1 L 2 ) different weight put on the ambiguous sentences by the probability distributions P i of the two languages Result: recursion for the three algorithms A 1 : u t+1 = A 2 : u t+1 = u t (1 a) A 3 : u t+1 = u t (1 b) + b u t (1 a) u t (1 a) + (1 u t )(1 b)
22 Explanation case of A 2 : n 1 u t+1 = E( ) = E(n 1) n 1 + n 2 + n 3 m = P 1(L 1 L 2 )E P (λ) = (1 a)u t case of A 3 : n 1 + n 2 u t+1 = E( ) = E(n 1) n 1 + n 2 + n 3 m + E(n 2) m = u t (1 a) + u t a + (1 u t )b
23 case of A 1 : n 1 u t+1 = E( ) = ( ) m α n 1 β n 2 γ n n 3 1 n 1 + n 3 n 1 n 2 n 3 n 1 + n 3 with α = (1 a)u t, β = au t + b(1 u t ), γ = (1 b)(1 u t ) u t+1 = = m k=0 m k=0 n 1 =0 = ( m k k ( k n 1 )( ) m α n 1 β m k γ k n n 1 1 k k ) β m k (1 β) k α 1 β = α 1 β u t (1 a) u t (1 a) + (1 u t )(1 b)
Models of Language Evolution
Models of Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Main Reference Partha Niyogi, The computational nature of language learning and evolution, MIT Press, 2006. From
More informationLanguage Acquisition and Parameters: Part II
Language Acquisition and Parameters: Part II Matilde Marcolli CS0: Mathematical and Computational Linguistics Winter 205 Transition Matrices in the Markov Chain Model absorbing states correspond to local
More informationModels of Language Acquisition: Part II
Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical
More informationLanguage Learning Problems in the Principles and Parameters Framework
Language Learning Problems in the Principles and Parameters Framework Partha Niyogi Presented by Chunchuan Lyu March 22, 2016 Partha Niyogi (presented by c.c. lyu) Language Learning March 22, 2016 1 /
More informationProbabilistic Linguistics
Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180 Bernoulli measures finite set A alphabet, strings of arbitrary (finite)
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationCS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions
CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions Instructor: Erik Sudderth Brown University Computer Science April 14, 215 Review: Discrete Markov Chains Some
More informationStructural Induction
Structural Induction In this lecture we ll extend the applicability of induction to many universes, ones where we can define certain kinds of objects by induction, in addition to proving their properties
More information= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1
Properties of Markov Chains and Evaluation of Steady State Transition Matrix P ss V. Krishnan - 3/9/2 Property 1 Let X be a Markov Chain (MC) where X {X n : n, 1, }. The state space is E {i, j, k, }. The
More informationGeometry of Phylogenetic Inference
Geometry of Phylogenetic Inference Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 References N. Eriksson, K. Ranestad, B. Sturmfels, S. Sullivant, Phylogenetic algebraic
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationCS 188 Introduction to AI Fall 2005 Stuart Russell Final
NAME: SID#: Section: 1 CS 188 Introduction to AI all 2005 Stuart Russell inal You have 2 hours and 50 minutes. he exam is open-book, open-notes. 100 points total. Panic not. Mark your answers ON HE EXAM
More informationRandom Times and Their Properties
Chapter 6 Random Times and Their Properties Section 6.1 recalls the definition of a filtration (a growing collection of σ-fields) and of stopping times (basically, measurable random times). Section 6.2
More informationBe sure this exam has 9 pages including the cover. The University of British Columbia
Be sure this exam has 9 pages including the cover The University of British Columbia Sessional Exams 2011 Term 2 Mathematics 303 Introduction to Stochastic Processes Dr. D. Brydges Last Name: First Name:
More informationHidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.
, I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding
More informationPart A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )
Part A 1. A Markov chain is a discrete-time stochastic process, defined by a set of states, a set of transition probabilities (between states), and a set of initial state probabilities; the process proceeds
More informationNatural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 18 Maximum Entropy Models I Welcome back for the 3rd module
More information8: Hidden Markov Models
8: Hidden Markov Models Machine Learning and Real-world Data Simone Teufel and Ann Copestake Computer Laboratory University of Cambridge Lent 2017 Last session: catchup 1 Research ideas from sentiment
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationCO350 Linear Programming Chapter 6: The Simplex Method
CO50 Linear Programming Chapter 6: The Simplex Method rd June 2005 Chapter 6: The Simplex Method 1 Recap Suppose A is an m-by-n matrix with rank m. max. c T x (P ) s.t. Ax = b x 0 On Wednesday, we learned
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationMath Spring 2011 Final Exam
Math 471 - Spring 211 Final Exam Instructions The following exam consists of three problems, each with multiple parts. There are 15 points available on the exam. The highest possible score is 125. Your
More information2007 Winton. Empirical Distributions
1 Empirical Distributions 2 Distributions In the discrete case, a probability distribution is just a set of values, each with some probability of occurrence Probabilities don t change as values occur Example,
More informationFast Inference with Min-Sum Matrix Product
Fast Inference with Min-Sum Matrix Product (or how I finished a homework assignment 15 years later) Pedro Felzenszwalb University of Chicago Julian McAuley Australia National University / NICTA 15 years
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationSTOCHASTIC PROCESSES Basic notions
J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving
More informationCMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009
CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models Jimmy Lin The ischool University of Maryland Wednesday, September 30, 2009 Today s Agenda The great leap forward in NLP Hidden Markov
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More information6.207/14.15: Networks Lecture 12: Generalized Random Graphs
6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model
More informationMatrix Factorization Reading: Lay 2.5
Matrix Factorization Reading: Lay 2.5 October, 20 You have seen that if we know the inverse A of a matrix A, we can easily solve the equation Ax = b. Solving a large number of equations Ax = b, Ax 2 =
More informationSampling Algorithms for Probabilistic Graphical models
Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir
More informationMarkov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015
Markov Chains Arnoldo Frigessi Bernd Heidergott November 4, 2015 1 Introduction Markov chains are stochastic models which play an important role in many applications in areas as diverse as biology, finance,
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationLearning Objectives for Math 165
Learning Objectives for Math 165 Chapter 2 Limits Section 2.1: Average Rate of Change. State the definition of average rate of change Describe what the rate of change does and does not tell us in a given
More informationFormal Languages 2: Group Theory
2: Group Theory Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Group G, with presentation G = X R (finitely presented) X (finite) set of generators x 1,..., x N R (finite)
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationHomework 6: Image Completion using Mixture of Bernoullis
Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled
More informationProseminar on Semantic Theory Fall 2010 Ling 720. Remko Scha (1981/1984): Distributive, Collective and Cumulative Quantification
1. Introduction Remko Scha (1981/1984): Distributive, Collective and Cumulative Quantification (1) The Importance of Scha (1981/1984) The first modern work on plurals (Landman 2000) There are many ideas
More informationMATH 564/STAT 555 Applied Stochastic Processes Homework 2, September 18, 2015 Due September 30, 2015
ID NAME SCORE MATH 56/STAT 555 Applied Stochastic Processes Homework 2, September 8, 205 Due September 30, 205 The generating function of a sequence a n n 0 is defined as As : a ns n for all s 0 for which
More informationModels of Molecular Evolution
Models of Molecular Evolution Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison September 15, 2007 Genetics 875 (Fall 2009) Molecular Evolution September 15, 2009 1 /
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs
More information26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.
10-708: Probabilistic Graphical Models, Spring 2015 26 : Spectral GMs Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G. 1 Introduction A common task in machine learning is to work with
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More informationCOMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017
COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SEQUENTIAL DATA So far, when thinking
More information10-701/ Machine Learning, Fall
0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationGeneralized Quantifiers Logical and Linguistic Aspects
Generalized Quantifiers Logical and Linguistic Aspects Lecture 1: Formal Semantics and Generalized Quantifiers Dag Westerståhl University of Gothenburg SELLC 2010 Institute for Logic and Cognition, Sun
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationSelection and Adversary Arguments. COMP 215 Lecture 19
Selection and Adversary Arguments COMP 215 Lecture 19 Selection Problems We want to find the k'th largest entry in an unsorted array. Could be the largest, smallest, median, etc. Ideas for an n lg n algorithm?
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationEvolutionary dynamics on graphs
Evolutionary dynamics on graphs Laura Hindersin May 4th 2015 Max-Planck-Institut für Evolutionsbiologie, Plön Evolutionary dynamics Main ingredients: Fitness: The ability to survive and reproduce. Selection
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationBirth-death chain. X n. X k:n k,n 2 N k apple n. X k: L 2 N. x 1:n := x 1,...,x n. E n+1 ( x 1:n )=E n+1 ( x n ), 8x 1:n 2 X n.
Birth-death chains Birth-death chain special type of Markov chain Finite state space X := {0,...,L}, with L 2 N X n X k:n k,n 2 N k apple n Random variable and a sequence of variables, with and A sequence
More informationA&S 320: Mathematical Modeling in Biology
A&S 320: Mathematical Modeling in Biology David Murrugarra Department of Mathematics, University of Kentucky http://www.ms.uky.edu/~dmu228/as320/ Spring 2016 David Murrugarra (University of Kentucky) A&S
More informationIntroduction to Semantics. The Formalization of Meaning 1
The Formalization of Meaning 1 1. Obtaining a System That Derives Truth Conditions (1) The Goal of Our Enterprise To develop a system that, for every sentence S of English, derives the truth-conditions
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Dynamic Programming II Date: 10/12/17 12.1 Introduction Today we re going to do a couple more examples of dynamic programming. While
More informationImproved TBL algorithm for learning context-free grammar
Proceedings of the International Multiconference on ISSN 1896-7094 Computer Science and Information Technology, pp. 267 274 2007 PIPS Improved TBL algorithm for learning context-free grammar Marcin Jaworski
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationProgramming Assignment 4: Image Completion using Mixture of Bernoullis
Programming Assignment 4: Image Completion using Mixture of Bernoullis Deadline: Tuesday, April 4, at 11:59pm TA: Renie Liao (csc321ta@cs.toronto.edu) Submission: You must submit two files through MarkUs
More informationPageRank. Ryan Tibshirani /36-662: Data Mining. January Optional reading: ESL 14.10
PageRank Ryan Tibshirani 36-462/36-662: Data Mining January 24 2012 Optional reading: ESL 14.10 1 Information retrieval with the web Last time we learned about information retrieval. We learned how to
More informationCS 277: Data Mining. Mining Web Link Structure. CS 277: Data Mining Lectures Analyzing Web Link Structure Padhraic Smyth, UC Irvine
CS 277: Data Mining Mining Web Link Structure Class Presentations In-class, Tuesday and Thursday next week 2-person teams: 6 minutes, up to 6 slides, 3 minutes/slides each person 1-person teams 4 minutes,
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details
More informationNatural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay
Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay Lecture - 21 HMM, Forward and Backward Algorithms, Baum Welch
More informationChapter 2. Some basic tools. 2.1 Time series: Theory Stochastic processes
Chapter 2 Some basic tools 2.1 Time series: Theory 2.1.1 Stochastic processes A stochastic process is a sequence of random variables..., x 0, x 1, x 2,.... In this class, the subscript always means time.
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationToday s Agenda. Need to cover lots of background material. Now on to the Map Reduce stuff. Rough conceptual sketch of unsupervised training using EM
Today s Agenda Need to cover lots of background material l Introduction to Statistical Models l Hidden Markov Models l Part of Speech Tagging l Applying HMMs to POS tagging l Expectation-Maximization (EM)
More informationHidden Markov Models. Terminology, Representation and Basic Problems
Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationAn introduction to PRISM and its applications
An introduction to PRISM and its applications Yoshitaka Kameya Tokyo Institute of Technology 2007/9/17 FJ-2007 1 Contents What is PRISM? Two examples: from population genetics from statistical natural
More information5. Density evolution. Density evolution 5-1
5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;
More information2 DISCRETE-TIME MARKOV CHAINS
1 2 DISCRETE-TIME MARKOV CHAINS 21 FUNDAMENTAL DEFINITIONS AND PROPERTIES From now on we will consider processes with a countable or finite state space S {0, 1, 2, } Definition 1 A discrete-time discrete-state
More informationAn Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI
An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationStochastic Processes
Stochastic Processes 8.445 MIT, fall 20 Mid Term Exam Solutions October 27, 20 Your Name: Alberto De Sole Exercise Max Grade Grade 5 5 2 5 5 3 5 5 4 5 5 5 5 5 6 5 5 Total 30 30 Problem :. True / False
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output
More information