Latent Variable Models Probabilistic Models in the Study of Language Day 4

Size: px
Start display at page:

Download "Latent Variable Models Probabilistic Models in the Study of Language Day 4"

Transcription

1 Latent Variable Models Probabilistic Models in the Study of Language Day 4 Roger Levy UC San Diego Department of Linguistics

2 Preamble: plate notation for graphical models Here is the kind of hierarchical model we ve seen so far: θ y 11 y 1n1 y 21 y 2n2 y m1 y mnm b 1 b 2 b m Σ b

3 Plate notation for graphical models Here is a more succinct representation of the same model: i θ y N The rectangles with N and m are plates; semantics of a plate with n is replicate this node n times b m Σ b

4 Plate notation for graphical models Here is a more succinct representation of the same model: i θ y N The rectangles with N and m are plates; semantics of a plate with n is replicate this node n times N = m i=1 n i (see previous slide) b m Σ b

5 Plate notation for graphical models Here is a more succinct representation of the same model: i θ y N The rectangles with N and m are plates; semantics of a plate with n is replicate this node n times N = m i=1 n i (see previous slide) The i node is a cluster identity node b m Σ b

6 Plate notation for graphical models Here is a more succinct representation of the same model: i θ y b N m The rectangles with N and m are plates; semantics of a plate with n is replicate this node n times N = m i=1 n i (see previous slide) The i node is a cluster identity node In our previous application of hierarchical models to regression, cluster identities are known Σ b

7 Plate notation for graphical models Here is a more succinct representation of the same model: i θ y b N m The rectangles with N and m are plates; semantics of a plate with n is replicate this node n times N = m i=1 n i (see previous slide) The i node is a cluster identity node In our previous application of hierarchical models to regression, cluster identities are known Σ b

8 The plan for today s lecture θ We are going to study the simplest type of latent-variable models i y N b m Σ b

9 The plan for today s lecture θ We are going to study the simplest type of latent-variable models i y N Technically speaking, latent variable means any variable whose value is unknown b m Σ b

10 The plan for today s lecture θ We are going to study the simplest type of latent-variable models i y b N Technically speaking, latent variable means any variable whose value is unknown But it s conventionally used to refer to hidden structural relations among observations m Σ b

11 The plan for today s lecture θ We are going to study the simplest type of latent-variable models i y b N m Technically speaking, latent variable means any variable whose value is unknown But it s conventionally used to refer to hidden structural relations among observations In today s clustering applications, simply treat i as unknown Σ b

12 The plan for today s lecture θ We are going to study the simplest type of latent-variable models φ i y b N m Technically speaking, latent variable means any variable whose value is unknown But it s conventionally used to refer to hidden structural relations among observations In today s clustering applications, simply treat i as unknown Σ b Inferring values of i induces a clustering among observations; to do so we need to put a probability distribution over i

13 The plan for today s lecture We will cover two types of simple latent-variable models:

14 The plan for today s lecture We will cover two types of simple latent-variable models: The mixture of Gaussians for continuous multivariate data;

15 The plan for today s lecture We will cover two types of simple latent-variable models: The mixture of Gaussians for continuous multivariate data; Latent Dirichlet Allocation (LDA; also called Topic models) for categorical data (words) in collections of documents

16 Mixture of Gaussians Motivating example: how are phonological categories learned

17 Mixture of Gaussians Motivating example: how are phonological categories learned Evidence that learning involves a combination of both innate bias and experience:

18 Mixture of Gaussians Motivating example: how are phonological categories learned Evidence that learning involves a combination of both innate bias and experience: Infants can distinguish some contrasts that adults of speakers lacking them cannot: alveolar [d] versus retroflex [ã] for English speakers, [r] versus [l] for Japanese speakers; Werker and Tees, 1984; Kuhl et al., 2006, inter alia)

19 Mixture of Gaussians Motivating example: how are phonological categories learned Evidence that learning involves a combination of both innate bias and experience: Infants can distinguish some contrasts that adults of speakers lacking them cannot: alveolar [d] versus retroflex [ã] for English speakers, [r] versus [l] for Japanese speakers; Werker and Tees, 1984; Kuhl et al., 2006, inter alia) Other contrasts are not reliably distinguished until 1 year of age by native speakers (e.g., syllable-initial [n] versus [N] in Filipino language environments; Narayan et al., 2010)

20 Learning vowel categories To appreciate the potential difficulties of vowel category learning, consider inter-speaker variation (data courtesy of Vallabha et al., 2007): S1 S Duration 0 2 Duration F F e E i I F F Scatter Plot Matrix

21 Framing the category learning problem Here s 19 speakers data mixed together: Duration F2 2 4 e E i I F1 0 2 Scatter Plot Matrix

22 Framing the category learning problem Learning from such data can be thought of in two ways:

23 Framing the category learning problem Learning from such data can be thought of in two ways: Grouping the observations into categories

24 Framing the category learning problem Learning from such data can be thought of in two ways: Grouping the observations into categories Determining the underlying category representations (positions, shapes, and sizes)

25 Framing the category learning problem Learning from such data can be thought of in two ways: Grouping the observations into categories Determining the underlying category representations (positions, shapes, and sizes) Formally: every possible grouping of observations y into categories represents a partition Π of the observations y.

26 Framing the category learning problem Learning from such data can be thought of in two ways: Grouping the observations into categories Determining the underlying category representations (positions, shapes, and sizes) Formally: every possible grouping of observations y into categories represents a partition Π of the observations y.

27 Framing the category learning problem Learning from such data can be thought of in two ways: Grouping the observations into categories Determining the underlying category representations (positions, shapes, and sizes) Formally: every possible grouping of observations y into categories represents a partition Π of the observations y. If θ are parameters describing category representations, our problem is to infer P(Π, θ y)

28 Framing the category learning problem Learning from such data can be thought of in two ways: Grouping the observations into categories Determining the underlying category representations (positions, shapes, and sizes) Formally: every possible grouping of observations y into categories represents a partition Π of the observations y. If θ are parameters describing category representations, our problem is to infer P(Π, θ y) from which we could recover the two marginal probability distributions of interest: P(Π y) P(θ y) (distr. over partitions given data) (distr. over category properties given data)

29 The mixture of Gaussians Simple generative model of the data: we have k multivariate Gaussians with frequencies φ = φ 1,...,φ k, each with its own mean µ i and covariance matrix Σ i (here we punt on how to induce the correct number of categories)

30 The mixture of Gaussians Simple generative model of the data: we have k multivariate Gaussians with frequencies φ = φ 1,...,φ k, each with its own mean µ i and covariance matrix Σ i (here we punt on how to induce the correct number of categories) N observations are generated i.i.d. by: i Multinom(φ) y N(µ i,σ i )

31 The mixture of Gaussians Simple generative model of the data: we have k multivariate Gaussians with frequencies φ = φ 1,...,φ k, each with its own mean µ i and covariance matrix Σ i (here we punt on how to induce the correct number of categories) N observations are generated i.i.d. by: i Multinom(φ) y N(µ i,σ i ) Here is the corresponding graphical model: φ i y n Σ µ m

32 Can we use maximum likelihood? For observations y all known to come from the same k-dimensional Gaussian, the MLE for the Gaussian s parameters is µ = ȳ 1,ȳ 2,...,ȳ k Var(y 1 ) Cov(y 1,y 2 )... Cov(y 1,y k ) Cov(y 1,y 2 ) Var(y 2 )... Cov(y 1,y k ) Σ = Cov(y 1,y 2 ) Cov(y 1,y 2 )... Var(y k ) where Var and Cov are the sample variance and covariance

33 Can we use maximum likelihood? So you might ask: why not use the method of maximum likelihood, searching through all the possible partitions of the data and choosing the partition that gives the highest data likelihood? y

34 Can we use maximum likelihood? The set of all partitions into 3,3 observations for our example data:

35 Can we use maximum likelihood? This looks like a daunting search task, but there is an even bigger problem.

36 Can we use maximum likelihood? This looks like a daunting search task, but there is an even bigger problem. Suppose I try a partition into 5,1...

37 Can we use maximum likelihood? This looks like a daunting search task, but there is an even bigger problem. Suppose I try a partition into 5,

38 Can we use maximum likelihood? This looks like a daunting search task, but there is an even bigger problem. Suppose I try a partition into 5, ML for this partition:!!!

39 Can we use maximum likelihood? This looks like a daunting search task, but there is an even bigger problem. Suppose I try a partition into 5, ML for this partition:!!! More generally, for a V-dimensional problem you need at least V +1 points in each partition

40 Can we use maximum likelihood? This looks like a daunting search task, but there is an even bigger problem. Suppose I try a partition into 5, ML for this partition:!!! More generally, for a V-dimensional problem you need at least V +1 points in each partition But this constraint would prevent you from finding intuitive solutions to your problem!

41 Bayesian Mixture of Gaussians φ i y n Σ µ m i Multinom(φ) y N(µ i,σ i )

42 Bayesian Mixture of Gaussians φ i y n Σ µ m i Multinom(φ) y N(µ i,σ i ) The Bayesian framework allows us to build in explicit assumptions about what constitutes a sensible category size

43 Bayesian Mixture of Gaussians φ i y n Σ µ m i Multinom(φ) y N(µ i,σ i ) The Bayesian framework allows us to build in explicit assumptions about what constitutes a sensible category size Returning to our graphical model, we put in a prior on category size/shape

44 Bayesian Mixture of Gaussians φ i y n Σ µ m α i Multinom(φ) y N(µ i,σ i ) The Bayesian framework allows us to build in explicit assumptions about what constitutes a sensible category size Returning to our graphical model, we put in a prior on category size/shape

45 Bayesian Mixture of Gaussians φ i y n Σ µ m α i Multinom(φ) y N(µ i,σ i ) For now we will just leave category prior probabilities uniform: φ 1 = φ 2 = φ 3 = φ 4 = 1 4

46 Bayesian Mixture of Gaussians φ i y n Σ µ m α i Multinom(φ) y N(µ i,σ i ) For now we will just leave category prior probabilities uniform: φ 1 = φ 2 = φ 3 = φ 4 = 1 4 Here is a conjugate prior distribution for multivariate Gaussians: Σ i IW(Σ 0,ν) µ i Σ N(µ 0,Σ i /A)

47 The Inverse Wishart distribution Perhaps the best way to understand the Inverse Wishart distribution is to look at samples from it

48 The Inverse Wishart distribution Perhaps the best way to understand the Inverse Wishart distribution is to look at samples from it Below I give samples for Σ = ( )

49 The Inverse Wishart distribution Perhaps the best way to understand the Inverse Wishart distribution is to look at samples from it Below I give samples for Σ = ( )

50 The Inverse Wishart distribution Perhaps the best way to understand the Inverse Wishart distribution is to look at samples from it Below I give samples for Σ = ( ) Here, k = 2 (top row) or k = 5 (bottom row)

51 Inference for Mixture of Gaussians using Gibbs Sampling We still have not given a solution to the search problem

52 Inference for Mixture of Gaussians using Gibbs Sampling We still have not given a solution to the search problem One broadly applicable solution is Gibbs sampling

53 Inference for Mixture of Gaussians using Gibbs Sampling We still have not given a solution to the search problem One broadly applicable solution is Gibbs sampling Simply put:

54 Inference for Mixture of Gaussians using Gibbs Sampling We still have not given a solution to the search problem One broadly applicable solution is Gibbs sampling Simply put: 1. Randomly initialize cluster assignments

55 Inference for Mixture of Gaussians using Gibbs Sampling We still have not given a solution to the search problem One broadly applicable solution is Gibbs sampling Simply put: 1. Randomly initialize cluster assignments 2. On each iteration through the data, for each point:

56 Inference for Mixture of Gaussians using Gibbs Sampling We still have not given a solution to the search problem One broadly applicable solution is Gibbs sampling Simply put: 1. Randomly initialize cluster assignments 2. On each iteration through the data, for each point: 2.1 Forget the cluster assignment of the current point x i

57 Inference for Mixture of Gaussians using Gibbs Sampling We still have not given a solution to the search problem One broadly applicable solution is Gibbs sampling Simply put: 1. Randomly initialize cluster assignments 2. On each iteration through the data, for each point: 2.1 Forget the cluster assignment of the current point x i 2.2 Compute the probability distribution over x i s cluster assignment conditional on the rest of the partition: P(C i x i,π i ) = P(x i C θ i,θ)p(c i θ)p(θ)dθ j P(x θ j C j,θ)p(c j θ)p(θ)dθ

58 Inference for Mixture of Gaussians using Gibbs Sampling We still have not given a solution to the search problem One broadly applicable solution is Gibbs sampling Simply put: 1. Randomly initialize cluster assignments 2. On each iteration through the data, for each point: 2.1 Forget the cluster assignment of the current point x i 2.2 Compute the probability distribution over x i s cluster assignment conditional on the rest of the partition: P(C i x i,π i ) = P(x i C θ i,θ)p(c i θ)p(θ)dθ j P(x θ j C j,θ)p(c j θ)p(θ)dθ 2.3 Randomly sample a cluster assignment for x i from P(C i x i,π i ) and continue

59 Inference for Mixture of Gaussians using Gibbs Sampling We still have not given a solution to the search problem One broadly applicable solution is Gibbs sampling Simply put: 1. Randomly initialize cluster assignments 2. On each iteration through the data, for each point: 2.1 Forget the cluster assignment of the current point x i 2.2 Compute the probability distribution over x i s cluster assignment conditional on the rest of the partition: P(C i x i,π i ) = P(x i C θ i,θ)p(c i θ)p(θ)dθ j P(x θ j C j,θ)p(c j θ)p(θ)dθ 2.3 Randomly sample a cluster assignment for x i from P(C i x i,π i ) and continue 3. Do this for many iterations (e.g., until the unnormalized marginal data likelihood is high)

60 Inference for Mixture of Gaussians using Gibbs Sampling Starting point for our problem:

61 One pass of Gibbs sampling through the data

62 Results of Gibbs sampling with known category probabilities Posterior modes of category structures: F1 versus F2 F1 versus Duration F2 versus Duration F2 0 Duration 0 Duration F F F2

63 Results of Gibbs sampling with known category probabilities Confusion table of assignments of observations to categories: Unsupervised Supervised e e True vowel E i True vowel E i I I Cluster Cluster

64 Extending the model to learning category probabilities The multinomial extension of the beta distribution is the Dirichlet distribution, characterized by parameters α 1,...,α k, and D(π 1,...,π k ): D(π 1,...,π k ) def = 1 Z πα π α π α k 1 k where the normalizing constant Z is Z = Γ(α 1)Γ(α 2 )...Γ(α k ) Γ(α 1 +α 2 + +α k )

65 Extending the model to learning category probabilities So we set: φ D(Σ φ )

66 Extending the model to learning category probabilities So we set: φ D(Σ φ ) Combine this with the rest of the model: Σ i IW(Σ 0,ν) µ i Σ N(µ 0,Σ i /A) i Multinom(φ) y N(µ i,σ i )

67 Extending the model to learning category probabilities So we set: φ D(Σ φ ) Combine this with the rest of the model: Σ i IW(Σ 0,ν) µ i Σ N(µ 0,Σ i /A) i Multinom(φ) y N(µ i,σ i )

68 Extending the model to learning category probabilities So we set: φ D(Σ φ ) Combine this with the rest of the model: Σ i IW(Σ 0,ν) µ i Σ N(µ 0,Σ i /A) i Multinom(φ) y N(µ i,σ i ) Σθ θ φ i y n Σφ b m ΣΣb Σb

69 Having to learn category probabilities too makes the problem harder F1 and F2 F1 and Duration F2 and Duration F2 0 Duration 1 0 Duration F F F2

70 Having to learn category probabilities too makes the problem harder We can make the problem even more challenging by skewing the category probabilities: Category Probability e 0.04 E 0.05 i 0.29 I 0.62

71 Having to learn category probabilities too makes the problem harder F1 and F2 F1 and Duration F2 and Duration F2 0 Duration 1 0 Duration F F F2

72 Having to learn category probabilities too makes the problem harder Confusion tables for these cases: With learning of category frequencies Without learning of category frequencies e e True vowel E i True vowel E i I I Cluster Cluster

73 Summary We can use the exact same models for unsupervised (latent-variable) learning as for hierarchical/mixed-effects regression!

74 Summary We can use the exact same models for unsupervised (latent-variable) learning as for hierarchical/mixed-effects regression! However, category induction presents additional difficulties category learning

75 Summary We can use the exact same models for unsupervised (latent-variable) learning as for hierarchical/mixed-effects regression! However, category induction presents additional difficulties category learning Non-convexity of the objective function difficulty of search

76 Summary We can use the exact same models for unsupervised (latent-variable) learning as for hierarchical/mixed-effects regression! However, category induction presents additional difficulties category learning Non-convexity of the objective function difficulty of search Degeneracy of maximum likelihood

77 Summary We can use the exact same models for unsupervised (latent-variable) learning as for hierarchical/mixed-effects regression! However, category induction presents additional difficulties category learning Non-convexity of the objective function difficulty of search Degeneracy of maximum likelihood In general you need far more data, and/or additional information sources, to converge on good solutions

78 Summary We can use the exact same models for unsupervised (latent-variable) learning as for hierarchical/mixed-effects regression! However, category induction presents additional difficulties category learning Non-convexity of the objective function difficulty of search Degeneracy of maximum likelihood In general you need far more data, and/or additional information sources, to converge on good solutions Relevant references: tons! Read about MOGs for automated speech recognition in Jurafsky and Martin (2008, Chapter 9). See Vallabha et al. (2007) and Feldman et al. (2009) for earlier application of MOGs to phonetic category learning.

79 References I Feldman, N. H., Griffiths, T. L., and Morgan, J. L. (2009). Learning phonetic categories by learning a lexicon. In Proceedings of the 31st Annual Conference of the Cognitive Science Society, pages Cognitive Science Society, Austin, TX. Jurafsky, D. and Martin, J. H. (2008). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, second edition. Kuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., and Iverson, P. (2006). Infants show a facilitation effect for native language phonetic perception between 6 and 12 months. Developmental Science, 9(2):F13 F21. Narayan, C. R., Werker, J. F., and Beddor, P. S. (2010). The interaction between acoustic salience and language experience in developmental speech perception: evidence from nasal place discrimination. Developmental Science, 13(3):

80 References II Vallabha, G. K., McClelland, J. L., Pons, F., Werker, J. F., and Amano, S. (2007). Unsupervised learning of vowel categories from infant-directed speech. Proceedings of the National Academy of Sciences, 104(33): Werker, J. F. and Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. Infant Behavior and Development, 7:49 63.

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Advanced Probabilistic Modeling in R Day 1

Advanced Probabilistic Modeling in R Day 1 Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Statistical Models. David M. Blei Columbia University. October 14, 2014

Statistical Models. David M. Blei Columbia University. October 14, 2014 Statistical Models David M. Blei Columbia University October 14, 2014 We have discussed graphical models. Graphical models are a formalism for representing families of probability distributions. They are

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Acoustic Unit Discovery (AUD) Models. Leda Sarı

Acoustic Unit Discovery (AUD) Models. Leda Sarı Acoustic Unit Discovery (AUD) Models Leda Sarı Lucas Ondel and Lukáš Burget A summary of AUD experiments from JHU Frederick Jelinek Summer Workshop 2016 lsari2@illinois.edu November 07, 2016 1 / 23 The

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL* 3 Conditionals and marginals For Bayesian analysis it is very useful to understand how to write joint, marginal, and conditional distributions for the multivariate

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models SUBMISSION TO IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models Xiaogang Wang, Xiaoxu Ma,

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

A Brief and Friendly Introduction to Mixed-Effects Models in Linguistics

A Brief and Friendly Introduction to Mixed-Effects Models in Linguistics A Brief and Friendly Introduction to Mixed-Effects Models in Linguistics Cluster-specific parameters ( random effects ) Σb Parameters governing inter-cluster variability b1 b2 bm x11 x1n1 x21 x2n2 xm1

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:

More information

Outline. Limits of Bayesian classification Bayesian concept learning Probabilistic models for unsupervised and semi-supervised category learning

Outline. Limits of Bayesian classification Bayesian concept learning Probabilistic models for unsupervised and semi-supervised category learning Outline Limits of Bayesian classification Bayesian concept learning Probabilistic models for unsupervised and semi-supervised category learning Limitations Is categorization just discrimination among mutually

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Day 1: Probability and speech perception

Day 1: Probability and speech perception Day 1: Probability and speech perception 1 Day 2: Human sentence parsing 2 Day 3: Noisy-channel sentence processing? Day 4: Language production & acquisition whatsthat thedoggie yeah wheresthedoggie Grammar/lexicon

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Probabilistic Methods in Linguistics Lecture 2

Probabilistic Methods in Linguistics Lecture 2 Probabilistic Methods in Linguistics Lecture 2 Roger Levy UC San Diego Department of Linguistics October 2, 2012 A bit of review & terminology A Bernoulli distribution was defined as π if x = 1 P(X = x)

More information

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models SUBMISSION TO IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models Xiaogang Wang, Xiaoxu Ma,

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

Bayesian Mixtures of Bernoulli Distributions

Bayesian Mixtures of Bernoulli Distributions Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Interpretable Latent Variable Models

Interpretable Latent Variable Models Interpretable Latent Variable Models Fernando Perez-Cruz Bell Labs (Nokia) Department of Signal Theory and Communications, University Carlos III in Madrid 1 / 24 Outline 1 Introduction to Machine Learning

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information