BAYESIAN MACHINE LEARNING: THEORETICAL FOUNDATIONS. Bayesian Machine Learning, Frederic Pennerath

Size: px
Start display at page:

Download "BAYESIAN MACHINE LEARNING: THEORETICAL FOUNDATIONS. Bayesian Machine Learning, Frederic Pennerath"

Transcription

1 BAYESIAN MACHINE LEARNING: THEORETICAL FOUNDATIONS

2 Overview 1. Define a model family as a type of joint distribution P X = P X 1,, X m Digression on Bayesian Network. Predict / estimate outputs from a learnt model 3. Learn / estimate a model distribution from data x 1 i,, x m i 1 i n Likelihood & Bayesian inference. Conjugate Prior Bayes estimator, MAP, MLE MLE of discrete distribution

3 DEFINING A MODEL

4 Discriminative Models Produce output samples from input: Model parameters θ Input X Model P Y = y X = x, θ Output Y Example: Y = c 0 + c 1 X 1 + c X + ε where ε~n 0, σ ε P Y X1 =x 1,X =x y = 1 e y c 0 c1x1 cx σ ε πσ ε X = X 1, X T and θ = c 0, c 1, c, σ ε T

5 Generative Models Produce full samples: Model parameters θ Model P(Z = z θ) Output Z Example: P X1,X,Y x 1, x, y = 1 πσ 1 σ σ ε e x 1 m 1 σ 1 e x m σ e y c 0 c 1 x 1 c x σ ε Z = X 1, X, Y T and θ = m 1, σ 1, m, σ, c 0, c 1, c, σ ε T Descriptive models subsume discriminative models P Y, X θ P Y X, θ = P(X θ) = P Z X, Y θ P Z X, Y = y θ dy Most common in Bayesian Machine Learning / Described by Bayesian Networks

6 The curse of dimensionality: Back to the university wrestling club example Variable Values Meaning M yes, no Member of the wrestling club H 140, 145, 35, 40 Height W 40, 45, 135, 140 Weight G female, male Gender D never, sometimes, often Sport looking (dress code, etc) Number of parameters? Number of students? Defining P M H, W, G, D) requires: θ = = 400 parameters at least ~ students

7 Graphical Models Bayesian Networks A tool to specify joint distribution P X = P A, B,, G A -level specification : Directed acyclic graph (DAG) of vertices A, B,, G Specify P X as a product of factors Specify independence relations DAG + Conditional Probability Tables (CPT) Specify the numerical values of factors Fully specify P X

8 First level of Bayesian Network: a factorization model of joint distribution P A, G a, b,, g = P A a P B A=a b P C c P D B=b,C=c d P E C=c e P F B=b,D=d,E=e f P G F=f g

9 Second level of Bayesian Network: conditional probability tables (CPTs) A P A D B C P D B,C B A P B A P A, G a, b,, g = P A a P B A=a b P C c P D B=b,C=c d P E C=c e P F B=b,D=d,E=e f P G F=f g

10 Bayesian Networks and Causality Causality : event A is a causal variable for B if A and B are (strongly) dependent and A occurs before B Real world phenomenae have causal models Bayesian Networks naturally formalize causal problems Parents of variable V are immediate causes for V Causality spreads from orphan variables to childless ones However Bayesian Networks do not say anything about causality, only about dependence Rain Weather (Clouds) Sun Corn Corn (last year) Pest

11 Rules for independence: an intuitive interpretation Weather (clouds) Corn (last year) Rain Sun Pest A C B Rules : (A B) (A B) C Corn Predators Corn Corn last year P(corn corn last year) < P(corn no corn last year) Corn Corn last year Pest P(corn corn last year & no pest) = P(corn no corn last year & no pest) A D Rules : C B (A B) (A B) C (A B) D Corn Predators P(no corn predators) > P(no corn no predator) Corn Predators Pest P(no corn predators & pest) = P(no corn pest) Corn Predat. Corn last year P(no corn predators & corn last year) > P(no corn no predators & corn last year) A B Rules : Sun Pest P(dark pest) = P(dark no pest) C D (A B) (A B) C (A B) D Sun Pest Corn P(dark no pest & no corn) > P(dark pest & no corn) Sun Pest Pest next year P(dark no pest & no pest next year) > P(dark pest & no pest next year)

12 Rules for independence

13 Rules for independence

14 d-separation : a general theorem for independence Definition of d-separation : X and Y are d-separated given Z if for all path P from X to Y, P contains one of the blocking configurations: X A B C Y such that B Z X A B C Y or X A B C Y such that B Z X A B C Y such that B and all descendants of B Z Theorem: X and Y are independent given Z if and only if X X, Y Y, X and Y are d-separated given Z

15 d-separation : example

16 Example of modelling: Back to the university wrestling club Variable Values Meaning M yes, no Member of the wrestling club H 140, 145, 35, 40 Height W 40, 45, 135, 140 Weight G female, male Gender D never, sometimes, often Sport looking (dress code, etc) S yes,no Practices some sport Hidden or latent variable

17 Example of modelling: Back to the university wrestling club

18 Bayesian Network: continuous random variables Given continuous X of parents Y 1, Y k all discrete: Assume some parameterized distribution for X Y 1, Y k e.g. H G = g~n(μ g, σ g ) 4 parameters (μ m, σ m, μ f, σ f ) For continuous parents, introduce parameterized dependency with them, e.g : W G = g, H = h, M = m~n(μ0 g,m + μ1 g,m h, σ 0 g,m + σ 1 g,m h) parameters μ g,m, μ g,m, σ 0 g,m, σ 1 g,m g m,f,m y,n Reduce number of parameters and overfitting: 4+16 instead of parameters!

19 USING A MODEL or how to estimate the output of a model

20 Model output prediction Inputs X Model m Model parameters θ Outputs Y Prediction: deduction of output distribution Y from X and θ Weak Bayesian model: θ is known e.g. mean of Y: E Y X, θ = y P Y = y X, θ dy Strong Bayesian model: θ is uncertain, modelled by Θ~P Θ θ κ e.g. mean of Y: E Y X, κ = y P Y = y X, θ dy P Θ θ κ dθ

21 A very simple example: defining the model Requests Server Responses Processing time server ~N(μ, σ ) Specs say: μ μ 0 ± σ 0, σ σ T ± 0 With: μ 0 = 50 ms, σ 0 = 5 ms σ T = 10 ms Generative model: No input Output: processing time T Parameters: θ = (μ, σ) Normal processing time not realistic (why?) Weak Bayesian: θ = (μ 0, σ T ) Strong Bayesian: P Θ κ e μ μ 0 σ 0 δ σ, σ T Hyperparameters: κ = μ 0, σ 0, σ T

22 A very simple example: applying the model Weak Bayesian model: P T κ t e t μ 0 σ T Strong Bayesian model: P T κ t e t μ 0 σ T +σ 0 avec

23 LEARNING A MODEL or how to estimate the parameters of a model from the data

24 Model output prediction Inputs X Model m Model parameters θ Outputs Y Bayesian prediction: deduction of output Y from X and P Θ κ P Y X, κ = P Y X, θ P Θ κ θ dθ

25 Model estimation: the learning step Inputs X Model m Model parameters θ Outputs Y Estimation: induction of parameters θ from data/observations 1 O = o i 1 i n Replace P Θ κ by distribution P Θ κ, O Updates prediction: P Y X, κ, O = P Y X, θ θ Bayesian inference: infer P θ κ, O from P θ κ and O But how? P Θ κ,o θ dθ (1) o i = (x i, y i ) or o i = y i

26 Bayesian estimation: The Bayes rule Bayes rule (or theorem): Given events A and B Thomas Bayes (170-61) P A B P B = P B A P A obvious by definition: P A B P B = P A B The heart of bayesian inference: Given a new observation O = o P θ o = P o θ P(O = o) P θ Likelihood L O θ of θ (not a distribution, why?) Posterior of θ (is a distribution) Normalization factor Prior of θ (is a distribution)

27 Bayesian estimation: fundamentally an online approach Hypothesis of i.i.d observations if studied system is stationary: P O θ o 1, o k = Given i.i.d observations O = o 1, o,, o k : P θ O, κ = i P O θ P O κ P O θ o i P θ κ P O θ o i i L O θ Processing order of observations does not matter P θ κ P θ O 1 O, κ L O1 θ L O θ P θ κ L O1 θ P θ O, κ L O θ P θ O 1, κ Batch or online processing of observations

28 Example of Bayesian estimation: Processing time server Requests Server No input Output: processing time T Model: θ = μ, σ, T θ~n(μ, σ ) ~N(μ, σ ) Responses 1. Define a prior on Θ: P Θ κ θ = 1 σ 0 π e μ μ 0 σ 0 δ σ, σ T. Observe T = t and apply Bayes rule: P Θ κ,t=t θ = P T = t θ P(T = t κ) P Θ κ θ = σ T 1 π e P T = t κ t μ σ T σ 0 1 π e μ μ 0 σ 0 δ σ, σ T

29 Example of Bayesian estimation: 3. Compute posterior: P μ, σ T = t e t μ σ μ μ 0 T σ 0 e μ μ 1 σ 1 δ σ, σ T δ σ, σ T μ T = t~n μ 1, σ 1 with t σ + μ 0 μ 1 = T σ 0 1 σ + 1 T σ 0 σ 1 1 = 1 σ + 1 T σ 0 e.g. if t = 56 ms then μ 1 = 51. ms, σ 1 = 4.47 ms! Why μ 1 so close to μ 0?

30 Example of Bayesian estimation: 4. Repeat observation process (assuming obs are i.i.d): Comments: μ T = t 1,, t n ~N n t σ i + 1 T σ μ n σ + 1, n T σ 0 σ + 1 T σ 0 μ is a weighted mean between observation average and prior Slow 1/ n decrease of standard deviation Importance of initial prior σ 0 too small slows down convergence σ 0 too high makes initial guess μ 0 useless n n

31 Choosing the prior P θ Philosophically, the prior should reflect a priori knowledge. The less is known, the more scattered should be the distribution (uniform prior). In practice, choosing a good prior can speed-up the convergence. The prior on parameters introduces new parameters (like μ 0, σ 0 ) called hyper parameters A compromise between two contradictory objectives: Choose representative but intractable prior. Choose tractable but unrealistic prior conjugate prior

32 Tractability issues Problem 1: enumerating all values for θ is impossible Introduce parameterized representation of P θ κ Problem : how to parameterize posterior P θ κ, O by P θ κ Either use approximate representations (sampling) Or use close forms for prior/posterior conjugate priors

33 Exact inference and the notion of conjugate prior In general no simple expression of posterior P θ O, κ P O θ P θ κ Special case of conjugate prior: when prior and posterior have analog closed forms Advantages: computation fast and easy, exact Limitation: might not fit reality. Likelihood P O θ parameters θ and Prior P θ κ and hyper parameters κ Posterior P θ O, κ Normal Y~N μ, σ with σ known Normal μ~n μ 0, σ 0 μ~n n y i σ + μ 0 σ 0 1 n σ + 1, n σ 0 σ + 1 σ 0 See more examples at

34 Making Bayesian decisions Prior P θ κ Loss L θ θ Observations O Bayesian Inference Posterior P θ O, κ Decision Parameter values θ Output of Bayesian estimation is a posterior distribution P θ O, κ Some problems require to choose values for θ for real-time prediction: e.g. navigation systems Choosing θ instead of real θ induces some cost or loss L θ θ NB: not anymore Bayesian strictly speaking

35 Bayes Estimators Prior P θ Loss L θ, θ Observations O Bayesian Inference Posterior P θ O Decision Parameter values θ Bayes estimator: select θ that minimizes average posterior risk Equiv. select θ = argmin θ E Θ O,κ L θ, θ Maximum A Posteriori (MAP): Bayes estimator with uniform risk Equiv. select θ as most probable θ: θ = argmax P Θ O,κ θ θ Maximum Likelihood (MLE): Maximum A Posteriori with uniform prior Equiv. select θ as θ = argmax θ P O θ, κ

36 Maximum A Posteriori Estimator (MAP) Assume uniform loss: 0 if θ= θ or constant (1) otherwise L θ, θ = 1 δ θ θ Then: θ MAP = argmin θ E Θ O,κ L θ, θ = argmin θ E Θ O,κ 1 δ θ θ = argmax E Θ O,κ δ θ θ = argmax θ θ δ θ θ P Θ O,κ θ dθ = argmax P Θ O,κ θ = argmax θ θ P O θ P O κ P θ κ θ MAP = argmax θ P O θ P θ κ

37 Maximum likelihood estimation (MLE) Equivalent to MAP with uniform prior: θ MLE = argmax θ P O θ P θ κ = argmax θ P O θ Often easier to work on log-likelihood Advantages: θ MLE = argmax θ logp O θ Products are transformed into sums Solve numerical precision problems (product of small probabilities)

38 i.i.d observations Common hypothesis: observations Z = z 1,, z n are i.i.d: Identically distributed Stationary process/model Independent random sampling on input space X (generally not true in real data) Consequence on MAP/MLE: n θ MAP = argmax θ L Z θ P θ = argmax θ i=1 L zi θ P θ n θ MAP = argmax θ logp θ + i=1 logl zi θ

39 EXAMPLE OF WEAK AND STRONG BAYESIAN ESTIMATION FOR THE CATEGORICAL DISTRIBUTION MLE of categorical distribution Conjugate prior of categorical distribution

40 Computing MLE of categorical CPTs Problem : estimate MLE of c P M G=g,H=h (yes) θ MLE = argmax θ L Z θ logl Z θ c = i=1 n logl zi θ c = 0 Given data z i = (g i, h i, m i, w i, d i ) P G,H,M,W,D = P G P H G P M G,H P W G,H,M P D G,M,W logl Zi θ = log P G g i + log P H h i g i + log P M m i g i, h i + log P W w i g i, h i, m i + log P D d i g i, m i, w logl Zi θ c = log P M m i g i, h i c

41 Computing MLE of categorical CPTs Three cases to distinguish: m i, g i, h i = yes, g, h log P M m i g i, h i c = log c c = 1 c g i, h i = no, g, h log p M m i g i, h i c = log 1 P M yes g, h c = log 1 c c = 1 1 c g i, h i g, h log P M m i g i, h i c = 0 logl Z c θ = 1 c N m i = yes, g i = g, h i = h 1 1 c N m i = no, g i = g, h i = h =0

42 Computing MLE of categorical CPTs c = N m i = yes, g i = g, h i = h N m i = yes, g i = g, h i = h + N m i = no, g i = g, h i = h c MLE = N m i = yes, g i = g, h i = h N g i = g, h i = h General result: MLE of categorical distribution amounts to compute frequency of occurrences in dataset Generalize to non Boolean categorical variables (Lagrangian optimization)

43 EXAMPLE OF WEAK AND STRONG BAYESIAN ESTIMATION FOR THE CATEGORICAL DISTRIBUTION MLE of categorical distribution Conjugate prior of categorical distribution

44 Dirichlet distribution: definition Definition: Dirichlet distribution Dir α of parameters α = α 1, α c s.t. α i, α i > 0 with P α p 1,, p C = B α = i Γ α i Γ i α i 1 B α C i=1 and Γ t p i α i 1 if c i=1 p i = 1 0 otherwise = 0 + x t 1 e x dt Properties: Samples of P α are categorical distributions Mode: p 1,, p C with i, p i = α i 1 C j=1 α j C Conjugate prior of categorical / multinomial distribution α 3 = α 3 = 4 α = α 3 = 5 α 1 = 6 α = 3 α 3 = 6 α 1 = α = 7 α 1 = 3 α = α 1 = 6

45 Dirichlet distribution: a conjugate prior of categorical distributions Given: c A variable Y of categorical distribution θ = p 1,, p C s.t. i=1 p i = 1 A Dirichlet prior P Θ κ = Dir α on θ n i.i.d obs. O = y 1,, y n of Y : P Θ O,κ θ P O θ P Θ κ θ j p yi i p i α i 1 N p i i i i N p i +α i 1 i i p i α i 1 Posterior P Θ O,κ is Dirichlet Dir N 1 + α 1,, N C + α C

46 Posterior of categorical CPT of Dirichlet prior Theorem: if priors of CPTs P pxj α Xj are independent, posteriors of CPTs P pxj O,α Xj are independent P Θ O,κ θ = L Z θ P Θ κ θ = L Zi θ P Θ κ θ i = P Xj par X j x j i P pxj α Xj p Xj i j j = P Xj par X j x j i P pxj α Xj p Xj j i = P pxj O,α Xj p Xj Consequence: j if CPT P X Y1 =v 1, Y k =v k x has prior Dir α 1,, α C, its posterior is Dir N(x = 1, y 1 = v 1, y k = v k ) + α 1,, N(x = C, y 1 = v 1, y k = v k ) + α C

47 Computing MAP of categorical CPTs: e.g compute MAP of c P M G=g,H=h (yes) for g = Man and h = 1. 80m c MAP = argmax c = mode c P PM g,h O,α M Dir N m i = yes, g i = g, h i = h + α y,g,h, N m i = no, g i = g, h i = h + α no,g,h c = N m i = yes, g i = g, h i = h + α yes,g,h 1 N m i = yes, g i = g, h i = h + N m i = no, g i = g, h i = h + α yes,g,h + α no,g,h c MAP = N m i = yes, g i = g, h i = h + α yes,g,h 1 N g i = g, h i = h + α yes,g,h + α no,g,h MAP consists in introducing faked examples (Laplace smoothing) Solve problem when MLE is undefined (0/0)

48 Bayes Estimation: A summary Bayesian inference: Produce posterior from prior & observations using Bayes rule Strong Bayesian models: model parameters are described by distributions The fully Bayesian approach has often to be broken Tractability/Scalability issue Application requirement to choose parameter values. The weak Bayesian approach uses Bayesian estimation: General case: Bayes Estimator With uniform loss: Maximum A Posteriori Estimator (MAP) With uniform prior: Maximum Likelihood Estimator (MLE)

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Bayesian Networks. Motivation

Bayesian Networks. Motivation Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning. Bayesian Learning Criteria Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Bayesian Networks Basic and simple graphs

Bayesian Networks Basic and simple graphs Bayesian Networks Basic and simple graphs Ullrika Sahlin, Centre of Environmental and Climate Research Lund University, Sweden Ullrika.Sahlin@cec.lu.se http://www.cec.lu.se/ullrika-sahlin Bayesian [Belief]

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Lecture 5: Bayesian Network

Lecture 5: Bayesian Network Lecture 5: Bayesian Network Topics of this lecture What is a Bayesian network? A simple example Formal definition of BN A slightly difficult example Learning of BN An example of learning Important topics

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Directed Graphical Models

Directed Graphical Models Directed Graphical Models Instructor: Alan Ritter Many Slides from Tom Mitchell Graphical Models Key Idea: Conditional independence assumptions useful but Naïve Bayes is extreme! Graphical models express

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32

K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32 The Probabilistic Analysis of Systems in Engineering 1/32 Bayesian probabilistic bili networks Definition and use of Bayesian probabilistic networks K. Nishijima nishijima@ibk.baug.ethz.ch 2/32 Today s

More information

Bayesian Inference for Dirichlet-Multinomials

Bayesian Inference for Dirichlet-Multinomials Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Learning Bayesian Networks (part 1) Goals for the lecture

Learning Bayesian Networks (part 1) Goals for the lecture Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Statistical learning. Chapter 20, Sections 1 3 1

Statistical learning. Chapter 20, Sections 1 3 1 Statistical learning Chapter 20, Sections 1 3 Chapter 20, Sections 1 3 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Data Mining 2018 Bayesian Networks (1)

Data Mining 2018 Bayesian Networks (1) Data Mining 2018 Bayesian Networks (1) Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 49 Do you like noodles? Do you like noodles? Race Gender Yes No Black Male 10

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Review: Bayesian learning and inference

Review: Bayesian learning and inference Review: Bayesian learning and inference Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem:

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Statistical Inference: Maximum Likelihood and Bayesian Approaches Statistical Inference: Maximum Likelihood and Bayesian Approaches Surya Tokdar From model to inference So a statistical analysis begins by setting up a model {f (x θ) : θ Θ} for data X. Next we observe

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily

More information

2 : Directed GMs: Bayesian Networks

2 : Directed GMs: Bayesian Networks 10-708: Probabilistic Graphical Models 10-708, Spring 2017 2 : Directed GMs: Bayesian Networks Lecturer: Eric P. Xing Scribes: Jayanth Koushik, Hiroaki Hayashi, Christian Perez Topic: Directed GMs 1 Types

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:

More information

CS540 Machine learning L9 Bayesian statistics

CS540 Machine learning L9 Bayesian statistics CS540 Machine learning L9 Bayesian statistics 1 Last time Naïve Bayes Beta-Bernoulli 2 Outline Bayesian concept learning Beta-Bernoulli model (review) Dirichlet-multinomial model Credible intervals 3 Bayesian

More information

Probabilistic Classification

Probabilistic Classification Bayesian Networks Probabilistic Classification Goal: Gather Labeled Training Data Build/Learn a Probability Model Use the model to infer class labels for unlabeled data points Example: Spam Filtering...

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

EM & Variational Bayes

EM & Variational Bayes EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Statistical Learning. Philipp Koehn. 10 November 2015

Statistical Learning. Philipp Koehn. 10 November 2015 Statistical Learning Philipp Koehn 10 November 2015 Outline 1 Learning agents Inductive learning Decision tree learning Measuring learning performance Bayesian learning Maximum a posteriori and maximum

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Lecture 2: Simple Classifiers

Lecture 2: Simple Classifiers CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities

More information

Day 5: Generative models, structured classification

Day 5: Generative models, structured classification Day 5: Generative models, structured classification Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 22 June 2018 Linear regression

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Reasoning. (Mostly using Bayesian Networks) Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational

More information

Quantifying uncertainty & Bayesian networks

Quantifying uncertainty & Bayesian networks Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information