Day 1: Probability and speech perception

Size: px
Start display at page:

Download "Day 1: Probability and speech perception"

Transcription

1 Day 1: Probability and speech perception 1

2 Day 2: Human sentence parsing 2

3 Day 3: Noisy-channel sentence processing?

4 Day 4: Language production & acquisition whatsthat thedoggie yeah wheresthedoggie Grammar/lexicon (abstract internal representation) whats that the doggie yeah wheres the doggie 4

5 Computational Psycholinguistics Day 1 Klinton Bicknell and Roger Levy Northwestern & UCSD July 7, /38

6 Computational Psycholinguistics Psycholinguistics deals with the problem of how humans 1. comprehend 2. produce 3. acquire language. 2/38

7 Computational Psycholinguistics Psycholinguistics deals with the problem of how humans 1. comprehend 2. produce 3. acquire language. In this class, we will study these problems from a computational, and especially probabilistic/bayesian, perspective. 2/38

8 Class goals Introduce you to the technical foundations of modeling work in the field Overview the literature and major areas in which computational psycholinguistic research is carried out Acquaint you with some of the key models and their empirical support Give you experience in understanding the details of a model from the papers Give you practice in critical analysis of models 3/38

9 What is computational modeling? Why do we do it? Any phenomenon involving human behavior is so complex that we cannot hope to formulate a comprehensive theory Instead, we devise a model that simplifies the phenomenon to capture some key aspect of it 4/38

10 What might we use a model for? Models can serve any of the following (related) functions: 5/38

11 What might we use a model for? Models can serve any of the following (related) functions: Prediction: estimating the behavior/properties of a new state/datum on the basis of an existing dataset 5/38

12 What might we use a model for? Models can serve any of the following (related) functions: Prediction: estimating the behavior/properties of a new state/datum on the basis of an existing dataset Hypothesis testing: as a framework for determining whether a given factor has an appreciable influence on some other variable 5/38

13 What might we use a model for? Models can serve any of the following (related) functions: Prediction: estimating the behavior/properties of a new state/datum on the basis of an existing dataset Hypothesis testing: as a framework for determining whether a given factor has an appreciable influence on some other variable Data simulation: creating artificial data more cheaply and quickly than through empirical data collection 5/38

14 What might we use a model for? Models can serve any of the following (related) functions: Prediction: estimating the behavior/properties of a new state/datum on the basis of an existing dataset Hypothesis testing: as a framework for determining whether a given factor has an appreciable influence on some other variable Data simulation: creating artificial data more cheaply and quickly than through empirical data collection Summarization: If phenomenon X is complex but relevant to phenomenon Y, it can be most effective to use a simple model of X when constructing a model of Y 5/38

15 What might we use a model for? Models can serve any of the following (related) functions: Prediction: estimating the behavior/properties of a new state/datum on the basis of an existing dataset Hypothesis testing: as a framework for determining whether a given factor has an appreciable influence on some other variable Data simulation: creating artificial data more cheaply and quickly than through empirical data collection Summarization: If phenomenon X is complex but relevant to phenomenon Y, it can be most effective to use a simple model of X when constructing a model of Y Insight: Most generally, a good model can be explored in ways that give insight into the phenomenon under consideration 5/38

16 Feedback from you Please take a moment to fill out a sheet of paper with this info: Name (optional) School & Program/Department Year/stage in program Computational Linguistics background Psycholinguistics background Probability/Statistics/Machine Learning background Do you know about (weighted) finite-state automata? Do you know about (probabilistic) context-free grammars? Other courses you re taking at ESSLLI (other side) What do you hope to learn in this class? 6/38

17 Today s content Foundations of probability theory Joint, marginal, and conditional probability Bayes Rule Bayes Nets (a.k.a. directed acyclic graphical models, DAGs) The Gaussian distribution A probabilistic model of human phoneme categorization A probabilistic model of the perceptual magnet effect 7/38

18 Probability spaces Traditionally, probability spaces are defined in terms of sets. An event E is a subset of a sample space Ω: E Ω. 8/38

19 Probability spaces Traditionally, probability spaces are defined in terms of sets. An event E is a subset of a sample space Ω: E Ω. A probability space P on a sample space Ω is a function from events E in Ω to real numbers such that the following three axioms hold: 1. P(E) 0 for all E Ω (non-negativity). 2. If E 1 and E 2 are disjoint, then P(E 1 E 2 ) = P(E 1 )+P(E 2 ) (disjoint union). 3. P(Ω) = 1 (properness). 8/38

20 Probability spaces Traditionally, probability spaces are defined in terms of sets. An event E is a subset of a sample space Ω: E Ω. A probability space P on a sample space Ω is a function from events E in Ω to real numbers such that the following three axioms hold: 1. P(E) 0 for all E Ω (non-negativity). 2. If E 1 and E 2 are disjoint, then P(E 1 E 2 ) = P(E 1 )+P(E 2 ) (disjoint union). 3. P(Ω) = 1 (properness). We can also think of these things as involving logical rather than set relations: Subset A B A B Disjointness E 1 E 2 = (E 1 E 2 ) Union E 1 E 2 E 1 E 2 8/38

21 A simple example In historical English, object NPs could appear both preverbally and postverbally. VP VP Object Verb Verb Object There is a broad cross-linguistic tendency for pronominal objects to occur earlier on average than non-pronominal objects. So, hypothetical probabilities from historical English: X: Y: Pronoun Not Pronoun Object Preverbal Object Postverbal /38

22 A simple example In historical English, object NPs could appear both preverbally and postverbally. VP VP Object Verb Verb Object There is a broad cross-linguistic tendency for pronominal objects to occur earlier on average than non-pronominal objects. So, hypothetical probabilities from historical English: X: Y: Pronoun Not Pronoun Object Preverbal Object Postverbal We will sometimes call this the joint distribution P(X,Y) over two random variables here, verb-object word order X and object pronominality Y. 9/38

23 Checking the axioms of probability 1. P(E) 0 for all E Ω (non-negativity). 2. If E 1 and E 2 are disjoint, then P(E 1 E 2 ) = P(E 1 ) + P(E 2 ) (disjoint union). 3. P(Ω) = 1 (properness). Object Pronoun Not Pronoun Object Preverbal Object Postverbal We can consider the sample space to be Ω ={Preverbal+Pronoun, Preverbal+Not Pronoun, Postverbal+Pronoun, Postverbal+Not Pronoun} 10/38

24 Checking the axioms of probability 1. P(E) 0 for all E Ω (non-negativity). 2. If E 1 and E 2 are disjoint, then P(E 1 E 2 ) = P(E 1 ) + P(E 2 ) (disjoint union). 3. P(Ω) = 1 (properness). Object Pronoun Not Pronoun Object Preverbal Object Postverbal We can consider the sample space to be Ω ={Preverbal+Pronoun, Preverbal+Not Pronoun, Postverbal+Pronoun, Postverbal+Not Pronoun} Disjoint union tells us the probabilities of non-atomic events: 10/38

25 Checking the axioms of probability 1. P(E) 0 for all E Ω (non-negativity). 2. If E 1 and E 2 are disjoint, then P(E 1 E 2 ) = P(E 1 ) + P(E 2 ) (disjoint union). 3. P(Ω) = 1 (properness). Object Pronoun Not Pronoun Object Preverbal Object Postverbal We can consider the sample space to be Ω ={Preverbal+Pronoun, Preverbal+Not Pronoun, Postverbal+Pronoun, Postverbal+Not Pronoun} Disjoint union tells us the probabilities of non-atomic events: If we define E 1 = {Preverbal+Pronoun,Postverbal+Not Pronoun}, then P(E 1 ) = = /38

26 Checking the axioms of probability 1. P(E) 0 for all E Ω (non-negativity). 2. If E 1 and E 2 are disjoint, then P(E 1 E 2 ) = P(E 1 ) + P(E 2 ) (disjoint union). 3. P(Ω) = 1 (properness). Object Pronoun Not Pronoun Object Preverbal Object Postverbal We can consider the sample space to be Ω ={Preverbal+Pronoun, Preverbal+Not Pronoun, Postverbal+Pronoun, Postverbal+Not Pronoun} Disjoint union tells us the probabilities of non-atomic events: If we define E 1 = {Preverbal+Pronoun,Postverbal+Not Pronoun}, then P(E 1 ) = = Check for properness: P(Ω) = = 1 10/38

27 Marginal probability Sometimes we have a joint distribution P(X,Y) over random variables X and Y, but we re interested in the distribution implied over one of them (here, without loss of generality, X) 11/38

28 Marginal probability Sometimes we have a joint distribution P(X,Y) over random variables X and Y, but we re interested in the distribution implied over one of them (here, without loss of generality, X) The marginal probability distribution P(X) is P(X = x) = y P(X = x,y = y) 11/38

29 Marginal probability: an example Y: X: Pronoun Not Pronoun Object Preverbal Object Postverbal Finding the marginal distribution on X: P(X = Preverbal) = P(X = Preverbal,Y = Prenominal) +P(X = Preverbal,Y = Postnominal) = = P(X = Postverbal) = P(X = Postverbal,Y = Prenominal) +P(X = Postverbal,Y = Postnominal) = = /38

30 Marginal probability: an example X: Y: Pronoun Not Pronoun Object Preverbal Object Postverbal So, the marginal distribution on X is P(X) Preverbal Postverbal Likewise, the marginal distribution on Y is P(Y) Pronoun Not Pronoun /38

31 Conditional probability The conditional probability of event B given that A has occurred/is known is defined as follows: P(B A) P(A,B) P(A) 14/38

32 Conditional Probability: an example X: Y: Pronoun Not Pronoun Object Preverbal Object Postverbal P(X) Preverbal Postverbal P(Y) Pronoun Not Pronoun /38

33 Conditional Probability: an example X: Y: Pronoun Not Pronoun Object Preverbal Object Postverbal P(X) Preverbal Postverbal P(Y) Pronoun Not Pronoun How do we calculate the following? P(Y = Pronoun X = Postverbal)

34 Conditional Probability: an example X: Y: Pronoun Not Pronoun Object Preverbal Object Postverbal P(X) Preverbal Postverbal P(Y) Pronoun Not Pronoun How do we calculate the following? P(Y = Pronoun X = Postverbal)

35 Conditional Probability: an example X: Y: Pronoun Not Pronoun Object Preverbal Object Postverbal P(X) Preverbal Postverbal P(Y) Pronoun Not Pronoun How do we calculate the following? P(X = Postverbal,Y = Pronoun) P(Y = Pronoun X = Postverbal) = P(X = Postverbal) = = /38

36 The chain rule A joint probability can be rewritten as the product of marginal and conditional probabilities: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) 16/38

37 The chain rule A joint probability can be rewritten as the product of marginal and conditional probabilities: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) And this generalizes to more than two variables: 16/38

38 The chain rule A joint probability can be rewritten as the product of marginal and conditional probabilities: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) And this generalizes to more than two variables: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 )

39 The chain rule A joint probability can be rewritten as the product of marginal and conditional probabilities: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) And this generalizes to more than two variables: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) P(E 1,E 2,E 3 ) = P(E 3 E 1,E 2 )P(E 2 E 1 )P(E 1 )

40 The chain rule A joint probability can be rewritten as the product of marginal and conditional probabilities: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) And this generalizes to more than two variables: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) P(E 1,E 2,E 3 ) = P(E 3 E 1,E 2 )P(E 2 E 1 )P(E 1 )..

41 The chain rule A joint probability can be rewritten as the product of marginal and conditional probabilities: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) And this generalizes to more than two variables: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) P(E 1,E 2,E 3 ) = P(E 3 E 1,E 2 )P(E 2 E 1 )P(E 1 )..

42 The chain rule A joint probability can be rewritten as the product of marginal and conditional probabilities: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) And this generalizes to more than two variables: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) P(E 1,E 2,E 3 ) = P(E 3 E 1,E 2 )P(E 2 E 1 )P(E 1 ).. P(E 1,E 2,...,E n ) = P(E n E 1,E 2,...,E n 1 )...P(E 2 E 1 )P(E 1 ) 16/38

43 The chain rule A joint probability can be rewritten as the product of marginal and conditional probabilities: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) And this generalizes to more than two variables: P(E 1,E 2 ) = P(E 2 E 1 )P(E 1 ) P(E 1,E 2,E 3 ) = P(E 3 E 1,E 2 )P(E 2 E 1 )P(E 1 ).. P(E 1,E 2,...,E n ) = P(E n E 1,E 2,...,E n 1 )...P(E 2 E 1 )P(E 1 ) Breaking a joint probability down into the product a marginal probability and several joint probabilities this way is called chain rule decomposition. 16/38

44 Bayes Rule (Bayes Theorem) P(A B) = P(B A)P(A) P(B) 17/38

45 Bayes Rule (Bayes Theorem) P(A B) = P(B A)P(A) P(B) With extra background random variables I: P(A B,I) = P(B A,I)P(A I) P(B I) 17/38

46 Bayes Rule (Bayes Theorem) P(A B) = P(B A)P(A) P(B) With extra background random variables I: P(A B,I) = P(B A,I)P(A I) P(B I) This theorem follows directly from def n of conditional probability: P(A, B) = P(B A)P(A) 17/38

47 Bayes Rule (Bayes Theorem) P(A B) = P(B A)P(A) P(B) With extra background random variables I: P(A B,I) = P(B A,I)P(A I) P(B I) This theorem follows directly from def n of conditional probability: P(A, B) = P(B A)P(A) P(A, B) = P(A B)P(B) 17/38

48 Bayes Rule (Bayes Theorem) P(A B) = P(B A)P(A) P(B) With extra background random variables I: P(A B,I) = P(B A,I)P(A I) P(B I) This theorem follows directly from def n of conditional probability: P(A, B) = P(B A)P(A) P(A, B) = P(A B)P(B) So P(A B)P(B) = P(B A)P(A) 17/38

49 Bayes Rule (Bayes Theorem) P(A B) = P(B A)P(A) P(B) With extra background random variables I: P(A B,I) = P(B A,I)P(A I) P(B I) This theorem follows directly from def n of conditional probability: So P(A, B) = P(B A)P(A) P(A, B) = P(A B)P(B) P(A B)P(B) = P(B A)P(A) P(A B)P(B) P(B) = P(B A)P(A) P(B) 17/38

50 Bayes Rule (Bayes Theorem) P(A B) = P(B A)P(A) P(B) With extra background random variables I: P(A B,I) = P(B A,I)P(A I) P(B I) This theorem follows directly from def n of conditional probability: So P(A, B) = P(B A)P(A) P(A, B) = P(A B)P(B) P(A B)P(B) = P(B A)P(A) P(A B)P(B) P(B) = P(B A)P(A) P(B) 17/38

51 Bayes Rule, more closely inspected Posterior {}}{ P(A B) = Likelihood Prior {}}{{}}{ P(B A) P(A) P(B) }{{} Normalizing constant 18/38

52 Bayes Rule in action Let me give you the same information you had before: P(Y = Pronoun) = P(X = Preverbal Y = Pronoun) = P(X = Preverbal Y = Not Pronoun ) = /38

53 Bayes Rule in action Let me give you the same information you had before: P(Y = Pronoun) = P(X = Preverbal Y = Pronoun) = P(X = Preverbal Y = Not Pronoun ) = Imagine you re an incremental sentence processor. You encounter a transitive verb but haven t encountered the object yet. Inference under uncertainty: How likely is it that the object is a pronoun? 19/38

54 Bayes Rule in Action P(Y = Pronoun) = P(X = Preverbal Y = Pronoun) = P(X = Preverbal Y = Not Pronoun ) = P(Y = Pron X = PostV) 20/38

55 Bayes Rule in Action P(Y = Pronoun) = P(X = Preverbal Y = Pronoun) = P(X = Preverbal Y = Not Pronoun ) = P(Y = Pron X = PostV) = P(X = PostV Y = Pron)P(Y = Pron) P(X = PostV) 20/38

56 Bayes Rule in Action P(Y = Pronoun) = P(X = Preverbal Y = Pronoun) = P(X = Preverbal Y = Not Pronoun ) = P(X = PostV Y = Pron)P(Y = Pron) P(Y = Pron X = PostV) = P(X = PostV) P(X = PostV Y = Pron)P(Y = Pron) = P(X = PostV,Y = y) y 20/38

57 Bayes Rule in Action P(Y = Pronoun) = P(X = Preverbal Y = Pronoun) = P(X = Preverbal Y = Not Pronoun ) = P(X = PostV Y = Pron)P(Y = Pron) P(Y = Pron X = PostV) = P(X = PostV) P(X = PostV Y = Pron)P(Y = Pron) = P(X = PostV,Y = y) y P(X = PostV Y = Pron)P(Y = Pron) = P(X = PostV Y = y)p(y = y) y 20/38

58 Bayes Rule in Action P(Y = Pronoun) = P(X = Preverbal Y = Pronoun) = P(X = Preverbal Y = Not Pronoun ) = P(X = PostV Y = Pron)P(Y = Pron) P(Y = Pron X = PostV) = P(X = PostV) P(X = PostV Y = Pron)P(Y = Pron) = P(X = PostV,Y = y) y P(X = PostV Y = Pron)P(Y = Pron) = P(X = PostV Y = y)p(y = y) y P(X = PostV Y = Pron)P(Y = Pron) = P(PostV Pron)P(Pron)+ P(PostV NotPron)P(NotPron) 20/38

59 Bayes Rule in Action P(Y = Pronoun) = P(X = Preverbal Y = Pronoun) = P(X = Preverbal Y = Not Pronoun ) = P(X = PostV Y = Pron)P(Y = Pron) P(Y = Pron X = PostV) = P(X = PostV) P(X = PostV Y = Pron)P(Y = Pron) = P(X = PostV,Y = y) y P(X = PostV Y = Pron)P(Y = Pron) = P(X = PostV Y = y)p(y = y) y P(X = PostV Y = Pron)P(Y = Pron) = = P(PostV Pron)P(Pron)+ P(PostV NotPron)P(NotPron) ( ) ( ) ( ) ( ) 20/38

60 Bayes Rule in Action P(Y = Pronoun) = P(X = Preverbal Y = Pronoun) = P(X = Preverbal Y = Not Pronoun ) = P(X = PostV Y = Pron)P(Y = Pron) P(Y = Pron X = PostV) = P(X = PostV) P(X = PostV Y = Pron)P(Y = Pron) = P(X = PostV,Y = y) y P(X = PostV Y = Pron)P(Y = Pron) = P(X = PostV Y = y)p(y = y) y P(X = PostV Y = Pron)P(Y = Pron) = P(PostV Pron)P(Pron)+ P(PostV NotPron)P(NotPron) ( ) = ( ) ( ) ( ) = /38

61 Other ways of writing Bayes Rule Likelihood Prior {}}{{}}{ P(B A) P(A) P(A B) = P(B) }{{} Normalizing constant The hardest part of using Bayes Rule was calculating the normalizing constant (a.k.a. the partition function) 21/38

62 Other ways of writing Bayes Rule Likelihood Prior {}}{{}}{ P(B A) P(A) P(A B) = P(B) }{{} Normalizing constant The hardest part of using Bayes Rule was calculating the normalizing constant (a.k.a. the partition function) Hence there are often two other ways we write Bayes Rule: 21/38

63 Other ways of writing Bayes Rule Likelihood Prior {}}{{}}{ P(B A) P(A) P(A B) = P(B) }{{} Normalizing constant The hardest part of using Bayes Rule was calculating the normalizing constant (a.k.a. the partition function) Hence there are often two other ways we write Bayes Rule: 1. Emphasizing explicit marginalization: P(A B) = P(B A)P(A) P(A = a,b) a 21/38

64 Other ways of writing Bayes Rule Likelihood Prior {}}{{}}{ P(B A) P(A) P(A B) = P(B) }{{} Normalizing constant The hardest part of using Bayes Rule was calculating the normalizing constant (a.k.a. the partition function) Hence there are often two other ways we write Bayes Rule: 1. Emphasizing explicit marginalization: 2. Ignoring the partition function: P(A B) = P(B A)P(A) P(A = a,b) a P(A B) P(B A)P(A) 21/38

65 (Conditional) Independence Events A and B are said to be Conditionally Independent given information C if P(A,B C) = P(A C)P(B C) Conditional independence of A and B given C is often expressed as A B C 22/38

66 Directed graphical models A lot of the interesting joint probability distributions in the study of language involve conditional independencies among the variables 23/38

67 Directed graphical models A lot of the interesting joint probability distributions in the study of language involve conditional independencies among the variables So next we ll introduce you to a general framework for specifying conditional independencies among collections of random variables 23/38

68 Directed graphical models A lot of the interesting joint probability distributions in the study of language involve conditional independencies among the variables So next we ll introduce you to a general framework for specifying conditional independencies among collections of random variables It won t allow us to express all possible independencies that may hold, but it goes a long way 23/38

69 Directed graphical models A lot of the interesting joint probability distributions in the study of language involve conditional independencies among the variables So next we ll introduce you to a general framework for specifying conditional independencies among collections of random variables It won t allow us to express all possible independencies that may hold, but it goes a long way And I hope that you ll agree that the framework is intuitive too! 23/38

70 A non-linguistic example Imagine a factory that produces three types of coins in equal volumes: 24/38

71 A non-linguistic example Imagine a factory that produces three types of coins in equal volumes: Fair coins; 24/38

72 A non-linguistic example Imagine a factory that produces three types of coins in equal volumes: Fair coins; 2-headed coins; 24/38

73 A non-linguistic example Imagine a factory that produces three types of coins in equal volumes: Fair coins; 2-headed coins; 2-tailed coins. 24/38

74 A non-linguistic example Imagine a factory that produces three types of coins in equal volumes: Fair coins; 2-headed coins; 2-tailed coins. Generative process: 24/38

75 A non-linguistic example Imagine a factory that produces three types of coins in equal volumes: Fair coins; 2-headed coins; 2-tailed coins. Generative process: The factory produces a coin of type X and sends it to you; 24/38

76 A non-linguistic example Imagine a factory that produces three types of coins in equal volumes: Fair coins; 2-headed coins; 2-tailed coins. Generative process: The factory produces a coin of type X and sends it to you; You receive the coin and flip it twice, with H(eads)/T(ails) outcomes Y 1 and Y 2 24/38

77 A non-linguistic example Imagine a factory that produces three types of coins in equal volumes: Fair coins; 2-headed coins; 2-tailed coins. Generative process: The factory produces a coin of type X and sends it to you; You receive the coin and flip it twice, with H(eads)/T(ails) outcomes Y 1 and Y 2 Receiving a coin from the factory and flipping it twice is sampling (or taking a sample) from the joint distribution P(X,Y 1,Y 2 ) 24/38

78 This generative process a Bayes Net The directed acyclic graphical model (DAG), or Bayes net: X Y 1 Y 2 Semantics of a Bayes net: the joint distribution can be expressed as the product of the conditional distributions of each variable given only its parents 25/38

79 This generative process a Bayes Net The directed acyclic graphical model (DAG), or Bayes net: X Y 1 Y 2 Semantics of a Bayes net: the joint distribution can be expressed as the product of the conditional distributions of each variable given only its parents In this DAG, P(X,Y 1,Y 2 ) = P(X)P(Y 1 X)P(Y 2 X) 25/38

80 This generative process a Bayes Net The directed acyclic graphical model (DAG), or Bayes net: X Y 1 Y 2 Semantics of a Bayes net: the joint distribution can be expressed as the product of the conditional distributions of each variable given only its parents In this DAG, P(X,Y 1,Y 2 ) = P(X)P(Y 1 X)P(Y 2 X) X P(X) 1 Fair 3 2-H T 3 25/38

81 This generative process a Bayes Net The directed acyclic graphical model (DAG), or Bayes net: X Y 1 Y 2 Semantics of a Bayes net: the joint distribution can be expressed as the product of the conditional distributions of each variable given only its parents In this DAG, P(X,Y 1,Y 2 ) = P(X)P(Y 1 X)P(Y 2 X) X P(X) 1 Fair 3 2-H T 3 X P(Y 1 = H X) P(Y 1 = T X) 1 1 Fair H T /38

82 This generative process a Bayes Net The directed acyclic graphical model (DAG), or Bayes net: X Y 1 Y 2 Semantics of a Bayes net: the joint distribution can be expressed as the product of the conditional distributions of each variable given only its parents In this DAG, P(X,Y 1,Y 2 ) = P(X)P(Y 1 X)P(Y 2 X) X P(X) 1 Fair 3 2-H T 3 X P(Y 1 = H X) P(Y 1 = T X) 1 1 Fair H T 0 1 X P(Y 2 = H X) P(Y 2 = T X) 1 1 Fair H T /38

83 Conditional independence in Bayes nets X P(X) 1 Fair 3 2-H T 3 X P(Y 1 = H X) P(Y 1 = T X) 1 1 Fair H T 0 1 X P(Y 2 = H X) P(Y 2 = T X) 1 1 Fair H T 0 1 Question: Conditioned on not having any further information, are the two coin flips Y 1 and Y 2 in this generative process independent? 26/38

84 Conditional independence in Bayes nets X P(X) 1 Fair 3 2-H T 3 X P(Y 1 = H X) P(Y 1 = T X) 1 1 Fair H T 0 1 X P(Y 2 = H X) P(Y 2 = T X) 1 1 Fair H T 0 1 Question: Conditioned on not having any further information, are the two coin flips Y 1 and Y 2 in this generative process independent? That is, if C = {}, is it the case that A B C? 26/38

85 Conditional independence in Bayes nets X P(X) 1 Fair 3 2-H T 3 X P(Y 1 = H X) P(Y 1 = T X) 1 1 Fair H T 0 1 X P(Y 2 = H X) P(Y 2 = T X) 1 1 Fair H T 0 1 Question: Conditioned on not having any further information, are the two coin flips Y 1 and Y 2 in this generative process independent? That is, if C = {}, is it the case that A B C? No! 26/38

86 Conditional independence in Bayes nets X P(X) 1 Fair 3 2-H T 3 X P(Y 1 = H X) P(Y 1 = T X) 1 1 Fair H T 0 1 X P(Y 2 = H X) P(Y 2 = T X) 1 1 Fair H T 0 1 Question: Conditioned on not having any further information, are the two coin flips Y 1 and Y 2 in this generative process independent? That is, if C = {}, is it the case that A B C? No! P(Y2 = H) = 1 2 (you can see this by symmetry) 26/38

87 Conditional independence in Bayes nets X P(X) 1 Fair 3 2-H T 3 X P(Y 1 = H X) P(Y 1 = T X) 1 1 Fair H T 0 1 X P(Y 2 = H X) P(Y 2 = T X) 1 1 Fair H T 0 1 Question: Conditioned on not having any further information, are the two coin flips Y 1 and Y 2 in this generative process independent? That is, if C = {}, is it the case that A B C? No! P(Y2 = H) = 1 2 (you can see this by symmetry) Coin was fair Coin was 2-H {}}{ 1 But P(Y2 = H Y 1 = H) = 3 1 {}}{ = /38

88 Formally assessing conditional independence in Bayes Nets The comprehensive criterion for assessing conditional independence is known as D-separation. 27/38

89 Formally assessing conditional independence in Bayes Nets The comprehensive criterion for assessing conditional independence is known as D-separation. A path between two disjoint node sets A and B is a sequence of edges connecting some node in A with some node in B 27/38

90 Formally assessing conditional independence in Bayes Nets The comprehensive criterion for assessing conditional independence is known as D-separation. A path between two disjoint node sets A and B is a sequence of edges connecting some node in A with some node in B Any node on a given path has converging arrows if two edges on the path connect to it and point to it. 27/38

91 Formally assessing conditional independence in Bayes Nets The comprehensive criterion for assessing conditional independence is known as D-separation. A path between two disjoint node sets A and B is a sequence of edges connecting some node in A with some node in B Any node on a given path has converging arrows if two edges on the path connect to it and point to it. A node on the path has non-converging arrows if two edges on the path connect to it, but at least one does not point to it. 27/38

92 Formally assessing conditional independence in Bayes Nets The comprehensive criterion for assessing conditional independence is known as D-separation. A path between two disjoint node sets A and B is a sequence of edges connecting some node in A with some node in B Any node on a given path has converging arrows if two edges on the path connect to it and point to it. A node on the path has non-converging arrows if two edges on the path connect to it, but at least one does not point to it. A third disjoint node set C d-separates A and B if for every path between A and B, either: 1. there is some node on the path with converging arrows which is not in C; or 2. there is some node on the path whose arrows do not converge and which is in C. 27/38

93 Major types of d-separation C d-separates A and B if for every path between A and B, either: 1. there is some node on the path with converging arrows which is not in C; or 2. there is some node on the path whose arrows do not converge and which is in C. Commoncause d- separation C Intervening d- separation A A Explaining away: no d- separation B D- separation in the absence of knowledge of C A B B A B C C C 28/38

94 Back to our example X Y 1 Y 2 29/38

95 Back to our example X Y 1 Y 2 Without looking at the coin before flipping it, the outcome Y 1 of the first flip gives me information about the type of coin, and affects my beliefs about the outcome of Y 2 29/38

96 Back to our example X Y 1 Y 2 Without looking at the coin before flipping it, the outcome Y 1 of the first flip gives me information about the type of coin, and affects my beliefs about the outcome of Y 2 But if I look at the coin before flipping it, Y 1 and Y 2 are rendered independent 29/38

97 An example of explaining away I saw an exhibition about the, uh... 30/38

98 An example of explaining away I saw an exhibition about the, uh... There are several causes of disfluency, including: 30/38

99 An example of explaining away I saw an exhibition about the, uh... There are several causes of disfluency, including: An upcoming word is difficult to produce (e.g., low frequency, astrolabe) 30/38

100 An example of explaining away I saw an exhibition about the, uh... There are several causes of disfluency, including: An upcoming word is difficult to produce (e.g., low frequency, astrolabe) The speaker s attention was distracted by something in the non-linguistic environment 30/38

101 An example of explaining away I saw an exhibition about the, uh... There are several causes of disfluency, including: An upcoming word is difficult to produce (e.g., low frequency, astrolabe) The speaker s attention was distracted by something in the non-linguistic environment 30/38

102 An example of explaining away I saw an exhibition about the, uh... There are several causes of disfluency, including: An upcoming word is difficult to produce (e.g., low frequency, astrolabe) The speaker s attention was distracted by something in the non-linguistic environment A reasonable graphical model: W: hard word? A: attention distracted? D: disfluency? 30/38

103 An example of explaining away W: hard word? A: attention distracted? D: disfluency? Without knowledge of D, there s no reason to expect that W and A are correlated 31/38

104 An example of explaining away W: hard word? A: attention distracted? D: disfluency? Without knowledge of D, there s no reason to expect that W and A are correlated But hearing a disfluency demands a cause 31/38

105 An example of explaining away W: hard word? A: attention distracted? D: disfluency? Without knowledge of D, there s no reason to expect that W and A are correlated But hearing a disfluency demands a cause Knowing that there was a distraction explains away the disfluency, reducing the probability that the speaker was planning to utter a hard word 31/38

106 An example of the disfluency model W: hard word? A: attention distracted? Let s suppose that both hard words and distractions are unusual, the latter more so P(W = hard) = 0.25 P(A = distracted) = 0.15 D: disfluency? 32/38

107 An example of the disfluency model W: hard word? A: attention distracted? Let s suppose that both hard words and distractions are unusual, the latter more so P(W = hard) = 0.25 P(A = distracted) = 0.15 D: disfluency? Hard words and distractions both induce disfluencies; having both makes a disfluency really likely W A D=no disfluency D=disfluency easy undistracted easy distracted hard undistracted hard distracted /38

108 An example of the disfluency model W: hard word? A: attention distracted? P(W = hard) = 0.25 P(A = distracted) = 0.15 D: disfluency? W A D=no disfluency D=disfluency easy undistracted easy distracted hard undistracted hard distracted Suppose that we observe the speaker uttering a disfluency. What is P(W = hard D = disfluent)? 33/38

109 An example of the disfluency model W: hard word? A: attention distracted? P(W = hard) = 0.25 P(A = distracted) = 0.15 D: disfluency? W A D=no disfluency D=disfluency easy undistracted easy distracted hard undistracted hard distracted Suppose that we observe the speaker uttering a disfluency. What is P(W = hard D = disfluent)? Now suppose we also learn that her attention is distracted. What does that do to our beliefs about W 33/38

110 An example of the disfluency model W: hard word? A: attention distracted? P(W = hard) = 0.25 P(A = distracted) = 0.15 D: disfluency? W A D=no disfluency D=disfluency easy undistracted easy distracted hard undistracted hard distracted Suppose that we observe the speaker uttering a disfluency. What is P(W = hard D = disfluent)? Now suppose we also learn that her attention is distracted. What does that do to our beliefs about W That is, what is P(W = hard D = disfluent,a = distracted)? 33/38

111 An example of the disfluency model Fortunately, there is automated machinery to turn the Bayesian crank : P(W = hard) = /38

112 An example of the disfluency model Fortunately, there is automated machinery to turn the Bayesian crank : P(W = hard) = 0.25 P(W = hard D = disfluent) = /38

113 An example of the disfluency model Fortunately, there is automated machinery to turn the Bayesian crank : P(W = hard) = 0.25 P(W = hard D = disfluent) = 0.57 P(W = hard D = disfluent,a = distracted) = /38

114 An example of the disfluency model Fortunately, there is automated machinery to turn the Bayesian crank : P(W = hard) = 0.25 P(W = hard D = disfluent) = 0.57 P(W = hard D = disfluent,a = distracted) = 0.40 Knowing that the speaker was distracted (A) decreased the probability that the speaker was about to utter a hard word (W) A explained D away. 34/38

115 An example of the disfluency model Fortunately, there is automated machinery to turn the Bayesian crank : P(W = hard) = 0.25 P(W = hard D = disfluent) = 0.57 P(W = hard D = disfluent,a = distracted) = 0.40 Knowing that the speaker was distracted (A) decreased the probability that the speaker was about to utter a hard word (W) A explained D away. A caveat: the type of relationship among A, W, and D will depend on the values one finds in the probability table! P(W) P(A) P(D W,A) 34/38

116 Summary thus far Key points: Bayes Rule is a compelling framework for modeling inference under uncertainty DAGs/Bayes Nets are a broad class of models for specifying joint probability distributions with conditional independencies Classic Bayes Net references: Pearl (1988, 2000); Jordan (1998); Russell and Norvig (2003, Chapter 14); Bishop (2006, Chapter 8). 35/38

117 References I Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. Jordan, M. I., editor (1998). Learning in Graphical Models. Cambridge, MA: MIT Press. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 2 edition. Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge. Russell, S. and Norvig, P. (2003). Artificial Intelligence: a Modern Approach. Prentice Hall, second edition. 36/38

118 An example of the disfluency model P(W = hard D = disfluent,a = distracted) hard W=hard easy W=easy disfl D=disfluent distr A=distracted undistr A=undistracted P(hard disfl,distr) = P(disfl hard,distr)p(hard distr) P(disfl distr) = P(disfl hard,distr)p(hard) P(disfl distr) P(disfl distr) = w P(disfl W = w )P(W = w ) (Bayes Rule) (Independence from the DAG) (Marginalization) = P(disfl hard)p(hard) + P(disfl easy)p(easy) = = P(hard disfl,distr) = = /38

119 An example of the disfluency model P(W = hard D = disfluent) P(hard disfl) = P(disfl hard)p(hard) P(disfl) (Bayes Rule) P(disfl hard) = a P(disfl A = a,hard)p(a = a hard) = P(disfl A = distr, hard)p(a = distr hard) + P(disfl undistr, hard)p(undistr hard) = = P(disfl) = P(disfl W = w )P(W = w ) w = P(disfl hard)p(hard) + P(disfl easy)p(easy) P(disfl easy) = a P(disfl A = a,easy)p(a = a easy) = P(disfl A = distr, easy)p(a = distr easy) + P(disfl undistr, easy)p(undistr easy) = = P(disfl) = = P(hard disfl) = = /38

120 Sound categorization our first computational psycholinguistic problem hear an acoustic signal, recover a sound category our example: distinguishing two similar sound categories, a voicing contrast between a pair of stops: /b/ vs. /p/ or /d/ vs. /t/ 1

121 Sound categorization voice onset time (VOT) primary cue distinguishing voiced and voiceless stops 2 (Chen, 1980)

122 Sound categorization identification curve (for /d/ vs. /t/) I z g 80. P b OK2 ONSET TIME o.%s, of stimulus and sentence context (Connine b et al., 1991) How do people do this? 3

123 Bayesian sound categorization Generative model c ~ discrete choice, e.g., p(p) = p(b) = 0.5 S c ~ [some distribution] Bayesian inference p(c S) = p(s c)p(c) p(s) 4 c! category S! sound value prior p(c): probability of each category overall (in first step of generative model) likelihood p(s c): [some distribution] = X p(s c)p(c) c 0 p(s c 0 )p(c 0 )

124 Plan some high level considerations in building cognitive models probability in continuous spaces and the Gaussian distribution deriving and testing a probabilistic model of sound categorization a closely related model of the perceptual magnet effect 5

125 Marr's levels of analysis Three levels of computational models (Marr, 1982) computational level what is the structure of the information processing problem? what are the inputs? what are the outputs? what information is relevant to solving the problem? algorithmic level what representations and algorithms are used? implementational level how are the representations and algorithms implemented neurally? levels are mutually constraining, and each necessary to fully understand 6

126 Rational analysis How to perform rational analysis (Anderson, 1990) background: organism behavior is optimized for common problems both by evolution and by learning step 1: specify a formal model of the problem to be solved and the agent's goals make as few assumptions about computational limitations as possible step 2: derive optimal behavior given problem and goals step 3: compare optimal behavior to agent behavior step 4: if predictions are off, revisit assumptions about limitations and iterate 7

127 Bayesian sound categorization Generative model c ~ discrete choice, e.g., p(p) = p(b) = 0.5 S c ~ [some distribution] Bayesian inference p(c S) = p(s c)p(c) p(s) 8 c! category S! sound (VOT) prior p(c): probability of each category overall (in first step of generative model) likelihood p(s c): [some distribution] = X p(s c)p(c) c 0 p(s c 0 )p(c 0 )

128 Continuous probability we can't just assign every VOT outcome a probability there are uncountably many possible outcomes (e.g., 60.1, 60.01, , ) instead, we use a probability density function that assigns each outcome a non-negative density actual probability is now an integral of the density function (area under the curve) properness requires that p(x) dx =1 9

129 Continuous probability a common continuous distribution: the Gaussian aka the normal probability density F2 (Hz) 10

130 Continuous probability a common continuous distribution: the Gaussian aka the normal probability density F2 (Hz) 11

131 Continuous probability a common continuous distribution: the Gaussian aka the normal probability density F2 (Hz) 12

132 Continuous probability a common continuous distribution: the Gaussian aka the normal probability density F2 (Hz) 13

133 Continuous probability a common continuous distribution: the Gaussian aka the normal 2.0 probability density VOT (ms) 14

134 Continuous probability a common continuous distribution: the Gaussian aka the normal 2.0 probability density VOT (ms) 15

135 Gaussian parameters Normal(μ, σ 2 ) = N(μ, σ 2 ) has two parameters most probability distributions are properly families of distributions, indexed by parameters e.g., N(μ = 10, σ 2 = 10) vs. N(μ = 20, σ 2 = 5) formal definition of Gaussian probability density function: p(x) = [ 1 exp (x µ)2 2πσ 2 2σ 2 ] 16

136 Gaussian parameters Mean = Expected value = Expectation = μ Formal definition E(X) = Z +1 1 intuitively: the center of mass (here: 0, 50) 0.05 xp(x)dx probability density x 17

137 Gaussian parameters Variance = Var = σ 2 Formal definition Var(X )=E[(X E(X )) 2 ] equivalent alternative definition Var(X )=E[X 2 ] E[X ] 2 intuitively: how broadly are outcomes dispersed: here (25, 100) 0.08 probability density x

138 Gaussian parameters Putting both parameters together p(x) µ=0, σ 2 =1 µ=0, σ 2 =2 µ=0, σ 2 =0.5 µ=2, σ 2 = x 19

139 Bayesian sound categorization modeling ideal speech sound categorization which Gaussian category did sound come from? Probability density /b/ /p/ VOT

140 Bayesian sound categorization Generative model c ~ discrete choice, e.g., p(p) = p(b) = 0.5 S c ~ Gaussian(μc, σ 2 c) Bayesian inference p(c S) = p(s c)p(c) p(s) 21 c! category S! sound value prior p(c): probability of each category overall (in first step of generative model) likelihood p(s c): Gaussian(μc, σ 2 c) = X p(s c)p(c) c 0 p(s c 0 )p(c 0 )

141 Bayesian sound categorization Concrete parameters c ~ discrete choice, p(p) = p(b) = 0.5 S c ~ normal, μb=0, μp=100; σb=20, σp=20 Concrete example p(c S) = p(s c)p(c) p(s) = p(s c)p(c) X c 0 p(s c 0 )p(c 0 ) p(b 60) = p(60 b)p(b) / [p(60 b)p(b) + p(60 p)p(p)] =.0002(.5) / [.0002(.5) (.5)].08 p(x) = 22 c! category S! [ 1 exp 2πσ 2 sound value (x µ)2 2σ 2 ]

142 Bayesian sound categorization Categorization function which (Gaussian) category did sound come from? Probability density /b/ /p/ VOT 23 Posterior probability of /b/ VOT

143 Bayesian sound categorization Categorization function ideal categorization function slope changes with category variance Probability density [b] [p] Posterior probability of /b/ VOT VOT 24

144 Bayesian sound categorization Clayards et al. (2008) tested exactly this prediction trained participants with Gaussian categories of two variances then tested categorization Posterior probability of /b/ Proportion response /b/ VOT VOT

145 Bayesian sound categorization Wrapping up categorization assumed knowledge of categories (which were Gaussian distributions) found the exact posterior probability that a sound belongs to each of two categories with a simple application of Bayes' rule confirmed Bayesian model prediction that categorization function should become less steep as variance is larger let's move on to a more complex situation 26

146 Bayesian sound categorization A more complex situation: the perceptual magnet effect empirical work by Kuhl and colleages [Kuhl et al., 1992; Iverson & Kuhl, 1995] modeling work we discuss is from Feldman and colleagues [Feldman & Griffths, 2007; Feldman et al., 2009] 27

147 28

148 Perceptual Magnet Effect /i/ /ε/ (Iverson & Kuhl, 1995)

149 Perceptual Magnet Effect Actual+S.muli:+ Perceived+S.muli:+ To account for this, we need a new generative model for speech perception (Iverson & Kuhl, 1995)

150 Speech Perception

151 Speech Perception c Speaker chooses a phonetic category

152 Speech Perception c Speaker chooses a phonetic category T Speaker articulates a target production

153 Speech Perception Noise in the speech signal c Speaker chooses a phonetic category T Speaker articulates a target production

154 Speech Perception Listener hears a speech sound S Noise in the speech signal c Speaker chooses a phonetic category T Speaker articulates a target production

155 Speech Perception Listener hears a speech sound S Noise in the speech signal Inferring an acoustic value: Compute p(t S) T Speaker articulates a target production c Speaker chooses a phonetic category

156 Statistical Model c Choose a category c with probability p(c)

157 Statistical Model c Choose a category c with probability p(c) T Articulate a target production T with probability p(t c) p(t c) =N(µ c, 2 c )

158 Statistical Model c Choose a category c with probability p(c) T S Articulate a target production T with probability p(t c) p(t c) =N(µ c, 2 c ) Listener hears speech sound S with probability p(s T) p(s T )=N(T, 2 S)

159 Statistical Model Phonetic Category c ( ) N µ c,σ c 2 Target Production T Speech Signal Noise ( ) N T,σ S 2 Speech Sound S

160 Statistical Model Phonetic Category c ( ) N µ c,σ c 2 Target Production T Speech Signal Noise ( ) N T,σ S 2 Speech Sound S

161 Statistical Model Phonetic Category c N ( 2 µ c,σ ) c? Speech Signal Noise 2 N( T,σ ) S Speech Sound S

162 Statistical Model Prior, p(h) Phonetic Category c N ( 2 µ c,σ ) c? Hypotheses, h Data, d Speech Sound Speech Signal Noise S ( ) N T,σ S 2 Likelihood, p(d h)

163 Bayes for Speech Perception Listeners must infer the target production based on the speech sound they hear and their prior knowledge of phonetic categories Data (d): speech sound S Hypotheses (h): target productions T Prior (p(h)): phonetic category structure p(t c) Likelihood (p(d h)): speech signal noise p(s T) p ( h d) p( d h) p( h)

164 Bayes for Speech Perception Prior Likelihood Speech Sound S

165 Bayes for Speech Perception Prior Likelihood Posterior Speech Sound S

166 Bayes for Speech Perception Prior Likelihood Posterior E[ T S,c ]= σ 2 S+ σ 2 µ c S c σ 2 + σ 2 c S

167 Perceptual Warping

168 Multiple Categories Want to compute p(t S) Marginalize over categories p(t S) = p(t S,c)p(c S) c

169 Multiple Categories Want to compute p(t S) Marginalize over categories p(t S) = p(t S,c)p(c S) c solution for a single probability of category category membership

170 Multiple Categories Speech Sound S

171 Multiple Categories Speech Sound S

172 Multiple Categories E[ T S,c ]= σ 2 S+ σ 2 µ c S c σ 2 + σ 2 c S

173 Multiple Categories E[ T S,c ]= σ 2 S+ σ 2 µ c S c σ 2 + σ 2 c S

174 Multiple Categories E[ T S]= c σ c 2 S+ σ S 2 µ c σ c 2 + σ S 2 p( c S)

175 Perceptual Warping

176 Perceptual Warping To compare model to humans we have a 13-step continuum estimate perceptual distance between each adjacent pair in humans and model

177 Modeling the /i/- /e/ Data Relative Distances Between Neighboring Stimuli MDS Model Perceptual Distance Stimulus Number

178 Bayesian sound categorization Conclusions continuous probability theory lets us build ideal models of speech perception part 1: can build a principled model of categorization, which fits human data well e.g., categorization less steep for high variance categories part 2: can predict how linguistic category structure warps perceptual space speech sounds are perceived as being closer to the center of their likely category 59

Advanced Probabilistic Modeling in R Day 1

Advanced Probabilistic Modeling in R Day 1 Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes

More information

Probability & statistics for linguists Class 2: more probability. D. Lassiter (h/t: R. Levy)

Probability & statistics for linguists Class 2: more probability. D. Lassiter (h/t: R. Levy) Probability & statistics for linguists Class 2: more probability D. Lassiter (h/t: R. Levy) conditional probability P (A B) = when in doubt about meaning: draw pictures. P (A \ B) P (B) keep B- consistent

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)

More information

Latent Variable Models Probabilistic Models in the Study of Language Day 4

Latent Variable Models Probabilistic Models in the Study of Language Day 4 Latent Variable Models Probabilistic Models in the Study of Language Day 4 Roger Levy UC San Diego Department of Linguistics Preamble: plate notation for graphical models Here is the kind of hierarchical

More information

1 What are probabilities? 2 Sample Spaces. 3 Events and probability spaces

1 What are probabilities? 2 Sample Spaces. 3 Events and probability spaces 1 What are probabilities? There are two basic schools of thought as to the philosophical status of probabilities. One school of thought, the frequentist school, considers the probability of an event to

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Intelligent Systems I

Intelligent Systems I Intelligent Systems I 00 INTRODUCTION Stefan Harmeling & Philipp Hennig 24. October 2013 Max Planck Institute for Intelligent Systems Dptmt. of Empirical Inference Which Card? Opening Experiment Which

More information

Introduction to Stochastic Processes

Introduction to Stochastic Processes Stat251/551 (Spring 2017) Stochastic Processes Lecture: 1 Introduction to Stochastic Processes Lecturer: Sahand Negahban Scribe: Sahand Negahban 1 Organization Issues We will use canvas as the course webpage.

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would

More information

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution NETWORK ANALYSIS Lourens Waldorp PROBABILITY AND GRAPHS The objective is to obtain a correspondence between the intuitive pictures (graphs) of variables of interest and the probability distributions of

More information

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals

More information

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013

Introduction to Bayes Nets. CS 486/686: Introduction to Artificial Intelligence Fall 2013 Introduction to Bayes Nets CS 486/686: Introduction to Artificial Intelligence Fall 2013 1 Introduction Review probabilistic inference, independence and conditional independence Bayesian Networks - - What

More information

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016

CS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Brief Review of Probability

Brief Review of Probability Brief Review of Probability Nuno Vasconcelos (Ken Kreutz-Delgado) ECE Department, UCSD Probability Probability theory is a mathematical language to deal with processes or experiments that are non-deterministic

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Chapter 8&9: Classification: Part 3 Instructor: Yizhou Sun yzsun@ccs.neu.edu March 12, 2013 Midterm Report Grade Distribution 90-100 10 80-89 16 70-79 8 60-69 4

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari Quantifying Uncertainty & Probabilistic Reasoning Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari Outline Previous Implementations What is Uncertainty? Acting Under Uncertainty Rational Decisions Basic

More information

Quantifying uncertainty & Bayesian networks

Quantifying uncertainty & Bayesian networks Quantifying uncertainty & Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2016 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information

7.1 What is it and why should we care?

7.1 What is it and why should we care? Chapter 7 Probability In this section, we go over some simple concepts from probability theory. We integrate these with ideas from formal language theory in the next chapter. 7.1 What is it and why should

More information

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks

Intelligent Systems: Reasoning and Recognition. Reasoning with Bayesian Networks Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 13 24 march 2017 Reasoning with Bayesian Networks Naïve Bayesian Systems...2 Example

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

CS 5522: Artificial Intelligence II

CS 5522: Artificial Intelligence II CS 5522: Artificial Intelligence II Bayes Nets Instructor: Alan Ritter Ohio State University [These slides were adapted from CS188 Intro to AI at UC Berkeley. All materials available at http://ai.berkeley.edu.]

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/32 Lecture 5a Bayesian network April 14, 2016 2/32 Table of contents 1 1. Objectives of Lecture 5a 2 2.Bayesian

More information

Linguistics 251 lecture notes, Fall 2008

Linguistics 251 lecture notes, Fall 2008 Linguistics 251 lecture notes, Fall 2008 Roger Levy December 2, 2008 Contents 1 Fundamentals of Probability 5 1.1 What are probabilities?.............................. 5 1.2 Sample Spaces...................................

More information

Directed Graphical Models

Directed Graphical Models CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

CS 188: Artificial Intelligence Fall 2009

CS 188: Artificial Intelligence Fall 2009 CS 188: Artificial Intelligence Fall 2009 Lecture 14: Bayes Nets 10/13/2009 Dan Klein UC Berkeley Announcements Assignments P3 due yesterday W2 due Thursday W1 returned in front (after lecture) Midterm

More information

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler Review: Probability BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de October 21, 2016 Today probability random variables Bayes rule expectation

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

More information

Introduction to Artificial Intelligence. Unit # 11

Introduction to Artificial Intelligence. Unit # 11 Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian

More information

Probabilistic Models

Probabilistic Models Bayes Nets 1 Probabilistic Models Models describe how (a portion of) the world works Models are always simplifications May not account for every variable May not account for all interactions between variables

More information

Computational Cognitive Science

Computational Cognitive Science Computational Cognitive Science Lecture 9: A Bayesian model of concept learning Chris Lucas School of Informatics University of Edinburgh October 16, 218 Reading Rules and Similarity in Concept Learning

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018

Bayesian networks. Soleymani. CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Bayesian networks CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Slides have been adopted from Klein and Abdeel, CS188, UC Berkeley. Outline Probability

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Probability Review. September 25, 2015

Probability Review. September 25, 2015 Probability Review September 25, 2015 We need a tool to 1) Formulate a model of some phenomenon. 2) Learn an instance of the model from data. 3) Use it to infer outputs from new inputs. Why Probability?

More information

MATH MW Elementary Probability Course Notes Part I: Models and Counting

MATH MW Elementary Probability Course Notes Part I: Models and Counting MATH 2030 3.00MW Elementary Probability Course Notes Part I: Models and Counting Tom Salisbury salt@yorku.ca York University Winter 2010 Introduction [Jan 5] Probability: the mathematics used for Statistics

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Graphical Models - Part I

Graphical Models - Part I Graphical Models - Part I Oliver Schulte - CMPT 726 Bishop PRML Ch. 8, some slides from Russell and Norvig AIMA2e Outline Probabilistic Models Bayesian Networks Markov Random Fields Inference Outline Probabilistic

More information

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004

A Brief Introduction to Graphical Models. Presenter: Yijuan Lu November 12,2004 A Brief Introduction to Graphical Models Presenter: Yijuan Lu November 12,2004 References Introduction to Graphical Models, Kevin Murphy, Technical Report, May 2001 Learning in Graphical Models, Michael

More information

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

COMP9414: Artificial Intelligence Reasoning Under Uncertainty COMP9414, Monday 16 April, 2012 Reasoning Under Uncertainty 2 COMP9414: Artificial Intelligence Reasoning Under Uncertainty Overview Problems with Logical Approach What Do the Numbers Mean? Wayne Wobcke

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Lecture 2 Probability Copyright 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 3-1 3.1 Definition Random Experiment a process leading to an uncertain

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Probabilistic Models in the Study of Language

Probabilistic Models in the Study of Language Probabilistic Models in the Study of Language Roger Levy November 6, 2012 Roger Levy Probabilistic Models in the Study of Language draft, November 6, 2012 ii Contents About the exercises ix 2 Univariate

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network.

Recall from last time. Lecture 3: Conditional independence and graph structure. Example: A Bayesian (belief) network. ecall from last time Lecture 3: onditional independence and graph structure onditional independencies implied by a belief network Independence maps (I-maps) Factorization theorem The Bayes ball algorithm

More information

CMPSCI 240: Reasoning about Uncertainty

CMPSCI 240: Reasoning about Uncertainty CMPSCI 240: Reasoning about Uncertainty Lecture 17: Representing Joint PMFs and Bayesian Networks Andrew McGregor University of Massachusetts Last Compiled: April 7, 2017 Warm Up: Joint distributions Recall

More information

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability Lecture Notes 1 Basic Probability Set Theory Elements of Probability Conditional probability Sequential Calculation of Probability Total Probability and Bayes Rule Independence Counting EE 178/278A: Basic

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

CS 188: Artificial Intelligence Fall 2008

CS 188: Artificial Intelligence Fall 2008 CS 188: Artificial Intelligence Fall 2008 Lecture 14: Bayes Nets 10/14/2008 Dan Klein UC Berkeley 1 1 Announcements Midterm 10/21! One page note sheet Review sessions Friday and Sunday (similar) OHs on

More information

Origins of Probability Theory

Origins of Probability Theory 1 16.584: INTRODUCTION Theory and Tools of Probability required to analyze and design systems subject to uncertain outcomes/unpredictability/randomness. Such systems more generally referred to as Experiments.

More information

Unit 1: Sequence Models

Unit 1: Sequence Models CS 562: Empirical Methods in Natural Language Processing Unit 1: Sequence Models Lecture 5: Probabilities and Estimations Lecture 6: Weighted Finite-State Machines Week 3 -- Sep 8 & 10, 2009 Liang Huang

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics School of Computing & Communication, UTS January, 207 Random variables Pre-university: A number is just a fixed value. When we talk about probabilities: When X is a continuous random variable, it has a

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts

More information

Our Status. We re done with Part I Search and Planning!

Our Status. We re done with Part I Search and Planning! Probability [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.] Our Status We re done with Part

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation CS 188: Artificial Intelligence Spring 2010 Lecture 15: Bayes Nets II Independence 3/9/2010 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell, Andrew Moore Current

More information

Generative Techniques: Bayes Rule and the Axioms of Probability

Generative Techniques: Bayes Rule and the Axioms of Probability Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 8 3 March 2017 Generative Techniques: Bayes Rule and the Axioms of Probability Generative

More information

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes.

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes. Chapter 2 Introduction to Probability 2.1 Probability Model Probability concerns about the chance of observing certain outcome resulting from an experiment. However, since chance is an abstraction of something

More information

Belief Update in CLG Bayesian Networks With Lazy Propagation

Belief Update in CLG Bayesian Networks With Lazy Propagation Belief Update in CLG Bayesian Networks With Lazy Propagation Anders L Madsen HUGIN Expert A/S Gasværksvej 5 9000 Aalborg, Denmark Anders.L.Madsen@hugin.com Abstract In recent years Bayesian networks (BNs)

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 16: Bayes Nets IV Inference 3/28/2011 Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015 Probability Refresher Kai Arras, University of Freiburg Winter term 2014/2015 Probability Refresher Introduction to Probability Random variables Joint distribution Marginalization Conditional probability

More information

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks.

Axioms of Probability? Notation. Bayesian Networks. Bayesian Networks. Today we ll introduce Bayesian Networks. Bayesian Networks Today we ll introduce Bayesian Networks. This material is covered in chapters 13 and 14. Chapter 13 gives basic background on probability and Chapter 14 talks about Bayesian Networks.

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability

More information

Probability Review. Chao Lan

Probability Review. Chao Lan Probability Review Chao Lan Let s start with a single random variable Random Experiment A random experiment has three elements 1. sample space Ω: set of all possible outcomes e.g.,ω={1,2,3,4,5,6} 2. event

More information

Probability, Entropy, and Inference / More About Inference

Probability, Entropy, and Inference / More About Inference Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

Machine Recognition of Sounds in Mixtures

Machine Recognition of Sounds in Mixtures Machine Recognition of Sounds in Mixtures Outline 1 2 3 4 Computational Auditory Scene Analysis Speech Recognition as Source Formation Sound Fragment Decoding Results & Conclusions Dan Ellis

More information

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27 Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27 Outline 1 Elements of probability 2 Random variables 3 Multiple

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Learning in Bayesian Networks

Learning in Bayesian Networks Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 14: Bayes Nets II Independence 3/9/2011 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan Klein, Stuart Russell, Andrew Moore Announcements

More information

Probabilistic Reasoning. (Mostly using Bayesian Networks)

Probabilistic Reasoning. (Mostly using Bayesian Networks) Probabilistic Reasoning (Mostly using Bayesian Networks) Introduction: Why probabilistic reasoning? The world is not deterministic. (Usually because information is limited.) Ways of coping with uncertainty

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

CSE 473: Artificial Intelligence Autumn 2011

CSE 473: Artificial Intelligence Autumn 2011 CSE 473: Artificial Intelligence Autumn 2011 Bayesian Networks Luke Zettlemoyer Many slides over the course adapted from either Dan Klein, Stuart Russell or Andrew Moore 1 Outline Probabilistic models

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams. Course Introduction Probabilistic Modelling and Reasoning Chris Williams School of Informatics, University of Edinburgh September 2008 Welcome Administration Handout Books Assignments Tutorials Course

More information

Probability theory. References:

Probability theory. References: Reasoning Under Uncertainty References: Probability theory Mathematical methods in artificial intelligence, Bender, Chapter 7. Expert systems: Principles and programming, g, Giarratano and Riley, pag.

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Probability. CS 3793/5233 Artificial Intelligence Probability 1

Probability. CS 3793/5233 Artificial Intelligence Probability 1 CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions

More information

Overview of Probability. Mark Schmidt September 12, 2017

Overview of Probability. Mark Schmidt September 12, 2017 Overview of Probability Mark Schmidt September 12, 2017 Dungeons & Dragons scenario: You roll dice 1: Practical Application Roll or you sneak past monster. Otherwise, you are eaten. If you survive, you

More information