Lecture 8: Inference. Kai-Wei Chang University of Virginia

Size: px
Start display at page:

Download "Lecture 8: Inference. Kai-Wei Chang University of Virginia"

Transcription

1 Lecture 8: Inference Kai-Wei Chang University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar s course on Structured Prediction Advanced ML: Inference 1

2 So far what we learned v Thinking about structures v A graph, a collection of parts that are labeled jointly, a collection of decisions A D E A D E B C F B C F G v Next: Prediction v Sets structured prediction apart from binary/multiclass G Advanced ML: Inference 2

3 The bigger picture v The goal of structured prediction: Predicting a graph v Modeling: Defining probability distributions over the random variables v Involves making independence assumptions v Inference: The computational step that actually constructs the output v Also called decoding v Learning creates functions that score predictions (e.g., learning model parameters) Advanced ML: Inference 3

4 Computational issues Data annotation difficulty Model definition What are the parts of the output? What are the inter-dependencies? How to train the model? Background knowledge about domain How to do inference? Semisupervised/indirectly supervised? Inference: deriving the probability of one or more random variables based on the model Advanced ML: Inference 4

5 Inference v What is inference? v An overview of what we have seen before v Combinatorial optimization v Different views of inference v Integer programming v Graph algorithms v Sum-product, max-sum v Heuristics for inference v LP relaxation v Sampling Advanced ML: Inference 5

6 Remember sequence prediction v Goal: Find the most probable/highest scoring state sequence v argmax y score(y) = argmax y w " φ(x, y) v Computationally: discrete optimization v The naïve algorithm v Enumerate all sequences, score each one and pick the max v Terrible idea! v We can do better v Scores decomposed over edges Advanced ML: Inference 6

7 The Viterbi algorithm: Recurrence Goal: Find argmax y w " φ(x, y) y = (y 1, y 2,!, y n ) y 1 y 2 y 3 y n Advanced ML: Inference 7

8 The Viterbi algorithm: Recurrence Goal: Find argmax y w " φ(x, y) y 1 y = (y 1, y 2,!, y n ) y 2 y 3 y n Advanced ML: Inference 8

9 The Viterbi algorithm: Recurrence Goal: Find argmax y w " φ(x, y) y = (y 1, y 2,!, y n ) y 1 y 2 y 3 y n, y )*+ )] Idea 1. If I know the score of all sequences y 1 to y n-1, then I could decide y n easily 2. Recurse to get score up to y n-1 Advanced ML: Inference 9

10 Inference questions v This class: v Mostly we use inference to mean What is the highest scoring assignment to the output random variables for a given input? v Maximum A Posteriori (MAP) inference (if the score is probabilistic) v Other inference questions v What is the highest scoring assignment to some of the output variables given the input? v Sample from the posterior distribution over the Y v Loss-augmented inference: Which structure most violates the margin for a given scoring function? v Computing marginal probabilities over Y Advanced ML: Inference 10

11 MAP inference v A combinatorial problem, v Computational complexity depends on v The size of the input v The factorization of the scores v More complex factors generally lead to expensive inference v A generally bad strategy in most but the simplest cases: Enumerate all possible structures and pick the highest scoring one Advanced ML: Inference 11

12 MAP inference v A combinatorial problem, v Computational complexity depends on v The size of the input v The factorization of the scores v More complex factors generally lead to expensive inference v A generally bad strategy in most but the simplest cases: Enumerate all possible structures and pick the highest scoring one Advanced ML: Inference 12

13 MAP inference is discrete optimization v A combinatorial problem ,. v Computational complexity depends on v The size of the input v The factorization of the scores v More complex factors generally lead to expensive inference v A generally bad strategy in most but the simplest cases: Enumerate all possible structures and pick the highest scoring one Advanced ML: Inference 13

14 MAP inference is discrete optimization v A combinatorial problem ,. v Computational complexity depends on v The size of the input v The factorization of the scores v More complex factors generally lead to expensive inference v A generally bad strategy in most but the simplest cases: Enumerate all possible structures and pick the highest scoring one Advanced ML: Inference 14

15 MAP inference is search v We want a graph that has highest scoring structure argmax y w " φ(x, y) v Without assumptions, no algorithm can find the max without considering every possible structure v How can we solve this computational problem? v Exploit the structure of the search space and the cost function v That is, exploit decomposition of the scoring function v Usually stronger assumptions lead to easier inference v E.g., consider 10 independent random variables Advanced ML: Inference 15

16 Approaches for inference v Exact vs. approximate inference v Should the maximization be performed exactly? v Or is a close-to-highest-scoring structure good enough? v Exact: Search, dynamic programming, integer linear programming,. v Heuristic (called approximate inference): Gibbs sampling, belief propagation, beam search, linear programming relaxations, v Randomized vs. deterministic v Relevant for approximate inference: If I run the inference program twice, will I get the same answer? Advanced ML: Inference 16

17 Coming up v Formulating general inference as integer linear programs v And variants of this idea v Graph algorithms, dynamic programing, greedy search v We have seen Viterbi algorithm Uses a cleverly defined ordering to decompose the output into a sequence of decisions We will talk about a general algorithm -- max-product, sum-product v Heuristics for inference v Sampling, Gibbs Sampling v Approximate graph search, beam search v LP-relaxation Advanced ML: Inference 17

18 Inference: Integer Linear Programs Advanced ML: Inference 18

19 The big picture v MAP Inference is combinatorial optimization v Combinatorial optimization problems can be written as integer linear programs (ILP) v The conversion is not always trivial v Allows injection of knowledge in the form of constraints v Different ways of solving ILPs v Commercial solvers: CPLEX, Gurobi, etc v Specialized solvers if you know something about your problem v Lagrangianrelaxation, amortized inference, etc v Can approximate to linear programs and hope for the best v Integer linear programs are NP hard in general v No free lunch Advanced ML: Inference 19

20 Detour: Linear programming v Minimizing a linear objective function subject to a finite number of linear constraints (equality or inequality) v Very widely applicable v Operations research, micro-economics, management v Historical note v Developed during world war II to reduce army costs v Programming not the same as computer programming Advanced ML: Inference 20

21 Example: The diet problem A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: Item Cost/100g Vitamin Z Nutrient X Carrots Sunflower seeds Double cheeseburger How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X? Advanced ML: Inference 21

22 Example: The diet problem A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: Item Cost/100g Vitamin Z Nutrient X Carrots Sunflower seeds Double cheeseburger How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X? Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative Advanced ML: Inference 22

23 Example: The diet problem A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: Item Cost/100g Vitamin Z Nutrient X Carrots Sunflower seeds Double cheeseburger How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X? Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative Advanced ML: Inference 23

24 Example: The diet problem A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: Item Cost/100g Vitamin Z Nutrient X Carrots Sunflower seeds Double cheeseburger How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X? Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative Advanced ML: Inference 24

25 Example: The diet problem A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: Item Cost/100g Vitamin Z Nutrient X Carrots Sunflower seeds Double cheeseburger How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X? Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative Advanced ML: Inference 25

26 Example: The diet problem A student wants to spend as little money on food while getting sufficient amount of vitamin Z and nutrient X. Her options are: Item Cost/100g Vitamin Z Nutrient X Carrots Sunflower seeds Double cheeseburger How should she spend her money to get at least 5 units of vitamin Z and 3 units of nutrient X? Let c, s and d denote how much of each item is purchased Minimize total cost At least 5 units of vitamin Z, At least 3 units of nutrient X, The number of units purchased is not negative Advanced ML: Inference 26

27 Linear programming v In general linear linear Advanced ML: Inference 27

28 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v For example: x 2 x 1 x 3 Advanced ML: Inference 28

29 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v For example: x 2 a 1 x 1 + a 2 x 2 + a 3 x 3 = b x 1 x 3 Advanced ML: Inference 29

30 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v For example: Suppose we had to maximize any c T x on this region x 2 a 1 x 1 + a 2 x 2 + a 3 x 3 = b x 1 x 3 Advanced ML: Inference 30

31 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v For example: Suppose we had to maximize any c T x on this region c x 2 a 1 x 1 + a 2 x 2 + a 3 x 3 = b x 1 x 3 Advanced ML: Inference 31

32 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v For example: Suppose we had to maximize any c T x on this region c x 2 a 1 x 1 + a 2 x 2 + a 3 x 3 = b x 1 x 3 Advanced ML: Inference 32

33 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v For example: Suppose we had to maximize any c T x on this region c x 2 a 1 x 1 + a 2 x 2 + a 3 x 3 = b x 1 x 3 Advanced ML: Inference 33

34 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v For example: Suppose we had to maximize any c T x on this region c x 2 a 1 x 1 + a 2 x 2 + a 3 x 3 = b x 1 x 3 Advanced ML: Inference 34

35 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v For example: Suppose we had to maximize any c T x on this region These three vertices are the only possible solutions! x 3 x 2 a 1 x 1 + a 2 x 2 + a 3 x 3 = b x 1 Advanced ML: Inference 35

36 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v The constraint matrix defines a polytope v Only the vertices or faces of the polytope can be solutions Advanced ML: Inference 36

37 Geometry of linear programming The constraint matrix defines polytope that contains allowed solutions (possibly not closed) Advanced ML: Inference 37

38 Geometry of linear programming The constraint matrix defines polytope that contains allowed solutions (possibly not closed) The objective defines cost for every point in the space Darker is higher Advanced ML: Inference 38

39 Geometry of linear programming The constraint matrix defines polytope that contains allowed solutions (possibly not closed) The objective defines cost for every point in the space Advanced ML: Inference 39

40 Geometry of linear programming The constraint matrix defines polytope that contains allowed solutions (possibly not closed) The objective defines cost for every point in the space Even though all points in the region are allowed, the vertices maximize/minimize the cost Advanced ML: Inference 40

41 Linear programming v In general v This is a continuous optimization problem v And yet, there are only a finite set of possible solutions v The constraint matrix defines a polytope v Only the vertices or faces of the polytope can be solutions v Linear programs can be solved in polynomial time Questions? Advanced ML: Inference 41

42 Integer linear programming v In general Advanced ML: Inference 42

43 Geometry of integer linear programming The constraint matrix defines polytope that contains allowed solutions (possibly not closed) The objective defines cost for every point in the space Only integer points allowed Advanced ML: Inference 43

44 Integer linear programming v In general v Solving integer linear programs in general can be NP-hard! v LP-relaxation: Drop the integer constraints and hope for the best Advanced ML: Inference 44

45 0-1 integer linear programming v In general v An instance of integer linear programs v Still NP-hard v Geometry: We are only considering points that are vertices of the Boolean hypercube Advanced ML: Inference 45

46 0-1 integer linear programming v In general Solution can be an interior point of the constraint set defined by Ax b v An instance of integer linear programs v Still NP-hard v Geometry: We are only considering points that are vertices of the Boolean hypercube v Constraints prohibit certain vertices Questions? Eg: Only points within this region are allowed Advanced ML: Inference 46

47 Back to structured prediction v Recall that we are solving argmax y w " φ(x, y) v The goal is to produce a graph v The set of possible values that y can take is finite, but large v General idea: Frame the argmax problem as a 0-1 integer linear program v Allows addition of arbitrary constraints Advanced ML: Inference 47

48 Thinking in ILPs Let s start with multi-class classification arg max φ(x, y) = argmax 2 {5,6,7} score(y) 2 {5,6,7} Introduce decision variables for each label vz A = 1 if output = A, 0 otherwise vz B = 1 if output = B, 0 otherwise vz C = 1 if output = C, 0 otherwise Advanced ML: Inference 48

49 Thinking in ILPs Let s start with multi-class classification arg max φ(x, y) = argmax 2 {5,6,7} score(y) 2 {5,6,7} Introduce decision variables for each label v z A = 1 if output = A, 0 otherwise v z B = 1 if output = B, 0 otherwise v z C = 1 if output = C, 0 otherwise Pick exactly one label Maximize the score Advanced ML: Inference 49

50 Thinking in ILPs Let s start with multi-class classification arg max φ(x, y) = argmax 2 {5,6,7} score(y) 2 {5,6,7} Introduce decision variables for each label v z A = 1 if output = A, 0 otherwise v z B = 1 if output = B, 0 otherwise v z C = 1 if output = C, 0 otherwise Maximize the score Advanced ML: Inference 50

51 Thinking in ILPs Let s start with multi-class classification arg max φ(x, y) = argmax 2 {5,6,7} score(y) 2 {5,6,7} We Introduce have taken decision a trivial variables problem for (finding each label a highest scoring element v z A = of 1 if a output list) and = A, converted 0 otherwise it into a representation that is NP-hard in the worst case! v z B = 1 if output = B, 0 otherwise Lesson: v z C = Don t 1 if output solve = multiclass C, 0 otherwise classification with an ILP solver Maximize the score Advanced ML: Inference 51

52 ILP for a general conditional models y 1 y 3 Suppose each y i can be A, B or C y 2 x 1 x 2 x 3 Introduce one decision variable for each part being assigned labels Our goal: max y W T φ(x 1, y 1 ) + W T φ (y 1, y 2, y 3 ) + W T φ (x 3, y 2, y 3 ) + W T φ (x 1, x 2, y 2 ) Advanced ML: Inference 52

53 ILP for a general conditional models Suppose each y i can be A, B or C z 1A, z 1B, z 1C y 1 y 2 y 3 Introduce one decision variable for each part being assigned labels x 1 x 2 x 3 Our goal: max y W T φ(x 1, y 1 ) + W T φ (y 1, y 2, y 3 ) + W T φ (x 3, y 2, y 3 ) + W T φ (x 1, x 2, y 2 ) Advanced ML: Inference 53

54 ILP for a general conditional models z 13AA, z 13AB, z 13AC, z 13BA, z 13BB, z 13BC, z 13CA, z 13CB, z 13CC Suppose each y i can be A, B or C z 1A, z 1B, z 1C y 1 y 2 y 3 Introduce one decision variable for each part being assigned labels x 1 x 2 x 3 Each of these decision variables is associated with a score z 2A, z 2B, z 2C z 23AA, z 23AB, z 23AC, z 23BA, z 23BB, z 23BC, z 23CA, z 23CB, z 23CC Our goal: max y W T φ(x 1, y 1 ) + W T φ (y 1, y 2, y 3 ) + W T φ (x 3, y 2, y 3 ) + W T φ (x 1, x 2, y 2 ) Questions? Advanced ML: Inference 54

55 ILP for a general conditional models y 1 y 3 Suppose each y i can be A, B or C y 2 Introduce one decision variable for each part being assigned labels x 1 x 2 x 3 Each of these decision variables is associated with a score Not all decisions can exist together. Eg: z 13AB implies z 1A and z 3B Our goal: max y W T φ(x 1, y 1 ) + W T φ (y 1, y 2, y 3 ) + W T φ (x 3, y 2, y 3 ) + W T φ (x 1, x 2, y 2 ) Advanced ML: Inference 55

56 Writing constraints as linear inequalities v Exactly one of z 1A, z 1B, z 1C can be true z 1A + z 1B + z 1C = 1 v At least m of z 1A, z 1B, z 1C should be true z 1A + z 1B + z 1C m v At most m of z 1A, z 1B, z 1C should be true z 1A + z 1B + z 1C m v Implication: z i z j v Convert to disjunction: z i z j (At least one of not z i or z j ) 1 z i + z j 1 Advanced ML: Inference 56

57 Writing constraints as linear inequalities v Exactly one of z 1A, z 1B, z 1C can be true z 1A + z 1B + z 1C = 1 v At least m of z 1A, z 1B, z 1C should be true z 1A + z 1B + z 1C m v At most m of z 1A, z 1B, z 1C should be true z 1A + z 1B + z 1C m v Implication: z i z j v Convert to disjunction: z i z j (At least one of not z i or z j ) 1 z i + z j 1 Advanced ML: Inference 57

58 Integer linear programming for inference v Easy to add additional knowledge v Specify them as Boolean formulas v Examples v If y 1 is an A, then y 2 or y 3 should be a B or C v No more than two A s allowed in the output v Many inference problems have standard mappings to ILPs v Sequences, parsing, dependency parsing v Encoding of the problem makes a difference in solving time v The mechanical encoding may not be efficient to solve v Generally: more complex constraints make solving harder Advanced ML: Inference 58

59 Exercise: Sequence labeling Goal: Find argmax y W T φ (x,y) y = (y 1, y 2,!, y n ) y 1 y 2 y 3 y n How can this be written as an ILP? Advanced ML: Inference 59

60 ILP for inference: Remarks v Many combinatorial optimization problem can be written as an ILP v Even the easy /polynomial ones v Given an ILP, checking whether it represents a polynomial problem is intractable in general v ILPs are a general language for thinking about combinatorial optimization v The representation allows us to make general statements about inference v Off-the-shelf solvers for ILPs are quite good v Gurobi, CPLEX v Use an off the shelf solver only if you can t solve your inference problem otherwise Advanced ML: Inference 60

61 Coming up v Formulating general inference as integer linear programs v And variants of this idea v Graph algorithms, dynamic programing, greedy search v We have seen Viterbi algorithm Uses a cleverly defined ordering to decompose the output into a sequence of decisions We will talk about a general algorithm -- max-product, sum-product v Heuristics for inference v Sampling, Gibbs Sampling v Approximate graph search, beam search v LP-relaxation Advanced ML: Inference 61

62 Inference: Graph algorithms Belief Propagation Advanced ML: Inference 62

63 Variable elimination (motivation) v Remember: We have a collection of inference variables that need to be assigned y = (y 1, y 2,!) v General algorithm v First fix an ordering of the variables, say (y 1, y 2,!) v Iteratively: v Find the best value for y i given the values of the previous neighbors v Use back pointers to find final answer Advanced ML: Inference 63

64 Variable elimination: (motivation) v Remember: We have a collection of inference variables that need to be assigned y = (y 1, y 2,!) v General algorithm v First fix an ordering of the variables, say (y 1, y 2,!) v Iteratively: v Find the best value for y i given the values of the previous neighbors v Use back pointers to find final answer v Viterbi is an instance of max-product variable elimination Advanced ML: Inference 64

65 Variable elimination: (motivation) v Remember: We have a collection of inference variables that need to be assigned y = (y 1, y 2,!) v General algorithm v First fix an ordering of the variables, say (y 1, y 2,!) v Iteratively: v Find the best value for y i given the values of the previous neighbors v Use back pointers to find final answer v Viterbi is an instance of max-product variable elimination Advanced ML: Inference 65

66 Variable elimination example (max-sum) y 1 A B C D B C D Score_local y 2 y 3 y n B 4 C 1 D 0 B 4 C 1 D 0 First eliminate y 1 Score + s = score_local + s, START Score O s = max P QRS [score O*+ (y O*+ ) + score_local O (y O*+, y O )] Advanced ML: Inference 66

67 Variable elimination example A B C D A B C D y 2 y 3 y n A B C D Next eliminate y 2 Score + s = score_local + s, START Score O s = max P QRS [score O*+ (y O*+ ) + score_local O (y O*+, y O )] Advanced ML: Inference 67

68 Variable elimination example y 3 A B C D A B C D y n A B C D Next eliminate y 3 Score + s = score_local + s, START Score O s = max P QRS [score O*+ (y O*+ ) + score_local O (y O*+, y O )] Advanced ML: Inference 68

69 Variable elimination example y n A B C D We have all the information to make a decision for y n Score + s = score_local + s, START Score O s = max P QRS [score O*+ (y O*+ ) + score_local O (y O*+, y O )] Advanced ML: Inference Questions? 69

70 Two types of inference problems v Marginals: find v Maximizer: find Probability in different domains Advanced ML: Inference 70

71 Belief Propagation v BP provides exact solution when there are no loops in graph v Viterbi is a special case v Otherwise, loopy BP provides an approximate solution We use sum-product BP as running example, where we want to compute the Z in Advanced ML: Inference 71

72 Intuition v iterative process in which neighboring variables pass message to each other: I (variable x3) think that you (variable x2) belong in these states with various likelihoods v After enough iterations, the conversations is likely to converge to a consensus that determines the marginal probabilities of all the variables. Advanced ML: Inference 72

73 Message v Message from node i to node j: m )W (x W ) v Message is not probability v May not sum to 1 v A high value of m )W (x W ) means that node i believes the marginal value P(x W ) to be high. Advanced ML: Inference 73

74 Beliefs vestimated marginal probabilities are called beliefs. valgorithm vupdate messages until convergence vthen calculate beliefs Advanced ML: Inference 74

75 Message update v To update message from i to j, consider all messages flowing into i Advanced ML: Inference 75

76 Message update Advanced ML: Inference 76

77 Message update Advanced ML: Inference 77

78 Sum-product vs. max-product v The standard BP we just described is sumproduct used to estimate marginal v A variant called max-product (or max-sum in log space), is used to estimate MAP Advanced ML: Inference 78

79 Max-product v Message update same as before, except that sum is replaced by max: v Belief: estimate most likely states Advanced ML: Inference 79

80 Recap Advanced ML: Inference 80

81 Example (Sum-Product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 For the sake of simplicity, let s assume all the transition and emission scores are the same How many possible assignments? Advanced ML: Inference 81

82 Example (Sum-Product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A B Advanced ML: Inference 82

83 Example (Sum-Product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 5 B 8 Advanced ML: Inference 83

84 Example (Sum-Product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 5 B 8 A 5 B 8 Advanced ML: Inference 84

85 Example (Sum-Product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 (5 5) (8 8) A 5 B 8 A 5 B 8 Advanced ML: Inference 85

86 Example (Sum-Product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 50 B 64 A 5 B 8 A 5 B 8 Advanced ML: Inference 86

87 Example (Sum-Product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 50 B 64 A 5 B 8 A 5 B 8 Let s verify: (A,(A,A)) = = 8 (A,(A,B)) = = 12 (A,(B,A)) = = 12 (A,(B,B)) = = 18 Advanced ML: Inference 87

88 Example (Sum-Product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 50 B 64 = A = 5 B 8 A 5 B 8 Let s verify: (A,(A,A)) = = 8 (A,(A,B)) = = 12 (A,(B,A)) = = 12 (A,(B,B)) = = 18 Advanced ML: Inference 88

89 Example (Sum-Product) A -> A 1 A -> B 2 A = 242 B -> A 3 B = 356 B-> B 4 A 50 B 64 A 5 B 8 A 5 B 8 A 5 B 8 A 5 B 8 Advanced ML: Inference 89

90 Example (Sum-Product) A -> A 1 A -> B 2 42 B -> A 3 B 356 B-> B 4 A 50 B = 12, = 22,784 A 5 A 5 B 8 B 8 A 5 B 8 A 5 B 8 Advanced ML: Inference 90

91 Example (Sum-Product) A -> A 1 A -> B 2 42 B -> A 3 B 356 B-> B 4 A 50 B 64 A 12,100 B 22,784 A 5 B 8 A 5 B 8 34,884 A 5 B 8 A 5 B 8 Advanced ML: Inference 91

92 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 For the sake of simplicity, let s assume all the transition and emission scores are the same How many possible assignments? Advanced ML: Inference 92

93 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c h(x ) ) A max 1 2, 3 1 = 3 B max 2 2, 4 1 = 4 Advanced ML: Inference 93

94 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c h(x ) ) A 3 B 4 Advanced ML: Inference 94

95 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c h(x ) ) A 3 B 4 A 3 B 4 Advanced ML: Inference 95

96 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c h(x ) ) (3 3) (4 4) A 3 B 4 A 3 B 4 Advanced ML: Inference 96

97 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 18 6 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c h(x ) ) A 3 B 4 A 3 B 4 Advanced ML: Inference 97

98 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 18 6 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c h(x ) ) A 3 B 4 A 3 B 4 Let s verify: (A,(A,A)) = = 8 (A,(A,B)) = = 12 (A,(B,A)) = = 12 (A,(B,B)) = = 18 Advanced ML: Inference 98

99 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 18 6 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c h(x ) ) = A 3 B 4 A 3 B 4 Let s verify: (A,(A,A)) = = 8 (A,(A,B)) = = 12 (A,(B,A)) = = 12 (A,(B,B)) = = 18 Advanced ML: Inference 99

100 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c h(x ) ) A max 1 18, 3 16 = 48 B max 2 18, 4 16 = 64 A 3 A 18 A 3 B 4 6 B 4 A 3 B 4 A 3 B 4 Advanced ML: Inference 100

101 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 18 6 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c = h(x 864 ) ) A = 1,024 B 64 A 3 A 3 B 4 B 4 A 3 B 4 A 3 B 4 Advanced ML: Inference 101

102 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 18 6 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c A 864 h(x ) ) A 48,024 B 64 A 3 A 3 B 4 B 4 A 3 B 4 A 3 B 4 Advanced ML: Inference 102

103 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 18 6 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c A 864 h(x ) ) A 48 (B),024 B 64 (B) A 3 (B) A 3 (B) B 4 (A) B 4 (A) A B 3 (B) 4 (A) A B 3(B) 4(A) Advanced ML: Inference 103

104 Example (max-product) A -> A 1 A -> B 2 B -> A 3 B-> B 4 A 18 6 m def Oc x W = max f ),W x ), x W g ) x ) j m lmn g k) (x ) ) Q k opn())\c A 864 h(x ) ) A 48 (B),024 B 64 (B) A 3 (B) A 3 (B) B 4 (A) B 4 (A) A B 3 (B) 4 (A) A B 3(B) 4(A) Advanced ML: Inference 104

105 Inference: Graph algorithms General Search Advanced ML: Inference 105

106 Inference as search: General setting v Predicting a graph as a sequence of decisions v General data structures: v State: Encodes partial structure v Transitions: Move from one partial structure to another v Start state v End state: We have a full structure v There may be more than one end state v Each transition is scored with the learned model v Goal: Find an end state that has the highest total score Questions? Advanced ML: Inference 106

107 Example y 1 y 3 Suppose each y can be one of A, B or C y 2 x 1 x 2 x 3 State: Triples (y 1, y 2, y 3 ) all possibly unknown (A, -, -), (-, A, A), (-, -, -), Transition: Fill in one of the unknowns Start state: (-,-,-) End state: All three y s are assigned Advanced ML: Inference 107

108 Example y 1 y 3 Suppose each y can be one of A, B or C y 2 x 1 x 2 x 3 State: Triples (y 1, y 2, y 3 ) all possibly unknown (A, -, -), (-, A, A), (-, -, -), Transition: Fill in one of the unknowns Start state: (-,-,-) (-,-,-) (A,-,-) (B,-,-) (C,-,-) (A,A,-) (C,C,-).. End state: All three y s are assigned (A,A,A) (C,C,C) Advanced ML: Inference 108

109 Example y 1 y 3 Suppose each y can be one of A, B or C y 2 Note: Here we have assumed an ordering (y 1, y 2, y 3 ) x 1 x 2 x 3 State: Triples (y 1, y 2, y 3 ) all possibly unknown (A, -, -), (-, A, A), (-, -, -), Transition: Fill in one of the unknowns Start state: (-,-,-) (-,-,-) (A,-,-) (B,-,-) (C,-,-) (A,A,-) (C,C,-).. End state: All three y s are assigned (A,A,A) (C,C,C) Advanced ML: Inference 109

110 Example y 1 y 3 Suppose each y can be one of A, B or C y 2 Note: Here we have assumed an ordering (y 1, y 2, y 3 ) How do the transitions get scored? x 1 x 2 x 3 State: Triples (y 1, y 2, y 3 ) all possibly unknown (A, -, -), (-, A, A), (-, -, -), Transition: Fill in one of the unknowns Start state: (-,-,-) (-,-,-) (A,-,-) (B,-,-) (C,-,-) (A,A,-) (C,C,-).. End state: All three y s are assigned Questions? Advanced ML: Inference (A,A,A) (C,C,C) 110

111 Graph search algorithms v Breadth/depth first search v Keep a stack/queue/priority queue of open states v The good: Guaranteed to be correct v Explores every option (-,-,-) v The bad? v Explores every option: Could be slow for any non-trivial graph (A,-,-) (B,-,-) (C,-,-) (A,A,-) (C,C,-).. (A,A,A) (C,C,C) Advanced ML: Inference 111

112 Greedy search v At each state, choose the highest scoring next transition v Keep only one state in memory: The current state v What is the problem? v Local decisions may override global optimum v Does not explore full search space v Greedy algorithms can give the true optimum for special class of problems v Eg: Maximum-spanning tree algorithms are greedy Questions? Advanced ML: Inference 112

113 Example y 1 y 3 Suppose each y can be one of A, B or C y 2 Note: Here we have assumed an ordering (y 1, y 2, y 3 ) How do the transitions get scored? x 1 x 2 x 3 State: Triples (y 1, y 2, y 3 ) all possibly unknown (A, -, -), (-, A, A), (-, -, -), Transition: Fill in one of the unknowns Start state: (-,-,-) (-,-,-) (A,-,-) (B,-,-) (C,-,-) (A,A,-) (A,B,-) (A,C,-) End state: All three y s are assigned Questions? Advanced ML: Inference (A,A,-) (A,B,C) (A,C,-) 113

114 Example (greedy) A -> A 1 A -> B 2 B -> A 3 B-> B 4 Advanced ML: Inference 114

115 Example (greedy) = = 1,024 A -> A 1 A -> B 2 B -> A 3 B-> B = 8 B = 16 Advanced ML: Inference 115

116 Beam search: A compromise v Keep size-limited priority queue of states v Called the beam, sorted by total score for the state v At each step: v Explore all transitions from the current state v Add all to beam and trim the size v The good: Explores more than greedy search v The bad: A good state might fall out of the beam v In general, easy to implement, very popular v No guarantees Questions? Advanced ML: Inference 116

117 Example beam = 3 Credit: Graham Neubig Advanced ML: Inference 117

118 v Calculate score, but ignore removed hypotheses Advanced ML: Inference 118

119 v Keep only best three Advanced ML: Inference 119

120 Structured prediction approaches based on search vlearning to search approaches vassume the complex decision is incrementally constructed by a sequence of decisions ve.g., dagger, Searn, transition-based methods vlearn how to make decisions at each branch. Advanced ML: Inference 120

121 Example: Dependency Parsing v Identifying relations between words I ate a cake with a fork I ate a cake with a fork Advanced ML: Inference 121

122 Learning to search (L2S) approaches 1. Define a search space and features v Example: dependency parsing [Nivre03,NIPS16] v Maintain a buffer and a stack v Make predictions from left to right v Three (four) types of actions: Shift, Reduce-Left, Reduce-Right Advanced ML: Inference Credit: Google research blog 122

123 Learning to search approaches Shift-Reduce parser v Maintain a buffer and a stack v Make predictions from left to right v Three (four) types of actions: Shift, Reduce-Left, Reduce-Right I ate a cake Shift I ate a cake I ate a cake Reduce-Left ate a cake I Advanced ML: Inference 123

124 Learning to search (L2S) approaches 1. Define a search space and features 2. Construct a reference policy (Ref) based on the gold label 3. Learning a policy that imitates Ref sentence Advanced ML: Inference 124

125 Policies v A policy maps observations to actions π( ) =a obs. input: x timestep: t partial traj: τ anything else Advanced ML: Inference 125

126 Imitation learning for joint prediction Challenges: v There are combinatorial number of search states v How a sub-decision affect the final decision? Advanced ML: Inference 126

127 Credit Assignment Problem v When someone goes wrong which decision should be blamed Advanced ML: Inference 127

128 Imitation learning for joint prediction Searn: [Langford, Daume& Marcu] Dagger: [Ross, Gordon & Bagnell] AggreVaTe: [Ross & Bagnell] LOLS: [Chang, Krishnamurthy, Agarwal, Daume, Langford] Advanced ML: Inference 128

129 Learning a Policy[ICML 15, Ross+15] v At? state, we construct a cost-sensitive multi-class example (?, [0,.2,.8]) E loss=0? E loss=.2 rollin E loss=.8 rollout one-step deviations Advanced ML: Inference 129

130 Example: Sequence Labeling Receive input: x = the monster ate the sandwich y = Dt Nn Vb Dt Nn Make a sequence of predictions: x = the monster ate the sandwich ŷ = Dt Dt Dt Dt Dt Pick a timestep and try all perturbations there: x = the monster ate the sandwich ŷ Dt = Dt Dt Vb Dt Nn l=1 ŷ Nn = Dt Nn Vb Dt Nn l=0 ŷ Vb = Dt Vb Vb Dt Nn l=1 Compute losses and construct example: ( { w=monster, p=dt, }, [1,0,1] ) Advanced ML: Inference 130

131 Learning to search approaches: Credit Assignment Compiler [NIPS16] v Write the decoder, providing some side information for training v Library translates this piece of program with data to the update rules of model v Applied to dependency parsing, Name entity recognition, relation extraction, POS tagging v Implementation: Vowpal Wabbit Advanced ML: Inference 131

132 Approximate Inference Inference by sampling Advanced ML: Inference 132

133 Inference by sampling v Monte Carlo methods: A large class of algorithms v Origins in physics v Basic idea: v Repeatedly sample from a distribution v Compute aggregate statistics from samples v E.g.: The marginal distribution v Useful when we have many, many interacting variables 133

134 Why sampling works v Suppose we have some probability distribution P(z) v Might be a cumbersome function v We want to answer questions about this distribution v What is the mean? v Approximate with samples from the distribution {z 1, z 2,!, z n } Eg: Expectation v Theory tells us that this is a good estimator v Chernoff-Hoeffding style bounds 134

135 Key idea rejection sampling Advanced ML: Inference 135

136 Key idea rejection sampling Work well when number of variables are small Advanced ML: Inference 136

137 The Markov Chain Monte Carlo revolution v Goal: To generate samples from a distribution P(y x) v The target distribution could be intractable to sample from v Idea: Construct a Markov chain of structures whose stationary distribution converges to P v An iterative process that constructs examples v Initially samples might not be from the target distribution v But after a long enough time, the samples are from a distribution that is close to P 137

138 The Markov Chain Monte Carlo revolution v Goal: To generate samples from a distribution P(y x) v The target distribution could be intractable to sample from v Idea: drawing examples in a way that in a long run the distribution is closed to P(y x) v Formally: Construct a Markov chain of structures whose stationary distribution converges to P v An iterative process that constructs examples v Initially samples might not be from the target distribution v But after a long enough time, the samples are from a distribution that is increasingly close to P 138

139 A detour Recall: Markov Chains A collection of random variables y 0, y 1, y 2,, y t form a Markov chain if the i th state depends only on the previous one D F A B B C 0.1 A! B! C! D! E! F F! A! A! E! F! B! C D E D

140 A detour Recall: Markov Chains A collection of random variables y 0, y 1, y 2,, y t form a Markov chain if the i th state depends only on the previous one D F A B B C 0.1 A B C D E F F A A E F B C D E D

141 Temporal dynamics of a Markov chain What is the probability that a chain is in a state z at time t+1? 141

142 Temporal dynamics of a Markov chain What is the probability that a chain is in a state z at time t+1? D F A B B C 0.1 Exercise: Suppose a Markov chain for these transition probabilities starts at A. What is the distribution over states after two steps? D E D

143 Stationary distributions v Informally, v If the set of states is {A, B, C, D, E, F} v A distribution over the states such that after a transition, the distribution over the states is still π v How do we get to a stationary distribution? v A regular Markov chain: There is an non-zero probability of getting from any states to any other in a finite number of steps v If transition matrix is regular, just run it for a long time v Steady-state behavior is the stationary distribution 143

144 Key idea rejection sampling Advanced ML: Inference 144

145 Back to inference Markov Chain Monte Carlo for inference v Design a Markov chain such that v Every state is a structure v The stationary distribution of the chain is the probability distribution we care about P(y x) v How to do inference? v Run the Markov chain for a long time till we think it gets to steady state v Let the chain wander around the space and collect samples v We have samples from P(y x) 145

146 MCMC for inference 1 146

147 MCMC for inference After many steps

148 MCMC for inference After many steps After many steps

149 MCMC for inference After many steps After many steps After many steps

150 MCMC for inference After many steps After many steps After many steps After many steps

151 MCMC for inference After many steps After many steps After many steps After many steps With sufficient samples, we can answer inference questions like calculating the partition function (just sum over the samples) 151

152 Key idea rejection sampling Advanced ML: Inference 152

153 Back to inference Markov Chain Monte Carlo for inference v Design a Markov chain such that v Every state is a structure v The stationary distribution of the chain is the probability distribution we care about P(y x) v How to do inference? v Run the Markov chain for a long time till we think it gets to steady state v Let the chain wander around the space and collect samples v We have samples from P(y x) 153

154 MCMC algorithms v Metropolis-Hastings algorithm v Gibbs sampling v An instance of the Metropolis Hastings algorithm v Many variants exist v Remember: We are sampling from an exponential state space v All possible assignments to the random variables 154

155 Metropolis-Hastings v Proposal distribution q(y y ) v Proposes changes to the state [Metropolis, Rosenbluth, Rosenbluth, Teller & Teller 1953] [Hastings 1970] v Could propose large changes to the state v Acceptance probability α v Should the proposal be accepted or not v If yes, move to the proposed state, else remain in the previous state 155

156 Metropolis-Hastings Algorithm v Start with an initial guess y 0 v Loop for t = 1, 2, N v Propose next state y v Calculate acceptance probability α v With probability α, accept proposal v If accepted: y t+1 y, else y t+1 y t v Return {y 0, y 1,, y N } The distribution we care about is P(y x) 156

157 Metropolis-Hastings Algorithm v Start with an initial guess y 0 v Loop for t = 1, 2, N v Propose next state y v Calculate acceptance probability α v With probability α, accept proposal v If accepted: y t+1 y, else y t+1 y t v Return {y 0, y 1,, y N } The distribution we care about is P(y x) 157

158 Metropolis-Hastings Algorithm v Start with an initial guess y 0 v Loop for t = 1, 2, N v Propose next state y v Calculate acceptance probability α v With probability α, accept proposal v If accepted: y t+1 y, else y t+1 y t v Return {y 0, y 1,, y N } The distribution we care about is P(y x) Sample from q(y t y ) 158

159 Metropolis-Hastings Algorithm v Start with an initial guess y 0 v Loop for t = 1, 2, N v Propose next state y v Calculate acceptance probability α v With probability α, accept proposal v If accepted: y t+1 y, else y t+1 y t v Return {y 0, y 1,, y N } The distribution we care about is P(y x) Sample from q(y t y ) Note that we don t need to compute the partition function. Why? 159

160 Metropolis-Hastings Algorithm v Start with an initial guess y 0 v Loop for t = 1, 2, N v Propose next state y v Calculate acceptance probability α v With probability α, accept proposal v If accepted: y t+1 y, else y t+1 y t v Return {y 0, y 1,, y N } The distribution we care about is P(y x) Sample from q(y t y ) Idea: when running with enough iteration, the distribution is invariant 160

161 Proposal functions for Metropolis v A design choice v Different possibilities v Only make local changes to the factor graph v But the chain might not explore widely v Make big jumps in the state space v But the chain might move very slowly v Doesn t have to depend on the size of the graph 161

162 Gibbs Sampling v Start with an initial guess y = (y 1, y 2,!, y n ) v Loop several times v For i = 1 to n: v Sample y i from P(y i y 1, y 2,! y i-1, y i+1,!, y n, x) v We now have a complete sample The ordering is arbitrary A specific instance of Metropolis-Hastings algorithm, no proposal needs to be designed 162

163 MAP inference with MCMC v So far we have only seen how to collect samples v Marginal inference with samples is easy v Compute the marginal probabilities from the samples v MAP inference: v Find the sample with highest probability v To help convergence to the maximum, acceptance condition becomes T is a temperature parameter that increases with every step Similar to simulated annealing 163

164 Summary of MCMC methods v A different approach for inference v No guarantee of exactness v General idea v Set up a Markov chain whose stationary distribution is the probability distribution that we care about v Run the chain, collect samples, aggregate v Metropolis-Hastings, Gibbs sampling v Many, many, many variants abound! v Useful when exact inference is intractable v Typically low memory costs, local changes only for Gibbs sampling Questions? 164

165 Inference v What is inference? The prediction step v More broadly, an aggregation operation on the space of outputs for an example: max, expectation, sample, sum v Different flavors: MAP, marginal, loss augmented. v Many algorithms, solution strategies v One size doesn t fit all v Next steps: v How can we take advantage of domain knowledge in inference? v How can we deal making predictions about latent variables for which we don t have data Questions? 165

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Lab 12: Structured Prediction

Lab 12: Structured Prediction December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

imitation learning Recurrent Hal Daumé III University of

imitation learning Recurrent Hal Daumé III University of Networks imitation learning Recurrent Neural Hal Daumé III University of Maryland me@hal3.name @haldaume3 Networks imitation learning Recurrent Neural NON-DIFFERENTIABLE DISCONTINUOUS Hal Daumé III University

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 5 Solving an NLP Problem When modelling a new problem in NLP, need to

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Lecture 6: Graphical Models

Lecture 6: Graphical Models Lecture 6: Graphical Models Kai-Wei Chang CS @ Uniersity of Virginia kw@kwchang.net Some slides are adapted from Viek Skirmar s course on Structured Prediction 1 So far We discussed sequence labeling tasks:

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Seq2Seq Losses (CTC)

Seq2Seq Losses (CTC) Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

CSC 411 Lecture 3: Decision Trees

CSC 411 Lecture 3: Decision Trees CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning

More information

Lecture 3: Multiclass Classification

Lecture 3: Multiclass Classification Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Bayes Nets: Sampling Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction 15-0: Learning vs. Deduction Artificial Intelligence Programming Bayesian Learning Chris Brooks Department of Computer Science University of San Francisco So far, we ve seen two types of reasoning: Deductive

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Sampling Algorithms for Probabilistic Graphical models

Sampling Algorithms for Probabilistic Graphical models Sampling Algorithms for Probabilistic Graphical models Vibhav Gogate University of Washington References: Chapter 12 of Probabilistic Graphical models: Principles and Techniques by Daphne Koller and Nir

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash Equilibrium Price of Stability Coping With NP-Hardness

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Bayesian networks: approximate inference

Bayesian networks: approximate inference Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008 Approximative inference September 2008 1 / 25 Motivation Because of the (worst-case) intractability of exact

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Introduction to Computer Science and Programming for Astronomers

Introduction to Computer Science and Programming for Astronomers Introduction to Computer Science and Programming for Astronomers Lecture 8. István Szapudi Institute for Astronomy University of Hawaii March 7, 2018 Outline Reminder 1 Reminder 2 3 4 Reminder We have

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 9 Oct, 11, 2011 Slide credit Approx. Inference : S. Thrun, P, Norvig, D. Klein CPSC 502, Lecture 9 Slide 1 Today Oct 11 Bayesian

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

Convergence Rate of Markov Chains

Convergence Rate of Markov Chains Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that if X n is an irreducible, aperiodic, positive recurrent Markov chain, then there exists a stationary distribution

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Lecture 12: EM Algorithm

Lecture 12: EM Algorithm Lecture 12: EM Algorithm Kai-Wei hang S @ University of Virginia kw@kwchang.net ouse webpage: http://kwchang.net/teaching/nlp16 S6501 Natural Language Processing 1 Three basic problems for MMs v Likelihood

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

CSCE 478/878 Lecture 6: Bayesian Learning

CSCE 478/878 Lecture 6: Bayesian Learning Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition

More information

Results: MCMC Dancers, q=10, n=500

Results: MCMC Dancers, q=10, n=500 Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time

More information

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017 1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Markov Networks.

Markov Networks. Markov Networks www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts Markov network syntax Markov network semantics Potential functions Partition function

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018

Bayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

Probability and Structure in Natural Language Processing

Probability and Structure in Natural Language Processing Probability and Structure in Natural Language Processing Noah Smith, Carnegie Mellon University 2012 Interna@onal Summer School in Language and Speech Technologies Quick Recap Yesterday: Bayesian networks

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic CS 188: Artificial Intelligence Fall 2008 Lecture 16: Bayes Nets III 10/23/2008 Announcements Midterms graded, up on glookup, back Tuesday W4 also graded, back in sections / box Past homeworks in return

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch

6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch 6.045: Automata, Computability, and Complexity (GITCS) Class 17 Nancy Lynch Today Probabilistic Turing Machines and Probabilistic Time Complexity Classes Now add a new capability to standard TMs: random

More information

Lecture 1 : Data Compression and Entropy

Lecture 1 : Data Compression and Entropy CPS290: Algorithmic Foundations of Data Science January 8, 207 Lecture : Data Compression and Entropy Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will study a simple model for

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

1 Closest Pair of Points on the Plane

1 Closest Pair of Points on the Plane CS 31: Algorithms (Spring 2019): Lecture 5 Date: 4th April, 2019 Topic: Divide and Conquer 3: Closest Pair of Points on a Plane Disclaimer: These notes have not gone through scrutiny and in all probability

More information

Introduction To Graphical Models

Introduction To Graphical Models Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Lecture 8: PGM Inference

Lecture 8: PGM Inference 15 September 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I 1 Variable elimination Max-product Sum-product 2 LP Relaxations QP Relaxations 3 Marginal and MAP X1 X2 X3 X4

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Markov chain Monte Carlo Lecture 9

Markov chain Monte Carlo Lecture 9 Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018 Math 456: Mathematical Modeling Tuesday, April 9th, 2018 The Ergodic theorem Tuesday, April 9th, 2018 Today 1. Asymptotic frequency (or: How to use the stationary distribution to estimate the average amount

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Bayes Nets III: Inference

Bayes Nets III: Inference 1 Hal Daumé III (me@hal3.name) Bayes Nets III: Inference Hal Daumé III Computer Science University of Maryland me@hal3.name CS 421: Introduction to Artificial Intelligence 10 Apr 2012 Many slides courtesy

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 8: Sequence Labeling Jimmy Lin University of Maryland Thursday, March 14, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

CS540 ANSWER SHEET

CS540 ANSWER SHEET CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Essential facts about NP-completeness:

Essential facts about NP-completeness: CMPSCI611: NP Completeness Lecture 17 Essential facts about NP-completeness: Any NP-complete problem can be solved by a simple, but exponentially slow algorithm. We don t have polynomial-time solutions

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Data Structures in Java

Data Structures in Java Data Structures in Java Lecture 21: Introduction to NP-Completeness 12/9/2015 Daniel Bauer Algorithms and Problem Solving Purpose of algorithms: find solutions to problems. Data Structures provide ways

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

10-701/ Machine Learning: Assignment 1

10-701/ Machine Learning: Assignment 1 10-701/15-781 Machine Learning: Assignment 1 The assignment is due September 27, 2005 at the beginning of class. Write your name in the top right-hand corner of each page submitted. No paperclips, folders,

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information