Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Size: px
Start display at page:

Download "Hidden Markov Models (HMM) and Support Vector Machine (SVM)"

Transcription

1 Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1

2 Hidden Markov Models (HMM) and Support Vector Machine (SVM) Part 1: Hidden Markov Models Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 2

3 Outline Hidden Markov Models Markov Markov Chain Markov Models and Markov Processes Hidden Markov Model (HMM) HMM Applications: Probability Evaluation 3

4 Markov (Markov Chain) [Definition (P ij )] The fixed probability (one-step transition probability) that it will next be in state j whenever the process is in state i. That is, P ij = P X n+1 = j X n = i, X n 1 = i n 1,, X 1 = i 1, X 0 = i 0 for all states i 0, i 1, i n 1, i, j and all n 0. [Note (Markov Property)] For all states i 0, i 1, i n 1, i, j and all n 0, P ij = P X n+1 = j X n = i, X n 1 = i n 1,, X 1 = i 1, X 0 = i 0 = P X n+1 = j X n = i 4

5 Markov (Markov Chain) [Note] P ij 0 where i 0, j 0 j=0 P ij = 1 for all i = 0,1, [Markov Chain] P i1 1 P i2 2 P ii i P in n 5

6 Markov (Markov Chain) [Note (P)] Let P denote the matrix of one-step transition probabilities, i.e., P = P ii P ij P ik P ji P jj P jk P ki P kj P kk P jj P ji j P ij P ii i P ki P jk P ik k P kk P kj 6

7 Markov (Markov Chain) [Example] There are two milk companies in South Korea, i.e., A and B. Based on last year statistics, the 88% customers of A is currently still with A; and the other 12% customers are now with B. In addition, the 85% customer of B is currently with B; and the other 15% customers are now with A. [Transition Matrix] P = P AA P AB P BA P BB = [Markov Chain] P AA = 0.88 P AB = 0.12 [One-Step Transition] If initial market share is A = 0.25 and B = 0.75, i.e., s 0 = , the next market share is: s 1 = s 0 P = = A B P BA = 0.15 P BB =

8 Markov (Markov Chain) [Example (Multi-Step Transition)] From the P (in previous slide), suppose that we are in state i in time t and we have to compute the probability for being in state i in time t + 2 (denote by P ii 2 ). t t + 1 t + 2 P ii P ij i i j P ik k P ii i P ji P ki P 2 ii = P X n+2 = i X n = i = P ii P ii + P ij P ji + P ik P ki P ii P ij P ik P ii P ij P ik = P ji P jj P jk P ji P jj P jk P ki P kj P kk P ki P kj P kk 2 = P ii [X n = i: State in i in time n] 8

9 Markov (Markov Models and Markov Processes) Example for Markov Model (Weather Forecasting) Weather State: Sunny (S), Rainy (R), Foggy (F) Today s weather q n depends on previous weather conditions, i.e., q n 1, q n 2,, q 1 : P q n q n 1, q n 2,, q 1 Example: if the previous three weather conditions are q n 1 = S,q n 2 = R, andq n 3 = F, subsequently, the probability where today weather (q n ) is R is as follows: P q n = R q n 1 = S, q n 2 = R, q n 3 = F 9

10 Markov (Markov Models and Markov Processes) Observation from previous [Example] If we have larger n, it means we have to gather more information. If n = 6, we need to gather 3 (6 1) = 243 weather data. Therefore, we need an assumption (called Markov Assumption) which reduces the number of gathering data. [First-Order Markov Assumption] P q n = S j q n 1 = S i, q n 2 = S k, = P q n = S j q n 1 = S i [Second-Order Markov Assumption] P q 1, q 2,, q n = P q i q i 1 n i=1 10

11 Markov (Markov Models and Markov Processes) Observation from previous [Example] (Continued) With Markov Assumption, the probability that can observe a sequence q 1, q 2,, q n can be presented by joint probability as follows: P q 1, q 2,, q n = P q 1 P q 2 q 1 P q 3 q 2, q 1 P q n 1 q n 2,, q 1 P q n q n 1,, q 1 = P q 1 P q 2 q 1 P q 3 q 2 P q n 1 q n 2 P q n q n 1 = n i=1 P q i q i 1 when we assume P q 0 = 1 11

12 Markov (Markov Models and Markov Processes) Example (Weather Forecasting) q n 1 [Weather State Table] q n S R F S R F [Transition Matrix] P = [Transition Diagram] R S F

13 Markov (Markov Models and Markov Processes) Example (Weather Forecasting) Case Study: Suppose that yesterday (q 1 ) s weather is Sunny (S). Then, find the probabilities where today (q 2 ) s weather is Sunny (S) and tomorrow (q 3 ) s weather is Rainy (R). (Solutions) P q 2 = S, q 3 = R q 1 = S = P q 3 = R q 2 = S, q 1 = S P q 2 = S q 1 = S = P q 3 = R q 2 = S P q 2 = S q 1 = S = = 0.04 [Markov Assumption] P q 1 = S, q 2 = S, q 3 = R = P q 1 = S P q 2 = S q 1 = S P q 3 = R q 2 = S, q 1 = S = P q 1 = S P q 2 = S q 1 = S P q 3 = R q 2 = S = = 0.04 [Markov Assumption] 13

14 Outline Hidden Markov Models Markov Hidden Markov Model (HMM) Example: Weather Example: Balls in Jars HMM Applications: Probability Evaluation 14

15 HMM (Example: Weather) [Example (Weather)] You are in a house which has no windows. Your friend will visit you once a day. Now, you can estimate weather by checking whether your friend has an umbrella or not. Your friend carries an umbrella with the probabilities of 0.1, 0.8, and 0.3, when the weather is S, R, and F. Observation: With Umbrella (o i = UO) or Without Umbrella (o i = UX). Now, the weather can be estimated by observing 0 i, i 1. Therefore, according to Bayes theorem: P q i o i = P o i q i P q i P o i 15

16 HMM (Example: Weather) [Example (Weather)] You are in a house which has no windows. Your friend will visit you once a day. Now, you can estimate weather by checking whether your friend has an umbrella or not. Your friend carries an umbrella with the probabilities of 0.1, 0.8, and 0.3, when the weather is S, R, and F. When the sequences of weather and umbrella are given, i.e., q 1,, q n and o 1,, o n, the conditional probability is as follows: P q 1,, q n o 1,, o n = P o 1,, o n q 1,, q n P q 1,, q n P o 1,, o n 16

17 HMM (Example: Balls in Jars) [Example (Weather)] A room has a curtain and there are three jars and the jars contain balls (colors: red, blue, green, and purple). A person behind the curtain select one jar and pick one ball from there. The person shows the ball and put the ball into the jar. And the person repeats. Notations) b j k : pick one ball from jar j and the color of the ball is k where k = 1,2,3,4 when the color is red, blue, green, and purple, respectively. N: The number of states (i.e., the number of jars): S = S 1,, S N M: The number of observation (i.e., the number of colors): O = O 1,, O M State Transition Matrix A = a ij where a ij = P q t+1 = S j q t = S i and this stands for the case where transition happens from state i to state j. Observation B = b j k where b j k = P O t = o k q t = S j and this stands for the case where k is observed in state j. Initial State Distribution π = π i where π i = P q 1 = S 1. 17

18 Outline Hidden Markov Models Markov Hidden Markov Model (HMM) HMM Applications: Probability Evaluation 18

19 HMM Applications: Probability Evaluation [Problem Definition (Probability Evaluation)] When O = o 1, o 2, o 3, and HMM model λ = A, B, π are given, find that the observation sequence can occur from which model with the highest probability? It means that how we can calculate P O λ? [Example] We are about to toss a coin with HMM model λ = A, B, π ; and we want to find the probability of the case where observation is O = T, H, T. 19

20 HMM Applications: Probability Evaluation [Problem Definition (Probability Evaluation)] When O = o 1, o 2, o 3, and HMM model λ = A, B, π are given, find that the observation sequence can occur from which model with the highest probability? It means that how we can calculate P O λ? [Example] We toss a coin with HMM model λ = A, B, π ; and we want to find the probability of the case where observation sequence is O = T, H, T. The given HMM model λ = A, B, π is as follows: A = B = π =

21 HMM Applications: Probability Evaluation [Example] We toss a coin with HMM model λ = A, B, π ; and we want to find the probability of the case where observation sequence is O = T, H, T. The given HMM model λ = A, B, π is as follows: A = B = [Transition Diagram] 1/3 1 P[H]=1 P[T]=0 1/3 2 1/3 1/2 1 P[H]=1/2 P[T]=1/2 1/2 3 P[H]=1/3 P[T]=2/3 π =

22 HMM Applications: Probability Evaluation [Example] We toss a coin with HMM model λ = A, B, π ; and we want to find the probability of the case where observation sequence is O = T, H, T. The given HMM model λ = A, B, π is as follows: A = π = B = [Trellis] State 1 P[H]=1 P[T]=0 State 2 P[H]=1/2 P[T]=1/2 State 3 P[H]=1/3 P[T]=2/3 t = 0 t = 1 t = 2 22

23 HMM Applications: Probability Evaluation [Trellis] State 1 P[H]=1 P[T]=0 State 2 P[H]=1/2 P[T]=1/2 State 3 P[H]=1/3 P[T]=2/3 t = 0 t = 1 t = 2 [Probability Evaluation] [Case 1] State 2 State 2 State 2 P 1 T, H, T = π 2 b 2 o 1 = T a 22 b 2 o 2 = H a 22 b 2 o 3 = T = = [Case 2] State 2 State 2 State 3 P 2 T, H, T = π 2 b 2 o 1 = T a 22 b 2 o 2 = H a 23 b 3 o 3 = T = = [Case 3] State 2 State 3 State 3 P 2 T, H, T = π 2 b 2 o 1 = T a 23 b 3 o 2 = H a 33 b 3 o 3 = T = = [Case 4] State 3 State 3 State 3 P 2 T, H, T = π 3 b 3 o 1 = T a 33 b 3 o 2 = H a 33 b 3 o 3 = T = = P O = 4 i=1 P i T, H, T =

24 HMM Applications: Probability Evaluation Forward Algorithm for Probability Evaluation Step 1) Initialization (α 1 i = π i b i o i, 1 i 3) State 1 P[H]=1 P[T]=0 State 2 P[H]=1/2 P[T]=1/2 t = 0 t = 1 t = 2 t = 0 i = 1 i = 2 i = 3 α 1 1 = π 1 b 1 o 1 = T = = 0 α 1 2 = π 2 b 2 o 1 = T = = 1 6 α 1 3 = π 3 b 3 o 1 = T = = 2 9 State 3 P[H]=1/3 P[T]=2/3 24

25 HMM Applications: Probability Evaluation Forward Algorithm for Probability Evaluation State 1 P[H]=1 P[T]=0 State 2 P[H]=1/2 P[T]=1/2 State 3 P[H]=1/3 P[T]=2/3 3 Step 2) Derivation (α t+1 j = i=1 α t i a ij b i o t+1, 1 t 2,1 j 3) t = 0 t = 1 t = 2 t = 1 j = 1 j = 2 j = 3 α 2 1 = = 0 α 2 2 = 3 i=1 3 i=1 α 1 i a i1 α 1 i a i2 = = 1 24 = α 2 3 = 3 i=1 α 1 i a i3 = = b 1 o 2 = H b 2 o 2 = H b 3 o 2 = H 25

26 HMM Applications: Probability Evaluation Forward Algorithm for Probability Evaluation State 1 P[H]=1 P[T]=0 State 2 P[H]=1/2 P[T]=1/2 State 3 P[H]=1/3 P[T]=2/3 3 Step 2) Derivation (α t+1 j = i=1 α t i a ij b i o t+1, 1 t 2,1 j 3) t = 0 t = 1 t = 2 t = 2 j = 1 j = 2 j = 3 α 3 1 = = 0 α 3 2 = 3 i=1 3 i=1 α 2 i a i1 α 2 i a i2 = = α 3 3 = 3 i=1 α 2 i a i3 b 1 o 3 = T b 2 o 3 = T b 3 o 3 = T = =

27 HMM Applications: Probability Evaluation Forward Algorithm for Probability Evaluation State 1 P[H]=1 P[T]=0 Step 2) Termination (P O λ = i=1 α 3 i ) t = 0 t = 1 t = 2 3 P O λ = 3 i=1 α 3 i = State 2 P[H]=1/2 P[T]=1/2 State 3 P[H]=1/3 P[T]=2/3 27

28 Hidden Markov Models (HMM) and Support Vector Machine (SVM) Part 2: Markov Decision Process Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 28

29 Outline Markov Decision Process (MDP) Basics Markov Property Policy and Return Value Functions (V, Q) Solving MDP Planning Reinforcement Learning (Value-based) Reinforcement Learning (Policy-based) advanced topic (out of scope) 29

30 MDP (Basics) Markov Decision Process (MDP) Components: <S, A, R, T, γ> S: Set of states A: Set of actions R: Reward function T: Transition function γ: Discount factor How can we use MDP to model agent in a maze? 30

31 MDP (Basics) Markov Decision Process (MDP) Components: <S, A, R, T, γ> S: Set of states A: Set of actions R: Reward function T: Transition function γ: Discount factor S: location (x, y) if the maze is a 2D grid s 0 : starting state s: current state s : next state s t : state at time t 31

32 MDP (Basics) Markov Decision Process (MDP) Components: <S, A, R, T, γ> S: Set of states A: Set of actions R: Reward function T: Transition function γ: Discount factor S: location (x, y) if the maze is a 2D grid A: move up, down, left, or right s s 32

33 MDP (Basics) Markov Decision Process (MDP) Components: <S, A, R, T, γ> S: Set of states A: Set of actions R: Reward function T: Transition function γ: Discount factor S: location (x, y) if the maze is a 2D grid A: move up, down, left, or right R: how good was the chosen action? r = R s, a, s -1 for moving (battery used) +1 for jewel? +100 for exit? 33

34 MDP (Basics) Markov Decision Process (MDP) Components: <S, A, R, T, γ> S: Set of states A: Set of actions R: Reward function T: Transition function γ: Discount factor S: location (x, y) if the maze is a 2D grid A: move up, down, left, or right R: how good was the chosen action? T: where is the robot s new location? T = s s, a Stochastic Transition 34

35 MDP (Basics) Markov Decision Process (MDP) Components: <S, A, R, T, γ> S: Set of states A: Set of actions R: Reward function T: Transition function γ: Discount factor S: location (x, y) if the maze is a 2D grid A: move up, down, left, or right R: how good was the chosen action? T: where is the robot s new location? γ: how much does future reward worth? 0 γ 1, [γ 0: future reward is near 0 (immediate action is preferred)] 35

36 MDP (Markov Property) Does s t+1 depend on s 0, s 1,, s t 1, s t? No. Memoryless! Future only depends on present Current state is a sufficient statistic of agent s history No need to remember agent s history s t+1 depends only on s t and a t r t depends only on s t and a t 36

37 MDP (Policy and Return) Policy π: S A Maps states to actions Gives an action for every state Return Discounted sum of rewards R t = k=0 γ k r t+k Our goal: Find π that maximizes expected return! Could be undiscounted Finite horizon 37

38 MDP (Value Functions (V, Q)) State Value Function (V) V π s = E π R t s t = s = E π k=0 γ k r t+k s t = s Expected return of starting at state s and following policy π How much return do I expect starting from state s? Action Value Function (Q) Q π s, a = E π R t s t = s, a t = a = E π k=0 γ k r t+k s t = s, a t = a Expected return of starting at state s, taking action a, and then following policy π How much return do I expect starting from state s and taking action a? 38

39 MDP (Solving MDP: Planning) Again, our goal is to find the optimal policy π s = max π Rπ s If T s s, a and R s, a, s are known, this is a planning problem. We can use dynamic programming to find the optimal policy. Keywords: Bellman equation, value iteration, policy iteration 39

40 MDP (Solving MDP: Planning) Bellman Equation s S: V s = max a s T s, a, s R s, a, s + γv s Value Iteration s S: V i+1 s max a s T s, a, s R s, a, s + γv s Policy Iteration Policy Evaluation π s S: V k i+1 s T s, π k (s), s R s, π k (s), s π + γv k i s s Policy Improvement π k+1 s = arg max a s T s, a, s R s, a, s + γv πk s 40

41 MDP (Solving MDP: Reinforcement Learning (Value-based)) If T s s, a and R s, a, s are unknown, this is a reinforcement learning problem. Agent need to interact with the world and gather experience At each time-step, From state s Take action a (a = π(s) if stochastic) Receive reward r End in state s Value-based: learn an optimal value function from these data 41

42 MDP (Solving MDP: Reinforcement Learning (Value-based)) One way to learn Q(s, a) Use empirical mean return instead of expected return Average sampled returns Q s, a = R 1 s, a + R 2 s, a + + R n s, a n Policy chooses action that max Q(s, a) π(s) = max a Q(s, a) Using V(s) requires the model: π s = arg max a s T s, a, s R s, a, s + γv s 42

43 Hidden Markov Models (HMM) and Support Vector Machine (SVM) Part 3: Support Vector Machine Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 43

44 Outline Main Idea Hyperplane in n-dimensional Space Brief Introduction to Optimization for Support Vector Machine (SVM) SVM for Classification 44

45 Main Idea How can we classify the give data? Any of these would be fine. But which is the best? 45

46 Main Idea Gene Y Gap Find a linear decision surface (hyperplane) that can separate patient classes and has the largest distance (i.e., largest gap (or margin)) between border-line patients (i.e., support vectors); Normal Patients Cancer Patients Gene X 46

47 Main Idea Kernel If linear decision surface does not exist, the data is mapped into a higher dimensional space (feature space) where the separating decision surface is found. The feature space is constructed via mathematical projection (kernel trick). 47

48 Outline Main Idea Hyperplane in n-dimensional Space Brief Introduction to Optimization for Support Vector Machine (SVM) SVM for Classification 48

49 Hyperplane in n-dimensional Space [Definition (Hyperplane)] A subspace of one dimension less than its ambient space, i.e., the hyperplane in n-dimensional space means the n 1 subspace. 49

50 Hyperplane in n-dimensional Space Equations of a Hyperplane An equation of a hyperplane is defined by a point (P 0 ) and a perpendicular vector to the plane (w) at that point. Define vectors: x 0 and x where P is an arbitrary point on a hyperplane. A condition for P to be one the plane is that the vector x x 0 is perpendicular to w: w x x 0 = 0 w x w x 0 = 0 and define b = w x 0 w x + b = 0 The above equations hold for R n when n > 3. 50

51 Hyperplane in n-dimensional Space Equations of a Hyperplane x 2 = x 1 + tw D = tw = t w w x 2 + b 2 = 0 w x 1 + tw + b 2 = 0 w x 1 + t w 2 + b 2 = 0 w x 1 + b 1 b 1 + t w 2 + b 2 = 0 b 1 + t w 2 + b 2 = 0 t = b 1 b 2 / w 2 Therefore, D = t w = b 1 b 2 / w Distance between two parallel hyperplanes w x + b 1 = 0 and w x + b 2 = 0 is equivalent to D = b 1 b 2 w. 51

52 Outline Main Idea Hyperplane in n-dimensional Space Brief Introduction to Optimization for Support Vector Machine (SVM) SVM for Classification 52

53 Brief Introduction to Optimization for Support Vector Machine Now, we understand How to represent data (vectors) How to define a linear decision surface (hyperplane) We need to understand How to efficiently compute the hyperplane that separates two classes with the largest gap? Need to understand the basics of relevant optimization theory 53

54 Brief Introduction to Optimization for Support Vector Machine Convex Functions A function is called convex if the function lies below the straight line segment connecting two points, for any two points in the interval. Property: Any local minimum is a global minimum. 54

55 Brief Introduction to Optimization for Support Vector Machine Quadratic programming (QP) Quadratic programming (QP) is a special optimization problem: the function to optimize (objective) is quadratic, subject to linear constraints. Convex QP problems have convex objective functions. These problems can be solved easily and efficiently by greedy algorithms (because every local minimum is a global minimum). 55

56 Brief Introduction to Optimization for Support Vector Machine Quadratic programming (QP) [Example] Consider x = x 1, x 2 Minimize 1 2 x 2 2 subject to x 1 + x Quadratic Objective Linear Constraints Consider x = x 1, x 2 Minimize 1 2 x x 2 2 subject to x 1 + x Quadratic Objective Linear Constraints 56

57 Outline Main Idea Hyperplane in n-dimensional Space Brief Introduction to Optimization for Support Vector Machine (SVM) SVM for Classification 57

58 SVM for Classification SVM for Classification (Case 1) Linearly Separable Data; Hard-Margin Linear SVM (Case 2) Not Linearly Separable Data; Soft-Margin Linear SVM (Case 3) Not Linearly Separable Data; Kernel Trick 58

59 SVM for Classification (Case 1) Linearly Separable Data; Hard-Margin Linear SVM Want to find a classifier (hyperplane) to separate negative instances from the positive ones. An infinite number of such hyperplanes exist. SVMs finds the hyperplane that maximizes the gap between data points on the boundaries (so-called support vectors). If the points on the boundaries are not informative (e.g., due to noise), SVMs will not do well. 59

60 SVM for Classification (Case 1) Linearly Separable Data; Hard-Margin Linear SVM The gap is distance between two parallel hyperplanes: w x + b = 1 and w x + b = +1 Now, we know that D = b 1 b 2 w, i.e., D = 2 w. Since we have to maximize the gap, we have to minimize w. Or equivalently, we have to minimize 1 2 w 2. 60

61 SVM for Classification (Case 1) Linearly Separable Data; Hard-Margin Linear SVM In addition, we need to impose constrains that all instances are correctly classified. In our case, w x i + b 1 if y i = 1 w x i + b +1 if y i = +1., i.e., equivalently, y i w x i + b 1. In summary, Minimize 1 2 w 2 subject to y i w x i + b 1, for i = 1,, N 61

62 SVM for Classification (Case 2) Not Linearly Separable Data; Soft-Margin Linear SVM What if the data is not linearly separable? E.g., there are outliers or noisy measurements, or the data is slightly non-linear. Approach Assign a slack variable to each instance ξ i 0, which can be thought of distance from the separating hyperplane if an instance is misclassified and 0 otherwise. Minimize 1 w 2 + C N 2 i=1 ξ i subject to y i w x i + b 1 ξ i, for i = 1,, N 62

63 SVM for Classification (Case 3) Not Linearly Separable Data; Kernel Trick Data is not linearly separable in the input space Data is linearly separable in the feature space obtained by a kernel 63

64 Questions? 64

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Announcements - Homework

Announcements - Homework Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Final Examination CS540-2: Introduction to Artificial Intelligence

Final Examination CS540-2: Introduction to Artificial Intelligence Final Examination CS540-2: Introduction to Artificial Intelligence May 9, 2018 LAST NAME: SOLUTIONS FIRST NAME: Directions 1. This exam contains 33 questions worth a total of 100 points 2. Fill in your

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B. Advanced Topics in Machine Learning, GI13, 2010/11 Advanced Topics in Machine Learning, GI13, 2010/11 Answer any THREE questions. Each question is worth 20 marks. Use separate answer books Answer any THREE

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

16.410/413 Principles of Autonomy and Decision Making

16.410/413 Principles of Autonomy and Decision Making 16.410/413 Principles of Autonomy and Decision Making Lecture 23: Markov Decision Processes Policy Iteration Emilio Frazzoli Aeronautics and Astronautics Massachusetts Institute of Technology December

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Parametric Models Part III: Hidden Markov Models

Parametric Models Part III: Hidden Markov Models Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2014 CS 551, Spring 2014 c 2014, Selim Aksoy (Bilkent

More information

CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes

CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision

More information

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B.

PART A and ONE question from PART B; or ONE question from PART A and TWO questions from PART B. Advanced Topics in Machine Learning, GI13, 2010/11 Advanced Topics in Machine Learning, GI13, 2010/11 Answer any THREE questions. Each question is worth 20 marks. Use separate answer books Answer any THREE

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a 3 MDP (2 points). (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a s P (s s, a)u(s ) where R(s) is the reward associated with being in state s. Suppose now

More information

CSE250A Fall 12: Discussion Week 9

CSE250A Fall 12: Discussion Week 9 CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Stochastic Primal-Dual Methods for Reinforcement Learning

Stochastic Primal-Dual Methods for Reinforcement Learning Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content

More information

Mathematical Optimization Models and Applications

Mathematical Optimization Models and Applications Mathematical Optimization Models and Applications Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 1, 2.1-2,

More information

Kernelized Perceptron Support Vector Machines

Kernelized Perceptron Support Vector Machines Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2011 Lecture 12: Probability 3/2/2011 Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. 1 Announcements P3 due on Monday (3/7) at 4:59pm W3 going out

More information

Lecture 3: Markov Decision Processes

Lecture 3: Markov Decision Processes Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Reinforcement Learning and Deep Reinforcement Learning

Reinforcement Learning and Deep Reinforcement Learning Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q

More information

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

Reinforcement Learning

Reinforcement Learning 1 Reinforcement Learning Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015 2 Plan 1 Why reinforcement learning? Where does this theory come from? Markov decision

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Support Vector Machines Explained

Support Vector Machines Explained December 23, 2008 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague

Sequential decision making under uncertainty. Department of Computer Science, Czech Technical University in Prague Sequential decision making under uncertainty Jiří Kléma Department of Computer Science, Czech Technical University in Prague https://cw.fel.cvut.cz/wiki/courses/b4b36zui/prednasky pagenda Previous lecture:

More information

Lecture 11: Hidden Markov Models

Lecture 11: Hidden Markov Models Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

Discrete planning (an introduction)

Discrete planning (an introduction) Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135

More information

The Reinforcement Learning Problem

The Reinforcement Learning Problem The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence

More information

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University Multi-class SVMs Lecture 17: Aykut Erdem April 2016 Hacettepe University Administrative We will have a make-up lecture on Saturday April 23, 2016. Project progress reports are due April 21, 2016 2 days

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering

CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering CS325 Artificial Intelligence Ch. 15,20 Hidden Markov Models and Particle Filtering Cengiz Günay, Emory Univ. Günay Ch. 15,20 Hidden Markov Models and Particle FilteringSpring 2013 1 / 21 Get Rich Fast!

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Reinforcement Learning. Machine Learning, Fall 2010

Reinforcement Learning. Machine Learning, Fall 2010 Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Markov Decision Processes and Solving Finite Problems. February 8, 2017

Markov Decision Processes and Solving Finite Problems. February 8, 2017 Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

Figure 1: Bayes Net. (a) (2 points) List all independence and conditional independence relationships implied by this Bayes net.

Figure 1: Bayes Net. (a) (2 points) List all independence and conditional independence relationships implied by this Bayes net. 1 Bayes Nets Unfortunately during spring due to illness and allergies, Billy is unable to distinguish the cause (X) of his symptoms which could be: coughing (C), sneezing (S), and temperature (T). If he

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

1 [15 points] Search Strategies

1 [15 points] Search Strategies Probabilistic Foundations of Artificial Intelligence Final Exam Date: 29 January 2013 Time limit: 120 minutes Number of pages: 12 You can use the back of the pages if you run out of space. strictly forbidden.

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Jordan Boyd-Graber University of Colorado Boulder LECTURE 7 Slides adapted from Tom Mitchell, Eric Xing, and Lauren Hannah Jordan Boyd-Graber Boulder Support Vector Machines 1 of

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS 540-2: Introduction to Artificial Intelligence Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11

More information

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

L23: hidden Markov models

L23: hidden Markov models L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech

More information

ARTIFICIAL INTELLIGENCE. Reinforcement learning

ARTIFICIAL INTELLIGENCE. Reinforcement learning INFOB2KI 2018-2019 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Reinforcement learning Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Lecture 3: The Reinforcement Learning Problem

Lecture 3: The Reinforcement Learning Problem Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

Reinforcement Learning. Spring 2018 Defining MDPs, Planning Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Reinforcement Learning: An Introduction

Reinforcement Learning: An Introduction Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????

More information

The Perceptron Algorithm, Margins

The Perceptron Algorithm, Margins The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt

More information

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012 CSE 573: Artificial Intelligence Autumn 2012 Reasoning about Uncertainty & Hidden Markov Models Daniel Weld Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 Outline

More information

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396 Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction

More information