Planning and Optimal Control

Size: px

Start display at page:

Download "Planning and Optimal Control"

Jemimah Evans
5 years ago
Views:

1 Planning and Optimal Control 1. Markov Models Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 / 22

2 Syllabus Tue (1) 1. Markov Models Tue Luther Day Tue (2) 1b. Markov Models (ctd.) Tue (3) 2. State Space Models Tue (4) 3. Markov Random Fields Tue (5) 4. Markov Decision Processes Tue (6) Tue (7) 5. Partially Observable Markov Decision Processes Tue (8) Tue Christmas Break Tue (9) 6. Reinforcement Learning Tue (10) Tue (11) Tue (12) Tue (13) 1 / 22

3 Outline 1. ML Problems for Sequence Data 2. Markov Models 3. Organizational Stuff 1 / 22

4 1. ML Problems for Sequence Data Outline 1. ML Problems for Sequence Data 2. Markov Models 3. Organizational Stuff 1 / 22

5 1. ML Problems for Sequence Data Sequence Data Examples: DNA sequence sentences and texts physical sensor data from machines in production: intelligent production, industry 4.0 physiological data from humans: ML for medicine from cars: intelligent transport, automatic driving speech, audio, video information systems e-commerce and the web: page view sequences, market basket sequences social media: short message streams technology enhanced learning: learning management / student interactions 1 / 22

6 1. ML Problems for Sequence Data Sequence Data other names: time series: usually measured quantity is numeric usually index is time data stream: usually index is time usually data is large (big data) 2 / 22

7 1. ML Problems for Sequence Data 1. Classification/Regression/Prediction of a Sequence predict a target variable for instances being sequences input is a sequence output usually is a scalar examples: classify EEGs of patients as depressed or healthy (classification) predict the rating of a text review (regression, for a numeric rating scale) most evolved area: time series classification 3 / 22

8 1. ML Problems for Sequence Data 2. Forecasting of a Sequence predict the value of a sequence in the future input is a sequence output is a scalar (of same type as the input) examples: predict sales of a company for next quarter (based on past sales) very rich economic literature on time series forecasting (econometrics) often for a single very long time series closely related to 2b. sequence imputation estimate values of a sequence at some positions where the value is missing 4 / 22

9 1. ML Problems for Sequence Data 3. Sequence Prediction for instances, predict a sequence valued target input is an attribute vector or a sequence output is a sequence examples: predict sequence of exercises a student should work on to learn most predict sequence of ad expenses for a company to sell most predict sequence of steering wheel movements to keep a car on a lane planning is a special case likely the most important one from ML perspective, sequence prediction is a special case of structured prediction forecasting for several time points is another special case 5 / 22

10 1. ML Problems for Sequence Data 4. Sequence Labeling predict a target for each index of a sequence input is a sequence output is a sequence of same length examples: predict sequence of part-of-speech classes for every word of a sentence 6 / 22

11 1. ML Problems for Sequence Data Density estimation Given a dataset D train X sampled from an unknown distribution p, find a density model ˆp : X [0, 1] from a model space M s.t. E x p ˆp(x) E x p ˆq(x), ˆq M Operational: s.t. for data D test X sampled from the same distribution, ˆp(x) ˆq(x), ˆq M x D test x D test 7 / 22

12 1. ML Problems for Sequence Data What are Density Models Good for? outlier analysis: the smaller ˆp(x), the more unlikely/uncommon x is this is an unsupervised / ill-defined problem missing value imputation: given incomplete instances x (with values of some attributes not observed), find the values of the non-observed attributes = find the most likely complete instance x that has the same values as x for the observed attributes classification/regression/prediction: build a class-specific density p(x Y ) for instances of each class and use Bayes rule: p(y X ) p(x Y ) p(y ) as Linear Discriminant Analysis and Naive Bayes classifiers 8 / 22

13 1. ML Problems for Sequence Data Why are Naive Bayes Densities not Useful for Sequences? Density models in Naive Bayes: ˆp(X ) := M ˆp(X m ) m=1 p(x m ) := freq(x m, proj m D train ) + 1 D train + K m, for discrete x m with K m levels p(x m ) := N (x m ; x m, σ 2 m), Applied to sequence data: for continuous x m with average x m and variance σ density value does not depend on the order of the values Note: proj m : M m=1 Xm Xm, x xm projection and proj m D := {proj m (x) x D} for D X := M m=1 Xm. 9 / 22

14 2. Markov Models Outline 1. ML Problems for Sequence Data 2. Markov Models 3. Organizational Stuff 10 / 22

15 2. Markov Models Markov Model p(x) := p(x 1 )p(x 2 x 1 )p(x 3 x 2 ) p(x T x T 1 ) T = p(x 1 ) p(x t x t 1 ), x X t=2 Markov model, Markov chain homogeneous, stationary, time-invariant: p(x t+1 x t ) does not depend on t, i.e., p(x t+1 x t ) = p(x t +1 x t ) t, t parameter tying: same parameters shared for multiple variables models arbitrary number of variables using a fixed number of parameters: stochastic process discrete-state, finite-state: X := {1,..., I } 10 / 22

16 2. Markov Models Transition Matrix for discrete-state Markov models: A := (p(x t+1 = j x t = i)) i,j=1,...,i π := (p(x 1 = i)) i=1,...,i I I transition matrix I -dim. start vector stochastic matrix: j A i,j = 1 discrete-state, stationary Markov models: equivalent to a stochastic automaton 11 / 22

17 2. Markov Models Transition Matrix / State Transition Diagram discrete-state, stationary Markov models: visualized as state transition diagram: directed graph with states as nodes and edges for non-zero elements of A examples: 590 Chapter 17. Markov and hidden Markov mo 1 α α 1 β 1 2 β (a) A 11 A 22 A 33 A 12 A (b) [source: Murphy 2012, p.590] Figure 17.1 State transition diagrams for some simple Markov chains. Left: a 2-state chain. Righ 3-state left-to-right ( chain. ) 1 α α 0 1 α α a) A :=, b) A := 0 1 β β, β 1 β A stationary, finite-state Markov chain is equivalent to a stochastic 0 automaton. 0 1 It is comm to visualize such automata by drawing a directed graph, where nodes represent states and arr represent legal transitions, i.e., non-zero elements of A. This is known as a state 12 transit / 22

18 2. Markov Models n-step Transition Matrix A(n) get from i to j in exactly n steps A(n) := (p(x t+n = j x t = i)) i,j=1,...,i can be computed simply by proof: A(n) = A n A(1) = A I A(n + m) i,j = A(m) i,k A(n) k,j = A(m) i,. A(n).,j k=1 A(n + m) = A(m)A(n) A(n) = AA n 1 = AAA (n 2) =... = A n 13 / 22

19 2. Markov Models n-grams / subsequences n-grams: (=subsequences of length n, windows) example: gram n : X (X n ) x (x t:t+n 1 ) t=1,..., x n+1 gram 2 ((2, 3, 5, 7)) = ((2, 3), (3, 5), (5, 7)) 14 / 22

20 2. Markov Models Frequencies 592 of 1- and 2-grams Chapter 17. Markov and hidden Markov models _ a b c d e f g h i j k l m n o p q r s t u v w x y z Unigrams Bigrams _ a b c d e f g h i j k l m n o p q r s t u v w x y z _ a b c d e f g h i j k l m n o p q r s t u v w x y z letter grams in Darwin s On the Origin of Species. Figure 17.2 [source: Murphy 2012, p.592] Unigram and bigram counts from Darwin s On The Origin Of Species. The 2D picture on the 15 / 22

21 2. Markov Models Maximum Likelihood Estimator l(a; D) := log x 1 π x1 x D t=1 I I = Ni 1 log π i + i=1 N 1 i := freq(i, proj 1 D) = A xt,x t+1 i=1 j=1 I N i,j log A i,j N I(x n,1 = i) n=1 N i,j := freq((i, j), gram 2 D) = N n=1 x n 1 t=1 I(x n,t = i, x n,t+1 = j) 16 / 22

22 2. Markov Models Maximum Likelihood Estimator I I I l(a; D) = Ni 1 log π i + N i,j log A i,j i=1 i=1 j=1 under constraints i π i = 1 and j A i,j = 1 maximal for ˆπ i := Â i,j := N 1 i I i =1 N1 i, N i,j I j =1 N i,j, i = 1,..., I i, j = 1,..., I or to avoid zeros in A, esp. for large I, sparse data: Â i,j := N i,j + 1 ( I j =1 N i,j ) + I, i, j = 1,..., I 17 / 22

23 2. Markov Models Long-Range Dependencies: Markov Models of Higher Order Markov models have no memory future sequence depends on the past only through the last state easy to model dependencies on the last h 1 states: replace each data sequence x by the sequence gram h (x) I h I h transition matrix from sequences X h to X h but with structural zeros for all i, j with i 2:h j 1:h 1 yields a I h I transition matrix from sequences X h to X I h dim. start vector Markov model mechanism works out-of-the-box, e.g., MLE estimates number of parameters exponential in h data sizes usually allow only small h 18 / 22

24 3. Organizational Stuff Outline 1. ML Problems for Sequence Data 2. Markov Models 3. Organizational Stuff 19 / 22

25 3. Organizational Stuff Character of the Lecture This is an advanced lecture: I will assume good knowledge of Machine Learning I and II. Slides will contain major keywords, not the full story. For the full story, you need to read the referenced chapters in one of the books. 19 / 22

26 3. Organizational Stuff Exercises and Tutorials There will be a weekly sheet with 2 exercises handed out each Tuesday in the lecture. 1st sheet will be handed out a little bit late this week, Thur Solutions to the exercises can be submitted until next Tuesday noon, 12pm 1st sheet is due a little bit late Wed morning, 8am Exercises will be corrected. Tutorials each Thursday 8am-10am or Friday 12pm-2pm, 1st tutorial next week, Thur Successful participation in the tutorial gives up to 10% bonus points for the exam. group submissions are OK (but yield no bonus points) Plagiarism is illegal and usually leads to expulsion from the program. 20 / 22

27 3. Organizational Stuff Exam and Credit Points There will be a written exam at end of term (2h, 4 problems). The course gives 6 ECTS (2+2 SWS). The course can be used in International Master in Data Analytics (mandatory) IMIT MSc. / Informatik / Gebiet KI & ML Wirtschaftsinformatik MSc / Informatik / Gebiet KI & ML & Wirtschaftsinformatik MSc / Wirtschaftsinformatik / Gebiet BI as well as in all IT BSc programs. 21 / 22

28 3. Organizational Stuff Some Books Kevin P. Murphy (2012): Machine Learning, A Probabilistic Approach, MIT Press. H. Geffner, B. Bonet (2013): A Concise Introduction to Models and Methods for Automated Planning. D. Nau, M. Ghallab, P. Traverso (2004): Automated Planning: Theory and Practice. Steve LaValle (2006): Planning Algorithms. Dimitri P. Bertsekas (2007): Dynamic Programming and Optimal Control, 3rd ed. Vols. I and II. Richard S. Sutton and Andrew G. Barto. (1998): Reinforcement Learning: An Introduction. 22 / 22

29 Further Readings Markov Models: Murphy 2012, chapter / 22

30 References Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, / 22

Modern Optimization Techniques

Modern Optimization Techniques 0. Overview Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 1 / 44 Syllabus Mon.