Learning from Temporal Data Using Dynamical Feature Space

Size: px
Start display at page:

Download "Learning from Temporal Data Using Dynamical Feature Space"

Transcription

1 Learning from Temporal Data Using Dynamical Feature Space Peter Tiňo School of Computer Science University of Birmingham, UK Learning from Temporal Data Using Dynamical Feature Space p.1/64

2 Special thanks to... Barbara Hammer Huanhuan Chen Alessio Micheli Michal Čerňanský Luba Beňúšková Jort van Mourik Igor Farkaš Ali Rodan Jochen Steil Xin Yao Nick Gianniotis Dan Conford Manfred Opper Fengzhen Tang,... Learning from Temporal Data Using Dynamical Feature Space p.2/64

3 Learning in the data or model space? Complex data, e.g. - long (possibly) multivariate time series (video, sensor data, etc.) of variable length - fmri data measured on normalized cognitive tasks -... We kind-of know what to do for fix-dimensional vectorial data, but how to represent such complex data in order to usefully "learn" from it? The critical notion of "smoothness" when learning from historic data - "close" things should lead to the same response "closeness" - distance, dot product,... How to represent complex data items in the light of the "smoothness criterion"? Learning from Temporal Data Using Dynamical Feature Space p.3/64

4 Learning in the Model Space Framework Each data item is represented by a model that "explains" it Learning is formulated in the space of models (function space) Parametrization-free notions of "input similarities": similarity quantifications between the models not their (arbitrary) parametrizations! Key consideration - base model class - flexible enough to represent variety of data items - sufficiently constrained to avoid overfitting Learning from Temporal Data Using Dynamical Feature Space p.4/64

5 NARMA task - illustrative example NARMA sequences - orders 10, 20 and 30 - represented as state space models order10 order20 order Learning from Temporal Data Using Dynamical Feature Space p.5/64

6 Representing Sequential Data Each sequence will be represented by a dedicated model that captures everything important about the sequence in a compact manner. If we d like to change focus on different aspects of the sequences, we need to change the base model accordingly. What should be our base model? Many possible choices, but here we ll consider State space models - a very general and powerful model class capturing temporal structure through the notion of information processing state (IPS). IPS at time t codes for the entire history of time series items we have seen up to time t. Learning from Temporal Data Using Dynamical Feature Space p.6/64

7 Discrete Time State Space Model Output Delay Parametrized DS W States of the DS form IPS These states code the entire history of sequence elements we have seen so far Learning from Temporal Data Using Dynamical Feature Space p.7/64

8 A State Space Model N-dimensional continuous state space [ 1,1] N x(t) = f(x(t 1),u(t)) = σ(rx(t 1)+Vu(t)+b), y(t) = h(x) = κ(wx(t)+a), x(t) [ 1,1] N - state vector at time t u(t) - input time series element at time t f( ) - state-transition function y(t) - output of the readout from the state space. h( ) - readout (output) function Learning from Temporal Data Using Dynamical Feature Space p.8/64

9 Another State Space Model - Transducer translate input streams over {a,b} to sequences over {x,y} a x b y 1 2 a x b x Learning from Temporal Data Using Dynamical Feature Space p.9/64

10 Picking the Right Base Model Appealing feature of the previous example - the state space structure and dynamics is perfectly tailored for the particular task. Of course, in our general scenario we may not know (and typically this is the case!) the right state space structure that extracts from the data (time series) what is needed for the particular task: - what are the appropriate IPS? - What is their transition structure? Since we are ML people, the answer seems to be obvious: Learn everything!... easier said than done... Learning from Temporal Data Using Dynamical Feature Space p.10/64

11 Picking the Right Base Model Things are actually much worse! If we adopted a general parameterized model x(t) = f(x(t 1),u(t)) = σ(rx(t 1)+Vu(t)+b), y(t) = h(x) = κ(wx(t)+a), we may have problems fiting it to individual sequences (information latching problem) Worse still - Even if we managed to do perfect model fitting so that each sequence is nicely represented - how to define distance between state space models - driven dynamical systems? Highly non-trivial - as we know from the work in machine vision and video processing communities (low-dim dynamical mostly linear dynamical systems). Learning from Temporal Data Using Dynamical Feature Space p.11/64

12 Our Proposal Interpret the state space model as a "dynamical kernel machine". x(t) = f(x(t 1),u(t)) = σ(rx(t 1)+Vu(t)+b), y(t) = h(x) = Wx(t)+a, At timetthe input itemu(t) is mapped to the feature space - state space of the dynamic system - its feature space image is the state x(t) The feature mapping is dynamic - e.g. besides providing an image x(t) of u(t) it also reflects the history of inputs up to time t Once in the feature space - the representations are rich enough to allow for a convenient use of linear tools! Learning from Temporal Data Using Dynamical Feature Space p.12/64

13 Dynamic Feature Space "dynamic kernel trick" fixed, "general purpose" state space organization "redundant" high-dim state (feature) space task dependent linear readout mapping all the "hard" work will be done in the feature space creation. How can the idea of "general purpose" dynamic feature space be realized? Let s get an inspiration from information theory ("general purpose" information sources). Learning from Temporal Data Using Dynamical Feature Space p.13/64

14 Simplest Case - Observable Markovian IPS The simplest construction of IPS is based on concentrating on the very recent (finite) past e.g. definite memory machines, finite memory machines, Markov models Example: Only the last 5 input symbols matter when reasoning about what symbol comes next All three sequences belong to the same IPS Learning from Temporal Data Using Dynamical Feature Space p.14/64

15 Probabilistic framework - Markov model (MM) finite context-conditional next-symbol distributions ? ? ? P(s 11111) P(s 11112) P(s 11113)... P(s 11121)... P(s 43211)... P(s 44444) s {1,2,3,4} Learning from Temporal Data Using Dynamical Feature Space p.15/64

16 Markov model MM are intuitive and simple, but... Number of IPS (prediction contexts) grows exponentially fast with memory length Large alphabets and long memories are infeasible: computationally demanding and very long sequences are needed to obtain sound statistical estimates of free parameters Learning from Temporal Data Using Dynamical Feature Space p.16/64

17 Variable memory length MM (VLMM) Sophisticated implementation of potentially high-order MM Takes advantage of subsequence structure in data Saves resources by making memory depth context dependent Use deep memory only when it is needed Natural representation of IPS in form of Prediction Suffix Trees Closely linked to the idea of universal simulation of information sources. Learning from Temporal Data Using Dynamical Feature Space p.17/64

18 Prediction suffix tree (PST) IFS are organized on nodes of PST Traverse the tree from root towards leaves in reversed order Complex and potentially time-consuming training 1 1 P(. 122 ) P(. 12 ) Each node i has associated next symbol probability distribution P(. 222 ) 2 P(. i ) time Learning from Temporal Data Using Dynamical Feature Space p.18/64

19 Contractive DS lead to Markovian IPS 3 * 4 3 *1 4 contract symbol 1 contract * *14 4 symbol 2 contract symbol Learning from Temporal Data Using Dynamical Feature Space p.19/64

20 Contractive DS lead to Markovian IPS Mapping symbolic sequences into Euclidean space Coding strategy is Markovian Sequences with shared histories of recently observed symbols have close images The longer is the common suffix of two sequences, the closer are they mapped to each other Add a simple output-generating readout on top of the fixed state dynamics! Learning from Temporal Data Using Dynamical Feature Space p.20/64

21 More Formally... Given an input symbol s A at time t, and activations of recurrent units from the previous step, x(t 1), x(t) = f s (x(t 1)). This notation can be extended to sequences over A the recurrent activations after t time steps are x(t) = f s1...s t (x(0)). If maps f s are contractions with Lip. constant 0 < C < 1, f vq (x) f wq (x) C q f v (x) f w (x) C q diam(x). Learning from Temporal Data Using Dynamical Feature Space p.21/64

22 Theoretical grounding Theorem (Hammer & Tiňo): Recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite memory machine. Conversely, every definite memory machine can be simulated by a recurrent network with contractive transition function. Hence initialization with small weights induces an architectural bias into learning with recurrent neural networks. Learning from Temporal Data Using Dynamical Feature Space p.22/64

23 Learnability (PAC framework) The architectural bias emphasizes one possible region of the weight space where generalization ability can be formally proved. Standard RNN are not distribution independent learnable in the PAC sense if arbitrary precision and inputs are considered. Theorem (Hammer & Tiňo): Recurrent networks with contractive transition function with a fixed contraction parameter fulfill the distribution independent UCED property and so unlike general recurrent networks, are distribution independent PAC-learnable. Learning from Temporal Data Using Dynamical Feature Space p.23/64

24 State space complexity sequence complexity Complexity of the driving input stream (topological entropy) is directly reflected in the complexity of the state space activations (fractal dimension). Theorem (Tiňo & Hammer): Recurrent activations inside contractive RNN form fractal clusters the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters. Learning from Temporal Data Using Dynamical Feature Space p.24/64

25 Fix the DS to a contractive system! Output "smooth" map IPS Non autonomous DS Delay Fix the DS part to a randomly constructed contraction. Only train the simple linear readout output map. Can be done efficiently. Reservoir computation models. Learning from Temporal Data Using Dynamical Feature Space p.25/64

26 Simple, non-randomized reservoirs Dynamical Reservoir N internal units x(t) Dynamical Reservoir N internal units x(t) Dynamical Reservoir N internal units x(t) Input unit s(t) V W U output unit y(t) Input unit s(t) V W U output unit y(t) Input unit s(t) V W U output unit y(t) Learning from Temporal Data Using Dynamical Feature Space p.26/64

27 Cycle reservoir with regular jumps still a simple deterministic construction... (A) 16 r c 13 r c r c r j N=18 1 r r j r j c 10 r j r j r c r c 4 r r 7 c c r c (B) 13 r c r 17 c N=18 r j r j r c 1 r j r j 9 r c r c r c r c 5 Learning from Temporal Data Using Dynamical Feature Space p.27/64

28 Reservoir characterizations Memory capacity - [Jaeger] A measure correlating past events in an i.i.d. input stream with the network output. Assume the network is driven by a univariate stationary input signal u(t). For a given delayk > 0, construct the network with optimal parameters for the task of outputtingu(t k) after seeing the input stream...u(t 1)u(t) up to time t. Goodness of fit - squared correlation coefficient between the desired output u(t k) and the observed network output y(t) Learning from Temporal Data Using Dynamical Feature Space p.28/64

29 Memory capacity MC k = Cov2 (u(t k),y(t)) Var(u(t)) Var(y(t)), where Cov denotes the covariance and Var the variance operators. The short term memory (STM) capacity is then given by: MC = MC k. k=1 [Jaeger] For any recurrent neural network with N recurrent neurons, under the assumption of i.i.d. input stream, the STM capacity cannot exceed N. Learning from Temporal Data Using Dynamical Feature Space p.29/64

30 Memory capacity of SCR [Tiňo & Rodan] Under the assumption of zero-mean i.i.d. input stream, the memory capacity of linear SCR architecture with N reservoir units can be made arbitrarily close to N. Theorem (Tiňo & Rodan): Consider a linear SCR network with reservoir weight 0 < r < 1 and an input weight vector such that the reservoir activations are full rank. Then, the SCR network memory capacity is equal to MC = N (1 r 2N ). Learning from Temporal Data Using Dynamical Feature Space p.30/64

31 Reservoir characterizations Eigen-decomposition of reservoir matrix ESN SCR CRJ CRHJ Learning from Temporal Data Using Dynamical Feature Space p.31/64

32 Reservoir characterizations Memory capacity (A) 10 0 (B) MC k MC k (C) 10 0 SCR CRJ ESN k (D) 10 0 SCR CRJ ESN k MC k MC k SCR CRJ ESN k SCR CRJ ESN k Learning from Temporal Data Using Dynamical Feature Space p.32/64

33 Reservoir characterizations Pseudo-Lyapunov exponents 2 NARMA Task 2 Laser task lyapunov exponent ESN SCR CRJ lyapunov exponent ESN SCR CRJ scaling parameter scaling parameter Learning from Temporal Data Using Dynamical Feature Space p.33/64

34 Time Series Kernel via Reservoir Models rc rc rc rj rj rj rj rc rc rc Readout Mapping 1. For all sequences - the same fixed state transition (reservoir) part. 2. For each time series train the linear readout on prediction task. Readout Mapping Space Deterministic and fixed Trained 3. Gaussian kernel in the readout function space. Learning from Temporal Data Using Dynamical Feature Space p.34/64

35 Distance in the Readout Function Space Two readouts h 1 and h 2 obtained on two sequences: L 2 (h 1,h 2 ) = ( C ) 1/2 h 1 (x) h 2 (x) 2 dµ(x) where µ(x) is the probability density measure on the input domain, andc is the integral range. C = [ 1,+1] N as we use recurrent neural network reservoir formulation with tanh transfer function. Learning from Temporal Data Using Dynamical Feature Space p.35/64

36 Model Distance with Uniform Measure Consider two readouts from the same reservoir h 1 (x) = W 1 x+a 1, h 2 (x) = W 2 x+a 2. Since the sigmoid activation function is employed in the domain of the readout,c [ 1,1] N. L 2 (h 1,h 2 ) = = ( C h 1(x) h 2 (x) 2 dµ(x)) 1/2 ( 2 N 3 N j=1 O i=1 w2 i,j +2N a 2) 1/2 where w i,j is the (i,j)-th element of W 1 W 2, a = a 1 a 2. Learning from Temporal Data Using Dynamical Feature Space p.36/64

37 Model Distance vs. Parameter Distance Scaling of the squared model distance by 1 3 N j=1 O wi,j 2 + a 2, i=1 Squared Euclidean distance of the readout parameters N j=1 O i=1 w 2 i,j + a 2, more importance is given to the offset than orientation of the readout mapping. Learning from Temporal Data Using Dynamical Feature Space p.37/64

38 Non-uniform State Distribution For K-component Gaussian mixture model µ(x) = K α i µ i (x η i,σ i ) i=1 The model distance can be obtained as follows: L 2 2(h 1,h 2 ) = K i=1 α i Sampling approximation trace(w T WΣ k )+a T a + η T k WT Wη k +2a T Wη k L 2 2(h 1,h 2 ) 1 m m i=1 h 1 (x(i)) h 2 (x(i)) 2. Learning from Temporal Data Using Dynamical Feature Space p.38/64

39 Three Time Series Kernels Uniform state distribution Reservoir Kernel: RV Non-uniform state distribution Reservoir kernel by sampling: SamplingRV Reservoir kernel by Gaussian mixture models: GMMRV K(s i,s j ) = exp { γ L 2 2(h i,h j ) } Learning from Temporal Data Using Dynamical Feature Space p.39/64

40 Fisher Kernel Single model for all data items. Based on Information Geometry Fisher vector (score functions): U(s) = λ logp(s λ) Metric tensor - Fisher Information Matrix F K(s i,s j ) = (U(s i )) T F U(s j ). Learning from Temporal Data Using Dynamical Feature Space p.40/64

41 Fisher Kernel Based on Reservoir Model Endowing the readout with a noise model yields a generative time series model of the form: x(t) = f(u(t),x(t 1)), u(t+1) = Wx(t)+a+ε(t). Assume the noise model follows a Gaussian distribution ε(t) = N(0,σ 2 I). Learning from Temporal Data Using Dynamical Feature Space p.41/64

42 Fisher Kernel by Reservoir Models partial derivatives of log likelihoodlogp(u(1..l)) U = logp(u(1..l)) l P(u(t) u(1 t 1)) t=1 = W W l (u(t) a) x(t 1) T Wx(t 1)x(t 1) T =. σ 2 t=1 Fisher kernel (simplified in the usual way - Id matrix instead of F) for two time series s i ands j with scores U i andu j : K(s i,s j ) = O o=1 N (U i U j ) o,n n=1 Learning from Temporal Data Using Dynamical Feature Space p.42/64

43 Other time series kernels Kernel based on Dynamic time warping (DTW) DTW tries to warp the time axis of one (or both) sequences to achieve a better alignment DTW can generate un-intuitive alignments by mapping a single point on one time series onto a large subsection of another time series, leading to inferior results efficient versions of DTW based kernels available Learning from Temporal Data Using Dynamical Feature Space p.43/64

44 Other time series kernels Autoregressive kernel VAR model class of a given order is used to generate an infinite family of features from the time series. For a given time series s, the likelihood profile p θ (s) across all possible parameter setting (under a matrix normal-inverse Wishart prior ω(θ)) forms a representation of s. K(s i,s j ) = θ p θ (s i )p θ (s j )ω(dθ) Learning from Temporal Data Using Dynamical Feature Space p.44/64

45 Experimental Comparisons Compared Kernels: Autoregressive Kernel: AR Fisher Kernel with Hidden Markov Model: Fisher Dynamic time wrapping: DTW Our Kernels Reservoir Kernel: RV Reservoir kernel by sampling: SamplingRV Reservoir kernel by Gaussian mixture models: GMMRV Fisher kernel with reservoir models: FisherRV Learning from Temporal Data Using Dynamical Feature Space p.45/64

46 Synthetic Data: NARMA order 10 order 20 order s(t + 1) = 0.3s(t) s(t) 9 s(t i)+1.5u(t 9)u(t)+0.1 i=0 s(t + 1) = tanh(0.3s(t) s(t) s(t + 1) = 0.2s(t) s(t) 29 i=0 19 i=0 s(t i)+1.5u(t 19)u(t)+0.01)+0.2 s(t i)+1.5u(t 29)u(t) Learning from Temporal Data Using Dynamical Feature Space p.46/64

47 Multi-Dimensional Scaling Representation order 10 order 20 order Illustration of MDS on the model distance among reservoir weights Learning from Temporal Data Using Dynamical Feature Space p.47/64

48 Robustness to Noise Generalization Accuracy RV AR Fisher DTW NoKernel Gaussian Noise Level Illustration of the performance of compared kernels with different noise levels Learning from Temporal Data Using Dynamical Feature Space p.48/64

49 Benchmark Data Dataset Length Classes Train Test Symbols OSULeaf Oliveoil Lighting Beef Car Fish Coffee Adiac Learning from Temporal Data Using Dynamical Feature Space p.49/64

50 Results: Classification Accuracy Dataset DTW AR Fisher RV FisherRV GMMRV SamplingRV Symbols OSULeaf Oliveoil Lighting Beef Car Fish Coffee Adiac Learning from Temporal Data Using Dynamical Feature Space p.50/64

51 Results : CPU Time (s) Dataset DTW AR Fisher RV FisherRV GMMRV SamplingRV Symbols 1,318 2,868 2, OSULeaf 6,030 1,375 3, Oliveoil Lighting , Beef Car Fish 3, , Coffee Adiac , Learning from Temporal Data Using Dynamical Feature Space p.51/64

52 Performance on increasing length of time series RV AR DTW Fisher 10 5 RV AR DTW Fisher Accuracy 30 CPU time length of time series length of time series Generalization Accuracy CPU Time Learning from Temporal Data Using Dynamical Feature Space p.52/64

53 Multivariate Time Series Our kernels can be applicable for multivariate time series of variable length Dataset dim length classes train test Libras handwritten AUSLAN Learning from Temporal Data Using Dynamical Feature Space p.53/64

54 Multivariate Time Series: Results AR Fisher DTW RV SamplingRV AR Fisher DTW RV SamplingRV Libras handwritten AUSLAN 0 Libras handwritten AUSLAN Generalization Accuracy CPU Time Learning from Temporal Data Using Dynamical Feature Space p.54/64

55 Online Kernel Construction Reservoir readouts can be trained in an on-line fashion, e.g. using Recursive Least Squares.This enables us to construct and refine reservoir kernels. w opt = w 0 +k(b a T w 0 ) k = 1 λ+a T (A T 0 A 0) 1 a (AT 0A 0 ) 1 a (A T 0A 0 ) 1 = [λ 1 I ka T ](A T 0A 0 ) 1 λ:forgetting factor Learning from Temporal Data Using Dynamical Feature Space p.55/64

56 Long Time series PEMS-SF is 15 months (Jan. 1st 2008 to Mar. 30th 2009) of daily data from the California Department of Transportation PEMS. The data describes the occupancy rate of different car lanes of San Francisco bay area freeways. PEMS-SF with 440 time series of length 138,672. Learning from Temporal Data Using Dynamical Feature Space p.56/64

57 Results The computational complexity of our kernels is O(l), where l is the length of time series: Scale linearly with the length of time series! RV kernel achieves 86.13% accuracy. Accuracy Length of Time Series x The best accuracy reported are 82% 83% for AR, global alignment, spline smoothing and Bag of vectors kernels Learning from Temporal Data Using Dynamical Feature Space p.57/64

58 Incremental One Class Learning 1. Normal data - apply reservoir model in the sliding window (size m) in the first t steps 2. Kernel matrix train a one-class SVM K σ (s i,s j ) = exp{ σ L 2 (h i,h j )}, 3. If needed, incrementally create new clusters of models (signal regimes ). Learning from Temporal Data Using Dynamical Feature Space p.58/64

59 NARMA task NARMA sequences - orders 10, 20 and order10 order20 order Learning from Temporal Data Using Dynamical Feature Space p.59/64

60 Experiments NARMA (3 classes) Algorithm Classes Precision Recall Specificity DBscan ARMAX-OCS RC-OCS DRC-OCS(Sampling) DRC-OCS Learning from Temporal Data Using Dynamical Feature Space p.60/64

61 Experiments Van der Pol (4 classes) Algorithm Classes Precision Recall Specificity DBscan ARMAX-OCS RC-OCS DRC-OCS(Sampling) DRC-OCS Learning from Temporal Data Using Dynamical Feature Space p.61/64

62 Experiments Three Tank (4 classes) Algorithm Classes Precision Recall Specificity DBscan ARMAX-OCS RC-OCS DRC-OCS(Sampling) DRC-OCS Learning from Temporal Data Using Dynamical Feature Space p.62/64

63 Experiments Barcelona Water (32 classes) Algorithm Classes Precision Recall Specificity DBscan ARMAX-OCS RC-OCS DRC-OCS(Sampling) DRC-OCS Learning from Temporal Data Using Dynamical Feature Space p.63/64

64 Further reading A. Rodan, P. Tino: Minimum Complexity Echo State Network. IEEE TNNLS, 22(1), pp , A. Rodan, P. Tino: Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps. Neural Computation, 24(7), pp , P. Tino, A. Rodan: Short Term Memory in Input-Driven Linear Dynamical Systems. Neurocomputing, 112, pp , H. Chen, P. Tino, X. Yao, A. Rodan: Learning in the Model Space for Fault Diagnosis. IEEE TNNLS, in print, H. Chen, F. Tang, P. Tino, X. Yao: Model-based Kernel for Efficient Time Series Analysis. KDD Learning from Temporal Data Using Dynamical Feature Space p.64/64

Learning in the Model Space for Temporal Data

Learning in the Model Space for Temporal Data Learning in the Model Space for Temporal Data Peter Tiňo School of Computer Science University of Birmingham, UK Learning in the Model Space for Temporal Data p.1/66 Special thanks to... Barbara Hammer

More information

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems

Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Peter Tiňo and Ali Rodan School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {P.Tino,

More information

arxiv: v1 [astro-ph.im] 5 May 2015

arxiv: v1 [astro-ph.im] 5 May 2015 Autoencoding Time Series for Visualisation arxiv:155.936v1 [astro-ph.im] 5 May 215 Nikolaos Gianniotis 1, Dennis Kügler 1, Peter Tiňo 2, Kai Polsterer 1 and Ranjeev Misra 3 1- Astroinformatics - Heidelberg

More information

arxiv: v1 [cs.lg] 2 Feb 2018

arxiv: v1 [cs.lg] 2 Feb 2018 Short-term Memory of Deep RNN Claudio Gallicchio arxiv:1802.00748v1 [cs.lg] 2 Feb 2018 Department of Computer Science, University of Pisa Largo Bruno Pontecorvo 3-56127 Pisa, Italy Abstract. The extension

More information

Negatively Correlated Echo State Networks

Negatively Correlated Echo State Networks Negatively Correlated Echo State Networks Ali Rodan and Peter Tiňo School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {a.a.rodan, P.Tino}@cs.bham.ac.uk

More information

Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps

Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps 1 Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps Ali Rodan 1 and Peter Tiňo 1 1 School of computer Science, The University of Birmingham, Birmingham B15 2TT, United Kingdom, (email:

More information

Reservoir Computing and Echo State Networks

Reservoir Computing and Echo State Networks An Introduction to: Reservoir Computing and Echo State Networks Claudio Gallicchio gallicch@di.unipi.it Outline Focus: Supervised learning in domain of sequences Recurrent Neural networks for supervised

More information

IEEE TRANSACTION ON NEURAL NETWORKS, VOL. 22, NO. 1, JANUARY

IEEE TRANSACTION ON NEURAL NETWORKS, VOL. 22, NO. 1, JANUARY IEEE TRANSACTION ON NEURAL NETWORKS, VOL. 22, NO. 1, JANUARY 211 131 Minimum Complexity Echo State Network Ali Rodan, Student Member, IEEE, andpetertiňo Abstract Reservoir computing (RC) refers to a new

More information

Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos

Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos Peter Barančok and Igor Farkaš Faculty of Mathematics, Physics and Informatics Comenius University in Bratislava, Slovakia farkas@fmph.uniba.sk

More information

Fisher Memory of Linear Wigner Echo State Networks

Fisher Memory of Linear Wigner Echo State Networks ESA 07 proceedings, European Symposium on Artificial eural etworks, Computational Intelligence and Machine Learning. Bruges (Belgium), 6-8 April 07, i6doc.com publ., ISB 978-87587039-. Fisher Memory of

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Several ways to solve the MSO problem

Several ways to solve the MSO problem Several ways to solve the MSO problem J. J. Steil - Bielefeld University - Neuroinformatics Group P.O.-Box 0 0 3, D-3350 Bielefeld - Germany Abstract. The so called MSO-problem, a simple superposition

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Text Correction Using Approaches Based on Markovian Architectural Bias

Text Correction Using Approaches Based on Markovian Architectural Bias Text Correction Using Approaches Based on Markovian Architectural Bias Michal Čerňanský, Matej Makula, Peter Trebatický and Peter Lacko Faculty of Informatics and Information Technologies Slovak University

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION

MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION Computing and Informatics, Vol. 30, 2011, 321 334 MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION Štefan Babinec, Jiří Pospíchal Department of Mathematics Faculty of Chemical and Food Technology

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

Deep Learning Architecture for Univariate Time Series Forecasting

Deep Learning Architecture for Univariate Time Series Forecasting CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Short Term Memory and Pattern Matching with Simple Echo State Networks

Short Term Memory and Pattern Matching with Simple Echo State Networks Short Term Memory and Pattern Matching with Simple Echo State Networks Georg Fette (fette@in.tum.de), Julian Eggert (julian.eggert@honda-ri.de) Technische Universität München; Boltzmannstr. 3, 85748 Garching/München,

More information

Using a Hopfield Network: A Nuts and Bolts Approach

Using a Hopfield Network: A Nuts and Bolts Approach Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Recurrent Latent Variable Networks for Session-Based Recommendation

Recurrent Latent Variable Networks for Session-Based Recommendation Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)

More information

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks

Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks Christian Emmerich, R. Felix Reinhart, and Jochen J. Steil Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

ARCHITECTURAL DESIGNS OF ECHO STATE NETWORK

ARCHITECTURAL DESIGNS OF ECHO STATE NETWORK ARCHITECTURAL DESIGNS OF ECHO STATE NETWORK by ALI ABDALLAH ALI AL RODAN A thesis submitted to The University of Birmingham for the degree of DOCTOR OF PHILOSOPHY School of Computer Science College of

More information

Forecasting Data Streams: Next Generation Flow Field Forecasting

Forecasting Data Streams: Next Generation Flow Field Forecasting Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and

More information

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

CS-E3210 Machine Learning: Basic Principles

CS-E3210 Machine Learning: Basic Principles CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction

More information

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction A Sparse Linear Model and Significance Test 1 for Individual Consumption Prediction Pan Li, Baosen Zhang, Yang Weng, and Ram Rajagopal arxiv:1511.01853v3 [stat.ml] 21 Feb 2017 Abstract Accurate prediction

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Learning theory. Ensemble methods. Boosting. Boosting: history

Learning theory. Ensemble methods. Boosting. Boosting: history Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

Support Vector Machines and Bayes Regression

Support Vector Machines and Bayes Regression Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering

More information

Hidden Markov Models Part 1: Introduction

Hidden Markov Models Part 1: Introduction Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta Recurrent Autoregressive Networks for Online Multi-Object Tracking Presented By: Ishan Gupta Outline Multi Object Tracking Recurrent Autoregressive Networks (RANs) RANs for Online Tracking Other State

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Lecture 9. Time series prediction

Lecture 9. Time series prediction Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Towards a Calculus of Echo State Networks arxiv: v1 [cs.ne] 1 Sep 2014

Towards a Calculus of Echo State Networks arxiv: v1 [cs.ne] 1 Sep 2014 his space is reserved for the Procedia header, do not use it owards a Calculus of Echo State Networks arxiv:49.28v [cs.ne] Sep 24 Alireza Goudarzi and Darko Stefanovic,2 Department of Computer Science,

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

DS-GA 1002 Lecture notes 12 Fall Linear regression

DS-GA 1002 Lecture notes 12 Fall Linear regression DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Part of the slides are adapted from Ziko Kolter

Part of the slides are adapted from Ziko Kolter Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information