Learning from Temporal Data Using Dynamical Feature Space
|
|
- Dwain Kelly
- 5 years ago
- Views:
Transcription
1 Learning from Temporal Data Using Dynamical Feature Space Peter Tiňo School of Computer Science University of Birmingham, UK Learning from Temporal Data Using Dynamical Feature Space p.1/64
2 Special thanks to... Barbara Hammer Huanhuan Chen Alessio Micheli Michal Čerňanský Luba Beňúšková Jort van Mourik Igor Farkaš Ali Rodan Jochen Steil Xin Yao Nick Gianniotis Dan Conford Manfred Opper Fengzhen Tang,... Learning from Temporal Data Using Dynamical Feature Space p.2/64
3 Learning in the data or model space? Complex data, e.g. - long (possibly) multivariate time series (video, sensor data, etc.) of variable length - fmri data measured on normalized cognitive tasks -... We kind-of know what to do for fix-dimensional vectorial data, but how to represent such complex data in order to usefully "learn" from it? The critical notion of "smoothness" when learning from historic data - "close" things should lead to the same response "closeness" - distance, dot product,... How to represent complex data items in the light of the "smoothness criterion"? Learning from Temporal Data Using Dynamical Feature Space p.3/64
4 Learning in the Model Space Framework Each data item is represented by a model that "explains" it Learning is formulated in the space of models (function space) Parametrization-free notions of "input similarities": similarity quantifications between the models not their (arbitrary) parametrizations! Key consideration - base model class - flexible enough to represent variety of data items - sufficiently constrained to avoid overfitting Learning from Temporal Data Using Dynamical Feature Space p.4/64
5 NARMA task - illustrative example NARMA sequences - orders 10, 20 and 30 - represented as state space models order10 order20 order Learning from Temporal Data Using Dynamical Feature Space p.5/64
6 Representing Sequential Data Each sequence will be represented by a dedicated model that captures everything important about the sequence in a compact manner. If we d like to change focus on different aspects of the sequences, we need to change the base model accordingly. What should be our base model? Many possible choices, but here we ll consider State space models - a very general and powerful model class capturing temporal structure through the notion of information processing state (IPS). IPS at time t codes for the entire history of time series items we have seen up to time t. Learning from Temporal Data Using Dynamical Feature Space p.6/64
7 Discrete Time State Space Model Output Delay Parametrized DS W States of the DS form IPS These states code the entire history of sequence elements we have seen so far Learning from Temporal Data Using Dynamical Feature Space p.7/64
8 A State Space Model N-dimensional continuous state space [ 1,1] N x(t) = f(x(t 1),u(t)) = σ(rx(t 1)+Vu(t)+b), y(t) = h(x) = κ(wx(t)+a), x(t) [ 1,1] N - state vector at time t u(t) - input time series element at time t f( ) - state-transition function y(t) - output of the readout from the state space. h( ) - readout (output) function Learning from Temporal Data Using Dynamical Feature Space p.8/64
9 Another State Space Model - Transducer translate input streams over {a,b} to sequences over {x,y} a x b y 1 2 a x b x Learning from Temporal Data Using Dynamical Feature Space p.9/64
10 Picking the Right Base Model Appealing feature of the previous example - the state space structure and dynamics is perfectly tailored for the particular task. Of course, in our general scenario we may not know (and typically this is the case!) the right state space structure that extracts from the data (time series) what is needed for the particular task: - what are the appropriate IPS? - What is their transition structure? Since we are ML people, the answer seems to be obvious: Learn everything!... easier said than done... Learning from Temporal Data Using Dynamical Feature Space p.10/64
11 Picking the Right Base Model Things are actually much worse! If we adopted a general parameterized model x(t) = f(x(t 1),u(t)) = σ(rx(t 1)+Vu(t)+b), y(t) = h(x) = κ(wx(t)+a), we may have problems fiting it to individual sequences (information latching problem) Worse still - Even if we managed to do perfect model fitting so that each sequence is nicely represented - how to define distance between state space models - driven dynamical systems? Highly non-trivial - as we know from the work in machine vision and video processing communities (low-dim dynamical mostly linear dynamical systems). Learning from Temporal Data Using Dynamical Feature Space p.11/64
12 Our Proposal Interpret the state space model as a "dynamical kernel machine". x(t) = f(x(t 1),u(t)) = σ(rx(t 1)+Vu(t)+b), y(t) = h(x) = Wx(t)+a, At timetthe input itemu(t) is mapped to the feature space - state space of the dynamic system - its feature space image is the state x(t) The feature mapping is dynamic - e.g. besides providing an image x(t) of u(t) it also reflects the history of inputs up to time t Once in the feature space - the representations are rich enough to allow for a convenient use of linear tools! Learning from Temporal Data Using Dynamical Feature Space p.12/64
13 Dynamic Feature Space "dynamic kernel trick" fixed, "general purpose" state space organization "redundant" high-dim state (feature) space task dependent linear readout mapping all the "hard" work will be done in the feature space creation. How can the idea of "general purpose" dynamic feature space be realized? Let s get an inspiration from information theory ("general purpose" information sources). Learning from Temporal Data Using Dynamical Feature Space p.13/64
14 Simplest Case - Observable Markovian IPS The simplest construction of IPS is based on concentrating on the very recent (finite) past e.g. definite memory machines, finite memory machines, Markov models Example: Only the last 5 input symbols matter when reasoning about what symbol comes next All three sequences belong to the same IPS Learning from Temporal Data Using Dynamical Feature Space p.14/64
15 Probabilistic framework - Markov model (MM) finite context-conditional next-symbol distributions ? ? ? P(s 11111) P(s 11112) P(s 11113)... P(s 11121)... P(s 43211)... P(s 44444) s {1,2,3,4} Learning from Temporal Data Using Dynamical Feature Space p.15/64
16 Markov model MM are intuitive and simple, but... Number of IPS (prediction contexts) grows exponentially fast with memory length Large alphabets and long memories are infeasible: computationally demanding and very long sequences are needed to obtain sound statistical estimates of free parameters Learning from Temporal Data Using Dynamical Feature Space p.16/64
17 Variable memory length MM (VLMM) Sophisticated implementation of potentially high-order MM Takes advantage of subsequence structure in data Saves resources by making memory depth context dependent Use deep memory only when it is needed Natural representation of IPS in form of Prediction Suffix Trees Closely linked to the idea of universal simulation of information sources. Learning from Temporal Data Using Dynamical Feature Space p.17/64
18 Prediction suffix tree (PST) IFS are organized on nodes of PST Traverse the tree from root towards leaves in reversed order Complex and potentially time-consuming training 1 1 P(. 122 ) P(. 12 ) Each node i has associated next symbol probability distribution P(. 222 ) 2 P(. i ) time Learning from Temporal Data Using Dynamical Feature Space p.18/64
19 Contractive DS lead to Markovian IPS 3 * 4 3 *1 4 contract symbol 1 contract * *14 4 symbol 2 contract symbol Learning from Temporal Data Using Dynamical Feature Space p.19/64
20 Contractive DS lead to Markovian IPS Mapping symbolic sequences into Euclidean space Coding strategy is Markovian Sequences with shared histories of recently observed symbols have close images The longer is the common suffix of two sequences, the closer are they mapped to each other Add a simple output-generating readout on top of the fixed state dynamics! Learning from Temporal Data Using Dynamical Feature Space p.20/64
21 More Formally... Given an input symbol s A at time t, and activations of recurrent units from the previous step, x(t 1), x(t) = f s (x(t 1)). This notation can be extended to sequences over A the recurrent activations after t time steps are x(t) = f s1...s t (x(0)). If maps f s are contractions with Lip. constant 0 < C < 1, f vq (x) f wq (x) C q f v (x) f w (x) C q diam(x). Learning from Temporal Data Using Dynamical Feature Space p.21/64
22 Theoretical grounding Theorem (Hammer & Tiňo): Recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite memory machine. Conversely, every definite memory machine can be simulated by a recurrent network with contractive transition function. Hence initialization with small weights induces an architectural bias into learning with recurrent neural networks. Learning from Temporal Data Using Dynamical Feature Space p.22/64
23 Learnability (PAC framework) The architectural bias emphasizes one possible region of the weight space where generalization ability can be formally proved. Standard RNN are not distribution independent learnable in the PAC sense if arbitrary precision and inputs are considered. Theorem (Hammer & Tiňo): Recurrent networks with contractive transition function with a fixed contraction parameter fulfill the distribution independent UCED property and so unlike general recurrent networks, are distribution independent PAC-learnable. Learning from Temporal Data Using Dynamical Feature Space p.23/64
24 State space complexity sequence complexity Complexity of the driving input stream (topological entropy) is directly reflected in the complexity of the state space activations (fractal dimension). Theorem (Tiňo & Hammer): Recurrent activations inside contractive RNN form fractal clusters the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters. Learning from Temporal Data Using Dynamical Feature Space p.24/64
25 Fix the DS to a contractive system! Output "smooth" map IPS Non autonomous DS Delay Fix the DS part to a randomly constructed contraction. Only train the simple linear readout output map. Can be done efficiently. Reservoir computation models. Learning from Temporal Data Using Dynamical Feature Space p.25/64
26 Simple, non-randomized reservoirs Dynamical Reservoir N internal units x(t) Dynamical Reservoir N internal units x(t) Dynamical Reservoir N internal units x(t) Input unit s(t) V W U output unit y(t) Input unit s(t) V W U output unit y(t) Input unit s(t) V W U output unit y(t) Learning from Temporal Data Using Dynamical Feature Space p.26/64
27 Cycle reservoir with regular jumps still a simple deterministic construction... (A) 16 r c 13 r c r c r j N=18 1 r r j r j c 10 r j r j r c r c 4 r r 7 c c r c (B) 13 r c r 17 c N=18 r j r j r c 1 r j r j 9 r c r c r c r c 5 Learning from Temporal Data Using Dynamical Feature Space p.27/64
28 Reservoir characterizations Memory capacity - [Jaeger] A measure correlating past events in an i.i.d. input stream with the network output. Assume the network is driven by a univariate stationary input signal u(t). For a given delayk > 0, construct the network with optimal parameters for the task of outputtingu(t k) after seeing the input stream...u(t 1)u(t) up to time t. Goodness of fit - squared correlation coefficient between the desired output u(t k) and the observed network output y(t) Learning from Temporal Data Using Dynamical Feature Space p.28/64
29 Memory capacity MC k = Cov2 (u(t k),y(t)) Var(u(t)) Var(y(t)), where Cov denotes the covariance and Var the variance operators. The short term memory (STM) capacity is then given by: MC = MC k. k=1 [Jaeger] For any recurrent neural network with N recurrent neurons, under the assumption of i.i.d. input stream, the STM capacity cannot exceed N. Learning from Temporal Data Using Dynamical Feature Space p.29/64
30 Memory capacity of SCR [Tiňo & Rodan] Under the assumption of zero-mean i.i.d. input stream, the memory capacity of linear SCR architecture with N reservoir units can be made arbitrarily close to N. Theorem (Tiňo & Rodan): Consider a linear SCR network with reservoir weight 0 < r < 1 and an input weight vector such that the reservoir activations are full rank. Then, the SCR network memory capacity is equal to MC = N (1 r 2N ). Learning from Temporal Data Using Dynamical Feature Space p.30/64
31 Reservoir characterizations Eigen-decomposition of reservoir matrix ESN SCR CRJ CRHJ Learning from Temporal Data Using Dynamical Feature Space p.31/64
32 Reservoir characterizations Memory capacity (A) 10 0 (B) MC k MC k (C) 10 0 SCR CRJ ESN k (D) 10 0 SCR CRJ ESN k MC k MC k SCR CRJ ESN k SCR CRJ ESN k Learning from Temporal Data Using Dynamical Feature Space p.32/64
33 Reservoir characterizations Pseudo-Lyapunov exponents 2 NARMA Task 2 Laser task lyapunov exponent ESN SCR CRJ lyapunov exponent ESN SCR CRJ scaling parameter scaling parameter Learning from Temporal Data Using Dynamical Feature Space p.33/64
34 Time Series Kernel via Reservoir Models rc rc rc rj rj rj rj rc rc rc Readout Mapping 1. For all sequences - the same fixed state transition (reservoir) part. 2. For each time series train the linear readout on prediction task. Readout Mapping Space Deterministic and fixed Trained 3. Gaussian kernel in the readout function space. Learning from Temporal Data Using Dynamical Feature Space p.34/64
35 Distance in the Readout Function Space Two readouts h 1 and h 2 obtained on two sequences: L 2 (h 1,h 2 ) = ( C ) 1/2 h 1 (x) h 2 (x) 2 dµ(x) where µ(x) is the probability density measure on the input domain, andc is the integral range. C = [ 1,+1] N as we use recurrent neural network reservoir formulation with tanh transfer function. Learning from Temporal Data Using Dynamical Feature Space p.35/64
36 Model Distance with Uniform Measure Consider two readouts from the same reservoir h 1 (x) = W 1 x+a 1, h 2 (x) = W 2 x+a 2. Since the sigmoid activation function is employed in the domain of the readout,c [ 1,1] N. L 2 (h 1,h 2 ) = = ( C h 1(x) h 2 (x) 2 dµ(x)) 1/2 ( 2 N 3 N j=1 O i=1 w2 i,j +2N a 2) 1/2 where w i,j is the (i,j)-th element of W 1 W 2, a = a 1 a 2. Learning from Temporal Data Using Dynamical Feature Space p.36/64
37 Model Distance vs. Parameter Distance Scaling of the squared model distance by 1 3 N j=1 O wi,j 2 + a 2, i=1 Squared Euclidean distance of the readout parameters N j=1 O i=1 w 2 i,j + a 2, more importance is given to the offset than orientation of the readout mapping. Learning from Temporal Data Using Dynamical Feature Space p.37/64
38 Non-uniform State Distribution For K-component Gaussian mixture model µ(x) = K α i µ i (x η i,σ i ) i=1 The model distance can be obtained as follows: L 2 2(h 1,h 2 ) = K i=1 α i Sampling approximation trace(w T WΣ k )+a T a + η T k WT Wη k +2a T Wη k L 2 2(h 1,h 2 ) 1 m m i=1 h 1 (x(i)) h 2 (x(i)) 2. Learning from Temporal Data Using Dynamical Feature Space p.38/64
39 Three Time Series Kernels Uniform state distribution Reservoir Kernel: RV Non-uniform state distribution Reservoir kernel by sampling: SamplingRV Reservoir kernel by Gaussian mixture models: GMMRV K(s i,s j ) = exp { γ L 2 2(h i,h j ) } Learning from Temporal Data Using Dynamical Feature Space p.39/64
40 Fisher Kernel Single model for all data items. Based on Information Geometry Fisher vector (score functions): U(s) = λ logp(s λ) Metric tensor - Fisher Information Matrix F K(s i,s j ) = (U(s i )) T F U(s j ). Learning from Temporal Data Using Dynamical Feature Space p.40/64
41 Fisher Kernel Based on Reservoir Model Endowing the readout with a noise model yields a generative time series model of the form: x(t) = f(u(t),x(t 1)), u(t+1) = Wx(t)+a+ε(t). Assume the noise model follows a Gaussian distribution ε(t) = N(0,σ 2 I). Learning from Temporal Data Using Dynamical Feature Space p.41/64
42 Fisher Kernel by Reservoir Models partial derivatives of log likelihoodlogp(u(1..l)) U = logp(u(1..l)) l P(u(t) u(1 t 1)) t=1 = W W l (u(t) a) x(t 1) T Wx(t 1)x(t 1) T =. σ 2 t=1 Fisher kernel (simplified in the usual way - Id matrix instead of F) for two time series s i ands j with scores U i andu j : K(s i,s j ) = O o=1 N (U i U j ) o,n n=1 Learning from Temporal Data Using Dynamical Feature Space p.42/64
43 Other time series kernels Kernel based on Dynamic time warping (DTW) DTW tries to warp the time axis of one (or both) sequences to achieve a better alignment DTW can generate un-intuitive alignments by mapping a single point on one time series onto a large subsection of another time series, leading to inferior results efficient versions of DTW based kernels available Learning from Temporal Data Using Dynamical Feature Space p.43/64
44 Other time series kernels Autoregressive kernel VAR model class of a given order is used to generate an infinite family of features from the time series. For a given time series s, the likelihood profile p θ (s) across all possible parameter setting (under a matrix normal-inverse Wishart prior ω(θ)) forms a representation of s. K(s i,s j ) = θ p θ (s i )p θ (s j )ω(dθ) Learning from Temporal Data Using Dynamical Feature Space p.44/64
45 Experimental Comparisons Compared Kernels: Autoregressive Kernel: AR Fisher Kernel with Hidden Markov Model: Fisher Dynamic time wrapping: DTW Our Kernels Reservoir Kernel: RV Reservoir kernel by sampling: SamplingRV Reservoir kernel by Gaussian mixture models: GMMRV Fisher kernel with reservoir models: FisherRV Learning from Temporal Data Using Dynamical Feature Space p.45/64
46 Synthetic Data: NARMA order 10 order 20 order s(t + 1) = 0.3s(t) s(t) 9 s(t i)+1.5u(t 9)u(t)+0.1 i=0 s(t + 1) = tanh(0.3s(t) s(t) s(t + 1) = 0.2s(t) s(t) 29 i=0 19 i=0 s(t i)+1.5u(t 19)u(t)+0.01)+0.2 s(t i)+1.5u(t 29)u(t) Learning from Temporal Data Using Dynamical Feature Space p.46/64
47 Multi-Dimensional Scaling Representation order 10 order 20 order Illustration of MDS on the model distance among reservoir weights Learning from Temporal Data Using Dynamical Feature Space p.47/64
48 Robustness to Noise Generalization Accuracy RV AR Fisher DTW NoKernel Gaussian Noise Level Illustration of the performance of compared kernels with different noise levels Learning from Temporal Data Using Dynamical Feature Space p.48/64
49 Benchmark Data Dataset Length Classes Train Test Symbols OSULeaf Oliveoil Lighting Beef Car Fish Coffee Adiac Learning from Temporal Data Using Dynamical Feature Space p.49/64
50 Results: Classification Accuracy Dataset DTW AR Fisher RV FisherRV GMMRV SamplingRV Symbols OSULeaf Oliveoil Lighting Beef Car Fish Coffee Adiac Learning from Temporal Data Using Dynamical Feature Space p.50/64
51 Results : CPU Time (s) Dataset DTW AR Fisher RV FisherRV GMMRV SamplingRV Symbols 1,318 2,868 2, OSULeaf 6,030 1,375 3, Oliveoil Lighting , Beef Car Fish 3, , Coffee Adiac , Learning from Temporal Data Using Dynamical Feature Space p.51/64
52 Performance on increasing length of time series RV AR DTW Fisher 10 5 RV AR DTW Fisher Accuracy 30 CPU time length of time series length of time series Generalization Accuracy CPU Time Learning from Temporal Data Using Dynamical Feature Space p.52/64
53 Multivariate Time Series Our kernels can be applicable for multivariate time series of variable length Dataset dim length classes train test Libras handwritten AUSLAN Learning from Temporal Data Using Dynamical Feature Space p.53/64
54 Multivariate Time Series: Results AR Fisher DTW RV SamplingRV AR Fisher DTW RV SamplingRV Libras handwritten AUSLAN 0 Libras handwritten AUSLAN Generalization Accuracy CPU Time Learning from Temporal Data Using Dynamical Feature Space p.54/64
55 Online Kernel Construction Reservoir readouts can be trained in an on-line fashion, e.g. using Recursive Least Squares.This enables us to construct and refine reservoir kernels. w opt = w 0 +k(b a T w 0 ) k = 1 λ+a T (A T 0 A 0) 1 a (AT 0A 0 ) 1 a (A T 0A 0 ) 1 = [λ 1 I ka T ](A T 0A 0 ) 1 λ:forgetting factor Learning from Temporal Data Using Dynamical Feature Space p.55/64
56 Long Time series PEMS-SF is 15 months (Jan. 1st 2008 to Mar. 30th 2009) of daily data from the California Department of Transportation PEMS. The data describes the occupancy rate of different car lanes of San Francisco bay area freeways. PEMS-SF with 440 time series of length 138,672. Learning from Temporal Data Using Dynamical Feature Space p.56/64
57 Results The computational complexity of our kernels is O(l), where l is the length of time series: Scale linearly with the length of time series! RV kernel achieves 86.13% accuracy. Accuracy Length of Time Series x The best accuracy reported are 82% 83% for AR, global alignment, spline smoothing and Bag of vectors kernels Learning from Temporal Data Using Dynamical Feature Space p.57/64
58 Incremental One Class Learning 1. Normal data - apply reservoir model in the sliding window (size m) in the first t steps 2. Kernel matrix train a one-class SVM K σ (s i,s j ) = exp{ σ L 2 (h i,h j )}, 3. If needed, incrementally create new clusters of models (signal regimes ). Learning from Temporal Data Using Dynamical Feature Space p.58/64
59 NARMA task NARMA sequences - orders 10, 20 and order10 order20 order Learning from Temporal Data Using Dynamical Feature Space p.59/64
60 Experiments NARMA (3 classes) Algorithm Classes Precision Recall Specificity DBscan ARMAX-OCS RC-OCS DRC-OCS(Sampling) DRC-OCS Learning from Temporal Data Using Dynamical Feature Space p.60/64
61 Experiments Van der Pol (4 classes) Algorithm Classes Precision Recall Specificity DBscan ARMAX-OCS RC-OCS DRC-OCS(Sampling) DRC-OCS Learning from Temporal Data Using Dynamical Feature Space p.61/64
62 Experiments Three Tank (4 classes) Algorithm Classes Precision Recall Specificity DBscan ARMAX-OCS RC-OCS DRC-OCS(Sampling) DRC-OCS Learning from Temporal Data Using Dynamical Feature Space p.62/64
63 Experiments Barcelona Water (32 classes) Algorithm Classes Precision Recall Specificity DBscan ARMAX-OCS RC-OCS DRC-OCS(Sampling) DRC-OCS Learning from Temporal Data Using Dynamical Feature Space p.63/64
64 Further reading A. Rodan, P. Tino: Minimum Complexity Echo State Network. IEEE TNNLS, 22(1), pp , A. Rodan, P. Tino: Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps. Neural Computation, 24(7), pp , P. Tino, A. Rodan: Short Term Memory in Input-Driven Linear Dynamical Systems. Neurocomputing, 112, pp , H. Chen, P. Tino, X. Yao, A. Rodan: Learning in the Model Space for Fault Diagnosis. IEEE TNNLS, in print, H. Chen, F. Tang, P. Tino, X. Yao: Model-based Kernel for Efficient Time Series Analysis. KDD Learning from Temporal Data Using Dynamical Feature Space p.64/64
Learning in the Model Space for Temporal Data
Learning in the Model Space for Temporal Data Peter Tiňo School of Computer Science University of Birmingham, UK Learning in the Model Space for Temporal Data p.1/66 Special thanks to... Barbara Hammer
More informationShort Term Memory Quantifications in Input-Driven Linear Dynamical Systems
Short Term Memory Quantifications in Input-Driven Linear Dynamical Systems Peter Tiňo and Ali Rodan School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {P.Tino,
More informationarxiv: v1 [astro-ph.im] 5 May 2015
Autoencoding Time Series for Visualisation arxiv:155.936v1 [astro-ph.im] 5 May 215 Nikolaos Gianniotis 1, Dennis Kügler 1, Peter Tiňo 2, Kai Polsterer 1 and Ranjeev Misra 3 1- Astroinformatics - Heidelberg
More informationarxiv: v1 [cs.lg] 2 Feb 2018
Short-term Memory of Deep RNN Claudio Gallicchio arxiv:1802.00748v1 [cs.lg] 2 Feb 2018 Department of Computer Science, University of Pisa Largo Bruno Pontecorvo 3-56127 Pisa, Italy Abstract. The extension
More informationNegatively Correlated Echo State Networks
Negatively Correlated Echo State Networks Ali Rodan and Peter Tiňo School of Computer Science, The University of Birmingham Birmingham B15 2TT, United Kingdom E-mail: {a.a.rodan, P.Tino}@cs.bham.ac.uk
More informationSimple Deterministically Constructed Cycle Reservoirs with Regular Jumps
1 Simple Deterministically Constructed Cycle Reservoirs with Regular Jumps Ali Rodan 1 and Peter Tiňo 1 1 School of computer Science, The University of Birmingham, Birmingham B15 2TT, United Kingdom, (email:
More informationReservoir Computing and Echo State Networks
An Introduction to: Reservoir Computing and Echo State Networks Claudio Gallicchio gallicch@di.unipi.it Outline Focus: Supervised learning in domain of sequences Recurrent Neural networks for supervised
More informationIEEE TRANSACTION ON NEURAL NETWORKS, VOL. 22, NO. 1, JANUARY
IEEE TRANSACTION ON NEURAL NETWORKS, VOL. 22, NO. 1, JANUARY 211 131 Minimum Complexity Echo State Network Ali Rodan, Student Member, IEEE, andpetertiňo Abstract Reservoir computing (RC) refers to a new
More informationMemory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos
Memory Capacity of Input-Driven Echo State NetworksattheEdgeofChaos Peter Barančok and Igor Farkaš Faculty of Mathematics, Physics and Informatics Comenius University in Bratislava, Slovakia farkas@fmph.uniba.sk
More informationFisher Memory of Linear Wigner Echo State Networks
ESA 07 proceedings, European Symposium on Artificial eural etworks, Computational Intelligence and Machine Learning. Bruges (Belgium), 6-8 April 07, i6doc.com publ., ISB 978-87587039-. Fisher Memory of
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationSeveral ways to solve the MSO problem
Several ways to solve the MSO problem J. J. Steil - Bielefeld University - Neuroinformatics Group P.O.-Box 0 0 3, D-3350 Bielefeld - Germany Abstract. The so called MSO-problem, a simple superposition
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationText Correction Using Approaches Based on Markovian Architectural Bias
Text Correction Using Approaches Based on Markovian Architectural Bias Michal Čerňanský, Matej Makula, Peter Trebatický and Peter Lacko Faculty of Informatics and Information Technologies Slovak University
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationMODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION
Computing and Informatics, Vol. 30, 2011, 321 334 MODULAR ECHO STATE NEURAL NETWORKS IN TIME SERIES PREDICTION Štefan Babinec, Jiří Pospíchal Department of Mathematics Faculty of Chemical and Food Technology
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationDeep Learning Architecture for Univariate Time Series Forecasting
CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationShort Term Memory and Pattern Matching with Simple Echo State Networks
Short Term Memory and Pattern Matching with Simple Echo State Networks Georg Fette (fette@in.tum.de), Julian Eggert (julian.eggert@honda-ri.de) Technische Universität München; Boltzmannstr. 3, 85748 Garching/München,
More informationUsing a Hopfield Network: A Nuts and Bolts Approach
Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationIn the Name of God. Lectures 15&16: Radial Basis Function Networks
1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationRecurrent Latent Variable Networks for Session-Based Recommendation
Recurrent Latent Variable Networks for Session-Based Recommendation Panayiotis Christodoulou Cyprus University of Technology paa.christodoulou@edu.cut.ac.cy 27/8/2017 Panayiotis Christodoulou (C.U.T.)
More informationRecurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves
Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative
More informationRecurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks
Recurrence Enhances the Spatial Encoding of Static Inputs in Reservoir Networks Christian Emmerich, R. Felix Reinhart, and Jochen J. Steil Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld
More informationLecture 3: Pattern Classification
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationARCHITECTURAL DESIGNS OF ECHO STATE NETWORK
ARCHITECTURAL DESIGNS OF ECHO STATE NETWORK by ALI ABDALLAH ALI AL RODAN A thesis submitted to The University of Birmingham for the degree of DOCTOR OF PHILOSOPHY School of Computer Science College of
More informationForecasting Data Streams: Next Generation Flow Field Forecasting
Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and
More informationLearning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationCS-E3210 Machine Learning: Basic Principles
CS-E3210 Machine Learning: Basic Principles Lecture 4: Regression II slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 61 Today s introduction
More informationA Sparse Linear Model and Significance Test. for Individual Consumption Prediction
A Sparse Linear Model and Significance Test 1 for Individual Consumption Prediction Pan Li, Baosen Zhang, Yang Weng, and Ram Rajagopal arxiv:1511.01853v3 [stat.ml] 21 Feb 2017 Abstract Accurate prediction
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationUnsupervised learning: beyond simple clustering and PCA
Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationHidden Markov Models Part 1: Introduction
Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationRecurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta
Recurrent Autoregressive Networks for Online Multi-Object Tracking Presented By: Ishan Gupta Outline Multi Object Tracking Recurrent Autoregressive Networks (RANs) RANs for Online Tracking Other State
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationLecture 9. Time series prediction
Lecture 9 Time series prediction Prediction is about function fitting To predict we need to model There are a bewildering number of models for data we look at some of the major approaches in this lecture
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationTowards a Calculus of Echo State Networks arxiv: v1 [cs.ne] 1 Sep 2014
his space is reserved for the Procedia header, do not use it owards a Calculus of Echo State Networks arxiv:49.28v [cs.ne] Sep 24 Alireza Goudarzi and Darko Stefanovic,2 Department of Computer Science,
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationDS-GA 1002 Lecture notes 12 Fall Linear regression
DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,
More informationMath 350: An exploration of HMMs through doodles.
Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More information