Kernels to detect abrupt changes in time series

Size: px
Start display at page:

Download "Kernels to detect abrupt changes in time series"

Transcription

1 1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES Paris, March 22nd, 216 1/47

2 Outline 1 Motivating examples and framework (kernels) 2 KCP Algorithm and computational complexity 3 Where are the change-points (D fixed)? 4 How many change-points? 2/47

3 Change-point detection: 1-D signal (example) Signal Reg. func. Signal.2.2.4?? Position t 3/47

4 Detect abrupt changes... General purposes: 1 Detect changes in (features of) the distribution (not only in the mean) 4/47

5 Abrupt changes in high-order moments Detecting changes in the mean is useless 5/47

6 Detect abrupt changes... General purposes: 1 Detect changes in (features of) the distribution (not only in the mean) 2 Complex data: High-dimension: measures in R d, curves,... Structured: audio/video streams, graphs, DNA sequence,... 6/47

7 Motivating example 1: Structured objects Description: Video sequences from Le grand échiquier, 7s-8s French talk show. At each time, one observes an image (high-dimensional). Each image is summarized by a histogram. 7/47

8 Motivating example 2: Structured objects Observe networks along the time Goal: Detect abrupt changes in some features of the network 8/47

9 Detect abrupt changes... General purposes: 1 Detect changes in (features of) the distribution (not only in the mean) 2 Complex data: High-dimension: measures in R d, curves,... Structured: audio/video streams, graphs, DNA sequence,... 3 Fusion of heterogeneous data Deal simultaneously with different types of complex data 4 Efficient algorithm allowing to deal with large data sets ( Big data challenge) 9/47

10 I Kernel framework 1/47

11 Kernel and Reproducing Kernel Hilbert Space (RKHS) X 1,..., X n X : initial observations. k(, ) : X X R: reproducing kernel (Aronszajn (195)) H: RKHS associated with k(, ) (φ : X H s.t. φ(x) = k(x, ): canonical feature map) Assets: Versatile tool to work with different types of data Complex data (high dimensional/structured) 11/47

12 Instances of kernels Gaussian kernel: (with R d -valued data) [ ] x y 2 k δ (x, y) = exp, δ >. δ χ 2 -kernel:... (with histogram-valued data) k I (p, q) = exp [ I i=1 (p i q i ) 2 p i + q i ] 12/47

13 Model where 1 i n, Y i = φ(x i ) = µ i + ε i H, µ i H: mean element of P Xi (distribution of X i ) i, ε i := Y i µ i, with Eε i =, v i := E [ ] ε i 2 H Mean element of P Xi The mean element of P Xi : (H separable and E [ k(x, X ) ] < + ) < µ i, f > H = E Xi [ < φ(x i ), f > H ], f H. With characteristic kernels, P Xi P Xj µ i µ j.. 13/47

14 Estimation rather than identification Assumption µ = (µ 1,..., µ n) H n : piecewise constant. 1 Signal: Y Reg. func. s Fact: With finite sample, it is impossible to recover change-point in noisy regions. Purpose: Estimate µ to recover change-points. Performance measure: µ µ 2 := n i=1 µ i µ i 2 H 14/47

15 II Algorithm 15/47

16 Notation Segmentation with D segments: τ = (τ,..., τ D ), with = τ < τ 1 < τ 2 < < τ D = n Quality of a segmentation τ: Following Hachaoui and Cappé (27), R n (τ) = 1 n k(x i, X i ) 1 D 1 n n τ l τ l 1 i=1 l=1 τ l τ l i=τ l 1 +1 j=τ l 1 +1 Rk: With the linear kernel k(x, x ) =< x, x > on X = R d, R n (τ) reduces to the usual least-squares empirical risk. k(x i, X j ). 16/47

17 KCP Algorithm Input: observations: X 1,..., X n X, kernel: k : X X R, 17/47

18 KCP Algorithm Input: observations: X 1,..., X n X, kernel: k : X X R, Step 1: 1 D D max, compute: τ(d) Argmin τ T D n dynamic programming { } Rn (τ) T D n = { (τ,..., τ D ) N D+1 / = τ < τ 1 < τ 2 < < τ D = n } 17/47

19 KCP Algorithm Input: observations: X 1,..., X n X, kernel: k : X X R, Step 1: 1 D D max, compute: τ(d) Argmin τ T D n dynamic programming { } Rn (τ) Step 2: Find: { } D Argmin 1 D Dmax Rn ( τ(d)) + pen ( τ(d)) model selection Output: sequence of change-points: τ = τ ( D T D n = { (τ,..., τ D ) N D+1 / = τ < τ 1 < τ 2 < < τ D = n } ). 17/47

20 Computational complexity (Naive approach) Dynamic programming (DP) update rule: 2 D D max, L D,n = min t n 1 {L D 1,t + C t,n }, where L D 1,t : cost of the best segmentation in D 1 segments up to time t, C t,n : cost of the segment t,n. C s,t = t i=s+1 k(x i, X i ) 1 t s t t i=s+1 j=s+1 k(x i, X j ) Complexity (Naive approach): time: O(D max n 4 ) (computation of {C s,t } 1 s,t n ) space: O(n 2 ) (storage of the cost matrix) 18/47

21 Computational complexity (Improvement) Ideas: (with G. Rigaill and G. Marot) Never store the cost matrix Update each column C,t+1 from C,t Pseudo-code: 1: for t = 1 to n 1 do 2: Compute the (t + 1)-th column C,t+1 from C,t 3: for D = 2 to min(t, D max ) do 4: L D,t+1 = min s t {L D 1,s + C s,t+1 } 5: end for 6: end for Computational complexity Space: O(D max n) (only store C,t R n ) Time: O(D max n 2 ) (update rule+dp complexity) 19/47

22 Runtime Open questions: Reduce computation time by low-rank matrix approx. Quantify what has been lost by the approx. 2/47

23 III Where are the change-points for a fixed D? 21/47

24 KCP Algorithm (reminder) Input: observations: X 1,..., X n X, kernel: k : X X R, Step 1: 1 D D max, compute: τ(d) Argmin τ T D n dynamic programming { } Rn (τ) Step 2: Find: { } D Argmin 1 D Dmax Rn ( τ(d)) + pen ( τ(d)) model selection Output: sequence of change-points: τ = τ ( D T D n = { (τ,..., τ D ) N D+1 / = τ < τ 1 < τ 2 < < τ D = n } ). 22/47

25 Distance between segmentations Hausdorff distance: { d H (τ, τ ) = max max min 1 i D τ 1 1 j D τ 1 τ i τ, j max min 1 j D τ 1 1 i D τ 1 } τ i τ j Frobenius distance: d F (τ, τ ) = M τ M τ F = 1 i,j n ( 2, Mi,j i,j) τ Mτ where M τ i,j = 1 {i and j belong to the same segment of τ} Card(segment of τ containing i and j). 23/47

26 Empirical assessment Scenario 1: Changes in (mean,variance) R-valued X 1,..., X n with n = 1 True partition of 1,n in D = 11 segments In each segment, randomly choose a distrib. among 7 of them /47

27 Scenario 1: Changes in (mean,variance) with D = 11 Hausdorff and Frobenius distances 1 Frobenius Hausdorff 5 1 Frobenius Hausdorff Frobenius dist Hausdorff dist. Frobenius dist Hausdorff dist Dimension Dimension (a) Gaussian (k G ) (b) Linear (k Lin ) K Lin puts changes in noise 25/47

28 Scenario 1: Changes in (mean,variance) Cont. Change-points frequencies for D = D (5 repetitions).6.6 Freq. of selected chgpts Freq. of selected chgpts Position Position (a) Gaussian (k G ) (b) Linear (k Lin ) K Lin puts changes in noise 26/47

29 Empirical assessment Scenario 2: No change in (mean,variance) R-valued X 1,..., X n with n = 1 True partition of 1,n in D = 11 segments In each segment, randomly choose a distrib. among 3 of them /47

30 Scenario 2: No change in (mean,variance) Hausdorff and Frobenius distances 1 Frobenius Hausdorff 5 1 Frobenius Hausdorff Frobenius dist Hausdorff dist. Frobenius dist Hausdorff dist Dimension Dimension (a) Gaussian (k G ) (b) Linear (k Lin ) K Lin puts changes in noise 28/47

31 Scenario 2: No change in (mean,variance) Cont. Change-points frequencies for D = D.6.6 Freq. of selected chgpts Freq. of selected chgpts Position Position (a) Gaussian (k G ) (b) Linear (k Lin ) K Lin puts changes in noise 29/47

32 Scenario 2: No change in (mean,variance) Cont. Change-points frequencies for D = D.6.6 Freq. of selected chgpts Freq. of selected chgpts Position Position (a) Gaussian (k G ) (b) Hermite (k H 5 ) K H 5 less sensitive to changes than K G (characteristic kernels) 3/47

33 Empirical assessment Scenario 3: Histogram-valued data Histogram-valued X 1,..., X n with 2 bins and n = 1 True partition of 1,n in D = 11 segments In each segment, randomly choose DP(p 1,..., p 2 ) (Dirichlet) /47

34 Scenario 3: Histogram-valued data Hausdorff and Frobenius distances 1 Frobenius Hausdorff 5 1 Frobenius Hausdorff Frobenius dist Hausdorff dist. Frobenius dist Hausdorff dist Dimension Dimension (a) χ 2 (k χ2 ) (b) Gaussian (k G ) K G misses change-points by ignoring the structure of the data 32/47

35 Scenario 3: Histogram-valued data Cont. Change-points frequencies for D = D.6.6 Freq. of selected chgpts Freq. of selected chgpts Position Position (a) χ 2 (k χ2 ) (b) Gaussian (k G ) potential gain in exploiting the structure of the data 33/47

36 IV How many change-points? 34/47

37 KCP Algorithm (reminder) Input: observations: X 1,..., X n X, kernel: k : X X R, Step 1: 1 D D max, compute: τ(d) Argmin τ T D n dynamic programming { } Rn (τ) Step 2: Find: { } D Argmin 1 D Dmax Rn ( τ(d)) + pen ( τ(d)) model selection Output: sequence of change-points: τ = τ ( D T D n = { (τ,..., τ D ) N D+1 / = τ < τ 1 < τ 2 < < τ D = n } ). 35/47

38 Empirical risk minimizer Assumption: 1 i n, Y i = µ i + ε i µ = (µ 1,..., µ n) : piecewise-constant Model τ = (τ, τ 1,..., τ D ), (with τ = and τ D = n) Vector space (model): F τ = { (f 1,..., f n ) H n f τl 1 +1 = = f τl, 1 l D τ } (D τ : number of segments of τ) Estimator of µ : µ τ = Argmin f Fτ { Y f 2}, with f 2 = n f i 2 H i=1 µ τ = Π Fτ Y : orthogonal projection onto F τ 36/47

39 Choose the number of change-points Ideal penalty: τ Argmin τ Tn µ µ τ 2 (oracle segmentation) } = Argmin τ Tn { Y µ τ 2 + pen id (τ), with pen id (τ) := 2 Π τ ε 2 2 < (I Π τ )µ, ε >. Strategy 1 Concentration inequalities for linear and quadratic terms. 2 Derive a tight upper bound pen pen id with high probability. 37/47

40 Concentration of the quadratic term Assumptions: max i Y i H M a.s. (Db). [ ] max i E ε i 2 H v max (Vmax). Theorem (Quadratic term) Assuming (Db)-(Vmax), then for every τ T n, x >, θ (, 1], [ ] [ Π τ ε 2 E Π τ ε 2 ] θe Π τ µ µ τ 2 + θ 1 Lv max x, with probability at least 1 2e x, where L is a constant. Rks: No Gaussian or constant-variance assumption Deals with Hilbert-valued vectors (not only in R d ) The x deviation term allows large collections 38/47

41 Oracle inequality Theorem Assume (Db)-(Vmax). For every x >, } τ Argmin τ { Y µ τ 2 + pen(τ) where pen(τ) = D τ [ C 1 ln ( n Then with prob. 1 2e x, µ µ τ 2 1 inf τ D τ ) ] + C 2 (C 1, C 2 > ). { } µ µ τ 2 + pen(τ) where 1 1 and 2 > is a remainder term., + 2, Rk: [ ( ) In Birgé, Massart (21), pen(τ) = D τ c 1 ln n D τ + c 2 ]. 39/47

42 Model selection procedure Algorithm 1 For every 1 D D max, τ(d) Argmin τ, Dτ =D { Y µ τ 2}, 2 Define { Y [ ( D = Argmin D µ τ(d) 2 n ) ]} + D C 1 ln + C 2 D. where C 1, C 2 : computed by simulations (slope heuristics). 3 Final estimator: µ τ := µ τ( D). 4/47

43 Scenario 1: Changes in (mean,variance) Behavior of the penalized criterion 15 4 x Penalized crit Risk Empirical risk 1 Penalized crit Risk Empirical risk Dimension Dimension (a) Gaussian (k G ) (b) Hermite (k H 5 ) crit( τ(d)) looks like the risk for both k G and k H 5 41/47

44 Scenario 1: Changes in (mean,variance) Cont. Change-points frequencies and ˆD.5 Freq. of selected chgpts Position (a) Fequencies (exact recovery) (b) Selected dimension (D = 11) 42/47

45 Scenario 2: No change in (mean,variance) Behavior of the penalized criterion 15 8 x Penalized crit Risk Empirical risk 2 2 Penalized crit Risk Empirical risk Dimension Dimension (a) Gaussian (k G ) (b) Hermite (k H 5 ) crit( τ(d)) looks like the risk for both k G and k H 5 43/47

46 Scenario 2: No change in (mean,variance) Cont. Change-points frequencies and ˆD.5 Freq. of selected chgpts Position (a) Fequencies (exact recovery) (b) Selected dimension (D = 11) 44/47

47 Scenario 3: histogram-valued (Cont.) Behavior of the penalized criterion Penalized crit Risk Empirical risk Dimension Penalized crit Risk Empirical risk Dimension (a) χ 2 (k χ2 ) (b) Gaussian (k G ) Crit looks like the risk for both k G and k χ2 45/47

48 Concluding remarks Summary: detect changes in the distribution (not only in the mean) efficient and theoretically grounded procedure deal with both vectorial (R d ) and structured (graphs,... ) objects 46/47

49 Concluding remarks Summary: detect changes in the distribution (not only in the mean) efficient and theoretically grounded procedure deal with both vectorial (R d ) and structured (graphs,... ) objects Statistical precision/computation trade-offs: Open challenges Reduce the O(n 2 ) time complexity approx. to the Gram matrix Investigate the link between kernel and abrupt changes Revisit the slope heuristic to: (i) preserve accuracy, and (ii) save computation resources Thank you! 46/47

50 Concluding remarks Summary: detect changes in the distribution (not only in the mean) efficient and theoretically grounded procedure deal with both vectorial (R d ) and structured (graphs,... ) objects Statistical precision/computation trade-offs: Open challenges Reduce the O(n 2 ) time complexity approx. to the Gram matrix Investigate the link between kernel and abrupt changes Revisit the slope heuristic to: (i) preserve accuracy, and (ii) save computation resources Thank you! 46/47

51 47/47

52 Scenario 3: Histogram-valued (Cont.) Change-points frequencies and ˆD Freq. of selected chgpts Freq. of selected chgpts Position Position (a) χ 2 (k χ2 ) (b) Gaussian (k G ) 48/47

53 Sketch of proof 1 Π τ ε 2 = λ m 1 n λ i λ ε i 2 H = λ m T λ. { } 2 are independent r.v.. i λ ε i 2 H λ m 3 Bernstein s inequality to Π τ ε 2 ( ). 4 For every q 2, upper bound of E [ T q ] λ. 5 Pinelis-Sakhanenko s inequality on i λ ε i H : [ x >, P ε i > x x 2 2 exp 2 ( σ 2 H λ + b λx ) i λ ], with b λ = 2M/3 and σ 2 λ = i λ v i. 49/47

54 Bernstein rather than Talagrand Talagrand s inequality Π τ ε = sup f Bn < f, Π τ ε >= sup n f Bn i=1 < f i, (Π τ ε) i > H [ P Π τ ε E [ Π τ ε ] + 2vx + b3 ] x, with v = n i=1 sup f E ( < f i, (Π τ ε) i > 2 H) + 16bE [ Πτ ε ]. Bernstein s inequality σ 2 = sup f n i=1 E ( < f i, (Π τ ε) i > 2 ) [ ] H = E Π τ ε 2. 5/47

Kernel change-point detection

Kernel change-point detection 1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille,

More information

arxiv: v2 [math.st] 24 Mar 2016

arxiv: v2 [math.st] 24 Mar 2016 A kernel multiple change-point algorithm via model selection arxiv:122.878v2 [math.st] 24 Mar 216 Sylvain Arlot sylvain.arlot@math.u-psud.fr Laboratoire de Mathématiques d Orsay Univ. Paris-Sud, CNRS,

More information

New efficient algorithms for multiple change-point detection with kernels

New efficient algorithms for multiple change-point detection with kernels New efficient algorithms for multiple change-point detection with kernels A. Célisse e,c, G. Marot a,c, M. Pierre-Jean a,b, G.J. Rigaill 1,b a Univ. Lille Droit et Santé EA 2694 - CERIM, F-59000 Lille,

More information

New efficient algorithms for multiple change-point detection with kernels

New efficient algorithms for multiple change-point detection with kernels New efficient algorithms for multiple change-point detection with kernels Alain Celisse, Guillemette Marot, Morgane Pierre-Jean, Guillem Rigaill To cite this version: Alain Celisse, Guillemette Marot,

More information

Segmentation of the mean of heteroscedastic data via cross-validation

Segmentation of the mean of heteroscedastic data via cross-validation Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Model selection theory: a tutorial with applications to learning

Model selection theory: a tutorial with applications to learning Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Using CART to Detect Multiple Change Points in the Mean for large samples

Using CART to Detect Multiple Change Points in the Mean for large samples Using CART to Detect Multiple Change Points in the Mean for large samples by Servane Gey and Emilie Lebarbier Research Report No. 12 February 28 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry,

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, 15-16 September 2016 Introduction: Matrix Completion

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Concentration, self-bounding functions

Concentration, self-bounding functions Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3

More information

Computational Oracle Inequalities for Large Scale Model Selection Problems

Computational Oracle Inequalities for Large Scale Model Selection Problems for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444 Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

Abstract. 1 Introduction. Cointerpretation of Flow Rate-Pressure-Temperature Data from Permanent Downhole Gauges. Deconvolution. Breakpoint detection

Abstract. 1 Introduction. Cointerpretation of Flow Rate-Pressure-Temperature Data from Permanent Downhole Gauges. Deconvolution. Breakpoint detection Cointerpretation of Flow Rate-Pressure-Temperature Data from Permanent Downhole Gauges CS 229 Course Final Report Chuan Tian chuant@stanford.edu Yue Li yuel@stanford.edu Abstract This report documents

More information

High-dimensional test for normality

High-dimensional test for normality High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

A Lower Bound Theorem. Lin Hu.

A Lower Bound Theorem. Lin Hu. American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

The Representor Theorem, Kernels, and Hilbert Spaces

The Representor Theorem, Kernels, and Hilbert Spaces The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...

More information

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium

More information

Random Matrix Theory and its Applications to Econometrics

Random Matrix Theory and its Applications to Econometrics Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018 Spectral Analysis of

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

Bayesian Models for Regularization in Optimization

Bayesian Models for Regularization in Optimization Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,

More information

Portfolio Allocation using High Frequency Data. Jianqing Fan

Portfolio Allocation using High Frequency Data. Jianqing Fan Portfolio Allocation using High Frequency Data Princeton University With Yingying Li and Ke Yu http://www.princeton.edu/ jqfan September 10, 2010 About this talk How to select sparsely optimal portfolio?

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

Pack only the essentials: distributed sequential sampling for adaptive kernel DL

Pack only the essentials: distributed sequential sampling for adaptive kernel DL Pack only the essentials: distributed sequential sampling for adaptive kernel DL with Daniele Calandriello and Alessandro Lazaric SequeL team, Inria Lille - Nord Europe, France appeared in AISTATS 2017

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Surrogate loss functions, divergences and decentralized detection

Surrogate loss functions, divergences and decentralized detection Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Statistical Estimation: Data & Non-data Information

Statistical Estimation: Data & Non-data Information Statistical Estimation: Data & Non-data Information Roger J-B Wets University of California, Davis & M.Casey @ Raytheon G.Pflug @ U. Vienna, X. Dong @ EpiRisk, G-M You @ EpiRisk. a little background Decision

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Two-step data analysis for future satellite gravity field solutions: a simulation study

Two-step data analysis for future satellite gravity field solutions: a simulation study BOLLETTINO DI GEOFISICA TEORICA ED APPLICATA VOL. 40, N. 3-4, pp.6-66; SEP.-DEC. 999 Two-step data analysis for future satellite gravity field solutions: a simulation study J. KUSCHE, K. H. ILK and S.

More information

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

Joint distribution optimal transportation for domain adaptation

Joint distribution optimal transportation for domain adaptation Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation

More information

Approximate Kernel PCA with Random Features

Approximate Kernel PCA with Random Features Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,

More information

When is MLE appropriate

When is MLE appropriate When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Kernel methods and the exponential family

Kernel methods and the exponential family Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Kaggle.

Kaggle. Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Group lasso for genomic data

Group lasso for genomic data Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group

More information

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1

Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.

More information

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)

Factor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University) Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber

More information

Information Recovery from Pairwise Measurements

Information Recovery from Pairwise Measurements Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Computation time/accuracy trade-off and linear regression

Computation time/accuracy trade-off and linear regression Computation time/accuracy trade-off and linear regression Maxime BRUNIN & Christophe BIERNACKI & Alain CELISSE Laboratoire Paul Painlevé, Université de Lille, Science et Technologie INRIA Lille-Nord Europe,

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Convergence rates of spectral methods for statistical inverse learning problems

Convergence rates of spectral methods for statistical inverse learning problems Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)

More information