Kernel change-point detection

Size: px
Start display at page:

Download "Kernel change-point detection"

Transcription

1 1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille, 1st April /23

2 1-D signal (example) Signal Position t 2/23

3 1-D signal (example): Find abrupt changes in the mean Signal Reg. func. Signal ?? Position t 2/23

4 Estimation rather than identification 1 Signal: Y Reg. func. s 0.8 With a finite sample, it is impossible to recover some change-points in noisy regions Purpose: 1 Estimate the regression function. 2 Use the quadratic loss l(u, v) = u v 2. Rk: Without too strong noise, recover all change-points. 3/23

5 Detect abrupt changes... General purposes: 1 Detect changes in the whole distribution (not only in the mean) Mean: homoscedastic: Birgé & Massart (2001), Comte & Rozenholc (2002, 2004), Baraud, Giraud & Huet (2010)... heteroscedastic: A. & Celisse (2011) Mean and variance: Picard et al. (2007) 4/23

6 Detect abrupt changes... General purposes: 1 Detect changes in the whole distribution (not only in the mean) Mean: homoscedastic: Birgé & Massart (2001), Comte & Rozenholc (2002, 2004), Baraud, Giraud & Huet (2010)... heteroscedastic: A. & Celisse (2011) Mean and variance: Picard et al. (2007) 2 High-dimensional data of different nature: Vectorial: measures in R d, curves (sound recordings,... ) Non vectorial: phenotypic data, graphs, DNA sequence,... Both vectorial and non vectorial data. 4/23

7 Detect abrupt changes... General purposes: 1 Detect changes in the whole distribution (not only in the mean) Mean: homoscedastic: Birgé & Massart (2001), Comte & Rozenholc (2002, 2004), Baraud, Giraud & Huet (2010)... heteroscedastic: A. & Celisse (2011) Mean and variance: Picard et al. (2007) 2 High-dimensional data of different nature: Vectorial: measures in R d, curves (sound recordings,... ) Non vectorial: phenotypic data, graphs, DNA sequence,... Both vectorial and non vectorial data. 3 Efficient algorithm allowing to deal with large data sets 4/23

8 Kernel and Reproducing Kernel Hilbert Space (RKHS) X : initial input space. X 1,..., X n : initial observations. k(, ) : X X R: reproducing kernel (H: RKHS). φ( ) : X H s.t. φ(x) = k(x, ): canonical feature map. Asset: Enables to work with high-dimensional heterogeneous data. Rk: Estimators depend on the Gram matrix K := {k(x i, X j )} 1 i,j n. 5/23

9 Model Mapping of the initial data 1 i n, Y i = φ(x i ) H. (t 1, Y 1 ),..., (t n, Y n ) [0, 1] H : independent. 6/23

10 Model where 1 i n, Y i = s i + ε i H, s i H: mean element of P Xi (distribution of X i ) si, f H = E X i [ φ(x i ), f H ], f H. [ ] i, ε i := Y i si with E [ε i ] = 0 and v i := E ε i 2 H. 6/23

11 Model where 1 i n, Y i = s i + ε i H, s i H: mean element of P Xi (distribution of X i ) si, f H = E X i [ φ(x i ), f H ], f H. [ ] i, ε i := Y i si with E [ε i ] = 0 and v i := E ε i 2 H Assumptions 1 max i Y i H M a.s. (Db).. 2 max i v i v max (Vmax). 3 s = (s 1,..., s n ) H n : piecewise constant. s µ 2 := n i=1 s i µ i 2 H. Goal: Estimate s to recover change-points. 6/23

12 Least-squares estimator Empirical risk minimizer over S m (= model): ŝ m arg min u S m Rn (u) where Rn (u) = 1 n u Y 2 = 1 n Regressogram: ŝ m = λ m β λ 1 λ βλ = 1 Card {t i λ} n u i Y i 2 H. i=1 Y i. t i λ 7/23

13 Model selection Models: M n = {m, segmentation of {1,..., n }}, D m = Card(m). m { I 1 = [0, t m1 ], I 2 = (t m1, t m2 ],..., I Dm = (t mdm 1, 1] }. S m = {µ : (t 1,..., t n ) H, piecewise const. on all λ m } subspace of H n. Strategy: (S m ) m Mn (ŝ m ) m Mn ŝ m??? Oracle model: m argmin m Mn s ŝ m 2. 8/23

14 Model selection Models: M n = {m, segmentation of {1,..., n }}, D m = Card(m). m { I 1 = [0, t m1 ], I 2 = (t m1, t m2 ],..., I Dm = (t mdm 1, 1] }. S m = {µ : (t 1,..., t n ) H, piecewise const. on all λ m } subspace of H n. Strategy: (S m ) m Mn (ŝ m ) m Mn ŝ m??? Oracle model: m argmin m Mn s ŝ m 2. Goal: Oracle inequality (in expectation, or with large probability): { } s ŝ m 2 C inf s ŝ m 2 + R(m, n) m M n 8/23

15 Choose (D 1) change-points... Assumption: (Harchaoui & Cappé (2007)) Question: The number (D 1) of change-points is known. Find the locations of the (D 1) change-points? Strategy: (D is given). The best segmentation in D pieces is obtained by applying the ERM algorithm over D m=d S m : ERM algorithm: m ERM (D) = argmin m D m=d Rk: Based on dynamic programming. R n (ŝ m ). 9/23

16 Quality of the segmentations 10/23

17 Elementary calculations Ideal criterion: (Π m : orthog. proj. operator onto S m ) s ŝ m 2 = s Π m s 2 + Π m ε 2. Empirical risk: Y ŝ m 2 = s Π m s 2 Π m ε (I Π m )s, ε + ε 2. 11/23

18 Elementary calculations Ideal criterion: (Π m : orthog. proj. operator onto S m ) Empirical risk: s ŝ m 2 = s Π m s 2 + Π m ε 2. Y ŝ m 2 = s Π m s 2 Π m ε (I Π m )s, ε + ε 2. Expectations (v λ = 1 Card(λ) i λ v i) [ ] E s ŝ m 2 = s Π m s 2 + v λ, λ m [ ] E Y ŝ m 2 = s Π m s 2 v λ + Cst, λ m Conclusion: ERM prefers models with large λ m v λ (overfitting). 11/23

19 Choose the number of change-points From { ŝ md, choose D amounts to choose the best model. }D Ideal penalty: m argmin s ŝ m 2 m M = argmin m M { } Y ŝ m 2 + pen id (m), with pen id (m) =: 2 Π m ε 2 2 (I Π m )s, ε. Strategy 1 Concentration inequalities for linear and quadratic terms. 2 Derive a tight upper bound pen pen id with high probability. Previous work: Birgé & Massart (2001): Gaussian assumption + real valued functions. cannot be extended to Hilbert framework. 12/23

20 Concentration of the linear term Theorem (Linear term) Assume (Db) (Vmax) hold true. Then, for every segmentation m M n, for every x > 0 with probability at least 1 2e x, Π m s s, ε θ Π m s s 2 + for every θ > 0. ( vmax θ + 4M2 3 ) x, 13/23

21 Concentration of the quadratic term Theorem (Quadratic term) Assume (Db) (Vmax), and κ 1, 0 < M2 κ min v i (Vmin). i Then, for every m M n, x > 0, and θ (0, 1], Π m ε 2 E [ Π ] [ m ε 2 ] θe Π m s ŝ m 2 + θ 1 L(κ)v max x, with probability at least 1 2e x, where L(κ) is a constant. Idea of the proof: Pinelis-Sakhanenko s inequality ( i λ ε i H ). Bernstein s inequality (upper bounding moments) 14/23

22 Oracle inequality Theorem Assume (Db)-(Vmin)-(Vmax) and define { } 1 m argmin m n Y ŝ m 2 + pen(m), [ ( ) ] where pen(m) = vmaxdm n C 1 ln n D m + C 2 for constants C 1, C 2 > 0. Then, for every x 1, with probability at least 1 2e x, { } 1 n s ŝ m inf m n s ŝ m 2 + pen(m) + 2v max x n where 1 1 and 2 > 0 are absolute constants. In Birgé & Massart (2001), pen(m) = σ2 D m n [ c 1 ln ( n D m ) + c 2 ]., 15/23

23 Model selection procedure Algorithm pen(m) = v maxd m n 1 For every 1 D D max, 2 Define D = argmin D [ ( ) ] n C 1 ln + C 2 = pen(d m ). D m m D argmin m, D m=d { Y ŝ m 2 }, { 1 Y ŝ n md 2 v max D [ ( n ) ] } + C 1 ln + C 2. n D where C 1, C 2 : computed by simulation experiments. 3 Final estimator: ŝ m =: ŝ m D. 16/23

24 Changes in the distribution (synthetic data) Description: 1 n = 1 000, D 1 = 9, N rep = In each segment, observations generated according to one distribution within a pool of 10 distributions with same mean and variance. 3 Kernel-based approach enables to distinguish them (higher order moments) [ ] 4 Gaussian kernel: k h (x, y) = exp x y 2 /(2h 2 ). 17/23

25 Changes in the distribution (synthetic data), cont. Results Emp. risk Crit. True risk Hausdorff distance: ± /23

26 Changes in the distribution (synthetic data), cont. Results: estimated number of change-points /23

27 Le grand échiquier, 70s-80s French talk show Audio and video recordings. Audio: different situations can be distinguished from sound recordings (music, applause, speech,... ). Video: different video scenes can be distinguished by their backgrounds or specific actions of people (clapping hands, discussing,... ). 20/23

28 Audio signal Description: n = 500, D 1 = 4. At each t i, one observes a multivariate vector of dimension 12. [ ] Gaussian kernel: k h (x, y) = exp x y 2 /(2h 2 ). Results: Hausdorff distance ± /23

29 Video sequence Description: n = , D 1 = 4. Each image summarized by a histogram with bins. χ 2 kernel: k d (x, y) = d (x i y i ) 2 i=1 x i +y i Results: Hausdorff distance ± /23

30 Conclusion Take-home message: Change-point detection algorithm for possibly high-dimensional or complex data Data-driven choice of the number of change-points Non-asymptotic oracle inequality (guarantee on the risk) Experiments: changes in less usual properties of the distribution, audio or video data 23/23

31 Conclusion Take-home message: Change-point detection algorithm for possibly high-dimensional or complex data Data-driven choice of the number of change-points Non-asymptotic oracle inequality (guarantee on the risk) Experiments: changes in less usual properties of the distribution, audio or video data Open questions: 1 Influence of the choice of kernel 2 Data-driven choice of the kernel 3 Relax the assumption on the variance 4 Extend our model selection theorem to other regression settings 23/23

Kernels to detect abrupt changes in time series

Kernels to detect abrupt changes in time series 1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES

More information

Segmentation of the mean of heteroscedastic data via cross-validation

Segmentation of the mean of heteroscedastic data via cross-validation Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009

More information

arxiv: v2 [math.st] 24 Mar 2016

arxiv: v2 [math.st] 24 Mar 2016 A kernel multiple change-point algorithm via model selection arxiv:122.878v2 [math.st] 24 Mar 216 Sylvain Arlot sylvain.arlot@math.u-psud.fr Laboratoire de Mathématiques d Orsay Univ. Paris-Sud, CNRS,

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Model selection theory: a tutorial with applications to learning

Model selection theory: a tutorial with applications to learning Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression

Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression Electronic Journal of Statistics Vol 008 1345 137 ISSN: 1935-754 DOI: 10114/08-EJS67 Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression Xavier Gendre Laboratoire

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

High-dimensional test for normality

High-dimensional test for normality High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

New efficient algorithms for multiple change-point detection with kernels

New efficient algorithms for multiple change-point detection with kernels New efficient algorithms for multiple change-point detection with kernels Alain Celisse, Guillemette Marot, Morgane Pierre-Jean, Guillem Rigaill To cite this version: Alain Celisse, Guillemette Marot,

More information

New efficient algorithms for multiple change-point detection with kernels

New efficient algorithms for multiple change-point detection with kernels New efficient algorithms for multiple change-point detection with kernels A. Célisse e,c, G. Marot a,c, M. Pierre-Jean a,b, G.J. Rigaill 1,b a Univ. Lille Droit et Santé EA 2694 - CERIM, F-59000 Lille,

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

A survey of cross-validation procedures for model selection

A survey of cross-validation procedures for model selection PUB. IRMA, LILLE 2009 Vol. 69, N o VII A survey of cross-validation procedures for model selection Sylvain Arlot, CNRS ; Willow Project-Team, Laboratoire d Informatique de l École Normale Supérieure (CNRS/ENS/INRIA

More information

GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE. By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA

GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE. By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA Submitted to the Annals of Statistics GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA Let Y be a Gaussian

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Bayesian Model Selection for Change Point Detection and Clustering

Bayesian Model Selection for Change Point Detection and Clustering Othmane Mazhar 1 Cristian R. Rojas 1 Carlo Fischione 1 Mohammad Reza Hesamzadeh 1 Abstract We address a generalization of change point detection with the purpose of detecting the change locations and the

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

References for online kernel methods

References for online kernel methods References for online kernel methods W. Liu, J. Principe, S. Haykin Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, 2010. W. Liu, P. Pokharel, J. Principe. The kernel least mean square

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Using CART to Detect Multiple Change Points in the Mean for large samples

Using CART to Detect Multiple Change Points in the Mean for large samples Using CART to Detect Multiple Change Points in the Mean for large samples by Servane Gey and Emilie Lebarbier Research Report No. 12 February 28 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry,

More information

A spectral clustering algorithm based on Gram operators

A spectral clustering algorithm based on Gram operators A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Risk bounds for model selection via penalization

Risk bounds for model selection via penalization Probab. Theory Relat. Fields 113, 301 413 (1999) 1 Risk bounds for model selection via penalization Andrew Barron 1,, Lucien Birgé,, Pascal Massart 3, 1 Department of Statistics, Yale University, P.O.

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities

Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities Probab. Theory Relat. Fields 126, 103 153 2003 Digital Object Identifier DOI 10.1007/s00440-003-0259-1 Patricia Reynaud-Bouret Adaptive estimation of the intensity of inhomogeneous Poisson processes via

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Computation time/accuracy trade-off and linear regression

Computation time/accuracy trade-off and linear regression Computation time/accuracy trade-off and linear regression Maxime BRUNIN & Christophe BIERNACKI & Alain CELISSE Laboratoire Paul Painlevé, Université de Lille, Science et Technologie INRIA Lille-Nord Europe,

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444 Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Hilbert Space Methods in Learning

Hilbert Space Methods in Learning Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

Concentration, self-bounding functions

Concentration, self-bounding functions Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

arxiv:math/ v3 [math.st] 1 Apr 2009

arxiv:math/ v3 [math.st] 1 Apr 2009 The Annals of Statistics 009, Vol. 37, No., 630 67 DOI: 10.114/07-AOS573 c Institute of Mathematical Statistics, 009 arxiv:math/070150v3 [math.st] 1 Apr 009 GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

Non-linear Regression Approaches in ABC

Non-linear Regression Approaches in ABC Non-linear Regression Approaches in ABC Michael G.B. Blum Lab. TIMC-IMAG, Université Joseph Fourier, CNRS, Grenoble, France ABCiL, May 5 2011 Overview 1 Why using (possibly non-linear) regression adjustment

More information

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Convergence Rates of Kernel Quadrature Rules

Convergence Rates of Kernel Quadrature Rules Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney

More information

Convergence rates of spectral methods for statistical inverse learning problems

Convergence rates of spectral methods for statistical inverse learning problems Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Learning Sparse Penalties for Change-Point Detection using Max Margin Interval Regression

Learning Sparse Penalties for Change-Point Detection using Max Margin Interval Regression Learning Sparse Penalties for Change-Point Detection using Max Margin Interval Regression Guillem Rigaill rigaill@evry.inra.fr Unité de Recherche en Génomique Végétale INRA-CNRS-Université d Evry Val d

More information

An Adaptive Test of Independence with Analytic Kernel Embeddings

An Adaptive Test of Independence with Analytic Kernel Embeddings An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute

More information

GAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS

GAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS GAUSSIANIATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS I. Marrakchi-Mezghani (1),G. Mahé (2), M. Jaïdane-Saïdane (1), S. Djaziri-Larbi (1), M. Turki-Hadj Alouane (1) (1) Unité Signaux

More information

arxiv: v1 [math.st] 10 May 2009

arxiv: v1 [math.st] 10 May 2009 ESTIMATOR SELECTION WITH RESPECT TO HELLINGER-TYPE RISKS ariv:0905.1486v1 math.st 10 May 009 YANNICK BARAUD Abstract. We observe a random measure N and aim at estimating its intensity s. This statistical

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

An overview of a nonparametric estimation method for Lévy processes

An overview of a nonparametric estimation method for Lévy processes An overview of a nonparametric estimation method for Lévy processes Enrique Figueroa-López Department of Mathematics Purdue University 150 N. University Street, West Lafayette, IN 47906 jfiguero@math.purdue.edu

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Mathematical Methods for Data Analysis

Mathematical Methods for Data Analysis Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

The Representor Theorem, Kernels, and Hilbert Spaces

The Representor Theorem, Kernels, and Hilbert Spaces The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

The Learning Problem and Regularization

The Learning Problem and Regularization 9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December

More information

New adaptive strategies for nonparametric estimation in linear mixed models

New adaptive strategies for nonparametric estimation in linear mixed models New adaptive strategies for nonparametric estimation in linear mixed models Charlotte Dion To cite this version: Charlotte Dion. New adaptive strategies for nonparametric estimation in linear mixed models.

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

Bayesian Models for Regularization in Optimization

Bayesian Models for Regularization in Optimization Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,

More information

Support Vector Machine for Classification and Regression

Support Vector Machine for Classification and Regression Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

Bayesian inference J. Daunizeau

Bayesian inference J. Daunizeau Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty

More information

Semi-Nonparametric Inferences for Massive Data

Semi-Nonparametric Inferences for Massive Data Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one

More information

Bayesian reconstruction of free energy profiles from umbrella samples

Bayesian reconstruction of free energy profiles from umbrella samples Bayesian reconstruction of free energy profiles from umbrella samples Thomas Stecher with Gábor Csányi and Noam Bernstein 21st November 212 Motivation Motivation and Goals Many conventional methods implicitly

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Integral approximation by kernel smoothing

Integral approximation by kernel smoothing Integral approximation by kernel smoothing François Portier Université catholique de Louvain - ISBA August, 29 2014 In collaboration with Bernard Delyon Topic of the talk: Given ϕ : R d R, estimation of

More information