Kernel change-point detection
|
|
- Amie Ramsey
- 5 years ago
- Views:
Transcription
1 1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille, 1st April /23
2 1-D signal (example) Signal Position t 2/23
3 1-D signal (example): Find abrupt changes in the mean Signal Reg. func. Signal ?? Position t 2/23
4 Estimation rather than identification 1 Signal: Y Reg. func. s 0.8 With a finite sample, it is impossible to recover some change-points in noisy regions Purpose: 1 Estimate the regression function. 2 Use the quadratic loss l(u, v) = u v 2. Rk: Without too strong noise, recover all change-points. 3/23
5 Detect abrupt changes... General purposes: 1 Detect changes in the whole distribution (not only in the mean) Mean: homoscedastic: Birgé & Massart (2001), Comte & Rozenholc (2002, 2004), Baraud, Giraud & Huet (2010)... heteroscedastic: A. & Celisse (2011) Mean and variance: Picard et al. (2007) 4/23
6 Detect abrupt changes... General purposes: 1 Detect changes in the whole distribution (not only in the mean) Mean: homoscedastic: Birgé & Massart (2001), Comte & Rozenholc (2002, 2004), Baraud, Giraud & Huet (2010)... heteroscedastic: A. & Celisse (2011) Mean and variance: Picard et al. (2007) 2 High-dimensional data of different nature: Vectorial: measures in R d, curves (sound recordings,... ) Non vectorial: phenotypic data, graphs, DNA sequence,... Both vectorial and non vectorial data. 4/23
7 Detect abrupt changes... General purposes: 1 Detect changes in the whole distribution (not only in the mean) Mean: homoscedastic: Birgé & Massart (2001), Comte & Rozenholc (2002, 2004), Baraud, Giraud & Huet (2010)... heteroscedastic: A. & Celisse (2011) Mean and variance: Picard et al. (2007) 2 High-dimensional data of different nature: Vectorial: measures in R d, curves (sound recordings,... ) Non vectorial: phenotypic data, graphs, DNA sequence,... Both vectorial and non vectorial data. 3 Efficient algorithm allowing to deal with large data sets 4/23
8 Kernel and Reproducing Kernel Hilbert Space (RKHS) X : initial input space. X 1,..., X n : initial observations. k(, ) : X X R: reproducing kernel (H: RKHS). φ( ) : X H s.t. φ(x) = k(x, ): canonical feature map. Asset: Enables to work with high-dimensional heterogeneous data. Rk: Estimators depend on the Gram matrix K := {k(x i, X j )} 1 i,j n. 5/23
9 Model Mapping of the initial data 1 i n, Y i = φ(x i ) H. (t 1, Y 1 ),..., (t n, Y n ) [0, 1] H : independent. 6/23
10 Model where 1 i n, Y i = s i + ε i H, s i H: mean element of P Xi (distribution of X i ) si, f H = E X i [ φ(x i ), f H ], f H. [ ] i, ε i := Y i si with E [ε i ] = 0 and v i := E ε i 2 H. 6/23
11 Model where 1 i n, Y i = s i + ε i H, s i H: mean element of P Xi (distribution of X i ) si, f H = E X i [ φ(x i ), f H ], f H. [ ] i, ε i := Y i si with E [ε i ] = 0 and v i := E ε i 2 H Assumptions 1 max i Y i H M a.s. (Db).. 2 max i v i v max (Vmax). 3 s = (s 1,..., s n ) H n : piecewise constant. s µ 2 := n i=1 s i µ i 2 H. Goal: Estimate s to recover change-points. 6/23
12 Least-squares estimator Empirical risk minimizer over S m (= model): ŝ m arg min u S m Rn (u) where Rn (u) = 1 n u Y 2 = 1 n Regressogram: ŝ m = λ m β λ 1 λ βλ = 1 Card {t i λ} n u i Y i 2 H. i=1 Y i. t i λ 7/23
13 Model selection Models: M n = {m, segmentation of {1,..., n }}, D m = Card(m). m { I 1 = [0, t m1 ], I 2 = (t m1, t m2 ],..., I Dm = (t mdm 1, 1] }. S m = {µ : (t 1,..., t n ) H, piecewise const. on all λ m } subspace of H n. Strategy: (S m ) m Mn (ŝ m ) m Mn ŝ m??? Oracle model: m argmin m Mn s ŝ m 2. 8/23
14 Model selection Models: M n = {m, segmentation of {1,..., n }}, D m = Card(m). m { I 1 = [0, t m1 ], I 2 = (t m1, t m2 ],..., I Dm = (t mdm 1, 1] }. S m = {µ : (t 1,..., t n ) H, piecewise const. on all λ m } subspace of H n. Strategy: (S m ) m Mn (ŝ m ) m Mn ŝ m??? Oracle model: m argmin m Mn s ŝ m 2. Goal: Oracle inequality (in expectation, or with large probability): { } s ŝ m 2 C inf s ŝ m 2 + R(m, n) m M n 8/23
15 Choose (D 1) change-points... Assumption: (Harchaoui & Cappé (2007)) Question: The number (D 1) of change-points is known. Find the locations of the (D 1) change-points? Strategy: (D is given). The best segmentation in D pieces is obtained by applying the ERM algorithm over D m=d S m : ERM algorithm: m ERM (D) = argmin m D m=d Rk: Based on dynamic programming. R n (ŝ m ). 9/23
16 Quality of the segmentations 10/23
17 Elementary calculations Ideal criterion: (Π m : orthog. proj. operator onto S m ) s ŝ m 2 = s Π m s 2 + Π m ε 2. Empirical risk: Y ŝ m 2 = s Π m s 2 Π m ε (I Π m )s, ε + ε 2. 11/23
18 Elementary calculations Ideal criterion: (Π m : orthog. proj. operator onto S m ) Empirical risk: s ŝ m 2 = s Π m s 2 + Π m ε 2. Y ŝ m 2 = s Π m s 2 Π m ε (I Π m )s, ε + ε 2. Expectations (v λ = 1 Card(λ) i λ v i) [ ] E s ŝ m 2 = s Π m s 2 + v λ, λ m [ ] E Y ŝ m 2 = s Π m s 2 v λ + Cst, λ m Conclusion: ERM prefers models with large λ m v λ (overfitting). 11/23
19 Choose the number of change-points From { ŝ md, choose D amounts to choose the best model. }D Ideal penalty: m argmin s ŝ m 2 m M = argmin m M { } Y ŝ m 2 + pen id (m), with pen id (m) =: 2 Π m ε 2 2 (I Π m )s, ε. Strategy 1 Concentration inequalities for linear and quadratic terms. 2 Derive a tight upper bound pen pen id with high probability. Previous work: Birgé & Massart (2001): Gaussian assumption + real valued functions. cannot be extended to Hilbert framework. 12/23
20 Concentration of the linear term Theorem (Linear term) Assume (Db) (Vmax) hold true. Then, for every segmentation m M n, for every x > 0 with probability at least 1 2e x, Π m s s, ε θ Π m s s 2 + for every θ > 0. ( vmax θ + 4M2 3 ) x, 13/23
21 Concentration of the quadratic term Theorem (Quadratic term) Assume (Db) (Vmax), and κ 1, 0 < M2 κ min v i (Vmin). i Then, for every m M n, x > 0, and θ (0, 1], Π m ε 2 E [ Π ] [ m ε 2 ] θe Π m s ŝ m 2 + θ 1 L(κ)v max x, with probability at least 1 2e x, where L(κ) is a constant. Idea of the proof: Pinelis-Sakhanenko s inequality ( i λ ε i H ). Bernstein s inequality (upper bounding moments) 14/23
22 Oracle inequality Theorem Assume (Db)-(Vmin)-(Vmax) and define { } 1 m argmin m n Y ŝ m 2 + pen(m), [ ( ) ] where pen(m) = vmaxdm n C 1 ln n D m + C 2 for constants C 1, C 2 > 0. Then, for every x 1, with probability at least 1 2e x, { } 1 n s ŝ m inf m n s ŝ m 2 + pen(m) + 2v max x n where 1 1 and 2 > 0 are absolute constants. In Birgé & Massart (2001), pen(m) = σ2 D m n [ c 1 ln ( n D m ) + c 2 ]., 15/23
23 Model selection procedure Algorithm pen(m) = v maxd m n 1 For every 1 D D max, 2 Define D = argmin D [ ( ) ] n C 1 ln + C 2 = pen(d m ). D m m D argmin m, D m=d { Y ŝ m 2 }, { 1 Y ŝ n md 2 v max D [ ( n ) ] } + C 1 ln + C 2. n D where C 1, C 2 : computed by simulation experiments. 3 Final estimator: ŝ m =: ŝ m D. 16/23
24 Changes in the distribution (synthetic data) Description: 1 n = 1 000, D 1 = 9, N rep = In each segment, observations generated according to one distribution within a pool of 10 distributions with same mean and variance. 3 Kernel-based approach enables to distinguish them (higher order moments) [ ] 4 Gaussian kernel: k h (x, y) = exp x y 2 /(2h 2 ). 17/23
25 Changes in the distribution (synthetic data), cont. Results Emp. risk Crit. True risk Hausdorff distance: ± /23
26 Changes in the distribution (synthetic data), cont. Results: estimated number of change-points /23
27 Le grand échiquier, 70s-80s French talk show Audio and video recordings. Audio: different situations can be distinguished from sound recordings (music, applause, speech,... ). Video: different video scenes can be distinguished by their backgrounds or specific actions of people (clapping hands, discussing,... ). 20/23
28 Audio signal Description: n = 500, D 1 = 4. At each t i, one observes a multivariate vector of dimension 12. [ ] Gaussian kernel: k h (x, y) = exp x y 2 /(2h 2 ). Results: Hausdorff distance ± /23
29 Video sequence Description: n = , D 1 = 4. Each image summarized by a histogram with bins. χ 2 kernel: k d (x, y) = d (x i y i ) 2 i=1 x i +y i Results: Hausdorff distance ± /23
30 Conclusion Take-home message: Change-point detection algorithm for possibly high-dimensional or complex data Data-driven choice of the number of change-points Non-asymptotic oracle inequality (guarantee on the risk) Experiments: changes in less usual properties of the distribution, audio or video data 23/23
31 Conclusion Take-home message: Change-point detection algorithm for possibly high-dimensional or complex data Data-driven choice of the number of change-points Non-asymptotic oracle inequality (guarantee on the risk) Experiments: changes in less usual properties of the distribution, audio or video data Open questions: 1 Influence of the choice of kernel 2 Data-driven choice of the kernel 3 Relax the assumption on the variance 4 Extend our model selection theorem to other regression settings 23/23
Kernels to detect abrupt changes in time series
1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES
More informationSegmentation of the mean of heteroscedastic data via cross-validation
Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009
More informationarxiv: v2 [math.st] 24 Mar 2016
A kernel multiple change-point algorithm via model selection arxiv:122.878v2 [math.st] 24 Mar 216 Sylvain Arlot sylvain.arlot@math.u-psud.fr Laboratoire de Mathématiques d Orsay Univ. Paris-Sud, CNRS,
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationModel selection theory: a tutorial with applications to learning
Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationSimultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression
Electronic Journal of Statistics Vol 008 1345 137 ISSN: 1935-754 DOI: 10114/08-EJS67 Simultaneous estimation of the mean and the variance in heteroscedastic Gaussian regression Xavier Gendre Laboratoire
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationHigh-dimensional test for normality
High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationNew efficient algorithms for multiple change-point detection with kernels
New efficient algorithms for multiple change-point detection with kernels Alain Celisse, Guillemette Marot, Morgane Pierre-Jean, Guillem Rigaill To cite this version: Alain Celisse, Guillemette Marot,
More informationNew efficient algorithms for multiple change-point detection with kernels
New efficient algorithms for multiple change-point detection with kernels A. Célisse e,c, G. Marot a,c, M. Pierre-Jean a,b, G.J. Rigaill 1,b a Univ. Lille Droit et Santé EA 2694 - CERIM, F-59000 Lille,
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationA survey of cross-validation procedures for model selection
PUB. IRMA, LILLE 2009 Vol. 69, N o VII A survey of cross-validation procedures for model selection Sylvain Arlot, CNRS ; Willow Project-Team, Laboratoire d Informatique de l École Normale Supérieure (CNRS/ENS/INRIA
More informationGAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE. By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA
Submitted to the Annals of Statistics GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE By Yannick Baraud, Christophe Giraud and Sylvie Huet Université de Nice Sophia Antipolis and INRA Let Y be a Gaussian
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationBayesian Model Selection for Change Point Detection and Clustering
Othmane Mazhar 1 Cristian R. Rojas 1 Carlo Fischione 1 Mohammad Reza Hesamzadeh 1 Abstract We address a generalization of change point detection with the purpose of detecting the change locations and the
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationReferences for online kernel methods
References for online kernel methods W. Liu, J. Principe, S. Haykin Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, 2010. W. Liu, P. Pokharel, J. Principe. The kernel least mean square
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationUsing CART to Detect Multiple Change Points in the Mean for large samples
Using CART to Detect Multiple Change Points in the Mean for large samples by Servane Gey and Emilie Lebarbier Research Report No. 12 February 28 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry,
More informationA spectral clustering algorithm based on Gram operators
A spectral clustering algorithm based on Gram operators Ilaria Giulini De partement de Mathe matiques et Applications ENS, Paris Joint work with Olivier Catoni 1 july 2015 Clustering task of grouping
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationRisk bounds for model selection via penalization
Probab. Theory Relat. Fields 113, 301 413 (1999) 1 Risk bounds for model selection via penalization Andrew Barron 1,, Lucien Birgé,, Pascal Massart 3, 1 Department of Statistics, Yale University, P.O.
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationAdaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities
Probab. Theory Relat. Fields 126, 103 153 2003 Digital Object Identifier DOI 10.1007/s00440-003-0259-1 Patricia Reynaud-Bouret Adaptive estimation of the intensity of inhomogeneous Poisson processes via
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationComputation time/accuracy trade-off and linear regression
Computation time/accuracy trade-off and linear regression Maxime BRUNIN & Christophe BIERNACKI & Alain CELISSE Laboratoire Paul Painlevé, Université de Lille, Science et Technologie INRIA Lille-Nord Europe,
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationHilbert Space Methods in Learning
Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationConcentration, self-bounding functions
Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationarxiv:math/ v3 [math.st] 1 Apr 2009
The Annals of Statistics 009, Vol. 37, No., 630 67 DOI: 10.114/07-AOS573 c Institute of Mathematical Statistics, 009 arxiv:math/070150v3 [math.st] 1 Apr 009 GAUSSIAN MODEL SELECTION WITH AN UNKNOWN VARIANCE
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationNon-linear Regression Approaches in ABC
Non-linear Regression Approaches in ABC Michael G.B. Blum Lab. TIMC-IMAG, Université Joseph Fourier, CNRS, Grenoble, France ABCiL, May 5 2011 Overview 1 Why using (possibly non-linear) regression adjustment
More informationMessage passing and approximate message passing
Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationConvergence rates of spectral methods for statistical inverse learning problems
Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationLearning Sparse Penalties for Change-Point Detection using Max Margin Interval Regression
Learning Sparse Penalties for Change-Point Detection using Max Margin Interval Regression Guillem Rigaill rigaill@evry.inra.fr Unité de Recherche en Génomique Végétale INRA-CNRS-Université d Evry Val d
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum Gatsby Unit, University College London wittawat@gatsby.ucl.ac.uk Probabilistic Graphical Model Workshop 2017 Institute
More informationGAUSSIANIZATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS
GAUSSIANIATION METHOD FOR IDENTIFICATION OF MEMORYLESS NONLINEAR AUDIO SYSTEMS I. Marrakchi-Mezghani (1),G. Mahé (2), M. Jaïdane-Saïdane (1), S. Djaziri-Larbi (1), M. Turki-Hadj Alouane (1) (1) Unité Signaux
More informationarxiv: v1 [math.st] 10 May 2009
ESTIMATOR SELECTION WITH RESPECT TO HELLINGER-TYPE RISKS ariv:0905.1486v1 math.st 10 May 009 YANNICK BARAUD Abstract. We observe a random measure N and aim at estimating its intensity s. This statistical
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationAn overview of a nonparametric estimation method for Lévy processes
An overview of a nonparametric estimation method for Lévy processes Enrique Figueroa-López Department of Mathematics Purdue University 150 N. University Street, West Lafayette, IN 47906 jfiguero@math.purdue.edu
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationThe Representor Theorem, Kernels, and Hilbert Spaces
The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationThe Learning Problem and Regularization
9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationDiscussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan
Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are
More informationKernel Bayes Rule: Nonparametric Bayesian inference with kernels
Kernel Bayes Rule: Nonparametric Bayesian inference with kernels Kenji Fukumizu The Institute of Statistical Mathematics NIPS 2012 Workshop Confluence between Kernel Methods and Graphical Models December
More informationNew adaptive strategies for nonparametric estimation in linear mixed models
New adaptive strategies for nonparametric estimation in linear mixed models Charlotte Dion To cite this version: Charlotte Dion. New adaptive strategies for nonparametric estimation in linear mixed models.
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationRKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee
RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators
More informationBayesian Models for Regularization in Optimization
Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,
More informationSupport Vector Machine for Classification and Regression
Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one
More informationBayesian reconstruction of free energy profiles from umbrella samples
Bayesian reconstruction of free energy profiles from umbrella samples Thomas Stecher with Gábor Csányi and Noam Bernstein 21st November 212 Motivation Motivation and Goals Many conventional methods implicitly
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationIntegral approximation by kernel smoothing
Integral approximation by kernel smoothing François Portier Université catholique de Louvain - ISBA August, 29 2014 In collaboration with Bernard Delyon Topic of the talk: Given ϕ : R d R, estimation of
More information