Kernels to detect abrupt changes in time series
|
|
- Willa Washington
- 5 years ago
- Views:
Transcription
1 1 UMR 8524 CNRS - Université Lille 1 2 Modal INRIA team-project 3 SSB group Paris joint work with S. Arlot, Z. Harchaoui, G. Rigaill, and G. Marot Computational and statistical trade-offs in learning IHES Paris, March 22nd, 216 1/47
2 Outline 1 Motivating examples and framework (kernels) 2 KCP Algorithm and computational complexity 3 Where are the change-points (D fixed)? 4 How many change-points? 2/47
3 Change-point detection: 1-D signal (example) Signal Reg. func. Signal.2.2.4?? Position t 3/47
4 Detect abrupt changes... General purposes: 1 Detect changes in (features of) the distribution (not only in the mean) 4/47
5 Abrupt changes in high-order moments Detecting changes in the mean is useless 5/47
6 Detect abrupt changes... General purposes: 1 Detect changes in (features of) the distribution (not only in the mean) 2 Complex data: High-dimension: measures in R d, curves,... Structured: audio/video streams, graphs, DNA sequence,... 6/47
7 Motivating example 1: Structured objects Description: Video sequences from Le grand échiquier, 7s-8s French talk show. At each time, one observes an image (high-dimensional). Each image is summarized by a histogram. 7/47
8 Motivating example 2: Structured objects Observe networks along the time Goal: Detect abrupt changes in some features of the network 8/47
9 Detect abrupt changes... General purposes: 1 Detect changes in (features of) the distribution (not only in the mean) 2 Complex data: High-dimension: measures in R d, curves,... Structured: audio/video streams, graphs, DNA sequence,... 3 Fusion of heterogeneous data Deal simultaneously with different types of complex data 4 Efficient algorithm allowing to deal with large data sets ( Big data challenge) 9/47
10 I Kernel framework 1/47
11 Kernel and Reproducing Kernel Hilbert Space (RKHS) X 1,..., X n X : initial observations. k(, ) : X X R: reproducing kernel (Aronszajn (195)) H: RKHS associated with k(, ) (φ : X H s.t. φ(x) = k(x, ): canonical feature map) Assets: Versatile tool to work with different types of data Complex data (high dimensional/structured) 11/47
12 Instances of kernels Gaussian kernel: (with R d -valued data) [ ] x y 2 k δ (x, y) = exp, δ >. δ χ 2 -kernel:... (with histogram-valued data) k I (p, q) = exp [ I i=1 (p i q i ) 2 p i + q i ] 12/47
13 Model where 1 i n, Y i = φ(x i ) = µ i + ε i H, µ i H: mean element of P Xi (distribution of X i ) i, ε i := Y i µ i, with Eε i =, v i := E [ ] ε i 2 H Mean element of P Xi The mean element of P Xi : (H separable and E [ k(x, X ) ] < + ) < µ i, f > H = E Xi [ < φ(x i ), f > H ], f H. With characteristic kernels, P Xi P Xj µ i µ j.. 13/47
14 Estimation rather than identification Assumption µ = (µ 1,..., µ n) H n : piecewise constant. 1 Signal: Y Reg. func. s Fact: With finite sample, it is impossible to recover change-point in noisy regions. Purpose: Estimate µ to recover change-points. Performance measure: µ µ 2 := n i=1 µ i µ i 2 H 14/47
15 II Algorithm 15/47
16 Notation Segmentation with D segments: τ = (τ,..., τ D ), with = τ < τ 1 < τ 2 < < τ D = n Quality of a segmentation τ: Following Hachaoui and Cappé (27), R n (τ) = 1 n k(x i, X i ) 1 D 1 n n τ l τ l 1 i=1 l=1 τ l τ l i=τ l 1 +1 j=τ l 1 +1 Rk: With the linear kernel k(x, x ) =< x, x > on X = R d, R n (τ) reduces to the usual least-squares empirical risk. k(x i, X j ). 16/47
17 KCP Algorithm Input: observations: X 1,..., X n X, kernel: k : X X R, 17/47
18 KCP Algorithm Input: observations: X 1,..., X n X, kernel: k : X X R, Step 1: 1 D D max, compute: τ(d) Argmin τ T D n dynamic programming { } Rn (τ) T D n = { (τ,..., τ D ) N D+1 / = τ < τ 1 < τ 2 < < τ D = n } 17/47
19 KCP Algorithm Input: observations: X 1,..., X n X, kernel: k : X X R, Step 1: 1 D D max, compute: τ(d) Argmin τ T D n dynamic programming { } Rn (τ) Step 2: Find: { } D Argmin 1 D Dmax Rn ( τ(d)) + pen ( τ(d)) model selection Output: sequence of change-points: τ = τ ( D T D n = { (τ,..., τ D ) N D+1 / = τ < τ 1 < τ 2 < < τ D = n } ). 17/47
20 Computational complexity (Naive approach) Dynamic programming (DP) update rule: 2 D D max, L D,n = min t n 1 {L D 1,t + C t,n }, where L D 1,t : cost of the best segmentation in D 1 segments up to time t, C t,n : cost of the segment t,n. C s,t = t i=s+1 k(x i, X i ) 1 t s t t i=s+1 j=s+1 k(x i, X j ) Complexity (Naive approach): time: O(D max n 4 ) (computation of {C s,t } 1 s,t n ) space: O(n 2 ) (storage of the cost matrix) 18/47
21 Computational complexity (Improvement) Ideas: (with G. Rigaill and G. Marot) Never store the cost matrix Update each column C,t+1 from C,t Pseudo-code: 1: for t = 1 to n 1 do 2: Compute the (t + 1)-th column C,t+1 from C,t 3: for D = 2 to min(t, D max ) do 4: L D,t+1 = min s t {L D 1,s + C s,t+1 } 5: end for 6: end for Computational complexity Space: O(D max n) (only store C,t R n ) Time: O(D max n 2 ) (update rule+dp complexity) 19/47
22 Runtime Open questions: Reduce computation time by low-rank matrix approx. Quantify what has been lost by the approx. 2/47
23 III Where are the change-points for a fixed D? 21/47
24 KCP Algorithm (reminder) Input: observations: X 1,..., X n X, kernel: k : X X R, Step 1: 1 D D max, compute: τ(d) Argmin τ T D n dynamic programming { } Rn (τ) Step 2: Find: { } D Argmin 1 D Dmax Rn ( τ(d)) + pen ( τ(d)) model selection Output: sequence of change-points: τ = τ ( D T D n = { (τ,..., τ D ) N D+1 / = τ < τ 1 < τ 2 < < τ D = n } ). 22/47
25 Distance between segmentations Hausdorff distance: { d H (τ, τ ) = max max min 1 i D τ 1 1 j D τ 1 τ i τ, j max min 1 j D τ 1 1 i D τ 1 } τ i τ j Frobenius distance: d F (τ, τ ) = M τ M τ F = 1 i,j n ( 2, Mi,j i,j) τ Mτ where M τ i,j = 1 {i and j belong to the same segment of τ} Card(segment of τ containing i and j). 23/47
26 Empirical assessment Scenario 1: Changes in (mean,variance) R-valued X 1,..., X n with n = 1 True partition of 1,n in D = 11 segments In each segment, randomly choose a distrib. among 7 of them /47
27 Scenario 1: Changes in (mean,variance) with D = 11 Hausdorff and Frobenius distances 1 Frobenius Hausdorff 5 1 Frobenius Hausdorff Frobenius dist Hausdorff dist. Frobenius dist Hausdorff dist Dimension Dimension (a) Gaussian (k G ) (b) Linear (k Lin ) K Lin puts changes in noise 25/47
28 Scenario 1: Changes in (mean,variance) Cont. Change-points frequencies for D = D (5 repetitions).6.6 Freq. of selected chgpts Freq. of selected chgpts Position Position (a) Gaussian (k G ) (b) Linear (k Lin ) K Lin puts changes in noise 26/47
29 Empirical assessment Scenario 2: No change in (mean,variance) R-valued X 1,..., X n with n = 1 True partition of 1,n in D = 11 segments In each segment, randomly choose a distrib. among 3 of them /47
30 Scenario 2: No change in (mean,variance) Hausdorff and Frobenius distances 1 Frobenius Hausdorff 5 1 Frobenius Hausdorff Frobenius dist Hausdorff dist. Frobenius dist Hausdorff dist Dimension Dimension (a) Gaussian (k G ) (b) Linear (k Lin ) K Lin puts changes in noise 28/47
31 Scenario 2: No change in (mean,variance) Cont. Change-points frequencies for D = D.6.6 Freq. of selected chgpts Freq. of selected chgpts Position Position (a) Gaussian (k G ) (b) Linear (k Lin ) K Lin puts changes in noise 29/47
32 Scenario 2: No change in (mean,variance) Cont. Change-points frequencies for D = D.6.6 Freq. of selected chgpts Freq. of selected chgpts Position Position (a) Gaussian (k G ) (b) Hermite (k H 5 ) K H 5 less sensitive to changes than K G (characteristic kernels) 3/47
33 Empirical assessment Scenario 3: Histogram-valued data Histogram-valued X 1,..., X n with 2 bins and n = 1 True partition of 1,n in D = 11 segments In each segment, randomly choose DP(p 1,..., p 2 ) (Dirichlet) /47
34 Scenario 3: Histogram-valued data Hausdorff and Frobenius distances 1 Frobenius Hausdorff 5 1 Frobenius Hausdorff Frobenius dist Hausdorff dist. Frobenius dist Hausdorff dist Dimension Dimension (a) χ 2 (k χ2 ) (b) Gaussian (k G ) K G misses change-points by ignoring the structure of the data 32/47
35 Scenario 3: Histogram-valued data Cont. Change-points frequencies for D = D.6.6 Freq. of selected chgpts Freq. of selected chgpts Position Position (a) χ 2 (k χ2 ) (b) Gaussian (k G ) potential gain in exploiting the structure of the data 33/47
36 IV How many change-points? 34/47
37 KCP Algorithm (reminder) Input: observations: X 1,..., X n X, kernel: k : X X R, Step 1: 1 D D max, compute: τ(d) Argmin τ T D n dynamic programming { } Rn (τ) Step 2: Find: { } D Argmin 1 D Dmax Rn ( τ(d)) + pen ( τ(d)) model selection Output: sequence of change-points: τ = τ ( D T D n = { (τ,..., τ D ) N D+1 / = τ < τ 1 < τ 2 < < τ D = n } ). 35/47
38 Empirical risk minimizer Assumption: 1 i n, Y i = µ i + ε i µ = (µ 1,..., µ n) : piecewise-constant Model τ = (τ, τ 1,..., τ D ), (with τ = and τ D = n) Vector space (model): F τ = { (f 1,..., f n ) H n f τl 1 +1 = = f τl, 1 l D τ } (D τ : number of segments of τ) Estimator of µ : µ τ = Argmin f Fτ { Y f 2}, with f 2 = n f i 2 H i=1 µ τ = Π Fτ Y : orthogonal projection onto F τ 36/47
39 Choose the number of change-points Ideal penalty: τ Argmin τ Tn µ µ τ 2 (oracle segmentation) } = Argmin τ Tn { Y µ τ 2 + pen id (τ), with pen id (τ) := 2 Π τ ε 2 2 < (I Π τ )µ, ε >. Strategy 1 Concentration inequalities for linear and quadratic terms. 2 Derive a tight upper bound pen pen id with high probability. 37/47
40 Concentration of the quadratic term Assumptions: max i Y i H M a.s. (Db). [ ] max i E ε i 2 H v max (Vmax). Theorem (Quadratic term) Assuming (Db)-(Vmax), then for every τ T n, x >, θ (, 1], [ ] [ Π τ ε 2 E Π τ ε 2 ] θe Π τ µ µ τ 2 + θ 1 Lv max x, with probability at least 1 2e x, where L is a constant. Rks: No Gaussian or constant-variance assumption Deals with Hilbert-valued vectors (not only in R d ) The x deviation term allows large collections 38/47
41 Oracle inequality Theorem Assume (Db)-(Vmax). For every x >, } τ Argmin τ { Y µ τ 2 + pen(τ) where pen(τ) = D τ [ C 1 ln ( n Then with prob. 1 2e x, µ µ τ 2 1 inf τ D τ ) ] + C 2 (C 1, C 2 > ). { } µ µ τ 2 + pen(τ) where 1 1 and 2 > is a remainder term., + 2, Rk: [ ( ) In Birgé, Massart (21), pen(τ) = D τ c 1 ln n D τ + c 2 ]. 39/47
42 Model selection procedure Algorithm 1 For every 1 D D max, τ(d) Argmin τ, Dτ =D { Y µ τ 2}, 2 Define { Y [ ( D = Argmin D µ τ(d) 2 n ) ]} + D C 1 ln + C 2 D. where C 1, C 2 : computed by simulations (slope heuristics). 3 Final estimator: µ τ := µ τ( D). 4/47
43 Scenario 1: Changes in (mean,variance) Behavior of the penalized criterion 15 4 x Penalized crit Risk Empirical risk 1 Penalized crit Risk Empirical risk Dimension Dimension (a) Gaussian (k G ) (b) Hermite (k H 5 ) crit( τ(d)) looks like the risk for both k G and k H 5 41/47
44 Scenario 1: Changes in (mean,variance) Cont. Change-points frequencies and ˆD.5 Freq. of selected chgpts Position (a) Fequencies (exact recovery) (b) Selected dimension (D = 11) 42/47
45 Scenario 2: No change in (mean,variance) Behavior of the penalized criterion 15 8 x Penalized crit Risk Empirical risk 2 2 Penalized crit Risk Empirical risk Dimension Dimension (a) Gaussian (k G ) (b) Hermite (k H 5 ) crit( τ(d)) looks like the risk for both k G and k H 5 43/47
46 Scenario 2: No change in (mean,variance) Cont. Change-points frequencies and ˆD.5 Freq. of selected chgpts Position (a) Fequencies (exact recovery) (b) Selected dimension (D = 11) 44/47
47 Scenario 3: histogram-valued (Cont.) Behavior of the penalized criterion Penalized crit Risk Empirical risk Dimension Penalized crit Risk Empirical risk Dimension (a) χ 2 (k χ2 ) (b) Gaussian (k G ) Crit looks like the risk for both k G and k χ2 45/47
48 Concluding remarks Summary: detect changes in the distribution (not only in the mean) efficient and theoretically grounded procedure deal with both vectorial (R d ) and structured (graphs,... ) objects 46/47
49 Concluding remarks Summary: detect changes in the distribution (not only in the mean) efficient and theoretically grounded procedure deal with both vectorial (R d ) and structured (graphs,... ) objects Statistical precision/computation trade-offs: Open challenges Reduce the O(n 2 ) time complexity approx. to the Gram matrix Investigate the link between kernel and abrupt changes Revisit the slope heuristic to: (i) preserve accuracy, and (ii) save computation resources Thank you! 46/47
50 Concluding remarks Summary: detect changes in the distribution (not only in the mean) efficient and theoretically grounded procedure deal with both vectorial (R d ) and structured (graphs,... ) objects Statistical precision/computation trade-offs: Open challenges Reduce the O(n 2 ) time complexity approx. to the Gram matrix Investigate the link between kernel and abrupt changes Revisit the slope heuristic to: (i) preserve accuracy, and (ii) save computation resources Thank you! 46/47
51 47/47
52 Scenario 3: Histogram-valued (Cont.) Change-points frequencies and ˆD Freq. of selected chgpts Freq. of selected chgpts Position Position (a) χ 2 (k χ2 ) (b) Gaussian (k G ) 48/47
53 Sketch of proof 1 Π τ ε 2 = λ m 1 n λ i λ ε i 2 H = λ m T λ. { } 2 are independent r.v.. i λ ε i 2 H λ m 3 Bernstein s inequality to Π τ ε 2 ( ). 4 For every q 2, upper bound of E [ T q ] λ. 5 Pinelis-Sakhanenko s inequality on i λ ε i H : [ x >, P ε i > x x 2 2 exp 2 ( σ 2 H λ + b λx ) i λ ], with b λ = 2M/3 and σ 2 λ = i λ v i. 49/47
54 Bernstein rather than Talagrand Talagrand s inequality Π τ ε = sup f Bn < f, Π τ ε >= sup n f Bn i=1 < f i, (Π τ ε) i > H [ P Π τ ε E [ Π τ ε ] + 2vx + b3 ] x, with v = n i=1 sup f E ( < f i, (Π τ ε) i > 2 H) + 16bE [ Πτ ε ]. Bernstein s inequality σ 2 = sup f n i=1 E ( < f i, (Π τ ε) i > 2 ) [ ] H = E Π τ ε 2. 5/47
Kernel change-point detection
1,2 (joint work with Alain Celisse 3 & Zaïd Harchaoui 4 ) 1 Cnrs 2 École Normale Supérieure (Paris), DIENS, Équipe Sierra 3 Université Lille 1 4 INRIA Grenoble Workshop Kernel methods for big data, Lille,
More informationarxiv: v2 [math.st] 24 Mar 2016
A kernel multiple change-point algorithm via model selection arxiv:122.878v2 [math.st] 24 Mar 216 Sylvain Arlot sylvain.arlot@math.u-psud.fr Laboratoire de Mathématiques d Orsay Univ. Paris-Sud, CNRS,
More informationNew efficient algorithms for multiple change-point detection with kernels
New efficient algorithms for multiple change-point detection with kernels A. Célisse e,c, G. Marot a,c, M. Pierre-Jean a,b, G.J. Rigaill 1,b a Univ. Lille Droit et Santé EA 2694 - CERIM, F-59000 Lille,
More informationNew efficient algorithms for multiple change-point detection with kernels
New efficient algorithms for multiple change-point detection with kernels Alain Celisse, Guillemette Marot, Morgane Pierre-Jean, Guillem Rigaill To cite this version: Alain Celisse, Guillemette Marot,
More informationSegmentation of the mean of heteroscedastic data via cross-validation
Segmentation of the mean of heteroscedastic data via cross-validation 1 UMR 8524 CNRS - Université Lille 1 2 SSB Group, Paris joint work with Sylvain Arlot GDR Statistique et Santé Paris, October, 21 2009
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationModel selection theory: a tutorial with applications to learning
Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationUsing CART to Detect Multiple Change Points in the Mean for large samples
Using CART to Detect Multiple Change Points in the Mean for large samples by Servane Gey and Emilie Lebarbier Research Report No. 12 February 28 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry,
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, 15-16 September 2016 Introduction: Matrix Completion
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationConcentration, self-bounding functions
Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3
More informationComputational Oracle Inequalities for Large Scale Model Selection Problems
for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationAbstract. 1 Introduction. Cointerpretation of Flow Rate-Pressure-Temperature Data from Permanent Downhole Gauges. Deconvolution. Breakpoint detection
Cointerpretation of Flow Rate-Pressure-Temperature Data from Permanent Downhole Gauges CS 229 Course Final Report Chuan Tian chuant@stanford.edu Yue Li yuel@stanford.edu Abstract This report documents
More informationHigh-dimensional test for normality
High-dimensional test for normality Jérémie Kellner Ph.D Student University Lille I - MODAL project-team Inria joint work with Alain Celisse Rennes - June 5th, 2014 Jérémie Kellner Ph.D Student University
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationA Lower Bound Theorem. Lin Hu.
American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationThe Representor Theorem, Kernels, and Hilbert Spaces
The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...
More informationLeast squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova
Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium
More informationRandom Matrix Theory and its Applications to Econometrics
Random Matrix Theory and its Applications to Econometrics Hyungsik Roger Moon University of Southern California Conference to Celebrate Peter Phillips 40 Years at Yale, October 2018 Spectral Analysis of
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationBayesian Models for Regularization in Optimization
Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,
More informationPortfolio Allocation using High Frequency Data. Jianqing Fan
Portfolio Allocation using High Frequency Data Princeton University With Yingying Li and Ke Yu http://www.princeton.edu/ jqfan September 10, 2010 About this talk How to select sparsely optimal portfolio?
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationSemi-Nonparametric Inferences for Massive Data
Semi-Nonparametric Inferences for Massive Data Guang Cheng 1 Department of Statistics Purdue University Statistics Seminar at NCSU October, 2015 1 Acknowledge NSF, Simons Foundation and ONR. A Joint Work
More informationCS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings
CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,
More informationPack only the essentials: distributed sequential sampling for adaptive kernel DL
Pack only the essentials: distributed sequential sampling for adaptive kernel DL with Daniele Calandriello and Alessandro Lazaric SequeL team, Inria Lille - Nord Europe, France appeared in AISTATS 2017
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationSurrogate loss functions, divergences and decentralized detection
Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationStatistical Estimation: Data & Non-data Information
Statistical Estimation: Data & Non-data Information Roger J-B Wets University of California, Davis & M.Casey @ Raytheon G.Pflug @ U. Vienna, X. Dong @ EpiRisk, G-M You @ EpiRisk. a little background Decision
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationAdaptive estimation of the copula correlation matrix for semiparametric elliptical copulas
Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationUnbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods
Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationTwo-step data analysis for future satellite gravity field solutions: a simulation study
BOLLETTINO DI GEOFISICA TEORICA ED APPLICATA VOL. 40, N. 3-4, pp.6-66; SEP.-DEC. 999 Two-step data analysis for future satellite gravity field solutions: a simulation study J. KUSCHE, K. H. ILK and S.
More informationSemi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances
Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2
More informationBrownian Motion. 1 Definition Brownian Motion Wiener measure... 3
Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationWhen is MLE appropriate
When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationGroup lasso for genomic data
Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group
More informationUpper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1
Upper Bound for Intermediate Singular Values of Random Sub-Gaussian Matrices 1 Feng Wei 2 University of Michigan July 29, 2016 1 This presentation is based a project under the supervision of M. Rudelson.
More informationMessage passing and approximate message passing
Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationFactor-Adjusted Robust Multiple Test. Jianqing Fan (Princeton University)
Factor-Adjusted Robust Multiple Test Jianqing Fan Princeton University with Koushiki Bose, Qiang Sun, Wenxin Zhou August 11, 2017 Outline 1 Introduction 2 A principle of robustification 3 Adaptive Huber
More informationInformation Recovery from Pairwise Measurements
Information Recovery from Pairwise Measurements A Shannon-Theoretic Approach Yuxin Chen, Changho Suh, Andrea Goldsmith Stanford University KAIST Page 1 Recovering data from correlation measurements A large
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationComputation time/accuracy trade-off and linear regression
Computation time/accuracy trade-off and linear regression Maxime BRUNIN & Christophe BIERNACKI & Alain CELISSE Laboratoire Paul Painlevé, Université de Lille, Science et Technologie INRIA Lille-Nord Europe,
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationConvergence rates of spectral methods for statistical inverse learning problems
Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)
More information