Convergence rates of spectral methods for statistical inverse learning problems
|
|
- Tyrone Patterson
- 5 years ago
- Views:
Transcription
1 Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München) G. Blanchard Rates for statistical inverse learning 1 / 39
2 1 The inverse learning setting 2 Rates for linear spectral regularization methods 3 Rates for conjugate gradient regularization G. Blanchard Rates for statistical inverse learning 2 / 39
3 1 The inverse learning setting 2 Rates for linear spectral regularization methods 3 Rates for conjugate gradient regularization G. Blanchard Rates for statistical inverse learning 3 / 39
4 DETERMINISTIC AND STATISTICAL INVERSE PROBLEMS Let A be a bounded operator between Hilbert spaces H 1 H 2 (assumed known) Classical (deterministic) inverse problem: observe y σ = Af + ση, (IP) under the assumption η 1. Note: the H 2 -norm measures the observation error; the H 1 -norm measures the reconstruction error. Classical deterministic theory: see Engl, Hanke and Neubauer (2000). G. Blanchard Rates for statistical inverse learning 4 / 39
5 DETERMINISTIC AND STATISTICAL INVERSE PROBLEMS Inverse problem y σ = Af + ση. (IP) What if noise is random? Classical statistical inverse problem model: η is a Gaussian white noise process on H 2. Note: in this case (IP) is not an equation between elements in H 2, but is to be interpreted as process on H 2. Under Hölder source condition of order r and polynomial ill-posedness (eigenvalue decay) of order 1/s, sharp minimax rates are known in this setting: (A A) θ ( f f ) H1 ) O (σ 2(r+θ) 2r+1+s O ) (σ 2(ν+bθ) 2ν+b+1, for θ [0, 1 2 ] (θ = 0: inverse problem; θ = 1 2 : direct problem.) (Alternate parametrization: b := 1/s, ν := rb intrinsic regularity.) G. Blanchard Rates for statistical inverse learning 5 / 39
6 LINEAR SPECTRAL REGULARIZATION METHODS Inverse problem (deterministic or statistical) where A is known. First consider the so-called normal equation : A y σ = (A A)f + σ(a η). Linear spectral methods: let ζ λ (x) : R + R + be a real function of 1 real variable which is an approximation of 1/x and λ > 0 a tunig parameter. Define fλ = ζ λ (A A)A y σ Examples: Tikhonov ζ λ (x) = (x + λ) 1, spectral cut-off ζ λ (x) = x 1 1{x λ}, Landweber iteration polynomials, ν-methods... Under general conditions on ζ λ, optimal/mimimax rates can be attained by such methods (Deterministic: Engl et al., 2000; Stochastic noise: Bissantz et al, 2007) G. Blanchard Rates for statistical inverse learning 6 / 39
7 STATISTICAL LEARNING Learning usually refers to the following setting: where Y R, (X i, Y i ) i=1,...,n i.i.d. P XY on X Y Goal: estimate some functional related to the dependency between X and Y, for instance (nonparametric) least squares regression: estimate f (x) := E [Y X = x], and measure the quality of an estimator f via f f [ 2 ) ] 2 = E X PX ( f (X) f (X) L 2 (P X ) G. Blanchard Rates for statistical inverse learning 7 / 39
8 SETTING: INVERSE LEARNING PROBLEM We refer to inverse learning for an inverse problem where we have noisy observations at random design points: (X i, Y i ) i=1,...,n i.i.d. : Y i = (Af )(X i ) + ε i. (ILP) the goal is to recover f H 1. early works on closely related subjects: from the splines literature in the 80 s (e.g. O Sullivan 90) G. Blanchard Rates for statistical inverse learning 8 / 39
9 MAIN ASSUMPTION FOR INVERSE LEARNING Model: Y i = (Af )(X i ) + ε i, i = 1,..., n, where A : H 1 H 2. (ILP) Observe: H 2 should be a space of real-values functions on X. the geometrical structure of the measurement errors will be dictated by the statistical properties of the sampling scheme we do not need to assume or consider any a priori Hilbert structure on H 2 the crucial stuctural assumption we make is the following: Assumption The family of evaluation functionals (S x ), x X, defined by S x : H 1 R f (S x )(f ) := (Af )(x) is uniformly bounded, i.e., there exists κ < such that for any x X S x (f ) κ f H1. G. Blanchard Rates for statistical inverse learning 9 / 39
10 GEOMETRY OF INVERSE LEARNING The inverse learning setting was essentially introduced by Caponnetto et al. (2006). Riesz s theorem implies the existence for any x X of F x H 1 : f H 1 : (Af )(x) = f, F x K (x, y) := F x, F y defines a positive semidefinite kernel on X with associated reproducing kernel Hilbert space (RKHS) denoted H K. as a pure function space, H K coincides with Im(A). assuming A injective, A is in fact an isometric isomorphism between H 1 and H K. G. Blanchard Rates for statistical inverse learning 10 / 39
11 GEOMETRY OF INVERSE LEARNING Main assumption implies that as a function space, Im(A) is endowed with a natural RKHS structure with a kernel K bounded by κ. Furthermore this RKHS H K is isometric to H 1 (through A 1 ). Therefore, the inverse learning problem is formally equivalent to the kernel learning problem Y i = h (X i ) + ε i, i = 1,..., n where h H K, and we measure the quality of an estimator ĥ H K via the RKHS norm ĥ HK h Indeed, if we put f := A 1ĥ, then f f H1 = A( f f ) HK = ĥ h HK G. Blanchard Rates for statistical inverse learning 11 / 39
12 SETTING, REFORMULATED We are actually back to the familiar regression setting on a random design, Y i = h (X i ) + ε i, where (X i, Y i ) 1 i n is an i.i.d. sample from P XY on the space X R, with E [ε i X i ] = 0. Noise assumptions: (BernsteinNoise) E [ ε p i X ] 1 i 2 p!mp, p 2 h is assumed to lie in a (known) RKHS H K with bounded kernel K. The criterion for measuring the quality of an estimator ĥ is the RKHS norm ĥ h HK. G. Blanchard Rates for statistical inverse learning 12 / 39
13 1 The inverse learning setting 2 Rates for linear spectral regularization methods 3 Rates for conjugate gradient regularization G. Blanchard Rates for statistical inverse learning 13 / 39
14 EMPIRICAL AND POPULATION OPERATORS Define the (random) empirical evaluation operator T n : h H (h(x 1 ),..., h(x n )) R n and its population counterpart the inclusion operator T : h H h L 2 (X, P X ); the (random) empirical kernel integral operator Tn : (v 1,..., v n ) R n 1 n K (X i,.)v i H n and its population counterpart, the kernel integral operator T : f L 2 (X, P X ) T (f ) = f (x)k(x,.)dp X (x) H. i=1 finally, define the empirical covariance operator S n = T n T n and its population counterpart S = T T. observe that S n, S are both opertors H K H K ; the intuition is that S n is a (random) approximation of S. G. Blanchard Rates for statistical inverse learning 14 / 39
15 Recall the model with h H K : Y i = h (X i ) + ε i i.e. Y = T n h + ε, where Y := (Y 1,..., Y n ). Associated normal equation : Z = Tn Y = Tn T n h + Tn ε = S n h + Tn ε Idea (Rosasco, Caponnetto, De Vito, Odone): use methods from inverse problems literature Observe that there is also an error on the operator Use concentration principles to bound T n ε and S n S. G. Blanchard Rates for statistical inverse learning 15 / 39
16 LINEAR SPECTRAL REGULARIZATION METHODS Linear spectral methods: ĥ ζ = ζ(s n )Z for somme well-chosen function ζ : R R acting on the spectrum and approximating the function x x 1. Examples: Tikhonov ζ λ (t) = (t + λ) 1, spectral cut-off ζ λ (t) = t 1 1{t λ}, Landweber iteration polynomials, ν-methods... G. Blanchard Rates for statistical inverse learning 16 / 39
17 SPECTRAL REGULARIZATION IN KERNEL SPACE Linear spectral regularization in kernel space is written ĥ ζ = ζ(s n )T n Y notice ζ(s n )Tn = ζ(tn T n )Tn = Tn ζ(t n Tn ) = Tn ζ(k n ), where K n = T n Tn : R n R n is the kernel Gram matrix, K n (i, j) = 1 n K (X i, X j ). equivalently: with ĥ ζ = n α ζ,i K (X i,.) i=1 α ζ = 1 n ζ ( 1 n K n ) Y. G. Blanchard Rates for statistical inverse learning 17 / 39
18 STRUCTURAL ASSUMPTIONS Two parameters determine attainable convergence rates: (Hölder) Source condition for the signal: for r > 0, define SC(r, R) : h = S r h 0 with h o R (can be generalized to extended source conditions, see e.g. Mathé and Pereverzev 2003) Ill-posedness: if (λ i ) i 1 is the sequence of positive eigenvalues of S in nonincreasing order, then define and IP + (s, β) : IP (s, β ) : λ i βi 1 s λ i β i 1 s G. Blanchard Rates for statistical inverse learning 18 / 39
19 ERROR/RISK MEASURE We are measuring the error (risk) of an estimator ĥ in the family of norms S θ (ĥ ) h (θ [0, 1 HK 2 ]) Note θ = 0: inverse problem; θ = 1/2: direct problem, since S 1 2 ( ĥ h ) = ĥ h L2. HK (P X ) G. Blanchard Rates for statistical inverse learning 19 / 39
20 PREVIOUS RESULTS [1]: Smale and Zhou (2007) [2]: Bauer, Pereverzev, Rosasco (2007) [3]: Caponnetto, De Vito (2007) [4]: Caponnetto (2006) Error [1] [2] [3] [4] ( ) 2r+1 ( ) 2r+1 ( ) (2r+1) ( ĥ L2 h 2r+2 2r+2 2r+1+s (P X ) n n n ( ) r ( ) r ĥ h HK r+1 r n n ) (2r+1) 2r+1+s 1 n Assumptions r 1 2 r q 1 2 r r q 1 2 (q: qualification) +unlabeled data if 2r + s < 1 Method Tikhonov General Tikhonov General Matching lower bound: only for ĥ h L2 [2]. (P X ) Compare to results known for regularization methods under Gaussian White Noise model: Mair and Ruymgaart (1996), Nussbaum and Pereverzev (1999), Bissantz, Hohage, Munk and Ruymgaart (2007). G. Blanchard Rates for statistical inverse learning 20 / 39
21 ASSUMPTIONS ON REGULARIZATION FUNCTION From now on we assume κ = 1 for simplicity. Standard assmptions on the regularization family ζ λ : [0, 1] R are: (i) There exists a constant D < such that sup sup 0<λ 1 0<t 1 tζ λ (t) D, (ii) There exists a constant M < such that sup sup 0<λ 1 0<t 1 λ ζ λ (t) M, (iii) Qualification: λ 1 : sup 1 tζ λ (t) t ν γ ν λ ν. 0<t 1 holds for ν = 0 and ν = q > 0. G. Blanchard Rates for statistical inverse learning 21 / 39
22 UPPER BOUND ON RATES Theorem (Mücke, Blanchard) Assume r, R, b, β are fixed positive constants and let P(r, R, s, β) denote the set of distributions on X Y satisfying (IP + )(s, β), (SC)(r, R) and (BernsteinNoise). Define ĥ (n) λ n = ζ λn (S n )Z (n) using a regularization family (ζ λ ) satisfying the standard assumptions with qualification q r + θ, and the parameter choice rule ( R 2 σ 2 λ n = n ) 1 2r+1+s. it holds for any θ [0, 1 2 ], η (0, 1): sup P n P P(r,R,s,β) ( S θ (h σ ĥ(n) λ n ) > C(log η 1 2 )R HK R 2 n ) (r+θ) 2r+1+s η. G. Blanchard Rates for statistical inverse learning 22 / 39
23 COMMENTS it follows that the convergence rate obtained is of order ( ) (r+θ) σ 2 2r+1+s C.R R 2 n the constant C depends on the various parameters entering in the assumptions, but not on n, R, σ! the result applies to all linear spectral regularization methods but assuming a precise tuning of the regularization constant λ as a function of the assumed regularization parameters of the target not adaptive. G. Blanchard Rates for statistical inverse learning 23 / 39
24 WEAK LOWER BOUND ON RATES Theorem (Mücke, Blanchard) Assume r, R, s, β are fixed positive constants and let P (r, R, s, β) denote the set of distributions on X Y satisfying (IP )(s, β), (SC)(r, R) and (BernsteinNoise). (We assume this set to be non empty!) Then lim sup n inf ĥ sup P n P P (r,r,s,β) S θ (h ĥ) HK ( ) (r+θ) σ 2 2r+1+s > CR R 2 n > 0 Proof: Fano s lemma technique G. Blanchard Rates for statistical inverse learning 24 / 39
25 STRONG LOWER BOUND ON RATES Assume additionally no big jumps in eigenvalues : λ 2k inf > 0 k 1 λ k Theorem (Mücke, Blanchard) Assume r, R, s, β are fixed positive constants and let P (r, R, s, β) denote the set of distributions on X Y satisfying (IP )(s, β), (SC)(r, R) and (BernsteinNoise). (We assume this set to be non empty!) Then lim inf inf sup P n n ĥ P P (r,r,s,β) S θ (h ĥ) HK ( ) (r+θ) σ 2 2r+1+s > CR R 2 n > 0 Proof: Fano s lemma technique G. Blanchard Rates for statistical inverse learning 25 / 39
26 COMMENTS obtained rates are minimax (but not adaptive) in the parameters R, n, σ provided (IP )(s, β) (IP + )(s, α) is not empty. G. Blanchard Rates for statistical inverse learning 26 / 39
27 STATISTICAL ERROR CONTROL Error controls were introduced and used by Caponnetto and De Vito (2007), Caponnetto (2007), using Bernstein s inequality for Hilbert space-valued variables (see Pinelis and Sakhanenko; Yurinski). Theorem (Caponetto, De Vito) Define N (λ) = Tr( (S + λ) 1 S ), then under assumption (BernsteinNoise) we have the following: [ (S ( ) ] P + λ) 1 2 (T n Y S n h N (λ) ) 2M + 2 log 6 1 δ. n λn δ Also, the following holds: [ (S ( P + λ) 1 N (λ) 2 (Sn S) 2 HS n ) ] + 2 log 6 1 δ. λn δ G. Blanchard Rates for statistical inverse learning 27 / 39
28 1 The inverse learning setting 2 Rates for linear spectral regularization methods 3 Rates for conjugate gradient regularization G. Blanchard Rates for statistical inverse learning 28 / 39
29 PARTIAL LEAST SQUARES REGULARIZATION Consider first the classical linear regression setting Y = Xω + ε, where Y := (Y 1,..., Y n ); X := (X 1,..., X n ) t ; ε = (ε 1,..., ε n ). Algorithmic description of Partial Least Squares: find direction v 1 s.t. Ĉov( X, v, Y ) Y t Xv v 1 = Arg Max = Arg Max v R v d v R v Xt Y d project Y orthogonally on Xv yielding Y 1 iterate the procedure on the residual Y Y 1 The fit at step m is m i=1 Y i. Regularization is obtained by early stopping. G. Blanchard Rates for statistical inverse learning 29 / 39
30 PLS AND CONJUGATE GRADIENT An equivalent definition of PLS: where ω m = is a Krylov space of order m. Arg Min Y Xω 2 ω K m(xx t,x t Y) K m (A, z) = span { z, Az,..., A m 1 z } G. Blanchard Rates for statistical inverse learning 30 / 39
31 PLS AND CONJUGATE GRADIENT An equivalent definition of PLS: where ω m = is a Krylov space of order m. Arg Min Y Xω 2 ω K m(xx t,x t Y) K m (A, z) = span { z, Az,..., A m 1 z } This definition is equivalent to m steps of the conjugate gradient algorithm applied to iteratively solve the linear equation Aω = X t Xω = X t Y = z G. Blanchard Rates for statistical inverse learning 30 / 39
32 PLS AND CONJUGATE GRADIENT An equivalent definition of PLS: where ω m = is a Krylov space of order m. Arg Min Y Xω 2 ω K m(xx t,x t Y) K m (A, z) = span { z, Az,..., A m 1 z } This definition is equivalent to m steps of the conjugate gradient algorithm applied to iteratively solve the linear equation Aω = X t Xω = X t Y = z For any fixed m, the fit Y m = Xω m is a nonlinear function of Y. G. Blanchard Rates for statistical inverse learning 30 / 39
33 PROPERTIES OF CONJUGATE GRADIENT by definition ω m has the form ω m = p m (A)z = X t p m (XX t )Y, where p m is a polynomial of degree m 1. of particular interest are the residual polynomials r m (t) = 1 tp m (t) ; Y Y m = r m (XX t )Y the polynomials r m form a family of orthogonal polynomials for the inner product p, q = p(xx t )Y, XX t q(xx t )Y and with the normalization r m (0) = 1. the polynomials r m follow an order 2 recurrence relation of the type ( simple implementation) r m+1 (t) = a m tr m (t) + b m r m (t) + c m r m 1 (t) G. Blanchard Rates for statistical inverse learning 31 / 39
34 ALGORITHM FOR CG/PLS Initialize: ω 0 = 0; r 0 = X t Y; g 0 = r 0 for m = 0,..., (m max 1) do α m = Xr m 2 / X t Xg m 2 ω m+1 = ω m + α m g m (update) r m+1 = r m α m X t Xg m (residuals) β m = Xr m+1 2 / Xr m 2 g m+1 = r m+1 + β m g m (next direction) end for Return: approximate solution ω mmax G. Blanchard Rates for statistical inverse learning 32 / 39
35 KERNEL-CG REGULARIZATION ( KERNEL PARTIAL LEAST SQUARES) Define the m-th iterate of CG as ĥ CG(m) = where K m denotes Krylov space: Arg Min Tn Y h H, h K m(s n,tn Y) K m (A, z) = span { z, Az,..., A m 1 z } equivalently: α CG(m) = Arg Min K 1 2 α K m(k n,y) n (Y K n α) 2 and ĥ CG(m) = n α CG(m),i K (X i,.). i=1 G. Blanchard Rates for statistical inverse learning 33 / 39
36 RATES FOR CG Consider the following stopping rule for some fixed τ { ( ) r+1 } m := min m 0 : Tn 1 2r+1+s 6 (T n ĥ CG(m) Y) τ n log2. (1) δ Theorem (Blanchard, Krämer) Assume (BernsteinNoise), SC(r, R), IP(s, β) hold; let θ [0, 1 2 ). Then for τ large enough, with probability larger than 1 δ : S θ (ĥcg( m) h ) Hk c(r, R, s, β, τ) ( ) r+θ 1 2r+1+s 6 n log2 δ. Technical tools: again, concentration of the error in appropriate norm, and suitable reworking of the arguments of Nemirovskii (1980) for deterministic CG. G. Blanchard Rates for statistical inverse learning 34 / 39
37 OUTER RATES It it natural (for the prediction problem) to assume extension of source condition for h H (now assuming h L 2 (P X )) SC outer (r, R) : B (r+ 1 2 ) h L R (for B := TT ) 2 to include the possible range r ( 1 2, 0]. For such outer source conditions, even for Kernel ridge regression and for the direct (=prediction) problem, there are no known results without additional assumptions to reach the optimal rate O ( n r r+1+s ). Mendelson and Neeman (2009) make assumptions on the sup norm of the eigenfunctions of the integral operator Caponnetto (2006) assumes additional unlabeled examples X n+1,..., Xñ are available, with ñ ( ) n O n (1 2r s) + 2r+1+s G. Blanchard Rates for statistical inverse learning 35 / 39
38 CONSTRUCTION WITH UNLABELED DATA assume n i.i.d. X-examples are available, out of which n are labeled. extend the n vector Y to a ñ-vector Ỹ = ñ n (Y 1,..., Y n, 0,..., 0) perform the same algorithm as before on X, Ỹ. notice in particular that T ñ Ỹ = T n Y. G. Blanchard Rates for statistical inverse learning 36 / 39
39 CONSTRUCTION WITH UNLABELED DATA assume n i.i.d. X-examples are available, out of which n are labeled. extend the n vector Y to a ñ-vector Ỹ = ñ n (Y 1,..., Y n, 0,..., 0) perform the same algorithm as before on X, Ỹ. notice in particular that T ñ Ỹ = T n Y. Recall: ĥ CG1(m) = Arg Min T n Y h H h K m( S n,tn Y) equivalently: α = Arg Min K ) n (Ỹ Kn α ω K m( K n,ỹ) G. Blanchard Rates for statistical inverse learning 36 / 39
40 OUTER RATES FOR CG REGULARIZATION Consider the following stopping rule for some fixed τ > 3 2, { ( ) r+1 } m := min m 0 : Tn 4β 2r+1+s 6 (T n ĥ CG(m) Y) τm n log2. (2) δ Furthermore assume (BoundedY) : Y M a.s. Theorem Assume (BoundedY), SC outer (r, R), IP + (s, β), and r ( min(s, 1 2 ), 0). ( ) ( 2r) Assume unlabeled data is available with ñ n 16β 2 + n log 2 6 2r+1+s δ. Then for θ [0, r ), with probability larger than 1 δ : B θ (Th m h ) ( ) r+ 2 1 θ 16β 2 2r+1+s 6 L 2 c(r, τ)(m + R) n log2 δ. G. Blanchard Rates for statistical inverse learning 37 / 39
41 OVERVIEW: inverse problem setting under random i.i.d. design scheme ( learning setting ), for source condition: Hölder of order r ; for ill-posedness: polynomial decay of eigenvalues of order s ; rates of the form (for θ [0, 1 2 ]): ) S θ (h ĥ) HK O (n (r+θ) 2r+1+s. rates established for general linear spectral methods, as well as CG. matching lower bound. matches classical rates in the white noise model (=sequence model) with σ 2 n. extension to outer rates (r ( 1 2, 0)) if additional unlabeled data available. G. Blanchard Rates for statistical inverse learning 38 / 39
42 CONCLUSION/PERSPECTIVES We filled gaps in the existing picture for inverse learning methods... Adaptivity? Ideally attain optimal rates without a priori knowledge of r nor of s! Lepski s method/balancing principle: in progress. Need a good estimator for N (λ)! (Prior work on this: Caponnetto; need some sharper bound) Hold-out principle: only valid for direct problem? But optimal parameter does not depend on risk norm: hope for validity in inverse case. G. Blanchard Rates for statistical inverse learning 39 / 39
Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods
Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,
More informationEmpirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods
Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More informationAn inverse problem perspective on machine learning
An inverse problem perspective on machine learning Lorenzo Rosasco University of Genova Massachusetts Institute of Technology Istituto Italiano di Tecnologia lcsl.mit.edu Feb 9th, 2018 Inverse Problems
More informationKernel Partial Least Squares is Universally Consistent
Kernel Partial Least Squares is Universally Consistent arxiv:0902.4380v2 [stat.me] 14 Jan 2010 Gilles Blanchard Weierstrass Institute for Applied Analysis and Stochastics Mohrenstr. 39 10117 Berlin, Germany
More informationEarly Stopping for Computational Learning
Early Stopping for Computational Learning Lorenzo Rosasco Universita di Genova, Massachusetts Institute of Technology Istituto Italiano di Tecnologia CBMM Sestri Levante, September, 2014 joint work with
More informationStatistical Inverse Problems and Instrumental Variables
Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut für Numerische und Angewandte Mathematik University of Göttingen Workshop on Inverse and Partial Information Problems: Methodology
More informationOptimal Rates for Multi-pass Stochastic Gradient Methods
Journal of Machine Learning Research 8 (07) -47 Submitted 3/7; Revised 8/7; Published 0/7 Optimal Rates for Multi-pass Stochastic Gradient Methods Junhong Lin Laboratory for Computational and Statistical
More informationarxiv: v1 [math.st] 28 May 2016
Kernel ridge vs. principal component regression: minimax bounds and adaptability of regularization operators Lee H. Dicker Dean P. Foster Daniel Hsu arxiv:1605.08839v1 [math.st] 8 May 016 May 31, 016 Abstract
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011
More informationSpectral Regularization
Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationTwo-parameter regularization method for determining the heat source
Global Journal of Pure and Applied Mathematics. ISSN 0973-1768 Volume 13, Number 8 (017), pp. 3937-3950 Research India Publications http://www.ripublication.com Two-parameter regularization method for
More informationRegularization via Spectral Filtering
Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationConjugate Gradient algorithm. Storage: fixed, independent of number of steps.
Conjugate Gradient algorithm Need: A symmetric positive definite; Cost: 1 matrix-vector product per step; Storage: fixed, independent of number of steps. The CG method minimizes the A norm of the error,
More informationORACLE INEQUALITY FOR A STATISTICAL RAUS GFRERER TYPE RULE
ORACLE INEQUALITY FOR A STATISTICAL RAUS GFRERER TYPE RULE QINIAN JIN AND PETER MATHÉ Abstract. We consider statistical linear inverse problems in Hilbert spaces. Approximate solutions are sought within
More informationSpectral Filtering for MultiOutput Learning
Spectral Filtering for MultiOutput Learning Lorenzo Rosasco Center for Biological and Computational Learning, MIT Universita di Genova, Italy Plan Learning with kernels Multioutput kernel and regularization
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationIterative Convex Regularization
Iterative Convex Regularization Lorenzo Rosasco Universita di Genova Universita di Genova Istituto Italiano di Tecnologia Massachusetts Institute of Technology Optimization and Statistical Learning Workshop,
More informationOn regularization algorithms in learning theory
Journal of Complexity 23 (2007) 52 72 www.elsevier.com/locate/jco On regularization algorithms in learning theory Frank Bauer a, Sergei Pereverzev b, Lorenzo Rosasco c,d, a Institute for Mathematical Stochastics,
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).
More informationOptimal Rates for Regularization of Statistical Inverse Learning Problems
Universitätsverlag Potsdam Gilles Blanchard Nicole Mücke Optimal Rates for Regularization of Statistical Inverse Learning Problems Preprints des Instituts für Mathematik der Universität Potsdam 5 (2016)
More informationOptimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces
Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces Junhong Lin 1, Alessandro Rudi 2,3, Lorenzo Rosasco 4,5, Volkan Cevher 1 1 Laboratory for Information and Inference
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationLecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm
CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationOn Regularization Algorithms in Learning Theory
On Regularization Algorithms in Learning Theory Frank Bauer a, Sergei Pereverzev b, Lorenzo Rosasco c,1 a Institute for Mathematical Stochastics, University of Göttingen, Department of Mathematics, Maschmühlenweg
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationLearning sets and subspaces: a spectral approach
Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationOptimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms
Optimal Convergence for Distributed Learning with Stochastic Gradient Methods and Spectral Algorithms Junhong Lin Volkan Cevher Laboratory for Information and Inference Systems École Polytechnique Fédérale
More information2 Tikhonov Regularization and ERM
Introduction Here we discusses how a class of regularization methods originally designed to solve ill-posed inverse problems give rise to regularized learning algorithms. These algorithms are kernel methods
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationON EARLY STOPPING IN GRADIENT DESCENT LEARNING. 1. Introduction
ON EARLY STOPPING IN GRADIENT DESCENT LEARNING YUAN YAO, LORENZO ROSASCO, AND ANDREA CAPONNETTO Abstract. In this paper, we study a family of gradient descent algorithms to approximate the regression function
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationLearning, Regularization and Ill-Posed Inverse Problems
Learning, Regularization and Ill-Posed Inverse Problems Lorenzo Rosasco DISI, Università di Genova rosasco@disi.unige.it Andrea Caponnetto DISI, Università di Genova caponnetto@disi.unige.it Ernesto De
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationIterative regularization of nonlinear ill-posed problems in Banach space
Iterative regularization of nonlinear ill-posed problems in Banach space Barbara Kaltenbacher, University of Klagenfurt joint work with Bernd Hofmann, Technical University of Chemnitz, Frank Schöpfer and
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationTUM 2016 Class 3 Large scale learning by regularization
TUM 2016 Class 3 Large scale learning by regularization Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x n, y n ) Beyond
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationNonlinear error dynamics for cycled data assimilation methods
Nonlinear error dynamics for cycled data assimilation methods A J F Moodey 1, A S Lawless 1,2, P J van Leeuwen 2, R W E Potthast 1,3 1 Department of Mathematics and Statistics, University of Reading, UK.
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationDivide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates
: A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationKarhunen-Loève decomposition of Gaussian measures on Banach spaces
Karhunen-Loève decomposition of Gaussian measures on Banach spaces Jean-Charles Croix GT APSSE - April 2017, the 13th joint work with Xavier Bay. 1 / 29 Sommaire 1 Preliminaries on Gaussian processes 2
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More informationPreconditioned Newton methods for ill-posed problems
Preconditioned Newton methods for ill-posed problems Dissertation zur Erlangung des Doktorgrades der Mathematisch-Naturwissenschaftlichen Fakultäten der Georg-August-Universität zu Göttingen vorgelegt
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationPlug-in Approach to Active Learning
Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X
More informationFinite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a pre-hilbert space, or a unitary space) if there is a mapping (, )
More informationInverse scattering problem from an impedance obstacle
Inverse Inverse scattering problem from an impedance obstacle Department of Mathematics, NCKU 5 th Workshop on Boundary Element Methods, Integral Equations and Related Topics in Taiwan NSYSU, October 4,
More informationGeometry on Probability Spaces
Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,
More information3 Compact Operators, Generalized Inverse, Best- Approximate Solution
3 Compact Operators, Generalized Inverse, Best- Approximate Solution As we have already heard in the lecture a mathematical problem is well - posed in the sense of Hadamard if the following properties
More informationECE 636: Systems identification
ECE 636: Systems identification Lectures 3 4 Random variables/signals (continued) Random/stochastic vectors Random signals and linear systems Random signals in the frequency domain υ ε x S z + y Experimental
More informationarxiv: v1 [stat.ml] 5 Nov 2018
Kernel Conjugate Gradient Methods with Random Projections Junhong Lin and Volkan Cevher {junhong.lin, volkan.cevher}@epfl.ch Laboratory for Information and Inference Systems École Polytechnique Fédérale
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one
More informationDue Giorni di Algebra Lineare Numerica (2GALN) Febbraio 2016, Como. Iterative regularization in variable exponent Lebesgue spaces
Due Giorni di Algebra Lineare Numerica (2GALN) 16 17 Febbraio 2016, Como Iterative regularization in variable exponent Lebesgue spaces Claudio Estatico 1 Joint work with: Brigida Bonino 1, Fabio Di Benedetto
More informationRegularization in Reproducing Kernel Banach Spaces
.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationThe Learning Problem and Regularization
9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationStochastic Spectral Approaches to Bayesian Inference
Stochastic Spectral Approaches to Bayesian Inference Prof. Nathan L. Gibson Department of Mathematics Applied Mathematics and Computation Seminar March 4, 2011 Prof. Gibson (OSU) Spectral Approaches to
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationOnline gradient descent learning algorithm
Online gradient descent learning algorithm Yiming Ying and Massimiliano Pontil Department of Computer Science, University College London Gower Street, London, WCE 6BT, England, UK {y.ying, m.pontil}@cs.ucl.ac.uk
More informationCONVERGENCE RATES OF GENERAL REGULARIZATION METHODS FOR STATISTICAL INVERSE PROBLEMS AND APPLICATIONS
CONVERGENCE RATES OF GENERAL REGULARIZATION METHODS FOR STATISTICAL INVERSE PROBLEMS AND APPLICATIONS BY N. BISSANTZ, T. HOHAGE, A. MUNK AND F. RUYMGAART UNIVERSITY OF GÖTTINGEN AND TEXAS TECH UNIVERSITY
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationRegularization in Banach Space
Regularization in Banach Space Barbara Kaltenbacher, Alpen-Adria-Universität Klagenfurt joint work with Uno Hämarik, University of Tartu Bernd Hofmann, Technical University of Chemnitz Urve Kangro, University
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationNumerical differentiation by means of Legendre polynomials in the presence of square summable noise
www.oeaw.ac.at Numerical differentiation by means of Legendre polynomials in the presence of square summable noise S. Lu, V. Naumova, S. Pereverzyev RICAM-Report 2012-15 www.ricam.oeaw.ac.at Numerical
More information3. Some tools for the analysis of sequential strategies based on a Gaussian process prior
3. Some tools for the analysis of sequential strategies based on a Gaussian process prior E. Vazquez Computer experiments June 21-22, 2010, Paris 21 / 34 Function approximation with a Gaussian prior Aim:
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationError Estimates for Multi-Penalty Regularization under General Source Condition
Error Estimates for Multi-Penalty Regularization under General Source Condition Abhishake Rastogi Department of Mathematics Indian Institute of Technology Delhi New Delhi 006, India abhishekrastogi202@gmail.com
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationAnalysis Preliminary Exam Workshop: Hilbert Spaces
Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationUnlabeled Data: Now It Helps, Now It Doesn t
institution-logo-filena A. Singh, R. D. Nowak, and X. Zhu. In NIPS, 2008. 1 Courant Institute, NYU April 21, 2015 Outline institution-logo-filena 1 Conflicting Views in Semi-supervised Learning The Cluster
More informationMercer s Theorem, Feature Maps, and Smoothing
Mercer s Theorem, Feature Maps, and Smoothing Ha Quang Minh, Partha Niyogi, and Yuan Yao Department of Computer Science, University of Chicago 00 East 58th St, Chicago, IL 60637, USA Department of Mathematics,
More informationStatistical Optimality of Stochastic Gradient Descent through Multiple Passes
Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien
More information