A talk on Oracle inequalities and regularization. by Sara van de Geer

Size: px
Start display at page:

Download "A talk on Oracle inequalities and regularization. by Sara van de Geer"

Transcription

1 A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003

2 Aim: to compare l 1 and other penalties Explain bias-variance paradigm Consider penalized least squares in general Study l 1 penalty in robust regression Study classification problems, l 1 penalties and an l 1/2 penalty

3 1. Regression model ɛ 1,..., ɛ n independent N (0, σ 2 ). x i X, i = 1,..., n. f 0 : [0, 1] R unknown function. Y i = f 0 (x i ) + ɛ i, i = 1,..., n.

4 An example. x i = i/n, i = 1,..., n. Estimator: ˆf n = arg min f { 1 n n } 1 Y i f(x i ) 2 + λ 2 f (x) 2 dx. i=1 0

5 True f Noise added, noise level = 0.01

6 Denoised, lambda=0.2 Fit=9.0531e Denoised, lambda=0.1 Fit=3.4322e-04

7 Denoised, lambda=0.1 Error=2.8119e Denoised, lambda=0.05 Error=7.8683e-05

8 Intermezzo: Continuous version: { 1 1 ˆf = arg min y(x) f(x) 2 dx + λ 2 f 0 0 } f (x) 2 dx. Lemma 1.1. Solution: where with ˆf(x) = C λ cosh(x λ ) + 1 λ C = Y (1) { 1 1 λ 0 Y (x) = x 0 y(u) sinh( u x λ )du, Y (u) sinh( 1 u } λ )du / sinh( 1 λ ), x 0 y(u)du.

9 Choice of regularization parameter λ? First Prev Next Last Go Back Full Screen Close Quit

10 Mimic oracle: choose λ as if the unknown f 0 were known? Our choice will be { 1 n } 1 ˆf n = arg min min Y i f(x i ) 2 + λ 2 f (x) 2 dx + c. f λ n nλ i=1 0 See later.

11 2. The sequence space model. Y j = θ j + ɛ j, j = 1,..., n, with ɛ 1,..., ɛ n independent N (0, σ2), n Define n ϑ 2 n = ϑ j 2, ϑ R n. j=1

12

13 Let J {1,..., n}. Define ˆθ j (J) = { Yj if j J 0 if j / J. Then E ˆθ(J) θ 2 n = j / J θ 2 j + J σ2 n = bias 2 + variance = approximation error + estimation error. Oracle θ oracle uses optimal tradeoff between bias 2 and variance: θ oracle θ 2 n = min θ 2 j + J σ2 J {1,...,n} n. j / J

14 3. Hard and soft thresholding. Let λ = 2σ 2 log n/n be the threshold (= regularization parameter). Hard thresholding: ˆθ j (hard) = { Yj if Y j > λ 0 if Y j λ, j = 1,..., n. Thus ˆθ(hard) minimizes where Here n (Y j ϑ j ) 2 + λ 2 #{ϑ j 0}, j=1 = n (Y j ϑ j ) 2 + pen(ϑ), j=1 pen(ϑ) = λ 2 #{ϑ j 0} = log n J(ϑ) σ2. n J(ϑ) = {ϑ j 0}. So the penalty is up to log-terms equal to the variance. First Prev Next Last Go Back Full Screen Close Quit

15 Soft thresholding: ˆθ j (soft) = { Yj λ if Y j > λ Y j + λ if Y j < λ, j = 1,..., n. 0 if Y j λ Thus ˆθ(soft) minimizes n (Y j ϑ j ) 2 + 2λ j=1 n ϑ j. j=1 = where n (Y j ϑ j ) 2 + pen(ϑ), j=1 pen(ϑ) = λ n log n ϑ j = n j=1 n ϑ j. This penalty is generally much much larger than the variance! j=1

16 The estimators ˆθ(hard) and ˆθ(soft) have similar oracle properties. Lemma. Let ˆθ {ˆθ(hard), ˆθ(soft)}. We have where and E ˆθ θ 2 n c min ϑ {blue(ϑ) + red(ϑ)}, blue(ϑ) = ϑ θ 2 n = approximation error = bias 2, red(ϑ) = log n J(ϑ) σ2 n = estimation error( variance).

17 4. Discretization. Y i = f 0 (x i ) + ɛ i, i = 1,..., n. with ɛ 1,..., ɛ n independent N (0, 1), and x i X, Y i R, i = 1,..., n. Let F be a finite collection of functions, and 1 n ˆf = arg min Y i f(x i ) 2. f F n Define Lemma 4.1. We have i=1 f f 0 n = min f F f f 0 n. E ˆf f 0 2 n 2 f f 0 2 log F n + c n = approximation error + estimation error.

18 Let {F m } m M be a collection of increasing nested finite models, and let F = m M F m. Define pen(f) = min c log F m. m: f F m n Let and ˆf = arg min f F Lemma 4.2. We have { 1 n f = arg min f F } n Y i f(x i ) 2 + pen(f), i=1 { f f0 2 n + pen(f) } = arg min {blue(f) + red(f)}. E[ ˆf f 0 2 n + pen( ˆf)] 2[blue(f ) + red(f )] + c n.

19 5. General penalties. Y i = f 0 (x i ) + ɛ i, i = 1,..., n. with ɛ 1,..., ɛ n independent N (0, 1), and x i X, Y i R, i = 1,..., n. Let { } 1 n ˆf = arg min Y i f(x i ) 2 + pen(f), f F n and let the oracle be f = arg min f F i=1 { f f0 2 n + pen(f) }. Definition 5.1. The δ-entropy H(δ, F, Q n ) is the logarithm of the minimum number of balls with radius δ necessary to cover F.

20 Lemma 5.2. For nδ 2 n c we have ( δn 0 ) H 1/2 (u, { f f 2 n + pen(f) δn}, 2 Q n )du δ n, E[ ˆf f 0 2 n + pen( ˆf)] 2[ f f 0 2 n + pen(f ) + δ 2 n] + c n.

21 Example: Sobolev penalties I 2 s (f) = 1 0 f (s) (x) 2 dx a) Penalty on I s with s fixed: pen(f) = λ 2 I 2 s (f). We find E[ ˆf f 0 2 n + λ 2 I 2 s ( ˆf)] 2[ f f λ 2 I 2 s (f )] + c c + nλ1/s n b) Penalty of I s with s fixed, and on λ Then pen(f) = inf 0<λ< {λ2 I 2 s (f) + c nλ1/s} = red(f). E ˆf f 0 2 n 2[ f f c 0 n 2s 2 2s+1 I 2s+1 = [blue(f ) + red(f )] + c log n n s (f )] + c log n n

22 c) Penalty on I s and on s pen(f) = d) Penalty on I s and on s, and on λ pen(f) = { min λ 2 I 2 s (f) + c } 0s 3 max 1 s s max nλ 1/s inf 0<λ< min 1 s s max {λ 2 I 2 s (f) + c 0s 3 max nλ 1/s }

23 6. Robust regression. Let Y i depend on some covariable x i, i = 1,..., n. Assume Y 1,..., Y n are independent. Let γ : R R be a convex loss function satisfying the Lipschitz condition γ(ỹ) γ(y) ỹ y, ỹ, y R. Consider { 1 n ˆf n = arg min f } n γ(y i f(x i )) + pen(f) i=1 Least absolute deviations: γ(y) = y. γ(y) = τ y l{y < 0} + (1 τ) y l{y > 0}, y R. Here 0 < τ < 1 is fixed. Huber loss function γ

24 true regression function: f 0 = arg min Γ(f), where Γ(f) = 1 n n Eγ(Y i f(x i )). i=1

25 6.1. Standard identifiablity condition Suppose that for some (unknown) σ > 0 Γ(f) Γ(f 0 ) f f 0 2 /σ. Then one can prove similar results as for the penalized least squares estimator More general margin condition Suppose that for some unknown κ 1 and σ > 0, Γ(f) Γ(f 0 ) f f 0 2κ /σ. The estimation error then depends on the unknown κ!

26 Let and where and 6.4. Oracle f = n ϑ j ψ j, j=1 J f = { ϑ j > 0}. f = arg min f {blue(f) + red(f)}, blue(f) = Γ(f) Γ(f 0 ), red(f) = [ ] κ log n n J 2κ 1 f.

27 6.5. l 1 penalty with { 1 n ˆf n = arg min f } n γ(y i f(x i )) + pen(f), i=1 pen(f) = c log n n n ϑ j. j=1

28 Oracle inequality. One has E(Γ( ˆf n ) Γ(f 0 )) C {blue(f ) + red(f )} { = C Γ(f ) Γ(f 0 ) + [ log n } n J κ 2κ 1 ]. In other words, ˆf n adapts to the smoothness of f 0 as well as to κ.

29 Typical example Then f f 0 J s. Γ( ˆf n ) Γ(f 0 ) ( log n n ) 2κs 4κs 2s+1.

30 10 FY3PQ : SMSE = MAD = time = 3.56 s 10 IRLS : SMSE = MAD = time = s

31 Conclusion: The advantage of the l 1 penalty over (nonrandom) penalties based on bias-variance considerations, is that it is not only adaptive to the smoothness, but also adaptive to the margin. First Prev Next Last Go Back Full Screen Close Quit

32 The classification problem 1. Introduction Y {0, 1} binary response variable, X X covariable. Aim: predict Y given X. Examples. - Recognition of speech or handwriting - Classifying an object in an image - Classification of gene expression levels - Etc. Training set: n i.i.d. copies (X i, Y i ) n i=1, of (X, Y ).

33 1 learning data = learning error= input data training errors 1 testing data = testing error= classification testing errors

34 2. Bayes classifier When using G as classifier, the PREDICTION ERROR is R(G) = P (Y l G (X)). where BAYES RULE = G 0 G 0 = arg min G R(G), where the minimum is over all sets G X. Thus G 0 = {x : η(x) 1/2} where η(x) the regression of Y on X = x: η(x) = P (Y = 1 X = x) : So Bayes rule predicts the most likely label.

35 Bayes rule 1_ 2 Bayes rule G 0 is here: [ ][ ].

36 3. Empirical risk minimization Let G be a collection of sets. The EMPIRICAL RISK MINIMIZER is Ĝ n = arg min G G R n(g), where is the. R n (G) = 1 n n Y i l G (X i ), i=1 EMPIRICAL RISK

37

38 Let 5. Mammen & Tsybakov margin condition G G = (G\G 0 ) (G 0 \G) is the symmetric difference between the two sets. Symmetric difference G G 0 G G 0

39 The EXCESS RISK = approximation error at G, is R(G) R(G 0 ). Margin condition. (Mammen and Tsybakov (1999), Tsybakov (2003)) For some (unknown) constants σ > 0 and κ 1, for all sets G X. R(G) R(G 0 ) σq κ (G G 0 ),

40 6. Boundary fragments We assume X [0, 1] d+1 and write X = (S, T ), with S [0, 1] d, T [0, 1]. For a function f : [0, 1] d [0, 1], we define the boundary fragment G f = {x = (s, t) : f(s) t}.

41 The symmetric difference G f G f for the boundary fragments G f and G f formed by subgraphs Subgraphs f f ~

42 Define the oracle 7. Oracle G = arg min G G [red(g) + blue(g)], where Moreover red(g) = estimation error. blue(g) = R(G) R(G 0 ) = approximation error. Thus G gives the best trade-off between estimation error and approximation error.

43 Let f ϑ = ϑ j ψ j, and 8. Oracle inequality G = {boundary fragments G fϑ : ϑ R n }. We take a square root penalty (or l 1/2 penalty) pen(g fϑ ) = λ n 2 dl/2 ϑ j,l. Theorem. (Tsybakov and van de Geer (2003)) Let where l Ĝ n = arg min G G {R n(g) + pen(g)}, λ n = c log 4 n n, and where c is a (large enough) universal constant. Then ( ) P R(Ĝn) R(G 0 ) c σ,κ [red(g ) + blue(g )] j exp[ c q log 4 n]. First Prev Next Last Go Back Full Screen Close Quit

44 Conclusion. The estimator with square root penalty adapts up to log-factors to the smoothness as well as to the margin.... but we cannot compute it!

45 9. Surrogate loss functions We now code the label as Y {±1}. Let f ϑ (x) = N ϑ j ψ j (x). Introduce a margin 0 λ < 1. We call (X, Y ) well classified by f ϑ if j=1 Y f ϑ (X) ϑ 1 λ. Here ϑ 1 = N ϑ j. j=1

46 Define R n (f ϑ, λ) = #{Y if ϑ (X i ) < λ ϑ 1 }. n Then by Chebyshev s inequality, for any non-negative, increasing function φ, where R n (f ϑ, λ) L n (f) = 1 n L n (f ϑ ) φ(1 λ ϑ 1 ), n φ(1 Y i f(x i )). i=1 This leads to minimizing the penalized loss function L n (f ϑ ) pen(ϑ), where pen(ϑ) = 1/φ(1 λ ϑ 1 ).

47 Example: support vector machine loss Adaptive estimation: minimize 1 n n (1 Y i f ϑ (X i )) + + λ ϑ 1, i=1 with λ = c log n/n.

48 Conclusion. Adaptation using empirical risk minimization leads to l 1/2 penalties, and is computationally very hard. In combination with surrogate loss functions, l 1 penalities are natural, and also computationally simple. But the resulting oracle inequalities generally do not yield fast optimal rates for the excess risk.

49 Some references D.L. Donoho and I.M. Johnstone (1996). New minimax theorems, thresholding and adaptation. Bernoulli L. Birgé and P. Massart (1997). From model selection to adaptive estimation. In; Festschrift for Lucien Le Cam: Research Papers in Probab. and Statist., (Eds. D. Pollard, E. Torgersen and G. Yang) 55-87, Springer, New York G. Lugosi and A. Nobel (1999). Adaptive model selection using empirical complexities. Ann. Statist E. Mammen and A. B. Tsybakov (1999). Smooth discriminant analysis. Ann. Statist S. van de Geer (2001). Least squares estimation with complexity penalties. Mathematical Methods of Statistics S. van de Geer (2002). M-estimation using penalties or sieves. Journal of Statistical Planning and Inference

50 J.-M. Loubes and S. van de Geer (2002). Adaptive estimation in regression, using soft thresholding type penalties. Statistica Neerlandica A.B. Tsybakov (2003). Optimal aggregation of classifiers in statistical learning. To appear in Ann. Statist. A.B. Tsybakov and S.A. van de Geer (2003). Square root penalty: adaptation to the margin in classification and in edge estimation. Prépublication PMA -820, Lab. de Probab. et Modèles Aléatoires, Université Paris VII (submitted).

Empirical Processes in M-Estimation by Sara van de Geer

Empirical Processes in M-Estimation by Sara van de Geer Empirical Processes in M-Estimation by Sara van de Geer Handout at New Directions in General Equilibrium Analysis Cowles Workshop, Yale University June 15-20, 2003 Version: June 13, 2003 1 Most of the

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Inverse Statistical Learning

Inverse Statistical Learning Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning

More information

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection

Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Curve learning. p.1/35

Curve learning. p.1/35 Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35 The problem

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

An Introduction to Statistical Machine Learning - Theoretical Aspects -

An Introduction to Statistical Machine Learning - Theoretical Aspects - An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,

More information

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Adaptive Sampling Under Low Noise Conditions 1

Adaptive Sampling Under Low Noise Conditions 1 Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

On threshold-based classification rules By Leila Mohammadi and Sara van de Geer. Mathematical Institute, University of Leiden

On threshold-based classification rules By Leila Mohammadi and Sara van de Geer. Mathematical Institute, University of Leiden On threshold-based classification rules By Leila Mohammadi and Sara van de Geer Mathematical Institute, University of Leiden Abstract. Suppose we have n i.i.d. copies {(X i, Y i ), i = 1,..., n} of an

More information

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016 12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Model selection theory: a tutorial with applications to learning

Model selection theory: a tutorial with applications to learning Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical

More information

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities Jiantao Jiao*, Lin Zhang, Member, IEEE and Robert D. Nowak, Fellow, IEEE

More information

On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A Bayesian Testimation Approach

On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A Bayesian Testimation Approach Sankhyā : The Indian Journal of Statistics 2011, Volume 73-A, Part 2, pp. 231-244 2011, Indian Statistical Institute On the Estimation of the Function and Its Derivatives in Nonparametric Regression: A

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

Sparsity oracle inequalities for the Lasso

Sparsity oracle inequalities for the Lasso Electronic Journal of Statistics Vol. 1 (007) 169 194 ISSN: 1935-754 DOI: 10.114/07-EJS008 Sparsity oracle inequalities for the Lasso Florentina Bunea Department of Statistics, Florida State University

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Risk Bounds for CART Classifiers under a Margin Condition

Risk Bounds for CART Classifiers under a Margin Condition arxiv:0902.3130v5 stat.ml 1 Mar 2012 Risk Bounds for CART Classifiers under a Margin Condition Servane Gey March 2, 2012 Abstract Non asymptotic risk bounds for Classification And Regression Trees (CART)

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

COMPARING LEARNING METHODS FOR CLASSIFICATION

COMPARING LEARNING METHODS FOR CLASSIFICATION Statistica Sinica 162006, 635-657 COMPARING LEARNING METHODS FOR CLASSIFICATION Yuhong Yang University of Minnesota Abstract: We address the consistency property of cross validation CV for classification.

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012

Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv: v2 [math.st] 23 Jul 2012 Vapnik-Chervonenkis Dimension of Axis-Parallel Cuts arxiv:203.093v2 [math.st] 23 Jul 202 Servane Gey July 24, 202 Abstract The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces of R d with frontiers

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

AdaBoost and other Large Margin Classifiers: Convexity in Classification

AdaBoost and other Large Margin Classifiers: Convexity in Classification AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

TUM 2016 Class 1 Statistical learning theory

TUM 2016 Class 1 Statistical learning theory TUM 2016 Class 1 Statistical learning theory Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Machine learning applications Texts Images Data: (x 1, y 1 ),..., (x n, y n ) Note: x i s huge dimensional! All

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new

More information

Discriminative Learning and Big Data

Discriminative Learning and Big Data AIMS-CDT Michaelmas 2016 Discriminative Learning and Big Data Lecture 2: Other loss functions and ANN Andrew Zisserman Visual Geometry Group University of Oxford http://www.robots.ox.ac.uk/~vgg Lecture

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Model Selection and Error Estimation

Model Selection and Error Estimation Model Selection and Error Estimation Peter L. Bartlett Stéphane Boucheron Computer Sciences Laboratory Laboratoire de Recherche en Informatique RSISE, Australian National University, Bâtiment 490 Canberra

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machine Learning Theory (CS 6783) Tu-Th 1:25 to 2:40 PM Hollister, 306 Instructor : Karthik Sridharan ABOUT THE COURSE No exams! 5 assignments that count towards your grades (55%) One term project (40%)

More information

Computational Learning Theory. CS534 - Machine Learning

Computational Learning Theory. CS534 - Machine Learning Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Adaptive Minimax Classification with Dyadic Decision Trees

Adaptive Minimax Classification with Dyadic Decision Trees Adaptive Minimax Classification with Dyadic Decision Trees Clayton Scott Robert Nowak Electrical and Computer Engineering Electrical and Computer Engineering Rice University University of Wisconsin Houston,

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Does Modeling Lead to More Accurate Classification?

Does Modeling Lead to More Accurate Classification? Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Statistical Properties of Large Margin Classifiers

Statistical Properties of Large Margin Classifiers Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides

More information

Distribution-Free Distribution Regression

Distribution-Free Distribution Regression Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading

More information

IN this paper, we study two related problems of minimaxity

IN this paper, we study two related problems of minimaxity IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 45, NO. 7, NOVEMBER 1999 2271 Minimax Nonparametric Classification Part I: Rates of Convergence Yuhong Yang Abstract This paper studies minimax aspects of

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

Robust Support Vector Machines for Probability Distributions

Robust Support Vector Machines for Probability Distributions Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, 15-16 September 2016 Introduction: Matrix Completion

More information

Dyadic Classification Trees via Structural Risk Minimization

Dyadic Classification Trees via Structural Risk Minimization Dyadic Classification Trees via Structural Risk Minimization Clayton Scott and Robert Nowak Department of Electrical and Computer Engineering Rice University Houston, TX 77005 cscott,nowak @rice.edu Abstract

More information

Comparing Learning Methods for Classification

Comparing Learning Methods for Classification Comparing Learning Methods for Classification Yuhong Yang School of Statistics University of Minnesota 224 Church Street S.E. Minneapolis, MN 55455 April 4, 2006 Abstract We address the consistency property

More information

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Chi-square lower bounds

Chi-square lower bounds IMS Collections Borrowing Strength: Theory Powering Applications A Festschrift for Lawrence D. Brown Vol. 6 (2010) 22 31 c Institute of Mathematical Statistics, 2010 DOI: 10.1214/10-IMSCOLL602 Chi-square

More information

Computational Oracle Inequalities for Large Scale Model Selection Problems

Computational Oracle Inequalities for Large Scale Model Selection Problems for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.

More information

MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS

MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS Statistica Sinica 19 (2009), 159-176 MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS Jussi Klemelä University of Oulu Abstract: We consider estimation of multivariate densities with histograms which

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

INFORMATION-THEORETIC DETERMINATION OF MINIMAX RATES OF CONVERGENCE 1. By Yuhong Yang and Andrew Barron Iowa State University and Yale University

INFORMATION-THEORETIC DETERMINATION OF MINIMAX RATES OF CONVERGENCE 1. By Yuhong Yang and Andrew Barron Iowa State University and Yale University The Annals of Statistics 1999, Vol. 27, No. 5, 1564 1599 INFORMATION-THEORETIC DETERMINATION OF MINIMAX RATES OF CONVERGENCE 1 By Yuhong Yang and Andrew Barron Iowa State University and Yale University

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality Weiqiang Dong 1 The goal of the work presented here is to illustrate that classification error responds to error in the target probability estimates

More information