A Risk Comparison of Ordinary Least Squares vs Ridge Regression
|
|
- Clarence Brooks
- 6 years ago
- Views:
Transcription
1 Joural of Machie Learig Research 14 (2013) Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer ad Iformatio Sciece Uiversity of Pesylvaia Philadelphia, PA 19104, USA Dea P. Foster Departmet of Statistics Wharto School, Uiversity of Pesylvaia Philadelphia, PA 19104, USA Sham M. Kakade Microsoft Research Oe Memorial Drive Cambridge, MA 02142, USA Lyle H. Ugar Departmet of Computer ad Iformatio Sciece Uiversity of Pesylvaia Philadelphia, PA 19104, USA DHILLON@CIS.UPENN.EDU FOSTER@WHARTON.UPENN.EDU SKAKADE@MICROSOFT.COM UNGAR@CIS.UPENN.EDU Editor: Gabor Lugosi Abstract We compare the risk of ridge regressio to a simple variat of ordiary least squares, i which oe simply proects the data oto a fiite dimesioal subspace (as specified by a pricipal compoet aalysis) ad the performs a ordiary (u-regularized) least squares regressio i this subspace. This ote shows that the risk of this ordiary least squares method (PCA-OLS) is withi a costat factor (amely 4) of the risk of ridge regressio (RR). Keywords: risk iflatio, ridge regressio, pca 1. Itroductio Cosider the fixed desig settig where we have a set of vectors X ={X i }, ad let X deote the matrix where the i th row of X is X i. The observed label vector is Y R. Suppose that: Y = X+ε, where ε is idepedet oise i each coordiate, with the variace of ε i beig. The obective is to lear E[Y]=X. The expected loss of a vector estimator is: L()= 1 E Y[ Y X 2 ], Let ˆ be a estimator of (costructed with a sample Y ). Deotig Σ := 1 XT X, c 2013 Paramveer S. Dhillo, Dea P. Foster, Sham M. Kakade ad Lyle H. Ugar.
2 DHILLON, FOSTER, KAKADE AND UNGAR we have that the risk (i.e., expected excess loss) is: Risk(ˆ) :=Eˆ[L(ˆ) L()]=Eˆ ˆ 2 Σ, where x Σ = x Σx ad where the expectatio is with respect to the radomess i Y. We show that a simple variat of ordiary (u-regularized) least squares always compares favorably to ridge regressio (as measured by the risk). This observatio is based o the followig bias variace decompositio: where =E[ˆ]. 1.1 The Risk of Ridge Regressio (RR) Risk(ˆ)=E ˆ 2 Σ+ 2 Σ, (1) }{{}}{{} Variace Predictio Bias Ridge regressio or Tikhoov Regularizatio (Tikhoov, 1963) pealizes thel 2 orm of a parameter vector ad shriks it towards zero, pealizig large values more. The estimator is: ˆ λ = argmi{ Y X 2 + λ 2 }. The closed form estimate is the: ( ) 1 ˆ λ =(Σ+λI) 1 XT Y. Note that is the ordiary least squares estimator. Without loss of geerality, rotate X such that: ˆ 0 = ˆ λ=0 = argmi{ Y X 2 }, Σ=diag(λ 1,λ 2,...,λ p ), where the λ i s are ordered i decreasig order. To see the ature of this shrikage observe that: [ˆ λ ] := λ λ + λ [ˆ 0 ], where ˆ 0 is the ordiary least squares estimator. Usig the bias-variace decompositio, (Equatio 1), we have that: Lemma 1 Risk(ˆ λ )= σ2 ( λ λ + λ ) 2 + The proof is straightforward ad is provided i the appedix. 2 λ (1+ λ λ )
3 A RISK COMPARISON OF ORDINARY LEAST SQUARES VS RIDGE REGRESSION 2. Ordiary Least Squares with PCA (PCA-OLS) Now let us costruct a simple estimator based o λ. Note that our rotated coordiate system where Σ is equal to diag(λ 1,λ 2,...,λ p ) correspods the PCA coordiate system. Cosider the followig ordiary least squares estimator o the top PCA subspace it uses the least squares estimate o coordiate if λ λ ad 0 otherwise { [ˆ0 ] [ˆ PCA,λ ] = if λ λ 0 otherwise. The followig claim shows this estimator compares favorably to the ridge estimator (for every λ) o matter how the λ is chose, for example, usig cross validatio or ay other strategy. Our mai theorem (Theorem 2) bouds the Risk Ratio/Risk Iflatio 1 of the PCA-OLS ad the RR estimators. Theorem 2 (Bouded Risk Iflatio) For all λ 0, we have that: ad the left had iequality is tight. 0 Risk(ˆ PCA,λ ) Risk(ˆ λ ) 4, Proof Usig the bias variace decompositio of the risk we ca write the risk as: Risk(ˆ PCA,λ )= σ2 ½ λ λ+ λ 2. :λ <λ The first term represets the variace ad the secod the bias. The ridge regressio risk is give by Lemma 1. We ow show that the th term i the expressio for the PCA risk is withi a factor 4 of the th term of the ridge regressio risk. First, let s cosider the case whe λ λ, the the ratio of th terms is: ( λ λ +λ ) 2+ 2 λ (1+ λ λ )2 Similarly, if λ < λ, the ratio of the th terms is: ( λ λ +λ λ 2 ) 2+ 2 λ (1+ λ λ )2 ( λ λ +λ λ 2 λ 2 (1+ λ λ )2 Sice, each term is withi a factor of 4 the proof is complete. ( ) 2 = 1+ λ ) 2 4. λ ( = 1+ λ ) 2 4. λ It is worth otig that the coverse is ot true ad the ridge regressio estimator (RR) ca be arbitrarily worse tha the PCA-OLS estimator. A example which shows that the left had iequality is tight is give i the Appedix. 1. Risk Iflatio has also bee used as a criterio for evaluatig feature selectio procedures (Foster ad George, 1994). 1507
4 DHILLON, FOSTER, KAKADE AND UNGAR 3. Experimets First, we geerated sythetic data with p = 100 ad varyig values of = {20, 50, 80, 110}. The data was geerated i a fixed desig settig as Y = X+ε where ε i N(0,1) i = 1,...,. Furthermore, X p MV N(0,I) where MVN(µ,Σ) is the Multivariate Normal Distributio with mea vector µ, variace-covariace matrixσad N(0,1) = 1,..., p. The results are show i Figure 1. As ca be see, the risk ratio of PCA (PCA-OLS) ad ridge regressio (RR) is ever worse tha 4 ad ofte its better tha 1 as dictated by Theorem 2. Next, we chose two real world data sets, amely USPS (=1500, p=241) ad BCI (=400, p=117). 2 Sice we do ot kow the true model for these data sets, we used all the observatios to fit a OLS regressio ad used it as a estimate of the true parameter. This is a reasoable approximatio to the true parameter as we estimate the ridge regressio (RR) ad PCA-OLS models o a small subset of these observatios. Next we choose a radom subset of the observatios, amely 0.2 p, 0.5 p ad 0.8 p to fit the ridge regressio (RR) ad PCA-OLS models. The results are show i Figure 2. As ca be see, the risk ratio of PCA-OLS to ridge regressio (RR) is agai withi a factor of 4 ad ofte PCA-OLS is better, that is, the ratio < Coclusio We showed that the risk iflatio of a particular ordiary least squares estimator (o the top PCA subspace) is withi a factor 4 of the ridge estimator. It turs out the coverse is ot true this PCA estimator may be arbitrarily better tha the ridge oe. Appedix A. Proof of Lemma 1. We aalyze the bias-variace decompositio i Equatio 1. For the variace, E Y ˆ λ λ 2 Σ λ E Y ([ˆ λ ] [ λ ] ) 2 = σ2 2E[ λ 1 ] (λ + λ) 2 (Y i E[Y i ])[X i ] (Y i E[Y i])[x i] i=1 λ (λ + λ) 2 λ (λ + λ) 2 λ 2 (λ + λ) 2. i=1 i=1 Var(Y i )[X i ] 2 [X i ] 2 i =1 2. The details about the data sets ca be foud here:
5 A RISK COMPARISON OF ORDINARY LEAST SQUARES VS RIDGE REGRESSION =0.2p =0.5p =0.8p =1.1p Figure 1: Plots showig the risk ratio as a fuctio of λ, the regularizatio parameter ad, for the sythetic data set. p=100 i all the cases. The error bars correspod to oe stadard deviatio for 100 such radom trials. =0.2p =0.5p =0.8p =1.1p Figure 2: Plots showig the risk ratio as a fuctio of λ, the regularizatio parameter ad, for two real world data sets (BCI ad USPS top to bottom). 1509
6 DHILLON, FOSTER, KAKADE AND UNGAR Similarly, for the bias, λ 2 Σ λ ([ λ ] [] ) 2 ( λ 2 λ λ + λ 1 2 λ (1+ λ λ )2, ) 2 which completes the proof. The risk for RR ca be arbitrarily worse tha the PCA-OLS estimator. Cosider the stadard OLS settig described i Sectio 1 i which X is p matrix ad Y is a 1 vector. Let X = diag( 1+α,1,...,1), the Σ = X X = diag(1+α,1,...,1) for some (α > 0) ad also choose =[2+α,0,...,0]. For coveiece let s also choose =. The, usig Lemma 1, we get the risk of RR estimator as ( ) Risk(ˆ λ )= 1+α 2 + (p 1) 1+α+λ (1+λ) }{{}}{{ 2 +(2+α)2 (1+α) (1+ } 1+α. λ }{{ )2 } I II III Let s cosider two cases Case 1: λ<(p 1) 1/3 1, the II >(p 1) 1/3. Case 2: λ>1, the 1+ 1+α λ < 2+α, hece III >(1+α). Combiig these two cases we get λ, Risk(ˆ λ ) > mi((p 1) 1/3,(1+α)). If we choose p such that p 1=(1+α) 3, the Risk(ˆ λ )>(1+α). The PCA-OLS risk (From Theorem 2) is: Risk(ˆ PCA,λ )= ½ λ λ+ λ 2. :λ <λ Cosiderig λ (1,1+α), the first term will cotribute 1 to the risk ad rest everythig will be 0. So the risk of PCA-OLS is 1 ad the risk ratio is Now, for large α, the risk ratio 0. Risk(ˆ PCA,λ ) 1 Risk(ˆ λ ) (1+α). 1510
7 A RISK COMPARISON OF ORDINARY LEAST SQUARES VS RIDGE REGRESSION Refereces D. P. Foster ad E. I. George. The risk iflatio criterio for multiple regressio. The Aals of Statistics, pages , A. N. Tikhoov. Solutio of icorrectly formulated problems ad the regularizatio method. Soviet Math Dokl 4, pages ,
arxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More informationFirst Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationThe random version of Dvoretzky s theorem in l n
The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More informationLecture 24: Variable selection in linear models
Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationPH 425 Quantum Measurement and Spin Winter SPINS Lab 1
PH 425 Quatum Measuremet ad Spi Witer 23 SPIS Lab Measure the spi projectio S z alog the z-axis This is the experimet that is ready to go whe you start the program, as show below Each atom is measured
More informationRegression with an Evaporating Logarithmic Trend
Regressio with a Evaporatig Logarithmic Tred Peter C. B. Phillips Cowles Foudatio, Yale Uiversity, Uiversity of Aucklad & Uiversity of York ad Yixiao Su Departmet of Ecoomics Yale Uiversity October 5,
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationThe standard deviation of the mean
Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider
More informationInvestigating the Significance of a Correlation Coefficient using Jackknife Estimates
Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR) ISSN 2307-4531 (Prit & Olie) http://gssrr.org/idex.php?joural=jouralofbasicadapplied ---------------------------------------------------------------------------------------------------------------------------
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationBecause it tests for differences between multiple pairs of means in one test, it is called an omnibus test.
Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationTHE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS
R775 Philips Res. Repts 26,414-423, 1971' THE SYSTEMATIC AND THE RANDOM. ERRORS - DUE TO ELEMENT TOLERANCES OF ELECTRICAL NETWORKS by H. W. HANNEMAN Abstract Usig the law of propagatio of errors, approximated
More informationA Hadamard-type lower bound for symmetric diagonally dominant positive matrices
A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of
More informationECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors
ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More informationPAijpam.eu ON TENSOR PRODUCT DECOMPOSITION
Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314
More informationDouble Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution
Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationMath 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency
Math 152. Rumbos Fall 2009 1 Solutios to Review Problems for Exam #2 1. I the book Experimetatio ad Measuremet, by W. J. Youde ad published by the by the Natioal Sciece Teachers Associatio i 1962, the
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More informationTime series models 2007
Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Solutios to problem sheet 1, 2007 Exercise 1.1 a Let Sc = E[Y c 2 ]. The This gives Sc = EY 2 2cEY + c 2 ds dc = 2EY + 2c = 0
More informationCov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.
CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model
More informationQuick Review of Probability
Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter & Teachig Material.
More informationQuick Review of Probability
Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter 2 & Teachig
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationLinear Regression Models
Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect
More informationProbability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].
Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationSection 14. Simple linear regression.
Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationEcon 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.
Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationRegression and generalization
Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More information1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable
More informationTHE KALMAN FILTER RAUL ROJAS
THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider
More informationConfidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation
Cofidece Iterval for tadard Deviatio of Normal Distributio with Kow Coefficiets of Variatio uparat Niwitpog Departmet of Applied tatistics, Faculty of Applied ciece Kig Mogkut s Uiversity of Techology
More informationSolutions to selected exercise of Randomized Algorithms
Solutios to selected exercise of Radomized Algorithms Zhag Qi December 5, 006 Chapter. Problem.7. We first prove that e λx λ(e x ) Whe x = 0, LHS = RHS = 0. Ad whe x > 0, (e λx ) = λe λx λe x = [λ(e x
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationBerry-Esseen bounds for self-normalized martingales
Berry-Essee bouds for self-ormalized martigales Xiequa Fa a, Qi-Ma Shao b a Ceter for Applied Mathematics, Tiaji Uiversity, Tiaji 30007, Chia b Departmet of Statistics, The Chiese Uiversity of Hog Kog,
More informationCMSE 820: Math. Foundations of Data Sci.
Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).
More informationRandom Design Analysis of Ridge Regression
JMLR: Workshop ad Coferece Proceedigs vol 23 202) 9. 9.24 25th Aual Coferece o Learig Theory Radom Desig Aalysis of Ridge Regressio Daiel Hsu Microsoft Research Sham M. Kakade Microsoft Research Tog Zhag
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationPosted-Price, Sealed-Bid Auctions
Posted-Price, Sealed-Bid Auctios Professors Greewald ad Oyakawa 207-02-08 We itroduce the posted-price, sealed-bid auctio. This auctio format itroduces the idea of approximatios. We describe how well this
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationSupplement to Graph Sparsification Approaches for Laplacian Smoothing
Supplemet to Graph Sparsificatio Approaches for Laplacia Smoothig Veerajaeyulu Sadhaala Yu-Xiag Wag Rya J. Tibshirai Machie Learig Departmet Caregie Mello Uiversity Pittsburgh PA 53 Departmet of Statistics
More informationComputation of Error Bounds for P-matrix Linear Complementarity Problems
Mathematical Programmig mauscript No. (will be iserted by the editor) Xiaoju Che Shuhuag Xiag Computatio of Error Bouds for P-matrix Liear Complemetarity Problems Received: date / Accepted: date Abstract
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationChapter 13: Tests of Hypothesis Section 13.1 Introduction
Chapter 13: Tests of Hypothesis Sectio 13.1 Itroductio RECAP: Chapter 1 discussed the Likelihood Ratio Method as a geeral approach to fid good test procedures. Testig for the Normal Mea Example, discussed
More informationModified Decomposition Method by Adomian and. Rach for Solving Nonlinear Volterra Integro- Differential Equations
Noliear Aalysis ad Differetial Equatios, Vol. 5, 27, o. 4, 57-7 HIKARI Ltd, www.m-hikari.com https://doi.org/.2988/ade.27.62 Modified Decompositio Method by Adomia ad Rach for Solvig Noliear Volterra Itegro-
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/
More informationCS284A: Representations and Algorithms in Molecular Biology
CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by
More informationON POINTWISE BINOMIAL APPROXIMATION
Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece
More informationLecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound
Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationRank tests and regression rank scores tests in measurement error models
Rak tests ad regressio rak scores tests i measuremet error models J. Jurečková ad A.K.Md.E. Saleh Charles Uiversity i Prague ad Carleto Uiversity i Ottawa Abstract The rak ad regressio rak score tests
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More informationComparison Study of Series Approximation. and Convergence between Chebyshev. and Legendre Series
Applied Mathematical Scieces, Vol. 7, 03, o. 6, 3-337 HIKARI Ltd, www.m-hikari.com http://d.doi.org/0.988/ams.03.3430 Compariso Study of Series Approimatio ad Covergece betwee Chebyshev ad Legedre Series
More informationSimilarity Solutions to Unsteady Pseudoplastic. Flow Near a Moving Wall
Iteratioal Mathematical Forum, Vol. 9, 04, o. 3, 465-475 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.988/imf.04.48 Similarity Solutios to Usteady Pseudoplastic Flow Near a Movig Wall W. Robi Egieerig
More informationVector Quantization: a Limiting Case of EM
. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z
More informationA goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality
A goodess-of-fit test based o the empirical characteristic fuctio ad a compariso of tests for ormality J. Marti va Zyl Departmet of Mathematical Statistics ad Actuarial Sciece, Uiversity of the Free State,
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More information1+x 1 + α+x. x = 2(α x2 ) 1+x
Math 2030 Homework 6 Solutios # [Problem 5] For coveiece we let α lim sup a ad β lim sup b. Without loss of geerality let us assume that α β. If α the by assumptio β < so i this case α + β. By Theorem
More informationQuantile regression with multilayer perceptrons.
Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer
More informationMatrix Representation of Data in Experiment
Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationThe variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.
SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationEstimation of Gumbel Parameters under Ranked Set Sampling
Joural of Moder Applied Statistical Methods Volume 13 Issue 2 Article 11-2014 Estimatio of Gumbel Parameters uder Raked Set Samplig Omar M. Yousef Al Balqa' Applied Uiversity, Zarqa, Jorda, abuyaza_o@yahoo.com
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationLinear Support Vector Machines
Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate
More informationSolutions to Odd Numbered End of Chapter Exercises: Chapter 4
Itroductio to Ecoometrics (3 rd Updated Editio) by James H. Stock ad Mark W. Watso Solutios to Odd Numbered Ed of Chapter Exercises: Chapter 4 (This versio July 2, 24) Stock/Watso - Itroductio to Ecoometrics
More informationLecture 12: February 28
10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More information