KERNEL MODELS AND SUPPORT VECTOR MACHINES

Size: px
Start display at page:

Download "KERNEL MODELS AND SUPPORT VECTOR MACHINES"

Transcription

1 COMPUAIONAL INELLIGENCE Vol. I - Kerel Models ad Support Vector Machies - K azushi Ikeda KERNEL MODELS AND SUPPOR VECOR MACHINES Kazushi Ikeda Nara Istitute of Sciece ad echology, Ikoma, Nara, Japa Keywords: kerel, margi, statistical learig theory Cotets. Itroductio. Kerel Fuctio ad Feature Space 3. Represeter heorem 4. Example 5. Pre-Image Problem 6. Properties of Kerel Methods 7. Statistical Learig heory 8. Support Vector Machies 9. Variatios of SVMs 0. Coclusios Glossary Bibliography Biographical Sketches Summary Kerel models are a method for itroducig oliear preprocessig to liear models. A kerel fuctio determies a feature space called a reproducig kerel Hilbert space. he solutio of a optimizatio problem with kerel models is expressed as a liear combiatio of give examples thaks to the represeter theorem. his meas that o explicit feature vectors are required ad hece kerel models are applicable to eve sequeces or graphs. he first half of this chapter itroduces the readers to the fudametals of kerel models such as the kerel trick ad the represeter theorem. he most successful applicatio of kerel models is support vector machies (SVMs). SVMs are derived based o the statistical learig theory, which does ot assume ay statistical model beforehad ad leads to margi maximizatio for higher geeralizatio ability. Although a SVM is costructed so that oly the upper boud of the geeralizatio error i a worst case is miimized, it performs better tha covetioal methods o average. Aother advatage of SVMs is its covexity, that is, it has a uique solutio differetly from other hierarchical learig models. he idea of margi maximizatio has bee applied to regressio, desity estimatio ad other areas. he secod half of this chapter itroduces the readers to the derivatio of SVMs ad their properties as well as several variatios.. Itroductio he simplest model of a euro is Perceptro that calculates the weighted sum of iput sigals ad outputs + whe the sum exceeds a threshold ad otherwise (Misky ad Ecyclopedia of Life Support Systems(EOLSS) 63

2 Papert, 969). he model attracted much attetio as a biary classifier or dichotomy with a adaptive algorithm for a give dataset called the Perceptro learig rule because it has good properties such as the covergece theorem that assures fiite times of iteratios for a liearly separable dataset ad the cyclig theorem that proves periodical solutios otherwise. Oe way to itroduce a oliear preprocessig to Perceptro is the multi-layer Perceptro model (MLP), where the threshold fuctio is replaced with a sigmoid fuctio such as the hyperbolic taget ad the preprocessor is autoomously acquired through learig. his model is simple ad works very well but it suffers a lot of local optimal solutios ad plateaus i a gradiet learig method whe the cost fuctio is give as the sum of squared errors. A alterative to the MLP is a kerel model. he preprocessor of a kerel model must be fixed a priori but it has a large variety of choices. I fact, we eed oly the ier product of preprocessed data, without describig them i a explicit form. his eables the feature space to have eve ifiite dimesios, or treat graphs directly, which leads to a wide rage of applicatios. Nowadays, kerel models are applied to ot oly classifiers but also regressio problems ad other liear sigal processig fields. From the mathematical viewpoit, a kerel model is based o its positive (or oegative) defiiteess. Positive defiite kerels appeared i the literature early i the twetieth cetury ad their geeral theory was established as the reproducig kerel Hilbert space i 950 s. However, the idea of kerel models has become popular after it was itroduced as a oliearity for support vector machies or SVMs i 990 s. A SVM is a liear dichotomy like Perceptro but it maximizes the margi, that is, the miimum distace of examples from the discrimiative boudary. he idea of margi maximizatio is based o Vapik s statistical learig theory that miimizes the structural risk i the framework of the probably approximately correct (PAC) learig itroduced by Valiat i 984. Oe of the good properties of SVMs is the structural risk miimizatio or SRM. he predictio error by SVMs is theoretically bouded eve i the worst case ad is ofte much lower tha the boud or that of MLPs. Aother pro is the uiqueess of the solutio. A SVM results i a covex quadratic programmig problem that has a uique solutio. Moreover, recet developmet of computer sciece ad egieerig such as ier poit methods makes the solvable size larger ad larger. haks to such superiority, SVMs have attracted much attetio ad their theoretical properties as well as their variats have bee well studied.. Kerel Fuctio ad Feature Space A kerel method maps a iput vector x X to the correspodig feature vector f( x ) i a high-dimesioal feature space F ad processes it liearly, such as Perceptro, the liear regressio or the pricipal compoet aalysis (PCA). Let us cosider the Perceptro learig here as a simple example. 64

3 Give the t - th example ( x () t, y( t) ) where () t { k} k= () x x is the t -th iput vector ad y t is the true label for the iput vector, the Perceptro learig rule i the feature space F is formulated as () t = ( t ) + y() t ( () t ) y() t () t ( () t ) () t = w( t ) otherwise, w w f x if sg w f x, w () where w () t is the weight vector of Perceptro at iteratio t ad deotes the traspose. Here, we assume that the weight vector ca be expressed as a weighted sum of iputs, that is, w() t = αk () t f( x k) () k= he, the algorithm () is simply rewritte as α α () t α ( t ) y() t if y() t sg () t ( () t ), () t () t = α ( t ) otherwise, k k k k = + w f x x = x k (3) for all k,, =. his is called the kerel Perceptro. Here, the ier product of w ( t) ad f( x ) for a arbitrary x X is calculated as ( t) ( ) = α ( t) ( ) ( ) = α ( t) K( ) w f x f x f x x, x. (4) k k k k k= k= where K ( xk, x) = f ( xk) f( x ) is called a kerel fuctio. Hece, although we do ot kow the feature vector f( x ) itself, we ca calculate the ier product. his makes the feature vector very high-dimesioal or eve ifiite without icrease of computatioal complexity, which is called the kerel trick. Oe example of kerel fuctios is the p -th order polyomial kerel K xx, = x x + p. Whe x R ad =, a possible feature vector is ( ) ( ) ( ) = ( x x x x x x ) f x,,,,,, (5) 65

4 ad hece w () t ( ) f x ca express ay polyomial fuctio with order two or less. x x Aother popular kerel fuctio is Gaussia kerel K ( xx, ) = exp, whose σ feature vector is essetially the Fourier trasform as show below: Suppose that the feature of x idexed by ω R is exp( iω x) where ω ad x are oe-dimesioal for simplicity. he, the ier product of exp( iω x) ad exp( iω x ) is calculated as ω K( x, x ) = exp( iωx) exp( iωx ) exp dω π ( ω i( x x )) ( x x ) = exp exp dω (6) π ( x x ) = exp where ( ω ) exp π is a measure of the space of ω. A advatage of kerel models is that we eed ot cosider the feature space for a kerel fuctio explicitly. his property makes kerel models possible to apply to the spaces of statistical models, sequeces, ad eve graphs. I fact, Mercer s theorem assures the existece of a ier product space F for ay positive semi defiite (or o-egative defiite) kerel fuctio K (, ), where a symmetric cotiuous fuctio K (, ) is said to be positive semi defiite if ad oly if it satisfies cc k mk( xk, x m) 0 (7) k= m= k k= c k k= for ay vector { x }, ay real umber { } ad ay positive iteger. he Gaussia kerel, for example, has a feature space thaks to this property although it is difficult to describe explicitly. From the practical viewpoit, however, provig a kerel fuctio beig positive semi defiite is more difficult tha costructig a feature space explicitly. Hece, popular kerel fuctios are simple oes such as the polyomial kerel or Gaussia kerel, or their combiatios. I fact, whe two fuctios K ad K are positive semi defiite, so are the followig fuctios: ( ) ( ) ( ) ( ) ck + ck for c, c 0, KK, exp K, g x K x, x g x. (8) 66

5 he polyomial kerel as well as Gaussia kerel are prove to be positive semi defiite usig (8). What kerel should we use? he aswer strogly depeds o the applicatio we cosider but a alterative approach is kerel learig, that is, automatic selectio of optimal of a c i i= kerel from data. Oe typical example is to optimize the coefficiets { } weighted sum of kerels, becomes best. 3. Represeter heorem i= ck i i, so that the performace of the learig machie I the previous sectio, we assumed that the weight vector could be expressed as a weighted sum of iputs, that is, w( x) = αkk ( x, x k). (9) k= A geeral theory to ustify the above assumptio is the Represeter heorem, which says the solutio of the optimizatio problem below has a solutio w of the form i (9): mi { }, { }, L xk yk w( xk) + ch i i( xk) +Ψ( w ), (0), k= k= w Hc R i= k = { } = where (, y ) x is a dataset, K is a kerel fuctio of x, H is the reproducig k k k kerel Hilbert space of K, { i} i h = is a set of basis fuctios of x, ad () Ψ is a mootoically icreasig fuctio (regularizatio term). Ituitively speakig, ay compoet i the complemet space H 0 of H0 = spa ({ K(, x k )} k= ) i w does ot chage the cost L but oly icreases the ormalizatio term Ψ. Oe simple example of L is mi w H ( ) y k w( x k ) + λ w H k= () ad aother example is the support vector machie as show later. haks to the kerel trick, the computatioal complexity of kerel methods does ot deped o the dimesios of the feature space but oly o the umber of examples. 67

6 Perceptro (MLP) PAC learig Perceptro Pricipal Compoet Aalysis (PCA) Represeter heorem Support Vector Machie (SVM) Support Vector Regressio (SVR) sigmoid fuctio. It ca approximate ay cotiuous fuctio with a arbitrary precisio if it has eough hidde uits. : Probably Approximately correct learig. It cosiders the coditio uder which the probability that the error of a learig algorithm exceeds a specific value is bouded by a specific value. : A liear dichotomy. It outputs the sig of the ier product betwee a weight vector ad a iput vector. : A dimesioality reductio method. It leaves the space spaed by eigevectors with large eigevalues (pricipal compoets) ad removes the compoets iduced by oise. : A theorem that assures the solutio of a optimizatio problem is described as a weighted sum of data. his is the theoretical base of kerel models. : he liear dichotomy that maximizes the margi to give examples. Sice such oe does ot exist for a liearly iseparable dataset, there are variatios to cope with this difficulty. : he liear regressio that employs the ε -isesitive error fuctio. his is robust agaist outliers due to the L-orm regularizer as well as a small umber of support vectors due to the isesitivity. Bibliography Misky M.L., Papert S.A. (969). Perceptros, MI Press, Cambridge, MA. [his is a comprehesive textbook for perceptros]. Rumelhert D.E., McClellad J.L. (986). Parallel Distributed Processig, MI Press, Cambridge, MA. [his presets a theory for multi-layer perceptros ad their applicatios]. Fukumizu K (00). Itroductio to Kerel Methods, Asakura-Shote, okyo. [his presets a theoretical backgroud of kerel methods for begiers. Note that it is writte i Japaese]. Wataabe S (00). Algebraic Aalysis for Noidetifiable Learig Machies. Neural Computatio, 3(4), [his is a epoch-makig paper itroducig algebraic geometry to machie learig theory]. Vapik V.N. (995). he Nature of Statistical Learig heory, Spriger, Heidelberg. [his is the first book itroducig support vector machies]. Valiat L.G. (984). A heory of Learable. Commuicatios of the ACM, 7(), [his paper presets a framework of the PAC learig]. Kwok J.., sag I.W. (004). he Pre-Image Problem i Kerel Methods, IEEE rasactios o Neural Networks, 5(6), [his gives a cocrete method to cope with the pre-image problem of kerel methods]. Schoelkopf B., Smola A.J., Williamso R.C., Bartlett P.L. (000). New Support Vector Algorithms, Neural Computatio,, [his paper itroduces the framework of the ν -SVMs]. Beett K.P., Bredesteier E.J. (000). Duality ad Geometry i SVM Classifiers, Proc. ICML, [his is the first paper to show aother view of the ν -SVMs, that is, the covex hulls of data]. Ikeda K. (004). A Asymptotic Statistical heory of Polyomial Kerel Methods. Neural Computatio, 6(8), [his paper solves the cotradictio of the theoretical results o SVMs]. 77

7 Ikeda K., Aoishi. (005). A Asymptotic Statistical Aalysis of Support Vector Machies with Soft Margis, Neural Computatio, 8, [his paper gives a theoretical aalysis of the effect of soft margis]. Schoelkopf B., Platt J.C., Shawe-aylor J., Smola A.J. (00). Estimatig the Support of a High-Dimesioal Distributio, Neural Computatio, 3, [his paper applies the idea of support vector macchies to a desity estimatio by formulatig it as a oe-class SVM]. Biographical Sketch Kazushi Ikeda received the B.E., M.E. ad Ph.D. degrees i Mathematical Egieerig ad Iformatio Physics from the Uiversity of okyo, okyo, Japa, i 989, 99 ad 994, respectively. He the oied the Departmet of Electrical ad Computer Egieerig, Kaazawa Uiversity, Kaazawa, Japa, as a Assistat Professor ad moved to the Departmet of Systems Sciece, Kyoto Uiversity, Kyoto, Japa, as a Associate Professor, i 998. Sice 008, he has bee a Professor i the Graduate School of Iformatio Sciece, Nara Istitute of Sciece ad echology, Nara, Japa. His research iterests iclude machie learig theory such as support vector machies ad iformatio geometry, ad applicatios to adaptive systems ad brai iformatics. He is curretly the Editor-i-Chief of Joural of Japaese Neural Network Society (JNNS), a Actio Editor of Neural Networks (Elsevier) ad a Associate Editor of IEEE rasactios o Neural Networks ad Learig Systems ad IEICE rasactios o Iformatio ad Systems. He has served as a member of Board of Goverors of JNNS. 78

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Chapter 7. Support Vector Machine

Chapter 7. Support Vector Machine Chapter 7 Support Vector Machie able of Cotet Margi ad support vectors SVM formulatio Slack variables ad hige loss SVM for multiple class SVM ith Kerels Relevace Vector Machie Support Vector Machie (SVM)

More information

State Space Representation

State Space Representation Optimal Cotrol, Guidace ad Estimatio Lecture 2 Overview of SS Approach ad Matrix heory Prof. Radhakat Padhi Dept. of Aerospace Egieerig Idia Istitute of Sciece - Bagalore State Space Represetatio Prof.

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Brief Review of Functions of Several Variables

Brief Review of Functions of Several Variables Brief Review of Fuctios of Several Variables Differetiatio Differetiatio Recall, a fuctio f : R R is differetiable at x R if ( ) ( ) lim f x f x 0 exists df ( x) Whe this limit exists we call it or f(

More information

Machine Learning. Ilya Narsky, Caltech

Machine Learning. Ilya Narsky, Caltech Machie Learig Ilya Narsky, Caltech Lecture 4 Multi-class problems. Multi-class versios of Neural Networks, Decisio Trees, Support Vector Machies ad AdaBoost. Reductio of a multi-class problem to a set

More information

CS537. Numerical Analysis and Computing

CS537. Numerical Analysis and Computing CS57 Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 Jauary 9 9 What is the Root May physical system ca be

More information

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice 0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct

More information

CMSE 820: Math. Foundations of Data Sci.

CMSE 820: Math. Foundations of Data Sci. Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Quantile regression with multilayer perceptrons.

Quantile regression with multilayer perceptrons. Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer

More information

Learning Bounds for Support Vector Machines with Learned Kernels

Learning Bounds for Support Vector Machines with Learned Kernels Learig Bouds for Support Vector Machies with Leared Kerels Nati Srebro TTI-Chicago Shai Be-David Uiversity of Waterloo Mostly based o a paper preseted at COLT 06 Kerelized Large-Margi Liear Classificatio

More information

CS321. Numerical Analysis and Computing

CS321. Numerical Analysis and Computing CS Numerical Aalysis ad Computig Lecture Locatig Roots o Equatios Proessor Ju Zhag Departmet o Computer Sciece Uiversity o Ketucky Leigto KY 456-6 September 8 5 What is the Root May physical system ca

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machie Learig (Fall 2014) Drs. Sha & Liu {feisha,yaliu.cs}@usc.edu October 9, 2014 Drs. Sha & Liu ({feisha,yaliu.cs}@usc.edu) CSCI567 Machie Learig (Fall 2014) October 9, 2014 1 / 49 Outlie Admiistratio

More information

TEACHER CERTIFICATION STUDY GUIDE

TEACHER CERTIFICATION STUDY GUIDE COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Discrete Orthogonal Moment Features Using Chebyshev Polynomials

Discrete Orthogonal Moment Features Using Chebyshev Polynomials Discrete Orthogoal Momet Features Usig Chebyshev Polyomials R. Mukuda, 1 S.H.Og ad P.A. Lee 3 1 Faculty of Iformatio Sciece ad Techology, Multimedia Uiversity 75450 Malacca, Malaysia. Istitute of Mathematical

More information

Chandrasekhar Type Algorithms. for the Riccati Equation of Lainiotis Filter

Chandrasekhar Type Algorithms. for the Riccati Equation of Lainiotis Filter Cotemporary Egieerig Scieces, Vol. 3, 00, o. 4, 9-00 Chadrasekhar ype Algorithms for the Riccati Equatio of Laiiotis Filter Nicholas Assimakis Departmet of Electroics echological Educatioal Istitute of

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem This is the Pre-Published Versio. A New Solutio Method for the Fiite-Horizo Discrete-Time EOQ Problem Chug-Lu Li Departmet of Logistics The Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog Phoe: +852-2766-7410

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001.

Physics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001. Physics 324, Fall 2002 Dirac Notatio These otes were produced by David Kapla for Phys. 324 i Autum 2001. 1 Vectors 1.1 Ier product Recall from liear algebra: we ca represet a vector V as a colum vector;

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

A Note on Matrix Rigidity

A Note on Matrix Rigidity A Note o Matrix Rigidity Joel Friedma Departmet of Computer Sciece Priceto Uiversity Priceto, NJ 08544 Jue 25, 1990 Revised October 25, 1991 Abstract I this paper we give a explicit costructio of matrices

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series. .3 Covergece Theorems of Fourier Series I this sectio, we preset the covergece of Fourier series. A ifiite sum is, by defiitio, a limit of partial sums, that is, a cos( kx) b si( kx) lim a cos( kx) b si(

More information

On the convergence rates of Gladyshev s Hurst index estimator

On the convergence rates of Gladyshev s Hurst index estimator Noliear Aalysis: Modellig ad Cotrol, 2010, Vol 15, No 4, 445 450 O the covergece rates of Gladyshev s Hurst idex estimator K Kubilius 1, D Melichov 2 1 Istitute of Mathematics ad Iformatics, Vilius Uiversity

More information

Math 312 Lecture Notes One Dimensional Maps

Math 312 Lecture Notes One Dimensional Maps Math 312 Lecture Notes Oe Dimesioal Maps Warre Weckesser Departmet of Mathematics Colgate Uiversity 21-23 February 25 A Example We begi with the simplest model of populatio growth. Suppose, for example,

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Practical Spectral Anaysis (continue) (from Boaz Porat s book) Frequency Measurement

Practical Spectral Anaysis (continue) (from Boaz Porat s book) Frequency Measurement Practical Spectral Aaysis (cotiue) (from Boaz Porat s book) Frequecy Measuremet Oe of the most importat applicatios of the DFT is the measuremet of frequecies of periodic sigals (eg., siusoidal sigals),

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Similarity Solutions to Unsteady Pseudoplastic. Flow Near a Moving Wall

Similarity Solutions to Unsteady Pseudoplastic. Flow Near a Moving Wall Iteratioal Mathematical Forum, Vol. 9, 04, o. 3, 465-475 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.988/imf.04.48 Similarity Solutios to Usteady Pseudoplastic Flow Near a Movig Wall W. Robi Egieerig

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Lecture 2 Clustering Part II

Lecture 2 Clustering Part II COMS 4995: Usupervised Learig (Summer 8) May 24, 208 Lecture 2 Clusterig Part II Istructor: Nakul Verma Scribes: Jie Li, Yadi Rozov Today, we will be talkig about the hardess results for k-meas. More specifically,

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation

A collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation Iteratioal Joural of Mathematics Research. ISSN 0976-5840 Volume 9 Number 1 (017) pp. 45-51 Iteratioal Research Publicatio House http://www.irphouse.com A collocatio method for sigular itegral equatios

More information

Numerical Solution of the Two Point Boundary Value Problems By Using Wavelet Bases of Hermite Cubic Spline Wavelets

Numerical Solution of the Two Point Boundary Value Problems By Using Wavelet Bases of Hermite Cubic Spline Wavelets Australia Joural of Basic ad Applied Scieces, 5(): 98-5, ISSN 99-878 Numerical Solutio of the Two Poit Boudary Value Problems By Usig Wavelet Bases of Hermite Cubic Splie Wavelets Mehdi Yousefi, Hesam-Aldie

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

10.6 ALTERNATING SERIES

10.6 ALTERNATING SERIES 0.6 Alteratig Series Cotemporary Calculus 0.6 ALTERNATING SERIES I the last two sectios we cosidered tests for the covergece of series whose terms were all positive. I this sectio we examie series whose

More information

Law of the sum of Bernoulli random variables

Law of the sum of Bernoulli random variables Law of the sum of Beroulli radom variables Nicolas Chevallier Uiversité de Haute Alsace, 4, rue des frères Lumière 68093 Mulhouse icolas.chevallier@uha.fr December 006 Abstract Let be the set of all possible

More information

Signal Processing in Mechatronics. Lecture 3, Convolution, Fourier Series and Fourier Transform

Signal Processing in Mechatronics. Lecture 3, Convolution, Fourier Series and Fourier Transform Sigal Processig i Mechatroics Summer semester, 1 Lecture 3, Covolutio, Fourier Series ad Fourier rasform Dr. Zhu K.P. AIS, UM 1 1. Covolutio Covolutio Descriptio of LI Systems he mai premise is that the

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

x c the remainder is Pc ().

x c the remainder is Pc (). Algebra, Polyomial ad Ratioal Fuctios Page 1 K.Paulk Notes Chapter 3, Sectio 3.1 to 3.4 Summary Sectio Theorem Notes 3.1 Zeros of a Fuctio Set the fuctio to zero ad solve for x. The fuctio is zero at these

More information

Properties of Fuzzy Length on Fuzzy Set

Properties of Fuzzy Length on Fuzzy Set Ope Access Library Joural 206, Volume 3, e3068 ISSN Olie: 2333-972 ISSN Prit: 2333-9705 Properties of Fuzzy Legth o Fuzzy Set Jehad R Kider, Jaafar Imra Mousa Departmet of Mathematics ad Computer Applicatios,

More information

A new iterative algorithm for reconstructing a signal from its dyadic wavelet transform modulus maxima

A new iterative algorithm for reconstructing a signal from its dyadic wavelet transform modulus maxima ol 46 No 6 SCIENCE IN CHINA (Series F) December 3 A ew iterative algorithm for recostructig a sigal from its dyadic wavelet trasform modulus maxima ZHANG Zhuosheg ( u ), LIU Guizhog ( q) & LIU Feg ( )

More information

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading : ME 537: Learig-Based Cotrol Week 1, Lecture 2 Neural Network Basics Aoucemets: HW 1 Due o 10/8 Data sets for HW 1 are olie Proect selectio 10/11 Suggested readig : NN survey paper (Zhag Chap 1, 2 ad Sectios

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Study the bias (due to the nite dimensional approximation) and variance of the estimators

Study the bias (due to the nite dimensional approximation) and variance of the estimators 2 Series Methods 2. Geeral Approach A model has parameters (; ) where is ite-dimesioal ad is oparametric. (Sometimes, there is o :) We will focus o regressio. The fuctio is approximated by a series a ite

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 3. Properties of Summary Statistics: Sampling Distribution Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary

More information

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan Deviatio of the Variaces of Classical Estimators ad Negative Iteger Momet Estimator from Miimum Variace Boud with Referece to Maxwell Distributio G. R. Pasha Departmet of Statistics Bahauddi Zakariya Uiversity

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

Advanced Course of Algorithm Design and Analysis

Advanced Course of Algorithm Design and Analysis Differet complexity measures Advaced Course of Algorithm Desig ad Aalysis Asymptotic complexity Big-Oh otatio Properties of O otatio Aalysis of simple algorithms A algorithm may may have differet executio

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine Lecture 11 Sigular value decompositio Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaie V1.2 07/12/2018 1 Sigular value decompositio (SVD) at a glace Motivatio: the image of the uit sphere S

More information

Self-normalized deviation inequalities with application to t-statistic

Self-normalized deviation inequalities with application to t-statistic Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric

More information

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017

Lecture 9: Boosting. Akshay Krishnamurthy October 3, 2017 Lecture 9: Boostig Akshay Krishamurthy akshay@csumassedu October 3, 07 Recap Last week we discussed some algorithmic aspects of machie learig We saw oe very powerful family of learig algorithms, amely

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information