1 Definition of Rademacher Complexity


 Maryann Williamson
 1 years ago
 Views:
Transcription
1 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the cases of consstent and nconsstent hypotheses selected fro fnte and nfnte hypothess spaces. In partcular, last te, we proved bounds for the case of nconsstent hypotheses selected fro nfnte hypothess spaces. However, recall that each te we encountered the proble of an nfnte hypothess space, we had to resort to technques lke usng ghost saples or the VCdenson of a concept class. In ths lecture, we ntroduce a ore odern and elegant approach, usng a concept called Radeacher coplexty. Ths approach turns out to nclude each of the bounds we ve proved n the past few lectures as specal cases. 1 Defnton of Radeacher Coplexty 1.1 Soe usual defntons Before gettng nto the defnton of Radeacher coplexty, we rend ourselves of the usual setup: Let the saple S = ((x 1, y 1 ),..., (x, y )) where, unlke before, y = 1, +1} Let the hypothess h : X 1, +1} To easure how well h fts S, let the tranng error err(h) ˆ = 1 =1 1 h(x ) y Note that, snce we are usng y = 1, +1} nstead of y = 0, 1} as n prevous lectures (for splcty), we can provde an alternatve defnton of tranng error: err(h) ˆ = 1 1h(x ) y } (1) =1 1 f (h(x ), y ) = (1, 1) or ( 1, 1) = 1 0 f (h(x =1 ), y ) = (1, 1) or ( 1, 1) (2) = 1 1 y h(x ) 2 =1 (3) = y h(x ) 2 (4) =1 The ter 1 =1 y h(x ) can be nterpreted as the correlaton of the predctons h(x ) wth the labels y. We see that correlaton s related to tranng error as correlaton = 1 2err(h). ˆ To fnd a hypothess h that nzes tranng error, we can thus equvalently seek to fnd the h satsfyng: 1 arg ax y h(x ) (5) h H
2 1.2 Playng wth correlaton Iagne, now, an experent where we replace a saple s true labels y wth the Radeacher rando varables σ : +1 wth prob. 1/2 σ = (6) 1 wth prob. 1/2 Ths gves a odfed expresson for correlaton: arg ax h H 1 σ h(x ) (7) Instead of selectng the hypothess n H that correlates best wth the labels, ths now selects the hypothess h n H that correlates best wth the rando nose varables σ. Snce h s dependent on the rando varables σ, however, to easure how well H can correlate wth rando nose, we take the expectaton of ths correlaton over the rando varables σ and fnd: E σ [ax h H 1 σ h(x )] (8) Ths ntutvely easures the expressveness of H. We can bound ths expresson usng two extree cases: H = 1 where we only have one choce for a hypothess, and H = 2 where H shatters S. In the frst case, our expectaton equals 0 snce the ax ter dsappears; n the second case our expectaton equals 1 snce there always exsts a hypothess atchng any set of σ s. Thus our easure, as defned above, ust fall between 0 and Generalzng correlaton Instead of workng wth hypotheses h : X 1, +1}, let s generalze our class of functons to the set of all realvalued functons. Replace H wth F, whch we defne to be any faly of functons f : Z R. Now, gven saple S = (z 1,..., z ) wth z Z, f we apply our expresson fro above to F, we arrve at the eprcal Radeacher coplexty of a faly of functons F wth respect to a saple S: 1 1 ˆR S (F) := E σ [sup σ f(z )] (9) Agan, ths expresson easures how well, on average, the functon class F correlates wth rando nose over the saple S. However, often we want to easure the correlaton of F wth respect to a dstrbuton D over X, rather than wth respect to a saple S over X. To fnd ths, we take the expectaton of ˆR S (F) over all saples of sze drawn accordng to D: R (F) := E[ ˆR S (F)] (10) Ths s the Radeacher coplexty, or for clarty, the expected Radeacher coplexty, of F. We now have the defntons we need, and are fnally ready to present our frst generalzaton bounds based on Radeacher coplexty. 1 Note: Snce F can be the faly of all realvalued functons, ax ay not exst. Thus we use sup nstead, whch s defned as the least upper bound on the eleents n a set. For exaple, the sup of the set.9,.99,.999,...} s 1. 2
3 2 Generalzaton bounds based on Radeacher coplexty 2.1 Bounds for general functon classes F The followng theore wll serve as a very general tool for provng unfor convergence bounds va the concept of Radeacher coplexty: Theore 1. Let F be a faly of functons appng fro Z to [0, 1], and let saple S = (z 1,..., z ) where z D for soe dstrbuton D over Z. Defne E[f] := E Z D [f(z)], and defne ÊS[f] := 1 =1 f(z ). Wth probablty 1 δ, for all f F: 2 ( ) E[f] ÊS[f] + 2R (F) + O ( ) E[f] ÊS[f] + 2 ˆR S (F) + O Proof. We derve a bound for E[f] ÊS[f] for all f F, or equvalently, bound sup (E[f] Ê S [f]). Note that ths expresson s a rando varable that depends on S. So we want to bound the followng rando varable: (11) (12) Φ(S) = sup(e[f] ÊS[f]) (13) Step 1: We show, wth probablty 1 δ, Φ(S) E S [Φ(S)] + 2. Ths step allows us to go fro workng wth Φ(S) to workng wth E S [Φ(S)]. then: Recall that McDard s nequalty states that, f: f(x 1,..., x,..., x ) f(x 1,..., x,..., x ) c (14) P r[f(x 1,..., x ) E[f(X 1,..., X )] + ɛ] exp( 2ɛ 2 / Fro the defnton of Φ(S), we have: c 2 ) (15) Φ(S) = sup(e[f] ÊS[f]) (16) = sup (E[f] 1 f(z )) (17) Snce f(z ) [0, 1] for all z, changng any one exaple z to z n the tranng set S wll change 1 f(z ) by at ost 1. Thus ths changng of any one exaple affects Φ(S) by at ost ths aount, plyng that Φ((z 1,..., z,..., z )) Φ((z 1,..., z,..., z )) 1. Ths fts the condton of McDard s nequalty (see (14)) wth c = 1, so we can apply McDard s nequalty and arrve at the bound shown. 2 Note that the BgOh ters n the two expressons have dfferent constants. =1 3
4 Step 2: Defne a ghost saple S = (z 1,..., z ), z D. We show that E S [Φ(S)] E S,S [sup (ÊS [f] ÊS[f])]: E S [Φ(S)] = E S [sup(e[f] ÊS[f])] (18) = E S [sup(e S [ÊS [f]] ÊS[f])] (19) = E S [sup(e S [ÊS [f] ÊS[f]])] (20) E S,S [sup(ês [f] ÊS[f])] (21) Note that we arrve at (19) snce the expected Radeacher coplexty E[f] s equal to the expectaton over all saples S of the eprcal Radeacher coplexty over those S, or E S [ÊS [f]]. We also arrve at (21) by ovng the expectaton over S n (20) outsde of the sup; ths can be done snce the expectaton of a ax over soe functon s at least the ax of that expectaton over that functon. Step 3: We show E S,S [sup (ÊS [f] ÊS[f])] = E S,S,σ[sup σ (f(z ) f(z ))] We use the ghost saplng technque for ths step. In partcular, for each par of eleents z, z n S, S respectvely, swap the two wth probablty 1/2. Let the resultng two sets of exaples be T, T. Snce S, S each ntally represented d saples fro D, we have that T, T S, S. Ths ples: Ê S [f] ÊS[f] ÊT [f] ÊT [f] (22) = 1 f(z ) f(z ) wth prob. 1/2 f(z ) f(z (23) ) wth prob. 1/2 = 1 σ (f(z ) f(z )) (24) Thus the expressons sup (ÊS [f] ÊS[f]) and sup σ (f(z ) f(z )) are equally dstrbuted. The latter depends on an addtonal set of rando varables σ, however, so we ust take the expectaton of the latter over σ as well as S, S. Takng the expectaton of the forer over S, S, as well, we arrve at the expresson shown. Step 4: We show E S,S,σ[sup σ (f(z ) f(z ))] 2R (F) E S,S,σ[sup σ (f(z ) f(z ))] E S,S,σ[sup σ f(z ) + sup ( σ )f(z ))] (25) E S,σ[sup σ f(z )] + E S,σ [sup ( σ )f(z ))] (26) = R (F) + R (F) (27) where we arrve at (27) because σ has the sae dstrbuton as σ. Concluson: Cobnng all the peces together, we fnally have that, wth probablty 1 δ, for all f F: E[f] ÊS[f] 2R (F) + (28) 2 4
5 To derve the bound nvolvng ˆR S (F), we use McDard s nequalty agan. Recall the defnton of ˆR 1 S (F) := E σ [sup σ f(z )]. Snce f [0, 1], changng one eleent n S changes ˆR S (F) by at ost 1. We can apply McDard s nequalty agan, fndng, wth probablty 1 δ: ˆR S (F) R (F) + (29) 2 Usng a δ = δ/2 and applyng the unon bound to (28) and (29), we have our result. Wth probablty 1 δ, for all f F: E[f] ÊS[f] + 2 ˆR S (F) + O( ) (30) 2.2 Bounds for hypothess spaces H To get fro ths generalzaton bound on classes of all realvalued functons to classes of hypotheses, defne the followng: Z = X 1, +1} (31) f h (x, y) = 1h(x) y} (32) F H = f h : h H} (33) Note that, due to (33), each f h F H corresponds to soe h H. Also note that, by these defntons, we have: err(h) = E (x,y) D [1h(x) y}] = E[f h ] (34) err(h) ˆ = 1 1h(x ) y } = h ] (35) Evdently we can use our bound fro Theore 1 to bound err(h) err(h): ˆ 1 ˆR S (F H ) = E σ [ sup σ f h (x, y )] (36) f h F H 1 = E σ [sup σ ( 1 y h(x ) )] (37) h H 2 = E σ [ 1 1 σ + sup ( y σ )h(x )] (38) 2 h H 2 = 1 2 E 1 σ[sup h H = 1 2 E 1 σ[sup h H ( y σ )h(x )] (39) σ h(x )] (40) = 1 2 ˆR S (H) (41) Note that we arrve at (40) snce ( y σ ) has the sae dstrbuton as σ. Now, cobnng (30), (34), (35), and (41), we have: err(h) err(h) ˆ + ˆR S (H) + O( ) (42) 5
COS 511: Theoretical Machine Learning
COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that
More informationExcess Error, Approximation Error, and Estimation Error
E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple
More informationComputational and Statistical Learning theory Assignment 4
Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}
More informationLearning Theory: Lecture Notes
Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be
More information1 Review From Last Time
COS 5: Foundatons of Machne Learnng Rob Schapre Lecture #8 Scrbe: Monrul I Sharf Aprl 0, 2003 Revew Fro Last Te Last te, we were talkng about how to odel dstrbutons, and we had ths setup: Gven  exaples
More informationVapnikChervonenkis theory
VapnkChervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationSystem in Weibull Distribution
Internatonal Matheatcal Foru 4 9 no. 9 9495 Relablty Equvalence Factors of a SeresParallel Syste n Webull Dstrbuton M. A. ElDacese Matheatcs Departent Faculty of Scence Tanta Unversty Tanta Egypt eldacese@yahoo.co
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More informationPolynomials. 1 More properties of polynomials
Polynomals 1 More propertes of polynomals Recall that, for R a commutatve rng wth unty (as wth all rngs n ths course unless otherwse noted), we defne R[x] to be the set of expressons n =0 a x, where a
More informationStanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011
Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected
More informationLectures  Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures  Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the faketest data; fxed
More informationMath 426: Probability MWF 1pm, Gasson 310 Homework 4 Selected Solutions
Exercses from Ross, 3, : Math 26: Probablty MWF pm, Gasson 30 Homework Selected Solutons 3, p. 05 Problems 76, 86 3, p. 06 Theoretcal exercses 3, 6, p. 63 Problems 5, 0, 20, p. 69 Theoretcal exercses 2,
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013
COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.
More informationFinding Dense Subgraphs in G(n, 1/2)
Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft ResearchBangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng
More informationprinceton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg
prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationDesigning Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate
The Frst Internatonal Senar on Scence and Technology, Islac Unversty of Indonesa, 45 January 009. Desgnng Fuzzy Te Seres odel Usng Generalzed Wang s ethod and Its applcaton to Forecastng Interest Rate
More information1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands
Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of
More information1 Matrix representations of canonical matrices
1 Matrx representatons of canoncal matrces 2d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3d rotaton around the xaxs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3d rotaton around the yaxs:
More information10701/ Machine Learning, Fall 2005 Homework 3
10701/15781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons10701@autonlaborg for queston Problem 1 Regresson and Crossvaldaton [40
More informationLecture 3. Ax x i a i. i i
18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest
More informationThe Parity of the Number of Irreducible Factors for Some Pentanomials
The Party of the Nuber of Irreducble Factors for Soe Pentanoals Wolfra Koepf 1, Ryul K 1 Departent of Matheatcs Unversty of Kassel, Kassel, F. R. Gerany Faculty of Matheatcs and Mechancs K Il Sung Unversty,
More information3.1 Expectation of Functions of Several Random Variables. )' be a kdimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationCanonical transformations
Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationAN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU
AN ANALYI OF A FRACTAL KINETIC CURE OF AAGEAU by John Maloney and Jack Hedel Departent of Matheatcs Unversty of Nebraska at Oaha Oaha, Nebraska 688 Eal addresses: aloney@unoaha.edu, jhedel@unoaha.edu Runnng
More informationSupplementary material: Margin based PU Learning. Matrix Concentration Inequalities
Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the wellknown concentraton nequalty, so the covarance estmator can be bounded Then we
More informationP exp(tx) = 1 + t 2k M 2k. k N
1. Subgaussan tals Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan.
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More information1 The Mistake Bound Model
5850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395  Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 50757 HIKARI Ltd, wwwmhkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationSalmon: Lectures on partial differential equations. Consider the general linear, secondorder PDE in the form. ,x 2
Salmon: Lectures on partal dfferental equatons 5. Classfcaton of secondorder equatons There are general methods for classfyng hgherorder partal dfferental equatons. One s very general (applyng even to
More informationXII.3 The EM (ExpectationMaximization) Algorithm
XII.3 The EM (ExpectatonMaxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles
More informationOur focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.
SSTEM MODELLIN In order to solve a control syste proble, the descrptons of the syste and ts coponents ust be put nto a for sutable for analyss and evaluaton. The followng ethods can be used to odel physcal
More informationWeek 2. This week, we covered operations on sets and cardinality.
Week 2 Ths week, we covered operatons on sets and cardnalty. Defnton 0.1 (Correspondence). A correspondence between two sets A and B s a set S contaned n A B = {(a, b) a A, b B}. A correspondence from
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationExpected Value and Variance
MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or
More informationBAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup
BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS Darusz Bskup 1. Introducton The paper presents a nonparaetrc procedure for estaton of an unknown functon f n the regresson odel y = f x + ε = N. (1) (
More informationFirst day August 1, Problems and Solutions
FOURTH INTERNATIONAL COMPETITION FOR UNIVERSITY STUDENTS IN MATHEMATICS July 30 August 4, 997, Plovdv, BULGARIA Frst day August, 997 Problems and Solutons Problem. Let {ε n } n= be a sequence of postve
More informationITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING
ESE 5 ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING Gven a geostatstcal regresson odel: k Y () s x () s () s x () s () s, s R wth () unknown () E[ ( s)], s R ()
More informationSELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:
SELECTED PROOFS DeMorgan s formulas: The frst one s clear from Venn dagram, or the followng truth table: A B A B A B Ā B Ā B T T T F F F F T F T F F T F F T T F T F F F F F T T T T The second one can be
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSRJM) ISSN: 22785728. Volume 3, Issue 3 SepOct. 202), PP 4448 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationFoundations of Arithmetic
Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an
More informationarxiv: v2 [math.co] 3 Sep 2017
On the Approxate Asyptotc Statstcal Independence of the Peranents of 0 Matrces arxv:705.0868v2 ath.co 3 Sep 207 Paul Federbush Departent of Matheatcs Unversty of Mchgan Ann Arbor, MI, 4809043 Septeber
More informationLECTURE :FACTOR ANALYSIS
LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationErratum: A Generalized Path Integral Control Approach to Reinforcement Learning
Journal of Machne Learnng Research 009 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of
More information18.1 Introduction and Recap
CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014
COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and
More informationQuantum Particle Motion in Physical Space
Adv. Studes Theor. Phys., Vol. 8, 014, no. 1, 734 HIKARI Ltd, www.hkar.co http://dx.do.org/10.1988/astp.014.311136 Quantu Partcle Moton n Physcal Space A. Yu. Saarn Dept. of Physcs, Saara State Techncal
More informationThe Feynman path integral
The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationCS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning
CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationMATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1
MATH 5707 HOMEWORK 4 SOLUTIONS CİHAN BAHRAN 1. Let v 1,..., v n R m, all lengths v are not larger than 1. Let p 1,..., p n [0, 1] be arbtrary and set w = p 1 v 1 + + p n v n. Then there exst ε 1,..., ε
More informationModule 2. Random Processes. Version 2 ECE IIT, Kharagpur
Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be
More informationThe Second AntiMathima on Game Theory
The Second AntMathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2player 2acton zerosum games 2. 2player
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More information4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA
4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth oneway ANOVA If the populatons ncluded n the study are selected
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationIntroductory Cardinality Theory Alan Kaylor Cline
Introductory Cardnalty Theory lan Kaylor Clne lthough by name the theory of set cardnalty may seem to be an offshoot of combnatorcs, the central nterest s actually nfnte sets. Combnatorcs deals wth fnte
More informationLeast Squares Fitting of Data
Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 19982014. All Rghts Reserved. Created: July 15, 1999 Last Modfed: February 9, 2008 Contents 1 Lnear Fttng
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationNotes on Frequency Estimation in Data Streams
Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to
More informationA. Proofs for learning guarantees
Leanng Theoy and Algoths fo Revenue Optzaton n SecondPce Auctons wth Reseve A. Poofs fo leanng guaantees A.. Revenue foula The sple expesson of the expected evenue (2) can be obtaned as follows: E b Revenue(,
More information/ n ) are compared. The logic is: if the two
STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationInterval Estimation in the Classical Normal Linear Regression Model. 1. Introduction
ECONOMICS 35*  NOTE 7 ECON 35*  NOTE 7 Interval Estmaton n the Classcal Normal Lnear Regresson Model Ths note outlnes the basc elements of nterval estmaton n the Classcal Normal Lnear Regresson Model
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationCOMP th April, 2007 Clement Pang
COMP 540 12 th Aprl, 2007 Cleent Pang Boostng Cobnng weak classers Fts an Addtve Model Is essentally Forward Stagewse Addtve Modelng wth Exponental Loss Loss Functons Classcaton: Msclasscaton, Exponental,
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 111 Emprcal Models 112 Smple Lnear Regresson 113 Propertes of the Least Squares Estmators 114 Hypothess Test n Smple Lnear Regresson 114.1 Use of ttests
More informationCOS 521: Advanced Algorithms Game Theory and Linear Programming
COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton
More informationNPCompleteness : Proofs
NPCompleteness : Proofs Proof Methods A method to show a decson problem Π NPcomplete s as follows. (1) Show Π NP. (2) Choose an NPcomplete problem Π. (3) Show Π Π. A method to show an optmzaton problem
More informationLeast Squares Fitting of Data
Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 19982015. All Rghts Reserved. Created: July 15, 1999 Last Modfed: January 5, 2015 Contents 1 Lnear Fttng
More informationXiangwen Li. March 8th and March 13th, 2001
CS49I Approxaton Algorths The VertexCover Proble Lecture Notes Xangwen L March 8th and March 3th, 00 Absolute Approxaton Gven an optzaton proble P, an algorth A s an approxaton algorth for P f, for an
More informationSymmetrization and Rademacher Averages
Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and
More informationLecture 4 Hypothesis Testing
Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to
More informationAnother converse of Jensen s inequality
Another converse of Jensen s nequalty Slavko Smc Abstract. We gve the best possble global bounds for a form of dscrete Jensen s nequalty. By some examples ts frutfulness s shown. 1. Introducton Throughout
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationSome basic inequalities. Definition. Let V be a vector space over the complex numbers. An inner product is given by a function, V V C
Some basc nequaltes Defnton. Let V be a vector space over the complex numbers. An nner product s gven by a functon, V V C (x, y) x, y satsfyng the followng propertes (for all x V, y V and c C) (1) x +
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120astyle regresson
More informationOn the number of regions in an mdimensional space cut by n hyperplanes
6 On the nuber of regons n an densonal space cut by n hyperplanes Chungwu Ho and Seth Zeran Abstract In ths note we provde a unfor approach for the nuber of bounded regons cut by n hyperplanes n general
More informationMore metrics on cartesian products
More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of
More informationStatistics II Final Exam 26/6/18
Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the
More informationModule 9. Lecture 6. Duality in Assignment Problems
Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept
More informationAppendix B. Criterion of RiemannStieltjes Integrability
Appendx B. Crteron of RemannSteltes Integrablty Ths note s complementary to [R, Ch. 6] and [T, Sec. 3.5]. The man result of ths note s Theorem B.3, whch provdes the necessary and suffcent condtons for
More informationLECTURE 89: THE BAKERCAMPBELLHAUSDORFF FORMULA
LECTURE 89: THE BAKERCAMPBELLHAUSDORFF FORMULA As we have seen, 1. Taylor s expanson on Le group, Y ] a(y ). So f G s an abelan group, then c(g) : G G s the entty ap for all g G. As a consequence, a()
More information