# 1 Definition of Rademacher Complexity

Size: px
Start display at page:

## Transcription

1 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #9 Scrbe: Josh Chen March 5, 2013 We ve spent the past few classes provng bounds on the generalzaton error of PAClearnng algorths for the cases of consstent and nconsstent hypotheses selected fro fnte and nfnte hypothess spaces. In partcular, last te, we proved bounds for the case of nconsstent hypotheses selected fro nfnte hypothess spaces. However, recall that each te we encountered the proble of an nfnte hypothess space, we had to resort to technques lke usng ghost saples or the VC-denson of a concept class. In ths lecture, we ntroduce a ore odern and elegant approach, usng a concept called Radeacher coplexty. Ths approach turns out to nclude each of the bounds we ve proved n the past few lectures as specal cases. 1 Defnton of Radeacher Coplexty 1.1 Soe usual defntons Before gettng nto the defnton of Radeacher coplexty, we rend ourselves of the usual setup: Let the saple S = ((x 1, y 1 ),..., (x, y )) where, unlke before, y = 1, +1} Let the hypothess h : X 1, +1} To easure how well h fts S, let the tranng error err(h) ˆ = 1 =1 1 h(x ) y Note that, snce we are usng y = 1, +1} nstead of y = 0, 1} as n prevous lectures (for splcty), we can provde an alternatve defnton of tranng error: err(h) ˆ = 1 1h(x ) y } (1) =1 1 f (h(x ), y ) = (1, 1) or ( 1, 1) = 1 0 f (h(x =1 ), y ) = (1, 1) or ( 1, 1) (2) = 1 1 y h(x ) 2 =1 (3) = y h(x ) 2 (4) =1 The ter 1 =1 y h(x ) can be nterpreted as the correlaton of the predctons h(x ) wth the labels y. We see that correlaton s related to tranng error as correlaton = 1 2err(h). ˆ To fnd a hypothess h that nzes tranng error, we can thus equvalently seek to fnd the h satsfyng: 1 arg ax y h(x ) (5) h H

2 1.2 Playng wth correlaton Iagne, now, an experent where we replace a saple s true labels y wth the Radeacher rando varables σ : +1 wth prob. 1/2 σ = (6) 1 wth prob. 1/2 Ths gves a odfed expresson for correlaton: arg ax h H 1 σ h(x ) (7) Instead of selectng the hypothess n H that correlates best wth the labels, ths now selects the hypothess h n H that correlates best wth the rando nose varables σ. Snce h s dependent on the rando varables σ, however, to easure how well H can correlate wth rando nose, we take the expectaton of ths correlaton over the rando varables σ and fnd: E σ [ax h H 1 σ h(x )] (8) Ths ntutvely easures the expressveness of H. We can bound ths expresson usng two extree cases: H = 1 where we only have one choce for a hypothess, and H = 2 where H shatters S. In the frst case, our expectaton equals 0 snce the ax ter dsappears; n the second case our expectaton equals 1 snce there always exsts a hypothess atchng any set of σ s. Thus our easure, as defned above, ust fall between 0 and Generalzng correlaton Instead of workng wth hypotheses h : X 1, +1}, let s generalze our class of functons to the set of all real-valued functons. Replace H wth F, whch we defne to be any faly of functons f : Z R. Now, gven saple S = (z 1,..., z ) wth z Z, f we apply our expresson fro above to F, we arrve at the eprcal Radeacher coplexty of a faly of functons F wth respect to a saple S: 1 1 ˆR S (F) := E σ [sup σ f(z )] (9) Agan, ths expresson easures how well, on average, the functon class F correlates wth rando nose over the saple S. However, often we want to easure the correlaton of F wth respect to a dstrbuton D over X, rather than wth respect to a saple S over X. To fnd ths, we take the expectaton of ˆR S (F) over all saples of sze drawn accordng to D: R (F) := E[ ˆR S (F)] (10) Ths s the Radeacher coplexty, or for clarty, the expected Radeacher coplexty, of F. We now have the defntons we need, and are fnally ready to present our frst generalzaton bounds based on Radeacher coplexty. 1 Note: Snce F can be the faly of all real-valued functons, ax ay not exst. Thus we use sup nstead, whch s defned as the least upper bound on the eleents n a set. For exaple, the sup of the set.9,.99,.999,...} s 1. 2

3 2 Generalzaton bounds based on Radeacher coplexty 2.1 Bounds for general functon classes F The followng theore wll serve as a very general tool for provng unfor convergence bounds va the concept of Radeacher coplexty: Theore 1. Let F be a faly of functons appng fro Z to [0, 1], and let saple S = (z 1,..., z ) where z D for soe dstrbuton D over Z. Defne E[f] := E Z D [f(z)], and defne ÊS[f] := 1 =1 f(z ). Wth probablty 1 δ, for all f F: 2 ( ) E[f] ÊS[f] + 2R (F) + O ( ) E[f] ÊS[f] + 2 ˆR S (F) + O Proof. We derve a bound for E[f] ÊS[f] for all f F, or equvalently, bound sup (E[f] Ê S [f]). Note that ths expresson s a rando varable that depends on S. So we want to bound the followng rando varable: (11) (12) Φ(S) = sup(e[f] ÊS[f]) (13) Step 1: We show, wth probablty 1 δ, Φ(S) E S [Φ(S)] + 2. Ths step allows us to go fro workng wth Φ(S) to workng wth E S [Φ(S)]. then: Recall that McDard s nequalty states that, f: f(x 1,..., x,..., x ) f(x 1,..., x,..., x ) c (14) P r[f(x 1,..., x ) E[f(X 1,..., X )] + ɛ] exp( 2ɛ 2 / Fro the defnton of Φ(S), we have: c 2 ) (15) Φ(S) = sup(e[f] ÊS[f]) (16) = sup (E[f] 1 f(z )) (17) Snce f(z ) [0, 1] for all z, changng any one exaple z to z n the tranng set S wll change 1 f(z ) by at ost 1. Thus ths changng of any one exaple affects Φ(S) by at ost ths aount, plyng that Φ((z 1,..., z,..., z )) Φ((z 1,..., z,..., z )) 1. Ths fts the condton of McDard s nequalty (see (14)) wth c = 1, so we can apply McDard s nequalty and arrve at the bound shown. 2 Note that the Bg-Oh ters n the two expressons have dfferent constants. =1 3

4 Step 2: Defne a ghost saple S = (z 1,..., z ), z D. We show that E S [Φ(S)] E S,S [sup (ÊS [f] ÊS[f])]: E S [Φ(S)] = E S [sup(e[f] ÊS[f])] (18) = E S [sup(e S [ÊS [f]] ÊS[f])] (19) = E S [sup(e S [ÊS [f] ÊS[f]])] (20) E S,S [sup(ês [f] ÊS[f])] (21) Note that we arrve at (19) snce the expected Radeacher coplexty E[f] s equal to the expectaton over all saples S of the eprcal Radeacher coplexty over those S, or E S [ÊS [f]]. We also arrve at (21) by ovng the expectaton over S n (20) outsde of the sup; ths can be done snce the expectaton of a ax over soe functon s at least the ax of that expectaton over that functon. Step 3: We show E S,S [sup (ÊS [f] ÊS[f])] = E S,S,σ[sup σ (f(z ) f(z ))] We use the ghost saplng technque for ths step. In partcular, for each par of eleents z, z n S, S respectvely, swap the two wth probablty 1/2. Let the resultng two sets of exaples be T, T. Snce S, S each ntally represented d saples fro D, we have that T, T S, S. Ths ples: Ê S [f] ÊS[f] ÊT [f] ÊT [f] (22) = 1 f(z ) f(z ) wth prob. 1/2 f(z ) f(z (23) ) wth prob. 1/2 = 1 σ (f(z ) f(z )) (24) Thus the expressons sup (ÊS [f] ÊS[f]) and sup σ (f(z ) f(z )) are equally dstrbuted. The latter depends on an addtonal set of rando varables σ, however, so we ust take the expectaton of the latter over σ as well as S, S. Takng the expectaton of the forer over S, S, as well, we arrve at the expresson shown. Step 4: We show E S,S,σ[sup σ (f(z ) f(z ))] 2R (F) E S,S,σ[sup σ (f(z ) f(z ))] E S,S,σ[sup σ f(z ) + sup ( σ )f(z ))] (25) E S,σ[sup σ f(z )] + E S,σ [sup ( σ )f(z ))] (26) = R (F) + R (F) (27) where we arrve at (27) because σ has the sae dstrbuton as σ. Concluson: Cobnng all the peces together, we fnally have that, wth probablty 1 δ, for all f F: E[f] ÊS[f] 2R (F) + (28) 2 4

5 To derve the bound nvolvng ˆR S (F), we use McDard s nequalty agan. Recall the defnton of ˆR 1 S (F) := E σ [sup σ f(z )]. Snce f [0, 1], changng one eleent n S changes ˆR S (F) by at ost 1. We can apply McDard s nequalty agan, fndng, wth probablty 1 δ: ˆR S (F) R (F) + (29) 2 Usng a δ = δ/2 and applyng the unon bound to (28) and (29), we have our result. Wth probablty 1 δ, for all f F: E[f] ÊS[f] + 2 ˆR S (F) + O( ) (30) 2.2 Bounds for hypothess spaces H To get fro ths generalzaton bound on classes of all real-valued functons to classes of hypotheses, defne the followng: Z = X 1, +1} (31) f h (x, y) = 1h(x) y} (32) F H = f h : h H} (33) Note that, due to (33), each f h F H corresponds to soe h H. Also note that, by these defntons, we have: err(h) = E (x,y) D [1h(x) y}] = E[f h ] (34) err(h) ˆ = 1 1h(x ) y } = h ] (35) Evdently we can use our bound fro Theore 1 to bound err(h) err(h): ˆ 1 ˆR S (F H ) = E σ [ sup σ f h (x, y )] (36) f h F H 1 = E σ [sup σ ( 1 y h(x ) )] (37) h H 2 = E σ [ 1 1 σ + sup ( y σ )h(x )] (38) 2 h H 2 = 1 2 E 1 σ[sup h H = 1 2 E 1 σ[sup h H ( y σ )h(x )] (39) σ h(x )] (40) = 1 2 ˆR S (H) (41) Note that we arrve at (40) snce ( y σ ) has the sae dstrbuton as σ. Now, cobnng (30), (34), (35), and (41), we have: err(h) err(h) ˆ + ˆR S (H) + O( ) (42) 5

### COS 511: Theoretical Machine Learning

COS 5: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #0 Scrbe: José Sões Ferrera March 06, 203 In the last lecture the concept of Radeacher coplexty was ntroduced, wth the goal of showng that

### Excess Error, Approximation Error, and Estimation Error

E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

### Computational and Statistical Learning theory Assignment 4

Coputatonal and Statstcal Learnng theory Assgnent 4 Due: March 2nd Eal solutons to : karthk at ttc dot edu Notatons/Defntons Recall the defnton of saple based Radeacher coplexty : [ ] R S F) := E ɛ {±}

### Learning Theory: Lecture Notes

Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

### 1 Review From Last Time

COS 5: Foundatons of Machne Learnng Rob Schapre Lecture #8 Scrbe: Monrul I Sharf Aprl 0, 2003 Revew Fro Last Te Last te, we were talkng about how to odel dstrbutons, and we had ths setup: Gven - exaples

### Vapnik-Chervonenkis theory

Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

### Generalized Linear Methods

Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

### Lecture Notes on Linear Regression

Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

### System in Weibull Distribution

Internatonal Matheatcal Foru 4 9 no. 9 94-95 Relablty Equvalence Factors of a Seres-Parallel Syste n Webull Dstrbuton M. A. El-Dacese Matheatcs Departent Faculty of Scence Tanta Unversty Tanta Egypt eldacese@yahoo.co

### ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

### Polynomials. 1 More properties of polynomials

Polynomals 1 More propertes of polynomals Recall that, for R a commutatve rng wth unty (as wth all rngs n ths course unless otherwse noted), we defne R[x] to be the set of expressons n =0 a x, where a

### Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

### Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

### Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

### Math 426: Probability MWF 1pm, Gasson 310 Homework 4 Selected Solutions

Exercses from Ross, 3, : Math 26: Probablty MWF pm, Gasson 30 Homework Selected Solutons 3, p. 05 Problems 76, 86 3, p. 06 Theoretcal exercses 3, 6, p. 63 Problems 5, 0, 20, p. 69 Theoretcal exercses 2,

### 1 Generalization bounds based on Rademacher complexity

COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

### Finding Dense Subgraphs in G(n, 1/2)

Fndng Dense Subgraphs n Gn, 1/ Atsh Das Sarma 1, Amt Deshpande, and Rav Kannan 1 Georga Insttute of Technology,atsh@cc.gatech.edu Mcrosoft Research-Bangalore,amtdesh,annan@mcrosoft.com Abstract. Fndng

### princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

### COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.

### Designing Fuzzy Time Series Model Using Generalized Wang s Method and Its application to Forecasting Interest Rate of Bank Indonesia Certificate

The Frst Internatonal Senar on Scence and Technology, Islac Unversty of Indonesa, 4-5 January 009. Desgnng Fuzzy Te Seres odel Usng Generalzed Wang s ethod and Its applcaton to Forecastng Interest Rate

### 1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

### 1 Matrix representations of canonical matrices

1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

### Lecture 3. Ax x i a i. i i

18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

### 10-701/ Machine Learning, Fall 2005 Homework 3

10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

### The Parity of the Number of Irreducible Factors for Some Pentanomials

The Party of the Nuber of Irreducble Factors for Soe Pentanoals Wolfra Koepf 1, Ryul K 1 Departent of Matheatcs Unversty of Kassel, Kassel, F. R. Gerany Faculty of Matheatcs and Mechancs K Il Sung Unversty,

### 3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

### Canonical transformations

Canoncal transformatons November 23, 2014 Recall that we have defned a symplectc transformaton to be any lnear transformaton M A B leavng the symplectc form nvarant, Ω AB M A CM B DΩ CD Coordnate transformatons,

### Lecture 4: Universal Hash Functions/Streaming Cont d

CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

### AN ANALYSIS OF A FRACTAL KINETICS CURVE OF SAVAGEAU

AN ANALYI OF A FRACTAL KINETIC CURE OF AAGEAU by John Maloney and Jack Hedel Departent of Matheatcs Unversty of Nebraska at Oaha Oaha, Nebraska 688 Eal addresses: aloney@unoaha.edu, jhedel@unoaha.edu Runnng

### Supplementary material: Margin based PU Learning. Matrix Concentration Inequalities

Supplementary materal: Margn based PU Learnng We gve the complete proofs of Theorem and n Secton We frst ntroduce the well-known concentraton nequalty, so the covarance estmator can be bounded Then we

### P exp(tx) = 1 + t 2k M 2k. k N

1. Subgaussan tals Defnton. Say that a random varable X has a subgaussan dstrbuton wth scale factor σ< f P exp(tx) exp(σ 2 t 2 /2) for all real t. For example, f X s dstrbuted N(,σ 2 ) then t s subgaussan.

### Lecture 3: Probability Distributions

Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

### 1 The Mistake Bound Model

5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

### 2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

### The Order Relation and Trace Inequalities for. Hermitian Operators

Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

### XII.3 The EM (Expectation-Maximization) Algorithm

XII.3 The EM (Expectaton-Maxzaton) Algorth Toshnor Munaata 3/7/06 The EM algorth s a technque to deal wth varous types of ncoplete data or hdden varables. It can be appled to a wde range of learnng probles

### Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Salmon: Lectures on partal dfferental equatons 5. Classfcaton of second-order equatons There are general methods for classfyng hgher-order partal dfferental equatons. One s very general (applyng even to

### Our focus will be on linear systems. A system is linear if it obeys the principle of superposition and homogenity, i.e.

SSTEM MODELLIN In order to solve a control syste proble, the descrptons of the syste and ts coponents ust be put nto a for sutable for analyss and evaluaton. The followng ethods can be used to odel physcal

### Week 2. This week, we covered operations on sets and cardinality.

Week 2 Ths week, we covered operatons on sets and cardnalty. Defnton 0.1 (Correspondence). A correspondence between two sets A and B s a set S contaned n A B = {(a, b) a A, b B}. A correspondence from

### Lecture 12: Discrete Laplacian

Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

### CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

### MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

### Expected Value and Variance

MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

### BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS. Dariusz Biskup

BAYESIAN CURVE FITTING USING PIECEWISE POLYNOMIALS Darusz Bskup 1. Introducton The paper presents a nonparaetrc procedure for estaton of an unknown functon f n the regresson odel y = f x + ε = N. (1) (

### First day August 1, Problems and Solutions

FOURTH INTERNATIONAL COMPETITION FOR UNIVERSITY STUDENTS IN MATHEMATICS July 30 August 4, 997, Plovdv, BULGARIA Frst day August, 997 Problems and Solutons Problem. Let {ε n } n= be a sequence of postve

### Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

### ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING

ESE 5 ITERATIVE ESTIMATION PROCEDURE FOR GEOSTATISTICAL REGRESSION AND GEOSTATISTICAL KRIGING Gven a geostatstcal regresson odel: k Y () s x () s () s x () s () s, s R wth () unknown () E[ ( s)], s R ()

### SELECTED PROOFS. DeMorgan s formulas: The first one is clear from Venn diagram, or the following truth table:

SELECTED PROOFS DeMorgan s formulas: The frst one s clear from Venn dagram, or the followng truth table: A B A B A B Ā B Ā B T T T F F F F T F T F F T F F T T F T F F F F F T T T T The second one can be

### Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

### 1 Convex Optimization

Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

### Week 5: Neural Networks

Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

### Foundations of Arithmetic

Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

### arxiv: v2 [math.co] 3 Sep 2017

On the Approxate Asyptotc Statstcal Independence of the Peranents of 0- Matrces arxv:705.0868v2 ath.co 3 Sep 207 Paul Federbush Departent of Matheatcs Unversty of Mchgan Ann Arbor, MI, 4809-043 Septeber

### LECTURE :FACTOR ANALYSIS

LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If

### = z 20 z n. (k 20) + 4 z k = 4

Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

### 3.1 ML and Empirical Distribution

67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum

### For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

### Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of

### 18.1 Introduction and Recap

CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

### Quantum Particle Motion in Physical Space

Adv. Studes Theor. Phys., Vol. 8, 014, no. 1, 7-34 HIKARI Ltd, www.-hkar.co http://dx.do.org/10.1988/astp.014.311136 Quantu Partcle Moton n Physcal Space A. Yu. Saarn Dept. of Physcs, Saara State Techncal

### COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

### The Feynman path integral

The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

### Ensemble Methods: Boosting

Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

### CS 229, Public Course Problem Set #3 Solutions: Learning Theory and Unsupervised Learning

CS9 Problem Set #3 Solutons CS 9, Publc Course Problem Set #3 Solutons: Learnng Theory and Unsupervsed Learnng. Unform convergence and Model Selecton In ths problem, we wll prove a bound on the error of

### MATH 5707 HOMEWORK 4 SOLUTIONS 2. 2 i 2p i E(X i ) + E(Xi 2 ) ä i=1. i=1

MATH 5707 HOMEWORK 4 SOLUTIONS CİHAN BAHRAN 1. Let v 1,..., v n R m, all lengths v are not larger than 1. Let p 1,..., p n [0, 1] be arbtrary and set w = p 1 v 1 + + p n v n. Then there exst ε 1,..., ε

### Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be

### The Second Anti-Mathima on Game Theory

The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

### MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

### Introductory Cardinality Theory Alan Kaylor Cline

Introductory Cardnalty Theory lan Kaylor Clne lthough by name the theory of set cardnalty may seem to be an offshoot of combnatorcs, the central nterest s actually nfnte sets. Combnatorcs deals wth fnte

### 4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

### Least Squares Fitting of Data

Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2014. All Rghts Reserved. Created: July 15, 1999 Last Modfed: February 9, 2008 Contents 1 Lnear Fttng

### Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

### Notes on Frequency Estimation in Data Streams

Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

### A. Proofs for learning guarantees

Leanng Theoy and Algoths fo Revenue Optzaton n Second-Pce Auctons wth Reseve A. Poofs fo leanng guaantees A.. Revenue foula The sple expesson of the expected evenue (2) can be obtaned as follows: E b Revenue(,

### Estimation: Part 2. Chapter GREG estimation

Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

### Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

ECONOMICS 35* -- NOTE 7 ECON 35* -- NOTE 7 Interval Estmaton n the Classcal Normal Lnear Regresson Model Ths note outlnes the basc elements of nterval estmaton n the Classcal Normal Lnear Regresson Model

### / n ) are compared. The logic is: if the two

STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

### Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

### Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,

### Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

### COMP th April, 2007 Clement Pang

COMP 540 12 th Aprl, 2007 Cleent Pang Boostng Cobnng weak classers Fts an Addtve Model Is essentally Forward Stagewse Addtve Modelng wth Exponental Loss Loss Functons Classcaton: Msclasscaton, Exponental,

### Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

### COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

### NP-Completeness : Proofs

NP-Completeness : Proofs Proof Methods A method to show a decson problem Π NP-complete s as follows. (1) Show Π NP. (2) Choose an NP-complete problem Π. (3) Show Π Π. A method to show an optmzaton problem

### Boostrapaggregating (Bagging)

Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

### Least Squares Fitting of Data

Least Squares Fttng of Data Davd Eberly Geoetrc Tools, LLC http://www.geoetrctools.co/ Copyrght c 1998-2015. All Rghts Reserved. Created: July 15, 1999 Last Modfed: January 5, 2015 Contents 1 Lnear Fttng

### Xiangwen Li. March 8th and March 13th, 2001

CS49I Approxaton Algorths The Vertex-Cover Proble Lecture Notes Xangwen L March 8th and March 3th, 00 Absolute Approxaton Gven an optzaton proble P, an algorth A s an approxaton algorth for P f, for an

Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

### Lecture 4 Hypothesis Testing

Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to

### Another converse of Jensen s inequality

Another converse of Jensen s nequalty Slavko Smc Abstract. We gve the best possble global bounds for a form of dscrete Jensen s nequalty. By some examples ts frutfulness s shown. 1. Introducton Throughout

### Some basic inequalities. Definition. Let V be a vector space over the complex numbers. An inner product is given by a function, V V C

Some basc nequaltes Defnton. Let V be a vector space over the complex numbers. An nner product s gven by a functon, V V C (x, y) x, y satsfyng the followng propertes (for all x V, y V and c C) (1) x +

### Feature Selection: Part 1

CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

### APPENDIX A Some Linear Algebra

APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

### Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

### On the number of regions in an m-dimensional space cut by n hyperplanes

6 On the nuber of regons n an -densonal space cut by n hyperplanes Chungwu Ho and Seth Zeran Abstract In ths note we provde a unfor approach for the nuber of bounded regons cut by n hyperplanes n general

### More metrics on cartesian products

More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

### Statistics II Final Exam 26/6/18

Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

### Lecture 3 January 31, 2017

CS 224: Advanced Algorthms Sprng 207 Prof. Jelan Nelson Lecture 3 January 3, 207 Scrbe: Saketh Rama Overvew In the last lecture we covered Y-fast tres and Fuson Trees. In ths lecture we start our dscusson