Questions and answers, kernel part

Size: px
Start display at page:

Download "Questions and answers, kernel part"

Transcription

1 Questios ad aswers, kerel part October 8, 205 Questios. Questio : properties of kerels, PCA, represeter theorem. [2 poits] Let F be a RK defied o some domai X, with feature map φ(x) x X ad reproducig kerel k(x, x ) = φ(x), φ(x ) F. Recall the reproducig property: f( ) F, f( ), φ(x) F = f( ), k(x, ) F = f(x). () (we will equivaletly use the shorthad f F). Give f takes the form f( ) = a i k(x i, ), show that f( ) 2 F = j= a i k(x i, x j )a j. 2. [3 poits] Show that for a fuctio f F, max x X f(x) < whe the kerel is bouded, k(x, x ) K < x, x X. You will eed Cauchy-Schwarz, f, f 2 F f F f 2 F, f, f 2 F, ad the kowledge that f F < sice otherwise f would ot be i F. 3. [5 poits] Defie the empirical feature space covariace (igore ceterig) as Ĉ XX := φ(x i ) φ(x i ) where (f f 2 ) f 3 = f 2, f 3 F f, f, f 2, f 3 F. The eigefuctios of C are fλ = Ĉf.

2 Assumig f( ) = α ik(x i, ), show that α R is give by the solutios to λα = Kα, K ij = k(x i, x j ), assumig K ivertible. 4. [5 poits] We have a set of paired observatios (x, y ),... (x, y ) (regressio or classificatio). We are give the learig problem where f = arg mi J(f), (2) f F ( ) J(f) = L y (f(x ),..., f(x )) + Ω f 2 F, the loss L depeds o x i oly via f(x i ), Ω is o-decreasig, ad y is the vector of y i. Prove that a solutio takes the form f = α i k(x i, ) (this is the represeter theorem). 5. [5 poits] A symmetric fuctio k : X X R is positive defiite if, (a,... a ) R, (x,..., x ) X, a i a j k(x i, x j ) 0, (3) j= ad strictly positive defiite if the equality to zero holds oly whe a i = 0 i {,..., }. We cosider the case where the positive defiiteess is ot strict. I this case, there exists some set of weights {a i } ad correspodig poits {x i } such that a i a j k(x i, x j ) = 0. Show that the fuctio j= f(x + ) = a i k(x i, x + ) = 0 at every poit x + X. This is a powerful result: it shows that f H = 0 = f(x) = 0 x X. Hits: sice k is positive defiite, it remais true that + + a i a j k(x i, x j ) 0. j= Fid the coditio o a + to esure this holds for every possible x +. Check whether this coditio ca still be eforced whe f(x + ) = a i k(x i, x + ) 0. 2

3 .2 Questio 2: covariace, depedece. [3 poits] Let F be a reproducig kerel Hilbert space defied o a domai X, ad G be reproducig kerel Hilbert space defied o a domai Y. The RK F has kerel k(x, x ) ad feature map φ(x), ad G has kerel l(y, y ) ad feature map ψ(y). Give the radom variables X P x o X ad Y P y o Y, we defie µ X F ad µ Y G to be mea embeddigs satisfyig µ X, f F = E X f(x) f F, ad i particular µ Y, g G = E Y g(y ) g G, µ X, φ(x) F = µ X, k(x, ) F = E X k(x, X), (4) ad i particular µ Y, ψ(y) G = µ Y, l(y, ) G = E Y l(y, Y ). (5) The Hilbert-Schmidt operators mappig from G to F form a Hilbert space, writte (G, F). Defie the tesor product f g (G, F) such that Show that (f g) h = g, h G f. (7) µ X µ Y 2 = E XX k(x, X )E Y Y l(y, Y ), (8) where X has distributio P x ad is idepedet of X, ad Y has distributio P y ad is idepedet of Y. You may use without proof that A, f g = f, Ag F, (9) where A (G, F). Please referece the umbers of the above equatios as you use them i your proof. 2. [4 poits] Give a probability distributio P xy over the pair of radom variables (X, Y ) with respective margial distributios P x ad P y, the ucetered covariace operator C XY is a elemet of (G, F) defied such that CXY, A = E XY φ(x) ψ(y ), A. (0) The Hilbert-Schmidt Idepedece Criterio is defied i terms of kerels as IC 2 (F, G, P xy ) = C XY µ X µ Y 2. The ier product is L, M = j J Lf j, Mf j F, (6) idepedet of the choice of orthoormal basis {f j } of G, however you do t eed to use this iformatio to aswer the questio. 3

4 Prove that the populatio expressio for IC i terms of expectatios of kerels takes the form IC 2 (F, G, P xy ) = E XY E X Y [k(x, X )l(y, Y )] + E XX k(x, X )E Y Y l(y, Y ) 2E XY [E X k(x, X )E Y l(y, Y )], where the pair (X, Y ) has distributio P xy ad is idepedet of (X, Y ). You will eed eq. (8) from the previous sectio. 3. [2 poit] Show that at idepedece, i.e., whe P xy = P x P y, the IC 2 (F, G, P xy ) = [2 poits] Give a sample z := {(x, y ),..., (x, y )} draw i.i.d. from P xy, write a ubiased empirical estimate of C XY [5 poits] Derive a biased estimate of C XY 2 by computig Ĉ XY 2 Ĉ XY := φ(x i ) ψ(y i ). Derive a expressio for the bias i the latter expressio, i.e., the expected differece betwee this estimate ad the ubiased estimate, i terms of expectatios of kerel fuctios. What happes to the bias as icreases? 6. [4 poits] Cosider a relatio betwee x ad y give as y i = x 2 i + ε i, where ε i N (0, σ 2 ) is Gaussia oise, ad x i U([, ]) is draw from the uiform distributio o [, ]. See Figure for a illustratio of pairs (x i, y i ) draw i.i.d. accordig to this relatio. What is the populatio IC whe both k ad l are liear, i.e. k(x i, x j ) = x i x j ad l(y i, y j ) = y i y j. No proof is eeded, a descriptio of your reasos is sufficiet. Next, defie the maximum sigular vectors f F ad g G of the cetered empirical covariace operator as arg max f F g G ) f, (ĈXY ˆµ X ˆµ Y g where ˆµ X ad ˆµ Y are the empirical estimates of the respective mea embeddigs. Sketch f ad g whe k(x i, x j ) = exp ( (x i x j ) 2 /γ ) is the RBF kerel, ad l(y i, y j ) is the liear kerel (ote: g ca oly be a straight lie i this case). Agai, o proof is eeded, oly a sketch of what you expect to see. F,,where 4

5 Y X Figure : Sample of relatio betwee x ad y. 5

6 .3 Questio 3: kerel rakig Rakig problem: we receive pairs {(x i, y i )}, where x i are the objects to be raked, ad y i {, 2,..., M} are the associated raks. M is the highest rak, is the lowest rak; two poits ca have a equal rak, i which case y i = y j ; we also assume M <, ad that at least oe example is see for every allowable y value. We represet the iput poits i terms of feature maps φ(x i ) to a reproducig kerel Hilbert space H with kerel k(x, x ). We set up the followig optimizatio problem: mi w 2 w H, ξ u,ξ l R,b R M+ H + C (ξi l + ξi u ), () subject to w, φ(x i ) H b yi + ξ l i (2) w, φ(x i ) H b yi + ξ u i (3) ξ u i, ξ l i 0, where {b y } M y=0 are parameters of the algorithm which must be leared, ad C > 0 is a user-defied costat.. (4 poits) Sketch a figure describig what the above optimizatio problem is doig. 2. (7 poits) Write the Lagragia for the kerel rakig problem. State the KKT coditios as they apply to the problem (you are give that strog duality holds - please defie the meaig of strog duality). You may use d dw w 2 H = 2w, d dw w, φ(x i) H = φ(x i ). 3. (5 poits) Show that the Lagrage dual fuctio for this optimizatio problem takes the form g(α u, α l ) = 4 j= (αi u αi)(α l j u αj)k(x l i, x j ). Hit: from the previous part, you should have a form for w that looks like w = 2 m (αi u αi)φ(x l i ). 4. (4 poits) What do the KKT coditios imply about the allowable rage of α i? Describe where poits with α u i = 0, α u i = C, ad α u i (0, C) are situated. Please provide proofs to justify your aswers. You do ot eed to provide a accompayig figure (although you are welcome to do so if you fid this makes thigs easier to explai). 6

7 2 Aswers 2. Questio. The orm is writte f( ) 2 F = f( ), f( ) F = a i k(x i, ), a i k(x i, ) = = a i a j k(x i, ), k(x j, ) F a i a j k(x i, x j ), where the reproducig property is used i the fial lie. 2. The proof is: max f(x) = max f, φ(x) x X x X F max f F φ(x) F x X = f F max x X f F K <. φ(x), φ(x) F 3. First substitutig i the covariace o the R, we have fλ = Ĉf ( ) = φ(x i ) φ(x i ) f = φ(x i ) φ(x i ), α j φ(x j ) j= = φ(x i ) α j k(x i, x j ) j= ow project both sides oto all of the φ(x q ): F F φ(x q ), L F = λ φ(x q ), f F = λ α i k(x q, x i ) Writig this as a matrix equatio, q {... }. λkα = K 2 α or λα = Kα. 7

8 4. Deote by f s the projectio of f oto the subspace such that spa {k(x i, ) : i }, (4) f = f s + f, where f s = α ik(x i, ). Regularizer: f 2 F = f s 2 F + f 2 F f s 2 F, so ( ) ( ) Ω f 2 F Ω f s 2 F, ad this term is miimized for f = f s. Idividual terms f(x i ) i the loss: f(x i ) = f, k(x i, ) F = f s + f, k(x i, ) F = f s, k(x i, ) F, so Hece L y (f(x ),..., f(x )) = L y (f s (x ),..., f s (x )). Loss L(...) oly depeds o the compoet of f i the data subspace, Regularizer Ω(...) miimized whe f = f s. Note: If Ω is strictly o-decreasig, the f F = 0 is required at the miimum. If Ω strictly icreasig, mi. is uique. 5. For k idetically zero, the statemet holds trivially. Assume that k is ot idetically zero. We expad out a i a j k(x i, x j ) = j= j= a i a j k(x i, x j ) + 2a + a i k(x i, x + ) + a 2 +k 2 (x +, x + ). }{{} :=c } {{ } =0 } {{ } :=b The miimum of the above expressio occurs whe a + = b/c (kowig k is ot idetically zero). For the expressio to be o-egative at this miimum, 0 c b2 c 2 2bb c = b2 c. However c > 0 so the oly possibility is b = 0, i.e. a i k(x i, x + ) = 0 x + X. 8

9 2.2 Questio 2. The proof is: µ X µ Y, µ X µ Y (a) = µ X, (µ X µ Y ) µ Y F (b) = µ X, µ X F µ Y, µ Y G (c) = E X µ X (X)E Y µ Y (Y ) (d) = E X µ X, k(x, ) E Y µ Y, l(y, ) (c) = E XX k(x, X )E Y Y l(y, Y ), where i step (a) we apply (9), i step (b) we apply (7), ad i the two steps (c) we apply (4) ad (5). Step (d) is the reproducig property. 2. We begi with the expasio IC 2 (F, G, P xy ) = C XY µ X µ Y 2 = CXY, C XY + µ X µ Y, µ X µ Y 2 CXY, µ X µ Y (5) There are three terms i the expasio of (5). To write the first i terms of kerels, we apply (9) ad the (0) twice, deotig by (X, Y ) a idepedet copy of the pair of variables (X, Y ), CXY, C XY C XY 2 = ad for the cross-terms, CXY, µ X µ Y = E X,Y φ(x) ψ(y ), C XY = E X,Y E X,Y φ(x) ψ(y ), φ(x ) ψ(y ) = E X,Y E X,Y φ(x), [φ(x ) ψ(y )]ψ(y ) F = E X,Y E X,Y [ φ(x), φ(x ) F ψ(y ), ψ(y ) G ] = E X,Y E X,Y [k(x, X )l(y, Y )]. (6) The fial part was proved previously. = E X,Y φ(x) ψ(y ), µ X µ Y = E X,Y ( φ(x), µx F φ(y ), µ Y G ) = E X,Y [E X k(x, X )E Y l(y, Y )]. 3. At idepedece, the expectatios o the pair (X, Y ) factorize as products of expectatios o X ad Y, hece IC 2 (F, G, P xy ) = E XX k(x, X )E Y Y l(y, Y ) + E XX k(x, X )E Y Y l(y, Y ) = 0. 2E XX k(x, X )E Y Y l(y, Y ). 9

10 4. A ubiased estimate of A := C XY 2 is  := ( ) j i k ij l ij, where we use the shorthad k ij = k(x i, x j ). Note that E(Â) = E X,Y E X,Y k(x, X )l(y, Y ) = C XY 2 from eq. (6). 5. The biased estimate of A := C XY 2 is  b := ĈXY 2 = φ(x i ) ψ(y i ), = 2 j= φ(x i ) ψ(y i ) k ij l ij = 2 tr(kl). The differece betwee the biased ad ubiased estimates is  b  = 2 i,j= k ij l ij = 2 k ii l ii + = ( ) j i k ij l ij ( ) 2 ( ) k ii l ii ( ) j i j i k ij l ij, thus the expectatio of this differece (i.e., the bias) is k ij l ij ) E (Âb  = (E XY [k(x, X)l(Y, Y )] E X,Y E X,Y [k(x, X )l(y, Y )]), ad is therefore O( ). 6. Whe both kerels are liear the populatio IC will be zero, as there is o pair of fuctios i these fuctio classes which ca trasform the variables to have a high liear covariace. Whe k is a RBF kerel, ad l is a liear kerel, we expect the mappigs i Figure (2). 2.3 Questio 3 This is a mior modificatio of the rakig algorithm i [, Sectio 8..].. Sketch is i Figure 3. The algorithm sets the thresholds {b j } m j= such that w, φ(x j ) is beeath the threshold b yi by a margi / w, but above the 0

11 Depedece witess, X f(x) x 0.2 Correlatio: 0.94 COCO: Depedece witess, Y Y g(y) g(y) X y f(x) Figure 2: Maximum sigular vectors of covariace operator. Left plot is origial poit cloud, ceter plot cotais both mappigs, right plot cotais mapped variables. threshold b yi by a margi / w. Some poits are allowed withi the margis, however these attract a pealty of ξi l or ξu i, respectively (the sum of these pealties costitutes the loss). The parameter C trades off the margi size with the loss. 2. Strog duality meas that the maximum of the dual fuctio coicides with the miimum of the primal fuctio subject to the problem costraits. Recall the optimizatio problem: mi w 2 w H, ξ u,ξ l R,b R M+ H + C (ξi l + ξi u ), (7) subject to w, φ(x i ) H b yi + ξ l i (8) w, φ(x i ) H b yi + ξ u i (9) ξ u i, ξ l i 0.

12 (x ) w b 3 (x 2 ) b 2 b (x 3 ) b 0 Figure 3: Sketch of rakig algorithm 2

13 The Lagragia is: L := w 2 H + C + + (ξi l + ξi u ) (ηiξ l i l + ηi u ξi u ) αi( w, l φ(x i ) H b yi + ξi) l αi u ( w, φ(x i ) H + b yi + ξi u ). The KKT coditios: kowig strog duality holds ad usig geeral otatio miimize f 0 (x) subject to f i (x) 0 i =,..., m (20) for covex f 0,..., f m, the KKT coditios are f 0 (x) + f i (x) 0, i =,..., m λ i 0, i =,..., m λ i f i (x) = 0, i =,..., m (2) m λ i f i (x) = 0. These are ecessary ad sufficiet for optimality uder strog duality. The coditio λ i f i = 0 traslates to 0 = ηiξ l i l 0 = ηi u ξi u 0 = αi( w, l φ(x i ) H b yi + ξi) l 0 = αi u ( w, φ(x i ) H + b yi + ξi u ). The dual variables satisfy αi, l αi u, ηi, l ηi u 0. Takig derivatives wrt the primal parameters ad settig to zero gives the 3

14 remaiig KKT coditios for this problem, L w = 2w + αiφ(x l i ) αi u φ(x i ) = 0 (22) L ξ l i L ξ u i L b y = L b 0 = L b M = = C α l i η l i = 0 (23) = C α u i η u i = 0 (24) i : y i=y i : y i : y i=m α l i + i : y i=y+ α u i = 0 y {,..., M } (25) α u i = 0 (26) α l i = 0, (27) where the fial set of equalities applies for each y {,..., M}. We iterpret (25) to state that b i is the upper threshold for poits with rak y i, ad the lower threshold for poits of rak y i We use the miimum Lagragia wrt the primal parameters, which we ca readily compute sice we have the poit at which the primal derivatives are zero. From (22), w = (αi u α l 2 i)φ(x i ). Substitutig the KKT coditios back ito the Lagragia, we get the Lagrage dual fuctio, g(α u, α l ) := m (αi u α 4 i)(α l j u αj)k(x l i, x j ) + C (ξi l + ξi u ) j= + αi l (αj u α 2 j)k(x l i, x j ) b yi + ξ l i j= + αi u (αj u α l 2 j)k(x i, x j ) + b yi + ξi u j= [ ( ) ξ l i C α l i + ξ u i (C αi u ) ] = 4 j= (αi u αi)(α l j u αj)k(x l i, x j ). To get the desired solutio, it must be maximized wrt α u i, αl i. 4

15 4. There are three cases: (a) Whe αi u = C, the from (24), ηi u = 0 for these poits, ad it is possible for ξi u > 0 from (2). Next, 0 = w, φ(x i ) H + b yi + ξi u w, φ(x i ) H = b yi + ξi u ad the projectio w, φ(x i ) H is above the threshold b yi by ξ u i (potetially withi the margi, or eve o the wrog side of the threshold for large eough ξ u i ). (b) Whe α u i = 0 the ηu i = C hece ξu i = 0, ad w, φ(x i ) H + b yi + ξ u i 0 w, φ(x i ) H + b yi, ad the poit is o or above the margi for the lower threshold. (c) Whe α u i (0, C) the ηu i 0, hece ξu i Refereces = 0. Moreover 0 = w, φ(x i ) H + b yi + ξ u i w, φ(x i ) H = b yi +. ad these poits are o the margi above the lower threshold b yi. [] J. Shawe-Taylor ad N. Cristiaii. Kerel Methods for Patter Aalysis. Cambridge Uiversity Press, Cambridge, UK,

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Linear Classifiers III

Linear Classifiers III Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Introduction to Optimization Techniques. How to Solve Equations

Introduction to Optimization Techniques. How to Solve Equations Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

1 Duality revisited. AM 221: Advanced Optimization Spring 2016 AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods Support Vector Machies ad Kerel Methods Daiel Khashabi Fall 202 Last Update: September 26, 206 Itroductio I Support Vector Machies the goal is to fid a separator betwee data which has the largest margi,

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions

Statistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet

More information

18.657: Mathematics of Machine Learning

18.657: Mathematics of Machine Learning 8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Complex Analysis Spring 2001 Homework I Solution

Complex Analysis Spring 2001 Homework I Solution Complex Aalysis Sprig 2001 Homework I Solutio 1. Coway, Chapter 1, sectio 3, problem 3. Describe the set of poits satisfyig the equatio z a z + a = 2c, where c > 0 ad a R. To begi, we see from the triagle

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get

More information

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition 6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.

More information

Section 4.3. Boolean functions

Section 4.3. Boolean functions Sectio 4.3. Boolea fuctios Let us take aother look at the simplest o-trivial Boolea algebra, ({0}), the power-set algebra based o a oe-elemet set, chose here as {0}. This has two elemets, the empty set,

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

A B = φ No conclusion. 2. (5) List the values of the sets below. Let A = {n 2 : n P n 5} = {1,4,9,16,25} and B = {n 4 : n P n 5} = {1,16,81,256,625}

A B = φ No conclusion. 2. (5) List the values of the sets below. Let A = {n 2 : n P n 5} = {1,4,9,16,25} and B = {n 4 : n P n 5} = {1,16,81,256,625} CPSC 070 Aswer Keys Test # October 1, 014 1. (a) (5) Defie (A B) to be those elemets i set A but ot i set B. Use set membership tables to determie what elemets are cotaied i (A (B A)). Use set membership

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

Supplemental Material: Proofs

Supplemental Material: Proofs Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

Math Solutions to homework 6

Math Solutions to homework 6 Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete

More information

Estimation of the Mean and the ACVF

Estimation of the Mean and the ACVF Chapter 5 Estimatio of the Mea ad the ACVF A statioary process {X t } is characterized by its mea ad its autocovariace fuctio γ ), ad so by the autocorrelatio fuctio ρ ) I this chapter we preset the estimators

More information

b i u x i U a i j u x i u x j

b i u x i U a i j u x i u x j M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here

More information

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)

NYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018) NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = = Review Problems ICME ad MS&E Refresher Course September 9, 0 Warm-up problems. For the followig matrices A = 0 B = C = AB = 0 fid all powers A,A 3,(which is A times A),... ad B,B 3,... ad C,C 3,... Solutio:

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let.

( ) (( ) ) ANSWERS TO EXERCISES IN APPENDIX B. Section B.1 VECTORS AND SETS. Exercise B.1-1: Convex sets. are convex, , hence. and. (a) Let. Joh Riley 8 Jue 03 ANSWERS TO EXERCISES IN APPENDIX B Sectio B VECTORS AND SETS Exercise B-: Covex sets (a) Let 0 x, x X, X, hece 0 x, x X ad 0 x, x X Sice X ad X are covex, x X ad x X The x X X, which

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

PAPER : IIT-JAM 2010

PAPER : IIT-JAM 2010 MATHEMATICS-MA (CODE A) Q.-Q.5: Oly oe optio is correct for each questio. Each questio carries (+6) marks for correct aswer ad ( ) marks for icorrect aswer.. Which of the followig coditios does NOT esure

More information

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Lecture 19. sup y 1,..., yn B d n

Lecture 19. sup y 1,..., yn B d n STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

1 1 2 = show that: over variables x and y. [2 marks] Write down necessary conditions involving first and second-order partial derivatives for ( x0, y

1 1 2 = show that: over variables x and y. [2 marks] Write down necessary conditions involving first and second-order partial derivatives for ( x0, y Questio (a) A square matrix A= A is called positive defiite if the quadratic form waw > 0 for every o-zero vector w [Note: Here (.) deotes the traspose of a matrix or a vector]. Let 0 A = 0 = show that:

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities

Ada Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15 17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

STAT Homework 1 - Solutions

STAT Homework 1 - Solutions STAT-36700 Homework 1 - Solutios Fall 018 September 11, 018 This cotais solutios for Homework 1. Please ote that we have icluded several additioal commets ad approaches to the problems to give you better

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS MIDTERM 3 CALCULUS MATH 300 FALL 08 Moday, December 3, 08 5:5 PM to 6:45 PM Name PRACTICE EXAM S Please aswer all of the questios, ad show your work. You must explai your aswers to get credit. You will

More information

TMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods

TMA4205 Numerical Linear Algebra. The Poisson problem in R 2 : diagonalization methods TMA4205 Numerical Liear Algebra The Poisso problem i R 2 : diagoalizatio methods September 3, 2007 c Eiar M Røquist Departmet of Mathematical Scieces NTNU, N-749 Trodheim, Norway All rights reserved A

More information

Hilbert Space and Least-squares Collocation

Hilbert Space and Least-squares Collocation Hilbert Space ad Least-squares Collocatio Lecture : Discrete, Mixed Boudary-Value Problem i Physical Geodesy Lecture : Hilbert Spaces, Reproducig Kerels, ad Fuctioals Lecture 3: Miimum Norm Solutio to

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb

Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb Olie Covex Optimizatio i the Badit Settig: Gradiet Descet Without a Gradiet -Aviash Atreya Feb 9 2011 Outlie Itroductio The Problem Example Backgroud Notatio Results Oe Poit Estimate Mai Theorem Extesios

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Monte Carlo Integration

Monte Carlo Integration Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer. 6 Itegers Modulo I Example 2.3(e), we have defied the cogruece of two itegers a,b with respect to a modulus. Let us recall that a b (mod ) meas a b. We have proved that cogruece is a equivalece relatio

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Mathematical Methods for Physics and Engineering

Mathematical Methods for Physics and Engineering Mathematical Methods for Physics ad Egieerig Lecture otes Sergei V. Shabaov Departmet of Mathematics, Uiversity of Florida, Gaiesville, FL 326 USA CHAPTER The theory of covergece. Numerical sequeces..

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

CLRM estimation Pietro Coretto Econometrics

CLRM estimation Pietro Coretto Econometrics Slide Set 4 CLRM estimatio Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Thursday 24 th Jauary, 2019 (h08:41) P. Coretto

More information

Math 113, Calculus II Winter 2007 Final Exam Solutions

Math 113, Calculus II Winter 2007 Final Exam Solutions Math, Calculus II Witer 7 Fial Exam Solutios (5 poits) Use the limit defiitio of the defiite itegral ad the sum formulas to compute x x + dx The check your aswer usig the Evaluatio Theorem Solutio: I this

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Differentiable Convex Functions

Differentiable Convex Functions Differetiable Covex Fuctios The followig picture motivates Theorem 11. f ( x) f ( x) f '( x)( x x) ˆx x 1 Theorem 11 : Let f : R R be differetiable. The, f is covex o the covex set C R if, ad oly if for

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information