Near-Orthogonality Regularization in Kernel Methods Supplementary Material
|
|
- Donald Hines
- 5 years ago
- Views:
Transcription
1 Near-Orthogoality Regularizatio i Kerel Methods Supplemetary Material ADMM-Based Algorithms. Detailed Derivatio of Solvig A Give H R K K where H ij = f i, ˆf j, the sub-problem defied over A is mi A λd φ (A, I) P, A Q, A + ρ H A F + ρ H A F. () D φ (A, I) has three cases, which we discuss separately. Whe D φ (A, I) is the squared Frobeius orm, the problem becomes mi A λ A I F P, A Q, A + ρ H A F + ρ H A F. () Takig the derivative ad settig it to zero, we get the optimal solutio for A: A = (λi + P + Q + ρ(h + H ))/(λ + ρ). (3) Whe D φ (A, I) is the log-determiat divergece, the problem is specialized to mi A λ(tr(a) log det(a)) P, A Q, A + ρ H A F + ρ H A F s.t. A 0 (4) Takig the derivative of the objective fuctio w.r.t A ad settig it to zero, we get A + ρ (λi P Q ρ(h + H ))A λ ρ I = 0. (5) Let B = ρ (λi P Q ρ(h + H )), C = λ ρ I, Eq.(5) ca be writte as A + BA + C = 0. (6) Accordig to (Higham ad Kim, 00), sice B ad C commute, the solutio of this equatio is A = B + B 4C. (7) Takig a eigedecompositio of B = ΦΣΦ, we ca compute A as A = Φ ΣΦ, where Σ is a diagoal matrix with Σ kk = Σ kk + Σ kk + 4λ ρ. Sice Σ kk + 4λ ρ > Σ kk, we kow Σ kk > 0. Hece A is positive defiite. Whe D φ (A, I) is the vo Neuma divergece, the problem becomes mi A λtr(a log A A) P, A Q, A + ρ H A F + ρ H A F s.t. A 0 (8)
2 Settig the gradiet of the objective fuctio w.r.t A to zero, we get λ log A + ρa = P + Q + ρ(h + H ). (9) Let D = P + Q + ρ(h + H ). We perform a eigedecompositio of D = ΦΣΦ ad parameterize A as A = Φ ΣΦ, the we obtai λ log A + ρa = Φ(λ log Σ + ρ Σ)Φ. (0) Pluggig this equatio ito Eq.(9), we get the followig equatio: which amouts to solvig K idepedet oe-variable equatios takig the form where i =,, K. This equatio has a closed-form solutio: λ log Σ + ρ Σ = Σ () λ log Σ ii + ρ Σ ii = Σ ii () Σ ii = λω( Σii λ log( λ ρ )) ρ (3) where ω( ) is the Wright omega fuctio Goreflo et al. (007). Due to the presece of log, Σ ii is required to be positive ad the solutio always exists sice the rage of λ log Σ ii + ρ Σ ii ad Σ ii are both (, ). Hece A is guarateed to be positive defiite.. ADMM-Based Algorithm for BMD-KSC BMD-KSC has two set of parameters: sparse codes {a } N = ad a dictioary of RKHS fuctios {f i } K i=. We use a coordiate descet algorithm to lear these two parameter sets, which iteratively performs the followig two steps: () fixig {f i } K i=, solvig {a } N =; () fixig {a } N =, solvig {f i } K i=, util covergece. We first discuss step (). The sub-problem defied over a is mi a k(x, ) K i= a if i H + λ a. (4) k(x, ) K i= a if i H = k(x, x ) a h + a Ga where h R K, h i = f i, k(x, ), G R K K ad G ij = f i, f j. This is a stadard Lasso (Tibshirai, 996) problem ad ca be solved usig may algorithms. Next we discuss step (), which lears {f i } K i= usig the ADMM-based algorithm outlied i the mai paper. The updates of all variables are the same as those i BMD-KDML, except f i. Let b i = k(x, ) K j i a jf j, the sub-problem defied over f i is: N mi fi = b i a i f i H + λ H + g i, f i + K j= P ij f i, ˆf j + K j= Q ij f i, ˆf j + ρ K j= ( f i, ˆf j A ij ) + ρ K j= ( f i, ˆf j A ji ) (5) This problem ca be solved with fuctioal gradiet descet. The fuctioal gradiet of the objective fuctio w.r.t f i is N K a i (a i f i b i ) + λ f i + g i + (P ij + Q ij + ρ f i, ˆf j ρ(a ij + A ji )) ˆf j (6) = Proof of Lemma j= For the ease of otatio, we use, to deote the ier product i the RKHS ad to deote the Hilbert orm. To prove Lemma, we eed the followig Lemma, which is a extesio of Lemma 4 i (Xie et al., 05a) from vectors to RKHS fuctios. Lemma. Let F = {f i } K i= ad G be the Gram matrix defied o F. Let g i be the fuctioal gradiet of det(g) w.r.t f i, the g i, f j = 0 for all j i, ad g i, f i > 0.
3 Proof. We decompose f i ito f i = f i + f i. f i is i the spa of F/{f i}: f i = K j i a jf j, where {a j } K j i are the liear coefficiets. fi is orthogoal to F/{f i }: fi, f j = 0 for all j i. Let c j deote the j-th colum of G. Subtractig K j i a jc j from the i-th colum, we get f, f 0 f, f K f, f 0 f, f K. det(g) = f i, f fi, f i f i, f K (7) f K, f 0 f K, f K Expadig the determiat accordig to the i-th colum, we get det(g) = det(g i ) f i, f i (8) where G i is the Gram matrix defied o F/{f i }. The the fuctioal gradiet g i of det(g) w.r.t f i is det(g i )f i, which is orthogoal to f j for all j i. Sice G is full rak, we kow det(g) > 0. From Eq.(8) we kow f i, f i is o-egative. Hece g i, f i = det(g i ) f i, f i > 0. Now we are ready to prove Lemma. Some of the proof techiques draw ispiratio from (Xie et al., 05a). We first compute ŝ ij : ŝ ij = f i, f j f i f = f i+ηg i,f j+ηg j j fi+ηg i (9) f j+ηg j The fuctioal gradiet of log det(g) w.r.t f i is computed as g i = det(g) g i. Accordig to Lemma ad the fact that det(g) > 0, we kow g i, f j = 0 for all j i, ad g i, f i > 0. The we have ad Accordig to the Taylor expasio, we have f i + ηg i, f j + ηg j = f i + η(f i g i ), f j + η(f j g j ) j = ( + η)f i ηg i, ( + η)f j ηg = ( + η) f i, f j + 4η g i, g j = f i, f j ( + η) + 4η g i,g j f i,f j fi+ηg i = fi +η f i,g i +η g i = (+ η f i,g i f i ) = f i + η f i,g i f i (0) () = x + o(x) () + x The where Hece + η f i,g i f i η f i,g i = ( η fi,gi f i + η g i = η fi,fi g i + η f i,g i f i ) + o( η fi,gi + η g i ) (3) = 4η 4η fi,g i (4) = η + η fi,g i f i + o(η) (5) 3
4 ad The + η f i,g i f i + η f j,g j f j + η g j f j = ( η) + η( fi,g i ŝ ij = fi,fj f ( + i f j η) + 4η g i,g j f i,f j (( η) + η( fi,g i fi,fj f (( + i f j η) + 4η g i,g j f i,f j )(( η) + η( fi,g = s ij ( + η( fi,g i f + fj,g j i f j ) + o(η)) > s ij + fj,g j f j + fj,g j f j i + fj,g j f j ) + o(η) ) + o(η)) ) + o(η)) (6) (7) This holds for all i, j, hece s( F) > s(f). The proof completes. 3 Proof of Theorem For the ease of presetatio, we use to deote the umber of data examples. Note that this umber is deoted by N i the mai paper. Some of the proof techiques draw ispiratio from (Xie et al., 05b). A well established result i learig theory is that the geeralizatio error ca be upper bouded by the Rademacher complexity. We start from the Rademacher complexity, seek a further upper boud of it ad show how s(f) affects this upper boud. The Rademacher complexity R (A) of the loss fuctio set A is defied as R (A) = E[sup l A i= σ il(u(x i, y i ), t i )] (8) where σ i is uiform over {, } ad {(x i, y i, t i )} i= are i.i.d samples draw from p. Aother form of Rademacher complexity Bartlett ad Medelso (003) ca be writte as R (A) = E[sup l A i= σ il(u(x i, y i ), t i ) ]. The Rademacher complexity ca be utilized to upper boud the estimatio error, as show i Lemma 3. Lemma 3. (Athoy ad Bartlett, 999; Bartlett ad Medelso, 003; Liag, 05) With probability at least δ for γ sup x,y,t,u l(u(x, y), t) L(u) L(u) 4R (A) + γ log(/δ) (9) Our aalysis starts from this lemma ad we seek further upper boud of R (A). The aalysis eeds a upper boud of the Rademacher complexity of the hypothesis set F, which is give i Lemma 4. Lemma 4. Let R (F) deote the Rademacher complexity of the hypothesis set F, the R (F) 8B(k) B (k,c) K (30) Proof. Let V = {v : (x, y) (f(x) f(y)), f H} deote the set of hypothesis v(x, y) = (f(x) f(y)), we have K R (U) = E[sup u U σ i (f j (x i ) f j (y i )) ] i= j= K = E[sup u U σ i (f j (x i ) f j (y i )) ] j= i= KE[sup v V max (3) j σ i (f j (x i ) f j (y i )) ] i= = KE[sup v V σ i (f(x i ) f(y i )) ] = K R (V) i= 4
5 Let G = {g : (x, y) f(x) f(y), f H} deote the set of hypothesis g(x, y) = f(x) f(y) ad h(x) = x, the R (V) = R (h g). h(0) = 0 ad h is Lipschitz cotiuous with Lipschitz costat L, which ca be bouded as follows L = sup g G h (g) = sup f,x,y f(x) f(y) 4 sup f,x f(x) 4 sup f,x f, k(x, ) (3) 4 sup f,x f k(x, ) 4B(k)B (k, C) Accordig to the compositio property of Rademacher complexity (Theorem i Bartlett ad Medelso (003)), we have R (h g) 4B(k)B (k, C)R (g) (33) Now we boud R (g): R (g) = E[sup g G i= σ i(f(x i ) f(y i )) ] = E[sup g G i= σ i( f, k(x i, ) f, k(y i, ) ) ] E[sup g G f i= σ i(k(x i, ) k(y i, )) ] B(k) E[ i= σ i(k(x i, ) k(y i, )) ] = B(k) E (x,y)[e σ [ i= σ i(k(x i, ) k(y i, )) ]] B(k) E (x,y)[ E σ [ i= σ i(k(x i, ) k(y i, )) ]] (cocavity of ) = B(k) E (x,y)[ E σ [ i= σ i k(x i, ) k(y i, ) ]] ( i j σ i σ j ) = B(k) E (x,y)[ i= k(x i, ) k(y i, ) ] = B(k) E (x,y)[ i= (k(x i, x i ) + k(y i, y i ) k(x i, y i ))] 4B(k)B (k,c) Puttig Eq.(33) ad Eq.(34) together, we have Pluggig ito R (U) K R (V) completes the proof. I additio, we eed the followig boud of u(x, y). Lemma 5. where J = 4B(k) B (k, C) ((K )s(f) + ). Proof. Let F = [f i,, f K ]. We have where op deotes the operator orm. (34) R (V) 6B(k) B (k, C) (35) sup u(x, y) J (36) x,y,u u(x, y) = K j= (f j(x) f j (y)) = K j= ( f j, k(x, ) f j, k(y, ) ) = F (k(x, ) k(y, )) F op k(x, ) k(y, ) = F op k(x, ) k(y, ) (k(x, x) + k(y, y) k(x, y)) F op 4B (k, C) F op (37) F op = sup a = Fa = sup a = a F Fa K K = sup a = i= j= a ia j f i, f j K K sup a = i= j= a i a j f i f j cos θ ij K K sup a = i= j= a i a j B(k) cos θ ij B(k) sup K K a =( i= j i a i a j s(f) + K i= a i ) (38) 5
6 where θ ij is the agle betwee f i ad f j. Defie a = [ a,, a K ] T, Q R K K : Q ij = s(f) for i j ad Q ii =, the a = a ad F op B(k) sup a = a Qa B(k) sup a = λ (Q) a B(k) λ (Q) (39) where λ (Q) is the largest eigevalue of Q. By simple liear algebra we ca get λ (Q) = (K )s(f) +, so Substitutig to Eq.(37), we have The sup x,y,u u(x, y) J with The proof completes. F op B(k) ((K )s(f) + ) (40) u(x, y) 4B(k) B (k, C) ((K )s(f) + ) (4) J = 4B(k) B (k, C) ((K )s(f) + ) (4) Give these lemmas, we proceed to prove Theorem. Sice l(u(x,y),t) u(x,y) +exp( u(x,y)) +exp( J), l(u(x, y), t) is Lipschitz cotiuous with respect to the first argumet, ad the costat L is +exp( J). Applyig the compositio property of Rademacher complexity, we have Usig Lemma 4, we have R (A) + exp( J) R (U) (43) R (A) 8B(k) B (k,c) K (+exp( J)) I additio, sup x,y,t,u l(u(x, y), t) log( + exp(j)) Substitutig this iequality ad Eq.(44) ito Lemma 3 completes the proof. 4 Proof of Theorem (44) First, we derive a upper boud of the Babel fuctio Tropp (004): µ K ({f i } K i= ) = max i {,,K} max i {,,K} mb (k)s(f) max Λ {,,K}\{i}; Λ =m max Λ {,,K}\{i}; Λ =m j Λ f j, f i j Λ f i f j s(f) (45) I Theorem 4 of Vaisecher et al. (0), we set the upper boud of µ K ({f i } K i= ) to mb (k)s(f), the get Theorem. 5 Visualizatio Give the leared RKHS fuctios {f i } K i=, we compute a K K matrix S where S ij is the absolute value of the cosie similarity betwee f i ad f j. The we visualize this matrix usig a heatmap obtaied by the imagesc fuctio i MATLAB. Figure shows the heatmaps of the matrices leared by differet methods o the MIMIC-III dataset. The umber of RKHS fuctios was fixed to 00. For all matrices, the diagoal etries equal to. From the visualizatio, we observed the followig. For BMD-KDML methods KDML-(SFN,VND,LDD)-(RTR,RFF), the heatmaps have low eergy o the off-diagoal etries, which idicates that these matrices are close to a idetity matrix ad hece the leared RKHS fuctios are close to beig orthogoal. For the uregularized KDML, the heatmap has high eergy o the off-diagoal etries. The matrices of KDML-(DPP,Agle) have lower eergy compared with KDML ad KDML- SHN, but their eergy is higher tha BMD-KDML methods. 6
7 KDML KDML-SHN KDML-DPP KDML-Agle KDML-SFN-RTR KDML-VND-RTR KDML-LDD-RTR KDML-SFN-RFF KDML-VND-RFF KDML-LDD-RFF Refereces Figure : Heatmaps of the Pairwise Cosie Similarities Marti Athoy ad Peter L Bartlett. Neural etwork learig: Theoretical foudatios. cambridge uiversity press, 999. Peter L Bartlett ad Shahar Medelso. Rademacher ad gaussia complexities: Risk bouds ad structural results. Joural of Machie Learig Research, 3:463 48, 003. Rudolf Goreflo, Yuri Luchko, ad Fracesco Maiardi. Aalytical properties ad applicatios of the Wright fuctio. arxiv preprit math-ph/070069, 007. Nicholas J Higham ad Hyu-Mi Kim. Solvig a quadratic matrix equatio by ewto s method with exact lie searches. SIAM Joural o Matrix Aalysis ad Applicatios, 3():303 36, 00. Percy Liag. Lecture otes of statistical learig theory. 05. Robert Tibshirai. Regressio shrikage ad selectio via the lasso. Joural of the Royal Statistical Society. Series B (Methodological), pages 67 88, 996. Joel A Tropp. Greed is good: Algorithmic results for sparse approximatio. IEEE Trasactios o Iformatio theory, 50(0):3 4, 004. Daiel Vaisecher, Shie Maor, ad Alfred M Bruckstei. The sample complexity of dictioary learig. Joural of Machie Learig Research, (Nov):359 38, 0. P. Xie, Y. Deg, ad E. Xig. Diversifyig restricted Boltzma machie for documet modelig. I SIGKDD, 05a. P. Xie, Y. Deg, ad E. Xig. O the geeralizatio error bouds of eural etworks uder mutual agular regularizatio. arxiv:5.070, 05b. 7
REGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationarxiv: v1 [math.pr] 4 Dec 2013
Squared-Norm Empirical Process i Baach Space arxiv:32005v [mathpr] 4 Dec 203 Vicet Q Vu Departmet of Statistics The Ohio State Uiversity Columbus, OH vqv@statosuedu Abstract Jig Lei Departmet of Statistics
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationA Hadamard-type lower bound for symmetric diagonally dominant positive matrices
A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationCMSE 820: Math. Foundations of Data Sci.
Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationSupplemental Material: Proofs
Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationLearning Bounds for Support Vector Machines with Learned Kernels
Learig Bouds for Support Vector Machies with Leared Kerels Nati Srebro TTI-Chicago Shai Be-David Uiversity of Waterloo Mostly based o a paper preseted at COLT 06 Kerelized Large-Margi Liear Classificatio
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More information1 Duality revisited. AM 221: Advanced Optimization Spring 2016
AM 22: Advaced Optimizatio Sprig 206 Prof. Yaro Siger Sectio 7 Wedesday, Mar. 9th Duality revisited I this sectio, we will give a slightly differet perspective o duality. optimizatio program: f(x) x R
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationSolutions to home assignments (sketches)
Matematiska Istitutioe Peter Kumli 26th May 2004 TMA401 Fuctioal Aalysis MAN670 Applied Fuctioal Aalysis 4th quarter 2003/2004 All documet cocerig the course ca be foud o the course home page: http://www.math.chalmers.se/math/grudutb/cth/tma401/
More informationReview Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =
Review Problems ICME ad MS&E Refresher Course September 9, 0 Warm-up problems. For the followig matrices A = 0 B = C = AB = 0 fid all powers A,A 3,(which is A times A),... ad B,B 3,... ad C,C 3,... Solutio:
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More information5.1 Review of Singular Value Decomposition (SVD)
MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of
More informationREVISION SHEET FP1 (MEI) ALGEBRA. Identities In mathematics, an identity is a statement which is true for all values of the variables it contains.
The mai ideas are: Idetities REVISION SHEET FP (MEI) ALGEBRA Before the exam you should kow: If a expressio is a idetity the it is true for all values of the variable it cotais The relatioships betwee
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More information5. Matrix exponentials and Von Neumann s theorem The matrix exponential. For an n n matrix X we define
5. Matrix expoetials ad Vo Neuma s theorem 5.1. The matrix expoetial. For a matrix X we defie e X = exp X = I + X + X2 2! +... = 0 X!. We assume that the etries are complex so that exp is well defied o
More informationApproximation by Superpositions of a Sigmoidal Function
Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 22 (2003, No. 2, 463 470 Approximatio by Superpositios of a Sigmoidal Fuctio G. Lewicki ad G. Mario Abstract. We geeralize
More informationREVISION SHEET FP1 (MEI) ALGEBRA. Identities In mathematics, an identity is a statement which is true for all values of the variables it contains.
the Further Mathematics etwork wwwfmetworkorguk V 07 The mai ideas are: Idetities REVISION SHEET FP (MEI) ALGEBRA Before the exam you should kow: If a expressio is a idetity the it is true for all values
More informationTR/46 OCTOBER THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION A. TALBOT
TR/46 OCTOBER 974 THE ZEROS OF PARTIAL SUMS OF A MACLAURIN EXPANSION by A. TALBOT .. Itroductio. A problem i approximatio theory o which I have recetly worked [] required for its solutio a proof that the
More informationPAPER : IIT-JAM 2010
MATHEMATICS-MA (CODE A) Q.-Q.5: Oly oe optio is correct for each questio. Each questio carries (+6) marks for correct aswer ad ( ) marks for icorrect aswer.. Which of the followig coditios does NOT esure
More informationBounds for the Extreme Eigenvalues Using the Trace and Determinant
ISSN 746-7659, Eglad, UK Joural of Iformatio ad Computig Sciece Vol 4, No, 9, pp 49-55 Bouds for the Etreme Eigevalues Usig the Trace ad Determiat Qi Zhog, +, Tig-Zhu Huag School of pplied Mathematics,
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationOFF-DIAGONAL MULTILINEAR INTERPOLATION BETWEEN ADJOINT OPERATORS
OFF-DIAGONAL MULTILINEAR INTERPOLATION BETWEEN ADJOINT OPERATORS LOUKAS GRAFAKOS AND RICHARD G. LYNCH 2 Abstract. We exted a theorem by Grafakos ad Tao [5] o multiliear iterpolatio betwee adjoit operators
More informationEnumerative & Asymptotic Combinatorics
C50 Eumerative & Asymptotic Combiatorics Stirlig ad Lagrage Sprig 2003 This sectio of the otes cotais proofs of Stirlig s formula ad the Lagrage Iversio Formula. Stirlig s formula Theorem 1 (Stirlig s
More informationON POINTWISE BINOMIAL APPROXIMATION
Iteratioal Joural of Pure ad Applied Mathematics Volume 71 No. 1 2011, 57-66 ON POINTWISE BINOMIAL APPROXIMATION BY w-functions K. Teerapabolar 1, P. Wogkasem 2 Departmet of Mathematics Faculty of Sciece
More informationSupplement to Graph Sparsification Approaches for Laplacian Smoothing
Supplemet to Graph Sparsificatio Approaches for Laplacia Smoothig Veerajaeyulu Sadhaala Yu-Xiag Wag Rya J. Tibshirai Machie Learig Departmet Caregie Mello Uiversity Pittsburgh PA 53 Departmet of Statistics
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationarxiv: v1 [math.fa] 3 Apr 2016
Aticommutator Norm Formula for Proectio Operators arxiv:164.699v1 math.fa] 3 Apr 16 Sam Walters Uiversity of Norther British Columbia ABSTRACT. We prove that for ay two proectio operators f, g o Hilbert
More informationLøsningsførslag i 4M
Norges tekisk aturviteskapelige uiversitet Istitutt for matematiske fag Side 1 av 6 Løsigsførslag i 4M Oppgave 1 a) A sketch of the graph of the give f o the iterval [ 3, 3) is as follows: The Fourier
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationPhysics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001.
Physics 324, Fall 2002 Dirac Notatio These otes were produced by David Kapla for Phys. 324 i Autum 2001. 1 Vectors 1.1 Ier product Recall from liear algebra: we ca represet a vector V as a colum vector;
More information10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice
0//008 Liear Discrimiat Fuctios Jacob Hays Amit Pillay James DeFelice 5.8, 5.9, 5. Miimum Squared Error Previous methods oly worked o liear separable cases, by lookig at misclassified samples to correct
More informationNBHM QUESTION 2007 Section 1 : Algebra Q1. Let G be a group of order n. Which of the following conditions imply that G is abelian?
NBHM QUESTION 7 NBHM QUESTION 7 NBHM QUESTION 7 Sectio : Algebra Q Let G be a group of order Which of the followig coditios imply that G is abelia? 5 36 Q Which of the followig subgroups are ecesarily
More information6.867 Machine learning, lecture 7 (Jaakkola) 1
6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationThe log-behavior of n p(n) and n p(n)/n
Ramauja J. 44 017, 81-99 The log-behavior of p ad p/ William Y.C. Che 1 ad Ke Y. Zheg 1 Ceter for Applied Mathematics Tiaji Uiversity Tiaji 0007, P. R. Chia Ceter for Combiatorics, LPMC Nakai Uivercity
More informationThe Method of Least Squares. To understand least squares fitting of data.
The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationIterative Techniques for Solving Ax b -(3.8). Assume that the system has a unique solution. Let x be the solution. Then x A 1 b.
Iterative Techiques for Solvig Ax b -(8) Cosider solvig liear systems of them form: Ax b where A a ij, x x i, b b i Assume that the system has a uique solutio Let x be the solutio The x A b Jacobi ad Gauss-Seidel
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationGeneralization of Samuelson s inequality and location of eigenvalues
Proc. Idia Acad. Sci. Math. Sci.) Vol. 5, No., February 05, pp. 03. c Idia Academy of Scieces Geeralizatio of Samuelso s iequality ad locatio of eigevalues R SHARMA ad R SAINI Departmet of Mathematics,
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationProblem Set 2 Solutions
CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More information1 1 2 = show that: over variables x and y. [2 marks] Write down necessary conditions involving first and second-order partial derivatives for ( x0, y
Questio (a) A square matrix A= A is called positive defiite if the quadratic form waw > 0 for every o-zero vector w [Note: Here (.) deotes the traspose of a matrix or a vector]. Let 0 A = 0 = show that:
More informationLecture 8: October 20, Applications of SVD: least squares approximation
Mathematical Toolkit Autum 2016 Lecturer: Madhur Tulsiai Lecture 8: October 20, 2016 1 Applicatios of SVD: least squares approximatio We discuss aother applicatio of sigular value decompositio (SVD) of
More information5 Birkhoff s Ergodic Theorem
5 Birkhoff s Ergodic Theorem Amog the most useful of the various geeralizatios of KolmogorovâĂŹs strog law of large umbers are the ergodic theorems of Birkhoff ad Kigma, which exted the validity of the
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationChimica Inorganica 3
himica Iorgaica Irreducible Represetatios ad haracter Tables Rather tha usig geometrical operatios, it is ofte much more coveiet to employ a ew set of group elemets which are matrices ad to make the rule
More informationSelf-normalized deviation inequalities with application to t-statistic
Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric
More informationThe Basic Space Model
The Basic Space Model Let x i be the ith idividual s (i=,, ) reported positio o the th issue ( =,, m) ad let X 0 be the by m matrix of observed data here the 0 subscript idicates that elemets are missig
More informationMIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS
MIDTERM 3 CALCULUS MATH 300 FALL 08 Moday, December 3, 08 5:5 PM to 6:45 PM Name PRACTICE EXAM S Please aswer all of the questios, ad show your work. You must explai your aswers to get credit. You will
More informationInverse Matrix. A meaning that matrix B is an inverse of matrix A.
Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix
More informationCHAPTER 5. Theory and Solution Using Matrix Techniques
A SERIES OF CLASS NOTES FOR 2005-2006 TO INTRODUCE LINEAR AND NONLINEAR PROBLEMS TO ENGINEERS, SCIENTISTS, AND APPLIED MATHEMATICIANS DE CLASS NOTES 3 A COLLECTION OF HANDOUTS ON SYSTEMS OF ORDINARY DIFFERENTIAL
More informationa for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a
Math E-2b Lecture #8 Notes This week is all about determiats. We ll discuss how to defie them, how to calculate them, lear the allimportat property kow as multiliearity, ad show that a square matrix A
More informationarxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More informationMcGill University Math 354: Honors Analysis 3 Fall 2012 Solutions to selected problems
McGill Uiversity Math 354: Hoors Aalysis 3 Fall 212 Assigmet 3 Solutios to selected problems Problem 1. Lipschitz fuctios. Let Lip K be the set of all fuctios cotiuous fuctios o [, 1] satisfyig a Lipschitz
More informationIntroduction to Optimization Techniques. How to Solve Equations
Itroductio to Optimizatio Techiques How to Solve Equatios Iterative Methods of Optimizatio Iterative methods of optimizatio Solutio of the oliear equatios resultig form a optimizatio problem is usually
More informationb i u x i U a i j u x i u x j
M ath 5 2 7 Fall 2 0 0 9 L ecture 1 9 N ov. 1 6, 2 0 0 9 ) S ecod- Order Elliptic Equatios: Weak S olutios 1. Defiitios. I this ad the followig two lectures we will study the boudary value problem Here
More information1+x 1 + α+x. x = 2(α x2 ) 1+x
Math 2030 Homework 6 Solutios # [Problem 5] For coveiece we let α lim sup a ad β lim sup b. Without loss of geerality let us assume that α β. If α the by assumptio β < so i this case α + β. By Theorem
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationSupplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting
Supplemetary Materials for Statistical-Computatioal Phase Trasitios i Plated Models: The High-Dimesioal Settig Yudog Che The Uiversity of Califoria, Berkeley yudog.che@eecs.berkeley.edu Jiamig Xu Uiversity
More informationTRACES OF HADAMARD AND KRONECKER PRODUCTS OF MATRICES. 1. Introduction
Math Appl 6 2017, 143 150 DOI: 1013164/ma201709 TRACES OF HADAMARD AND KRONECKER PRODUCTS OF MATRICES PANKAJ KUMAR DAS ad LALIT K VASHISHT Abstract We preset some iequality/equality for traces of Hadamard
More informationSolutions for Math 411 Assignment #2 1
Solutios for Math 4 Assigmet #2 A2. For each of the followig fuctios f : C C, fid where f(z is complex differetiable ad where f(z is aalytic. You must justify your aswer. (a f(z = e x2 y 2 (cos(2xy + i
More informationSequences and Limits
Chapter Sequeces ad Limits Let { a } be a sequece of real or complex umbers A ecessary ad sufficiet coditio for the sequece to coverge is that for ay ɛ > 0 there exists a iteger N > 0 such that a p a q
More informationLinear Classifiers III
Uiversität Potsdam Istitut für Iformatik Lehrstuhl Maschielles Lere Liear Classifiers III Blaie Nelso, Tobias Scheffer Cotets Classificatio Problem Bayesia Classifier Decisio Liear Classifiers, MAP Models
More informationOrthogonal Gaussian Filters for Signal Processing
Orthogoal Gaussia Filters for Sigal Processig Mark Mackezie ad Kiet Tieu Mechaical Egieerig Uiversity of Wollogog.S.W. Australia Abstract A Gaussia filter usig the Hermite orthoormal series of fuctios
More informationMath 104: Homework 2 solutions
Math 04: Homework solutios. A (0, ): Sice this is a ope iterval, the miimum is udefied, ad sice the set is ot bouded above, the maximum is also udefied. if A 0 ad sup A. B { m + : m, N}: This set does
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More informationMath 128A: Homework 1 Solutions
Math 8A: Homework Solutios Due: Jue. Determie the limits of the followig sequeces as. a) a = +. lim a + = lim =. b) a = + ). c) a = si4 +6) +. lim a = lim = lim + ) [ + ) ] = [ e ] = e 6. Observe that
More information[ 11 ] z of degree 2 as both degree 2 each. The degree of a polynomial in n variables is the maximum of the degrees of its terms.
[ 11 ] 1 1.1 Polyomial Fuctios 1 Algebra Ay fuctio f ( x) ax a1x... a1x a0 is a polyomial fuctio if ai ( i 0,1,,,..., ) is a costat which belogs to the set of real umbers ad the idices,, 1,...,1 are atural
More informationPAijpam.eu ON TENSOR PRODUCT DECOMPOSITION
Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314
More informationComplex Analysis Spring 2001 Homework I Solution
Complex Aalysis Sprig 2001 Homework I Solutio 1. Coway, Chapter 1, sectio 3, problem 3. Describe the set of poits satisfyig the equatio z a z + a = 2c, where c > 0 ad a R. To begi, we see from the triagle
More informationMath 61CM - Solutions to homework 3
Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig
More informationStochastic Matrices in a Finite Field
Stochastic Matrices i a Fiite Field Abstract: I this project we will explore the properties of stochastic matrices i both the real ad the fiite fields. We first explore what properties 2 2 stochastic matrices
More informationA note on the Frobenius conditional number with positive definite matrices
Li et al. Joural of Iequalities ad Applicatios 011, 011:10 http://www.jouralofiequalitiesadapplicatios.com/cotet/011/1/10 RESEARCH Ope Access A ote o the Frobeius coditioal umber with positive defiite
More informationLECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)
LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum
More informationSession 5. (1) Principal component analysis and Karhunen-Loève transformation
200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationA remark on p-summing norms of operators
A remark o p-summig orms of operators Artem Zvavitch Abstract. I this paper we improve a result of W. B. Johso ad G. Schechtma by provig that the p-summig orm of ay operator with -dimesioal domai ca be
More informationOn Nonsingularity of Saddle Point Matrices. with Vectors of Ones
Iteratioal Joural of Algebra, Vol. 2, 2008, o. 4, 197-204 O Nosigularity of Saddle Poit Matrices with Vectors of Oes Tadeusz Ostrowski Istitute of Maagemet The State Vocatioal Uiversity -400 Gorzów, Polad
More information