5.1 Review of Singular Value Decomposition (SVD)
|
|
- Sabina Fitzgerald
- 6 years ago
- Views:
Transcription
1 MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of Sigular Value Decompositio Spectral Clusterig uder Gaussia Mixture Model (cotiued from previous lecture) Aalysis of spectral clusterig 5 Review of Sigular Value Decompositio (SVD) Recall from previous lecture that we give sigular value decompositio of a matrix A as A = i= σ iu i vi T where σ σ σ are sigular values ad u, u,, u ad v, v,, v are correspodig left ad right sigular vectors respectively Below we preset a summary of some of the results related to geometric iterpretatio of SVD that we discussed i previous lecture: We ca iterpret leadig right sigular vector (v ) of A as the best-fit vector for rows of A It is also the leadig eigevector of A T A Leadig sigular value (σ ) ca be viewed as sum of the legth of the projectios of rows of A oto the liear subspace spaed by v ie Av = σ Previous result exteds to higher dimesios ie best-fit k dimesioal subspace for rows of A is give by spa{v, v,, v k } where v k arg v arg max v = Av v arg max Av v v v = max v v,v v k v = Av Collectio of v, v, v k ca be chose as the top-k eigevectors of A T A u i s are defied as u i = Av i σ i This combied with the previous property implies that u i u j if i j ad u i = A = i= σ iu i vi T = r i= σ iu i vi T for some r assumig σ r+ = = σ = 0 I this case, let row(a)deote the row space of A The row(a) = spa{v, v,, v r } Frobeius orm ( A F ) is defied as A F = j= σ j = j= Av j = m j= A i = j= A ij where A i is ith row of A Secod equality holds because σ j = Av j ad m i= third equality holds because of the previous property
2 5 Spectral Clusterig uder Gaussia Mixture Model Recall that if we kow the cluster mea µ a priori the we ca project our sample poits to spa{µ} This ca help us i reducig the dimesio of the problem (i the simple example of clusters we ca reduce d dimesios to dimesio) However, i geeral we do t have prior kowledge of µ As discussed i the previous lecture, we could try a radom projectio but we showed that it does t help Here, we ll discuss aother projectio scheme called Spectral projectio 5 Spectral Projectio We ll start with our basic example of clusters cetered at µ ad µ ad variace σ The, we ll exted the model to more geeral case Idea We have bee give below iformatio: X X X = X X i iid N (µ, σ I d d ) + N ( µ, σ I d d ) Based o this we ca say that µ T µ T First Cluster E[X] = µ T = Left Sigular Vector µ Ṭ Secod Cluster µ T µ T } Right Sigular vector Observe that poits from first cluster cotribute µ T i first expected value matrix ad poits from secod cluster cotribute µ T i first expected value matrix This matrix ca be further decomposed ito vectors of {, } ad µ T which ca be treated as left sigular vector ad right sigular vector (upo ormalizatio) of E[X] respectively This gives us ituitio that if X is close to E[X] the we would expect the leadig right sigular vector of X to be close to µ Note that E[X] is rak matrix however X may ot be rak Now suppose, X = r σ i u i vi T i= Xv = σ u
3 If X is close to E[X] the u would be close to We ca treat the problem of clusterig X i as problem of clusterig u ad this gives us a algorithm 5 Spectral clusterig algorithm for k =, µ = µ, µ = µ Compute the leadig left sigular vector of X say it is give by u If (a) u,i < 0 assig X i to the first cluster (b) u,i > 0 assig X i to the secod cluster (c) u,i = 0 assig X i a arbitrarily chose cluster We ca easily geeralize our results for a geeral clusterig problem followig Gaussia mixture model with k clusters cetered at µ, µ,, µ k respectively We ca agai check that, { µ T 0 0 First Cluster { 0 0 µ T Secod Cluster µ T E[X] = = Left Sigular Vectors { 0 0 µ T Right Sigular vectors µ T 0 0 ḳ µ k-th Cluster T k 0 0 Colum i of the matrix cotaiig left sigular vectors (up to ormalizatio) acts as a idicator vector ( Si ) for cluster i The matrix itself is kow as membership matrix It is easy to exted our previous algorithm to deal with geeral case 53 Spectral clusterig algorithm i geeral case Compute SVD of X ie X = r i= σ iu i v T i U = [u, u, u k ] R k 3 Ru k meas o the rows of U Clusterig U is easy if U is close to membership matrix Note: We are treatig [µ T, µt,, µt k ]T as right sigular vectors up to ormalizatio However, they may ot be orthogoal to each other Our assumptio still works because we are oly iterested i the space spaed by them 3
4 Recall our spectral relaxatio of k-meas problem: Ŷ = arg max < XX T, Y > such that Y = r w i wi T, w i =, w i w j i j i= Optimal solutio for above is Y = k i= u iu T i = UU T where u,, u k are top k left sigular vectors of X ad U = [u, u,, u k ] Notice how U appears i the spectral clusterig algorithm as well 53 Aalysis of Spectral Clusterig Algorithms metioed i previous sectio deped o our assumptio that X is close to E[X] I this sectio, we ll try to quatify this closeess We will be usig Davis-Kaha s si θ theorem to aalyze spectral clusterig But before we move there, we ll defie some otatio Lets say we have two matrix A ad B such that B = A + where is called perturbatio Suppose that A ad B have a decompositio which is similar to SVD ad is give by : [ ] A0 0 A = E G = [ ] [ ] [ ] A E 0 A 0 E 0 0 G T 0 0 A G T [ ] B0 0 B = F H = [ ] [ ] [ ] B F 0 B 0 F 0 0 H T 0 0 B where A R m, E R m m, G R, E 0 R m k, E R m m k, G 0 R k, G R k B R m, F R m m, H R, F 0 R m k, F R m m k, H 0 R k, H R k A 0 R k k, A R m k k, B 0 R k k, B R m k k assume EE T = E T E = I m m GG T = G T G = I F T F = F F T = I m m H T H = HH T = I clearly A = E 0 A 0 G T 0 + E A G T B = F 0 B 0 H T 0 + F B H T I our case, we ca view A = E[X] ad B = X Our goal would be to defie a distace d(e 0, F 0 ) betwee E 0 ad F 0 ad upper boud it as a fuctio of Davis-Kaha s si θ theorem helps us i doig that But before we move to actual theorem we ll defie some specific distaces ad look ito their properties H T 4
5 53 Projectio distace Defiitio 5 d p (E 0, F 0 ) E 0 E T 0 F 0 F T 0 Lemma 5 d p (E 0, F 0 ) = F T E 0 = E T 0 F Proof Left for homework To get the ituitio behid above lemma we ca take a simple example where E 0 ad F 0 are oe dimesioal ad we kow that F F 0 Hece, the arragemet looks like below: It is easy to see that E 0 E T 0 F 0F T 0 = F T E 0 = si θ Notice, how we ca deote projectio distace i terms of si θ We ll geeralize this otio ad preset a way to view the projectio distace i terms of pricipal agles Let, E T 0 F 0 = U cos ΘV T where θ θ Θ = θk ad cos θ cos θ cos Θ = cos θk with 0 θ θ θ k π We ca do this because E 0 F 0 are basis with sigular values less tha or equal to Also, ote that U, V O(k) where O(k) is set of k k orthoormal matrices I our oe dimesioal example above E T 0 F 0 = cos θ 5
6 Lemma 5 F T E 0 = si Θ = si θ k Proof F T E 0 = E0 T F F T E 0 = E0 T (I F 0 F0 T )E 0 = E0 T E 0 E0 T F 0 F0 T E 0 = I k k U cos ΘV T V cos ΘU T = I k k U cos ΘU T = I k k cos Θ = si Θ = si Θ = si θ k Secod equality holds because F F T = I m m, third ad fourth equalities use SVD of E T 0 F 0 ad fifth ad sixth equalities hold because left ad right multiplicatio by U ad U T respectively oly causes rotatio which does t affect the spectral orm 53 Spectral distace Defiitio 5 (Spectral distace) d s (E 0, F 0 ) mi Q,R O(k) E 0Q F 0 R R O(k) E 0 F 0 R The equality holds because we ca iterpret Q ad R as rotatio matrices Let E 0, F 0 be ay vectors i R, the we oly eed to multiply oe of the two vectors by to get the quatity to be miimized Lemma 53 d s (E 0, F 0 ) = si Θ = si θ k 6
7 Proof d s(e 0, F 0 ) = mi R O(k) E 0 F 0 R R O(k) (E 0 F 0 R) (E 0 F 0 R) R O(k) E 0 E 0 R F 0 E 0 E 0 F 0 R + R F 0 F 0 R R O(k) I R F 0 E 0 E 0 F 0 R + I R O(k) I R V cos ΘU U cos ΘV R R O(k) U (I R V cos ΘU U cos ΘV R)U R O(k) I U R V cos Θ cos ΘV RU Let R V RU Sice the product of two orthogoal matrices is also a orthogoal matrix, we have R O(k) Next, we boud the quatity d s(e 0, F 0 ) o the both sides O the oe had, we have d s(e 0, F 0 ) = mi R O(k) I (R ) cos Θ cos ΘR I cos Θ = ( cos Θ k ) = 4 si θ k The iequality holds by lettig R be a feasible solutio, ie I k k O the aother had, we have d s(e 0, F 0 ) I R O(k) (R ) cos Θ cos ΘR ( R O(k) max x = x (I (R ) cos Θ cos ΘR )x mi R O(k) x (I (R ) cos Θ cos ΘR )x mi R O(k) e k (R ) cos Θe k R O(k) R kk cos θ k = cos θ k = 4 si θ k The secod iequality is true by lettig x e k ) Corollary 5 d p (E 0, F 0 ) d s (E 0, F 0 ) d p (E 0, F 0 ) Proof By Lemma 5, we have d p (E 0, F 0 ) = si θ k = si θ k cos θ k From Lemma 53, we have d s (E 0, F 0 ) = si θ k Sice 0 θ k, the cos θ k Therefore we have d p (E 0, F 0 ) d s (E 0, F 0 ) d p (E 0, F 0 ) 7
8 533 Davis-Kaha si-θ Theorem Theorem 5 (Davis-Kaha si-θ Theorem) Let Sval(A 0 ) ad Sval(B ) be the set of sigular values of A 0 ad B, respectively If Sval(A 0 ) [0, α] ad Sval(B ) [α + δ, ) for some α R ad δ > 0, the we have d p (E 0, F 0 ) δ (5) I the theorem, δ is called the spectral gap Before goig to prove the theorem, we discuss a applicatio of Davis-Kaha si-θ Theorem i spectral clusterig Example 5 (Applicatio of D-K si-θ theorem i spectral cluster) Recall i the two clusters settig, cluster oe ceters at µ R d, cluster two at µ R d, the matrix X R d is the data matrix Let A X, B E [X], ad X E [X] The, we have a SVD of A A = σ u v + r σ i u i vi ; i= ad where B = µ = ( µ ) µ µ β( µ ) µ µ, β The goal is to derive a upper boud of the distace betwee β ad sigular vector u i term of X E [X] We ca apply the Davis-Kaha si-θ Theorem (Theorem 5),with E 0 = β, F 0 = u, σ σ 3 A = ad B 0 = µ ad obtai σr d p (β, u ) X E [X], δ 8
9 where a lower boud of δ is give as below We eed to obtai a upper boud of the sigular value set {σ,, σ r } From the Weyl s Theorem, we kow σ i (A) σ i (B) A B Thus the sigular value set {σ,, σ r } is bouded by Hece, δ µ ad we have d p (β, u ) X E [X] δ µ We eed oe more lemma to prove the Davis-Kaha si-θ Theorem Lemma 54 Let P R, Q R m m, X R m ad Y R m Assume P α ad Q α+δ from some α R + ad δ R + Let C XQ P Y, the we have C (α + δ) X α Y Proof First, we have C = XQ P Y XQ P Y by the subadditivity of a orm The, we derive a lower boud of XQ : X = XQQ XQ Q XQ α + δ, where the secod iequality holds because for ay two matrices A, B, AB A B Thus, XQ (α + δ) X We also have a upper boud of P Y P Y α Y Hece, C (α + δ) X α Y Proof of Davis-Kaha si-θ Theorem Recall A = [ ] [ ] [ ] A E 0 E 0 0 G 0 0 A G B = [ ] [ ] [ ] B F 0 F 0 0 H 0 0 B = B A The sice E, F O(m) ad G, H O() E 0 H = E 0 (B A)H H = E 0 BH E 0 AH = E 0 F B A 0 G 0 H Let E0 F be X, B be Q, A 0 be P, G 0 H be Y, by Lemma 54, we have E0 H (α + δ) E0 F α G 0 H Similarly, we have F G 0 (α + δ) G 0 H α E0 F 9
10 Let t = G 0 H ad t = E0 F Thus, t αt + α+δ ad t αt + α+δ Therefore max{t, t } δ By Lemma 5, d p (E 0, F 0 ) δ 0
Lecture 8: October 20, Applications of SVD: least squares approximation
Mathematical Toolkit Autum 2016 Lecturer: Madhur Tulsiai Lecture 8: October 20, 2016 1 Applicatios of SVD: least squares approximatio We discuss aother applicatio of sigular value decompositio (SVD) of
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture 9: Pricipal Compoet Aalysis The text i black outlies mai ideas to retai from the lecture. The text i blue give a deeper uderstadig of how we derive or get
More informationLinear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationLECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)
LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum
More informationGeometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT
OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca
More informationSingular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine
Lecture 11 Sigular value decompositio Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaie V1.2 07/12/2018 1 Sigular value decompositio (SVD) at a glace Motivatio: the image of the uit sphere S
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More information6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition
6. Kalma filter implemetatio for liear algebraic equatios. Karhue-Loeve decompositio 6.1. Solvable liear algebraic systems. Probabilistic iterpretatio. Let A be a quadratic matrix (ot obligatory osigular.
More informationGrouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014
Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group
More informationLinear Regression Demystified
Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to
More informationSpectral Partitioning in the Planted Partition Model
Spectral Graph Theory Lecture 21 Spectral Partitioig i the Plated Partitio Model Daiel A. Spielma November 11, 2009 21.1 Itroductio I this lecture, we will perform a crude aalysis of the performace of
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More information18.S096: Homework Problem Set 1 (revised)
8.S096: Homework Problem Set (revised) Topics i Mathematics of Data Sciece (Fall 05) Afoso S. Badeira Due o October 6, 05 Exteded to: October 8, 05 This homework problem set is due o October 6, at the
More informationSymmetric Matrices and Quadratic Forms
7 Symmetric Matrices ad Quadratic Forms 7.1 DIAGONALIZAION OF SYMMERIC MARICES SYMMERIC MARIX A symmetric matrix is a matrix A such that. A = A Such a matrix is ecessarily square. Its mai diagoal etries
More informationHomework Set #3 - Solutions
EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm
More informationRandom Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices
Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio
More informationMatrix Theory, Math6304 Lecture Notes from October 25, 2012 taken by Manisha Bhardwaj
Matrix Theory, Math6304 Lecture Notes from October 25, 2012 take by Maisha Bhardwaj Last Time (10/23/12) Example for low-rak perturbatio, re-examied Relatig eigevalues of matrices ad pricipal submatrices
More informationInverse Matrix. A meaning that matrix B is an inverse of matrix A.
Iverse Matrix Two square matrices A ad B of dimesios are called iverses to oe aother if the followig holds, AB BA I (11) The otio is dual but we ofte write 1 B A meaig that matrix B is a iverse of matrix
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationLecture #20. n ( x p i )1/p = max
COMPSCI 632: Approximatio Algorithms November 8, 2017 Lecturer: Debmalya Paigrahi Lecture #20 Scribe: Yua Deg 1 Overview Today, we cotiue to discuss about metric embeddigs techique. Specifically, we apply
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More information(VII.A) Review of Orthogonality
VII.A Review of Orthogoality At the begiig of our study of liear trasformatios i we briefly discussed projectios, rotatios ad projectios. I III.A, projectios were treated i the abstract ad without regard
More informationMath Solutions to homework 6
Math 175 - Solutios to homework 6 Cédric De Groote November 16, 2017 Problem 1 (8.11 i the book): Let K be a compact Hermitia operator o a Hilbert space H ad let the kerel of K be {0}. Show that there
More informationLecture 2: April 3, 2013
TTIC/CMSC 350 Mathematical Toolkit Sprig 203 Madhur Tulsiai Lecture 2: April 3, 203 Scribe: Shubhedu Trivedi Coi tosses cotiued We retur to the coi tossig example from the last lecture agai: Example. Give,
More informationWe are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n
Review of Power Series, Power Series Solutios A power series i x - a is a ifiite series of the form c (x a) =c +c (x a)+(x a) +... We also call this a power series cetered at a. Ex. (x+) is cetered at
More informationMath 61CM - Solutions to homework 3
Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig
More informationSeunghee Ye Ma 8: Week 5 Oct 28
Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value
More informationLecture 2 Clustering Part II
COMS 4995: Usupervised Learig (Summer 8) May 24, 208 Lecture 2 Clusterig Part II Istructor: Nakul Verma Scribes: Jie Li, Yadi Rozov Today, we will be talkig about the hardess results for k-meas. More specifically,
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationAlgorithms for Clustering
CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat
More informationFactor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis
Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model
More informationSession 5. (1) Principal component analysis and Karhunen-Loève transformation
200 Autum semester Patter Iformatio Processig Topic 2 Image compressio by orthogoal trasformatio Sessio 5 () Pricipal compoet aalysis ad Karhue-Loève trasformatio Topic 2 of this course explais the image
More information5.1. The Rayleigh s quotient. Definition 49. Let A = A be a self-adjoint matrix. quotient is the function. R(x) = x,ax, for x = 0.
40 RODICA D. COSTIN 5. The Rayleigh s priciple ad the i priciple for the eigevalues of a self-adjoit matrix Eigevalues of self-adjoit matrices are easy to calculate. This sectio shows how this is doe usig
More informationMath 155 (Lecture 3)
Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,
More informationPhysics 324, Fall Dirac Notation. These notes were produced by David Kaplan for Phys. 324 in Autumn 2001.
Physics 324, Fall 2002 Dirac Notatio These otes were produced by David Kapla for Phys. 324 i Autum 2001. 1 Vectors 1.1 Ier product Recall from liear algebra: we ca represet a vector V as a colum vector;
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More information2 Geometric interpretation of complex numbers
2 Geometric iterpretatio of complex umbers 2.1 Defiitio I will start fially with a precise defiitio, assumig that such mathematical object as vector space R 2 is well familiar to the studets. Recall that
More information36-755, Fall 2017 Homework 5 Solution Due Wed Nov 15 by 5:00pm in Jisu s mailbox
Poits: 00+ pts total for the assigmet 36-755, Fall 07 Homework 5 Solutio Due Wed Nov 5 by 5:00pm i Jisu s mailbox We first review some basic relatios with orms ad the sigular value decompositio o matrices
More informationReal Numbers R ) - LUB(B) may or may not belong to B. (Ex; B= { y: y = 1 x, - Note that A B LUB( A) LUB( B)
Real Numbers The least upper boud - Let B be ay subset of R B is bouded above if there is a k R such that x k for all x B - A real umber, k R is a uique least upper boud of B, ie k = LUB(B), if () k is
More informationCMSE 820: Math. Foundations of Data Sci.
Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).
More informationMATH 10550, EXAM 3 SOLUTIONS
MATH 155, EXAM 3 SOLUTIONS 1. I fidig a approximate solutio to the equatio x 3 +x 4 = usig Newto s method with iitial approximatio x 1 = 1, what is x? Solutio. Recall that x +1 = x f(x ) f (x ). Hece,
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationChimica Inorganica 3
himica Iorgaica Irreducible Represetatios ad haracter Tables Rather tha usig geometrical operatios, it is ofte much more coveiet to employ a ew set of group elemets which are matrices ad to make the rule
More informationSequences and Series of Functions
Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges
More informationBivariate Sample Statistics Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 7
Bivariate Sample Statistics Geog 210C Itroductio to Spatial Data Aalysis Chris Fuk Lecture 7 Overview Real statistical applicatio: Remote moitorig of east Africa log rais Lead up to Lab 5-6 Review of bivariate/multivariate
More informationCov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.
CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationA Hadamard-type lower bound for symmetric diagonally dominant positive matrices
A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More information1 Covariance Estimation
Eco 75 Lecture 5 Covariace Estimatio ad Optimal Weightig Matrices I this lecture, we cosider estimatio of the asymptotic covariace matrix B B of the extremum estimator b : Covariace Estimatio Lemma 4.
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More informationLecture 19. sup y 1,..., yn B d n
STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that
More informationSummary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector
Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short
More informationa for a 1 1 matrix. a b a b 2 2 matrix: We define det ad bc 3 3 matrix: We define a a a a a a a a a a a a a a a a a a
Math E-2b Lecture #8 Notes This week is all about determiats. We ll discuss how to defie them, how to calculate them, lear the allimportat property kow as multiliearity, ad show that a square matrix A
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationLINEAR ALGEBRA. Paul Dawkins
LINEAR ALGEBRA Paul Dawkis Table of Cotets Preface... ii Outlie... iii Systems of Equatios ad Matrices... Itroductio... Systems of Equatios... Solvig Systems of Equatios... 5 Matrices... 7 Matrix Arithmetic
More information(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:
Math 50-004 Tue Feb 4 Cotiue with sectio 36 Determiats The effective way to compute determiats for larger-sized matrices without lots of zeroes is to ot use the defiitio, but rather to use the followig
More informationThe Random Walk For Dummies
The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli
More informationDistributional Similarity Models (cont.)
Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {
More informationProduct measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.
Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the
More information2.2. Central limit theorem.
36.. Cetral limit theorem. The most ideal case of the CLT is that the radom variables are iid with fiite variace. Although it is a special case of the more geeral Lideberg-Feller CLT, it is most stadard
More informationDefinition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.
4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad
More informationACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory
1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.
More informationLecture 7: Density Estimation: k-nearest Neighbor and Basis Approach
STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.
More information4.3 Growth Rates of Solutions to Recurrences
4.3. GROWTH RATES OF SOLUTIONS TO RECURRENCES 81 4.3 Growth Rates of Solutios to Recurreces 4.3.1 Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer.
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 11
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple
More informationLecture 2: Concentration Bounds
CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy
More informationVector Spaces and Vector Subspaces. Remarks. Euclidean Space
Vector Spaces ad Vector Subspaces Remarks Let be a iteger. A -dimesioal vector is a colum of umbers eclosed i brackets. The umbers are called the compoets of the vector. u u u u Euclidea Space I Euclidea
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationAdvanced Stochastic Processes.
Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationA class of spectral bounds for Max k-cut
A class of spectral bouds for Max k-cut Miguel F. Ajos, José Neto December 07 Abstract Let G be a udirected ad edge-weighted simple graph. I this paper we itroduce a class of bouds for the maximum k-cut
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More informationSupplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting
Supplemetary Materials for Statistical-Computatioal Phase Trasitios i Plated Models: The High-Dimesioal Settig Yudog Che The Uiversity of Califoria, Berkeley yudog.che@eecs.berkeley.edu Jiamig Xu Uiversity
More informationMatrix Algebra from a Statistician s Perspective BIOS 524/ Scalar multiple: ka
Matrix Algebra from a Statisticia s Perspective BIOS 524/546. Matrices... Basic Termiology a a A = ( aij ) deotes a m matrix of values. Whe =, this is a am a m colum vector. Whe m= this is a row vector..2.
More informationn outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 9 Variace Questio: At each time step, I flip a fair coi. If it comes up Heads, I walk oe step to the right; if it comes up Tails, I walk oe
More informationSlide Set 13 Linear Model with Endogenous Regressors and the GMM estimator
Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday
More informationPrinciple Of Superposition
ecture 5: PREIMINRY CONCEP O RUCUR NYI Priciple Of uperpositio Mathematically, the priciple of superpositio is stated as ( a ) G( a ) G( ) G a a or for a liear structural system, the respose at a give
More informationDistributional Similarity Models (cont.)
Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical
More informationThe Basic Space Model
The Basic Space Model Let x i be the ith idividual s (i=,, ) reported positio o the th issue ( =,, m) ad let X 0 be the by m matrix of observed data here the 0 subscript idicates that elemets are missig
More informationLecture 12: February 28
10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationSolutions to home assignments (sketches)
Matematiska Istitutioe Peter Kumli 26th May 2004 TMA401 Fuctioal Aalysis MAN670 Applied Fuctioal Aalysis 4th quarter 2003/2004 All documet cocerig the course ca be foud o the course home page: http://www.math.chalmers.se/math/grudutb/cth/tma401/
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationAxis Aligned Ellipsoid
Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 0 Scribe: Ade Forrow Oct. 3, 05 Recall the followig defiitios from last time: Defiitio: A fuctio K : X X R is called a positive symmetric
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationSection 1.1. Calculus: Areas And Tangents. Difference Equations to Differential Equations
Differece Equatios to Differetial Equatios Sectio. Calculus: Areas Ad Tagets The study of calculus begis with questios about chage. What happes to the velocity of a swigig pedulum as its positio chages?
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More information6.895 Essential Coding Theory October 20, Lecture 11. This lecture is focused in comparisons of the following properties/parameters of a code:
6.895 Essetial Codig Theory October 0, 004 Lecture 11 Lecturer: Madhu Suda Scribe: Aastasios Sidiropoulos 1 Overview This lecture is focused i comparisos of the followig properties/parameters of a code:
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationBertrand s Postulate
Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a
More informationTechnical Proofs for Homogeneity Pursuit
Techical Proofs for Homogeeity Pursuit bstract This is the supplemetal material for the article Homogeeity Pursuit, submitted for publicatio i Joural of the merica Statistical ssociatio. B Proofs B. Proof
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More information