Supplementary Material: A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data
|
|
- Bennett McCoy
- 5 years ago
- Views:
Transcription
1 Supplementary Material: A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data Jinfeng Yi JINFENGY@US.IBM.COM IBM Thomas J. Watson Research Center, Yorktown Heights, NY 0598, USA Lijun Zhang ZHANGLJ@LAMDA.NJU.EDU.CN National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 003, China Jun Wang WANGJUN@US.IBM.COM IBM Thomas J. Watson Research Center, Yorktown Heights, NY 0598, USA Rong Jin Anil K. Jain RONGJIN@CSE.MSU.EDU JAIN@CSE.MSU.EDU Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 4884 USA Theorem. Let /(6m be a parameter to control the success probability. Assume max, ( s λ c s, ( ( 8 T max ln K µ 0, 3c ( η 0 6c3 σ λ, λ (ln n + ln d (3 where c, c and c 3 are some universal constants. Then, with a probability at least 6m, we have m+ = max i K ĉm+ i c i max (, c. m Corollary. The convergence rate for, the maximum difference between the optimal cluster centers and the estimated ones, is O( (s log d/n before reaching the optimal difference.. Proof of Corollary According to the assumption of λ in (, we know that λ the right side of (3, we have T s log d, which implies s. Since the value of T is dominated by the last term in n m T m s log d.
2 Combining with the conclusion m+, we have m m+ s log d n. Lemma. Let t be the maximum difference between the optimal cluster centers and the ones estimated from iteration t, and (0, be the failure probability. Assume t ρ S t 8 ln K µ 0, σ 5 ln (3K max, (4 λ t c exp ( ( t ρ 8( + t σ for some constants c, c and c 3. Then with a probability 6, we have. Proof of Lemma (η 0 + σ ln S t + c η 0 ln S t + c St + ln d 3σ, (6 St t+ sλ t. For the simplicity of analysis, we will drop the superscript t through this analysis... Preliminaries We denote by C k the support of c k and C k = [d] \ C k. For any vector z, z(c is defined as [z(c] i = z i if i C and zero, otherwise. For any x i S, we use k i to denote the index of the true cluster, and k i to denote index of the cluster assigned by the nearest neighbor search, i.e., x i =c ki + g i and g i N(0, σ I, ki = arg max ĉ j x i. j [K] Then, we can partition data points in S based on either the ground truth or the assigned cluster. Let S k be the subset of data points in S that belong to the k-th cluster, i.e., S k = {x i S : x i = c k + g i and g i N(0, σ I} (7 Let Ŝk be the subset of data points that are assigned to the k-th cluster based on the nearest neighbor search, i.e.,.. The Main Analysis Ŝ k = {x i S : k = arg max ĉ j x i } (8 j [K] Let L k (c be the objective function in Step of Algorithm. We expand L k (c as (5 L k (c =λ c + c c k + =λ c + c c k + x i c k x i c k (c c k (x i c k (c c k (c ki c k (c c k g i. \S k }{{}}{{} A k B k (9
3 Let c k be the optimal solution that minimizes L k(c, and define f k = c k c k. We have L k (c k L k (c k =λ f k + c k + f k fk A k fk B k λ c k λ c k λ f k (C k + λ f k (C k + f k fk A k fk B k λ c k λ f k (C k + λ f k (C k + f k f k A k f k B k = (λ + A k + B k f k (C k + (λ A k B k f k (C k + f k C k (λ + A k + B k f k (C k + (λ A k B k f k (C k + f k. Thus, if we have λ A k + B k, f k (C k f k (λ + A k + B k C k f k (C k λ C k f k (C k f k (C k λ C k, and thus f k λ C k f k (C k 4λ C k f k λ C k. In summary, if we have λ A k + B k, k [K] max k K c k c k sλ. In the following, we discuss how to bound A k and B k..3. Bound for A k From the definition of A k in (9, we have Ŝk \ S k A k η LOWER BOUND OF First, we show that the size of S k is lower-bounded, which means a significant amount of data points in S belong to the k-th cluster. Recall that µ,..., µ K are the weight of the Gaussian mixtures, and µ 0 = min µ i. According to the Chernoff i K bound (Angluin & Valiant, 979 provided in Appendix A, we have, with a probability at least ( S k µ k µ k ln K (5 3 µ k, k [K]. (0 Next, we prove that a larger amount of data points in S k belong to Ŝk. We begin by analyzing the probability that the assigned cluster k i of x i is the true cluster k i. The similarity between x i and the estimated cluster centers can be bounded by ĉ k i x i =ĉ k i (c ki + g i = c ki + [ĉ ki c ki ] c ki + ĉ k i g i ĉ ki c ki ĉ k i g i ( + ĉ ki ĉ ki, ĉ j x i =ĉ j (c ki + g i = c j c ki + [ c j ] c ki + ĉ j g i ρ + c j + ĉ j g i ρ + + ( +, j k i.
4 Hence, x i will be assigned to cluster k i if ( + which leads to the following sufficient condition max j K ρ ( + ĉ ki ĉ ki ρ + + ( + gi, j k i, (4 g 0 σ 5 ln(3k σ ln(3k. ( 3 It is easy to verify that for any fixed direction ĉ with ĉ =, gi c is a Gaussian random variable with mean 0 and variance σ. Based on the tail bound for the Gaussian distribution (Chang et al., 0 provided in Appendix B, we have [ ] ( Pr max j K g 0 K exp g 0 σ. Define In summary, we have proved the following lemma. ( δ = K exp g ( 0 σ 3. ( Lemma. Under the condition in (4, with a probability at least δ, x i = c ki + g i S ki S satisfies max j K g 0, and is assigned to the correct cluster k i based on the nearest neighbor search (i.e., k i = k i. Define S k = { x i S k : max j K } g 0 Ŝk S k. (3 Since each data point in S k has a probability at least δ to be assigned to set Sk, using the Chernoff bound again, we have, with a probability at least, Ŝk S k Sk E [ Sk ] ( E [ Sk ] ln K ( ( δ S k ( ( 3 S k 3 S k ln K ( δ S k ln K (5, (0 3 S k, k [K]. (4.3.. UPPER BOUND OF Ŝk \ S k Define O = K k=sk S and O = K k= (Ŝk \ Sk = S \ O S. From Lemma, we know that with a probability at least δ, each x i S k belongs to the set Sk O. Thus, with probability at least δ, each x i S belongs to O. In other words, with probability at most δ, each x i S belongs to O. Based on the Chernoff bound, we have, with a probability at least, O E [ O ] + ln δ + ln. (5 Since Sk S k, we have Ŝk \ S k Ŝk \ Sk O. Therefore, with a probability at least, we have Ŝk \ S k δ + ln, k [K]. (6
5 Combining (0, (4 and (6, we have, with probability at least 3 δ + ln A k η 0 9 µ = 8η ( 0 δ + k µ k ln ( η0 = O(δη 0 + O, k [K]. (7.4. Bound for B k Notice that {g i : x i Ŝk}, determined by the estimated centers ĉ,..., ĉ K, is a specific subset of {g i : x i S}. Although g i is drawn from the Gaussian distribution N(0, σ I, the distribution of elements in {g i : x i Ŝk} is unknown. As a result, we cannot direct apply concentration inequality of Gaussian random vectors to bound B k. Let U R d K be a matrix whose columns are basis vectors of the subspace spanned by ĉ,..., ĉ K, and U R d (d K be a matrix whose columns are basis vectors of the complementary subspace. We then divide each g i as where g i = U U g i, and g i = U U g i. First, we upper bound B k as B k g i } {{ } B k g i = g i + g i, + Ŝk\S k Ŝk\Sk g i \Sk }{{} B k In the following, we discuss how to bound each term in the right hand side of (8. + S k Sk g i x i S k }{{} B k 3. (8.4.. UPPER BOUND OF B k Following the property of Gaussian random vector, ( x U i Ŝk g i / σ can be treated as a (d K-dimensional ( Gaussian random vector. As a result, each element of U x U i Ŝk g i / σ is a Gaussian random variable with variance smaller than. Based on the tail bound for the Gaussian distribution (Chang et al., 0 provided in Appendix B and the union bound, with a probability at least, we have ( gi / σ = U ( U g i / σ ln Kd, k [K], which implies B k σ ln Kd (0, (4 ( ln Kd σ µ k /9 = O σ ln d, k [K]. (9.4.. UPPER BOUND OF B k First, we have Ŝk\Sk \S k g i = Ŝk\Sk \S k U U g i Ŝk\Sk \S k U g i (0 Since U g i /σ can be treated as a K-dimensional Gaussian random vector, based on the tail bound for the χ distribution (Laurent & Massart, 000, we have with a probability at least, ( K U g i σ + log
6 Applying the union bound again, with a probability at least, we have ( max U K g i σ + log i ( Combining (0 and (, we have B k 9σ ( δ + µ k ln ( K + log = O(δσ ( ln ln + O σ, k [K]. (.4.3. UPPER BOUND OF B 3 k First, we have Sk x i S k g i = U Sk U g i x i S k Sk U g i := u k (3 x i S k Recall the definition of S k in (3. Due to the fact that the domain is symmetric, we have E [ U g i ] = 0. Under the condition in (, we can invoke the following lemma to bound u k. Lemma 3. (Lemma from (Smale & Zhou, 007 Let H be a Hilbert space and ξ be a random variable on (Z, ρ with values in H. Assume ξ M < almost surely. Denote σ (ξ = E( ξ. Let {z i } m i= be independent random drawers of ρ. For any 0 < δ <, with confidence δ, m M ln(/δ σ (ξ ln(/δ (ξ i E[ξ i ] + m m m i= From Lemma 3 and the union bound, with a probability at least, we have ( ( K u k σ + log ln(k/ ln(k/ Sk + Sk, k [K]. (4 Combining (3 and (4, we have ( ( K B k 3 σ + log Sk ln K K + Sk ln ( ( (0, (4, (5 K σ + log 9 µ k ln K ln = O σ, k [K]. (5 In summary, under the condition that (0, (4 and (5 are true, with a probability at least 3, B k O(δσ ( ln + ln d ln + O σ, k [K]. (6 A. Chernoff Bound Theorem (Multiplicative Chernoff Bound (Angluin & Valiant, 979. Let X, X,..., X n be independent binary random variables with Pr[X i = ] = p i. Denote S = n i= X i and µ = E[S] = n i= p i. We have Pr [S ( µ] exp ( µ, for 0 < <, Pr [S ( + µ] exp ( + µ, for > 0.
7 Therefore, [ ( Pr S µ ln ] ( µ δ, for exp < δ <, δ µ Pr S µ + ln δ + ln δ + µ ln δ µ δ, for 0 < δ <. µ B. Tail bounds for the Gaussian distribution Theorem 3 (Chernoff-type upper bound for the Q-function (Chang et al., 0. The Q-function defined as Q(x = ( exp t dt π is the tail probability of the standard Gaussian distribution. When x > 0, we have Q(x ( exp x. Let X N (0, be a Gaussian random variable. According to Theorem 3, we have ( Pr [ X ] exp [ ] Pr X x ln δ δ., or References Angluin, D. and Valiant, L.G. Fast probabilistic algorithms for hamiltonian circuits and matchings. Journal of Computer and System Sciences, 8(:55 93, 979. Chang, Seok-Ho, Cosman, Pamela C., and Milstein, Laurence B. Chernoff-type bounds for the gaussian error function. IEEE Transactions on Communications, 59(: , 0. Laurent, B. and Massart, P. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 8 (5:30 338, 000. Smale, Steve and Zhou, Ding-Xuan. Learning theory estimates via integral operators and their approximations. Constructive Approximation, 6(:53 7, 007.
Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning
Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,
More informationLearning Theory of Randomized Kaczmarz Algorithm
Journal of Machine Learning Research 16 015 3341-3365 Submitted 6/14; Revised 4/15; Published 1/15 Junhong Lin Ding-Xuan Zhou Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue Kowloon,
More informationSupplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
Supplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition Serhat S. Bucak bucakser@cse.msu.edu Rong Jin rongjin@cse.msu.edu Anil
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationProject in Computational Game Theory: Communities in Social Networks
Project in Computational Game Theory: Communities in Social Networks Eldad Rubinstein November 11, 2012 1 Presentation of the Original Paper 1.1 Introduction In this section I present the article [1].
More informationSupplementary Material of Dynamic Regret of Strongly Adaptive Methods
Supplementary Material of A Proof of Lemma 1 Lijun Zhang 1 ianbao Yang Rong Jin 3 Zhi-Hua Zhou 1 We first prove the first part of Lemma 1 Let k = log K t hen, integer t can be represented in the base-k
More informationEffective Dimension and Generalization of Kernel Learning
Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance
More informationFast Algorithms for Constant Approximation k-means Clustering
Transactions on Machine Learning and Data Mining Vol. 3, No. 2 (2010) 67-79 c ISSN:1865-6781 (Journal), ISBN: 978-3-940501-19-6, IBaI Publishing ISSN 1864-9734 Fast Algorithms for Constant Approximation
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationCSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )
CSE548, AMS54: Analysis of Algorithms, Spring 014 Date: May 1 Final In-Class Exam ( :35 PM 3:50 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your relative
More informationAppendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data
Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data Jonathan Ortigosa-Hernández, Iñaki Inza, and Jose A. Lozano Contents 1 Appendix A.
More informationSaharon Rosset 1 and Ji Zhu 2
Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.
More informationAlgorithms Reading Group Notes: Provable Bounds for Learning Deep Representations
Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations Joshua R. Wang November 1, 2016 1 Model and Results Continuing from last week, we again examine provable algorithms for
More informationThe Expectation-Maximization Algorithm
The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior
More informationGeometry on Probability Spaces
Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,
More informationHandout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.
Notes on Complexity Theory Last updated: October, 2005 Jonathan Katz Handout 5 1 An Improved Upper-Bound on Circuit Size Here we show the result promised in the previous lecture regarding an upper-bound
More informationOn the Sample Complexity of Noise-Tolerant Learning
On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationHoeffding, Chernoff, Bennet, and Bernstein Bounds
Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically
More informationA Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang
More informationThe University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.
The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationarxiv: v1 [cs.it] 21 Feb 2013
q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto
More informationThe Informativeness of k-means for Learning Mixture Models
The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationACO Comprehensive Exam October 14 and 15, 2013
1. Computability, Complexity and Algorithms (a) Let G be the complete graph on n vertices, and let c : V (G) V (G) [0, ) be a symmetric cost function. Consider the following closest point heuristic for
More informationHigh-Rank Matrix Completion and Subspace Clustering with Missing Data
High-Rank Matrix Completion and Subspace Clustering with Missing Data Authors: Brian Eriksson, Laura Balzano and Robert Nowak Presentation by Wang Yuxiang 1 About the authors
More informationOn the convergence of uncertain random sequences
Fuzzy Optim Decis Making (217) 16:25 22 DOI 1.17/s17-16-9242-z On the convergence of uncertain random sequences H. Ahmadzade 1 Y. Sheng 2 M. Esfahani 3 Published online: 4 June 216 Springer Science+Business
More informationGROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA
GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University
More informationImage Tag Completion by Noisy Matrix Recovery
Image Tag Completion by Noisy Matrix Recovery Supplementary Document Zheyun Feng, Songhe Feng, Rong Jin, Anil K Jain {fengzhey, rongjin, jain}@csemsuedu, shfeng@bjtueducn Michigan State University, Beijing
More informationAlgorithms, Geometry and Learning. Reading group Paris Syminelakis
Algorithms, Geometry and Learning Reading group Paris Syminelakis October 11, 2016 2 Contents 1 Local Dimensionality Reduction 5 1 Introduction.................................... 5 2 Definitions and Results..............................
More informationThe information-theoretic value of unlabeled data in semi-supervised learning
The information-theoretic value of unlabeled data in semi-supervised learning Alexander Golovnev Dávid Pál Balázs Szörényi January 5, 09 Abstract We quantify the separation between the numbers of labeled
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationLecture 9: March 26, 2014
COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query
More informationBALANCING GAUSSIAN VECTORS. 1. Introduction
BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors
More informationLecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the
More informationInduced subgraphs with many repeated degrees
Induced subgraphs with many repeated degrees Yair Caro Raphael Yuster arxiv:1811.071v1 [math.co] 17 Nov 018 Abstract Erdős, Fajtlowicz and Staton asked for the least integer f(k such that every graph with
More informationSimulation of quantum computers with probabilistic models
Simulation of quantum computers with probabilistic models Vlad Gheorghiu Department of Physics Carnegie Mellon University Pittsburgh, PA 15213, U.S.A. April 6, 2010 Vlad Gheorghiu (CMU) Simulation of quantum
More informationNoisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get
Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationDeterministic Approximation Algorithms for the Nearest Codeword Problem
Deterministic Approximation Algorithms for the Nearest Codeword Problem Noga Alon 1,, Rina Panigrahy 2, and Sergey Yekhanin 3 1 Tel Aviv University, Institute for Advanced Study, Microsoft Israel nogaa@tau.ac.il
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationDimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices
Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke
More informationIEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 2, FEBRUARY
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 54, NO 2, FEBRUARY 2006 423 Underdetermined Blind Source Separation Based on Sparse Representation Yuanqing Li, Shun-Ichi Amari, Fellow, IEEE, Andrzej Cichocki,
More informationOn Learning Mixtures of Heavy-Tailed Distributions
On Learning Mixtures of Heavy-Tailed Distributions Anirban Dasgupta John Hopcroft Jon Kleinberg Mar Sandler Abstract We consider the problem of learning mixtures of arbitrary symmetric distributions. We
More informationChernoff-Type Bounds for the Gaussian Error Function
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO., NOVEMBER 0 939 Chernoff-Type Bounds for the Gaussian Error Function Seok-Ho Chang, Member, IEEE, Pamela C. Cosman, Fellow, IEEE, and Laurence B. Milstein,
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationA NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang
A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.
More informationConditional Sparse Linear Regression
COMS 6998-4 Fall 07 November 3, 07 Introduction. Background Conditional Sparse Linear Regression esenter: Xingyu Niu Scribe: Jinyi Zhang Given a distribution D over R d R and data samples (y, z) D, linear
More information2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?
ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we
More informationA Lower Bound Analysis of Population-based Evolutionary Algorithms for Pseudo-Boolean Functions
A Lower Bound Analysis of Population-based Evolutionary Algorithms for Pseudo-Boolean Functions Chao Qian,2, Yang Yu 2, and Zhi-Hua Zhou 2 UBRI, School of Computer Science and Technology, University of
More informationVector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.
Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar
More informationNotes on MapReduce Algorithms
Notes on MapReduce Algorithms Barna Saha 1 Finding Minimum Spanning Tree of a Dense Graph in MapReduce We are given a graph G = (V, E) on V = N vertices and E = m N 1+c edges for some constant c > 0. Our
More informationSolving LPN Using Covering Codes
Solving LPN Using Covering Codes Qian Guo 1,2 Thomas Johansson 1 Carl Löndahl 1 1 Dept of Electrical and Information Technology, Lund University 2 School of Computer Science, Fudan University ASIACRYPT
More informationLarge Zero Autocorrelation Zone of Golay Sequences and 4 q -QAM Golay Complementary Sequences
Large Zero Autocorrelation Zone of Golay Sequences and 4 q -QAM Golay Complementary Sequences Guang Gong 1 Fei Huo 1 and Yang Yang 2,3 1 Department of Electrical and Computer Engineering University of
More informationRobustness of PSPACE-complete sets
Robustness of PSPACE-complete sets A. Pavan a,1 and Fengming Wang b,1 a Department of Computer Science, Iowa State University. pavan@cs.iastate.edu b Department of Computer Science, Rutgers University.
More informationUniversity of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May
University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationUncertain Satisfiability and Uncertain Entailment
Uncertain Satisfiability and Uncertain Entailment Zhuo Wang, Xiang Li Department of Mathematical Sciences, Tsinghua University, Beijing, 100084, China zwang0518@sohu.com, xiang-li04@mail.tsinghua.edu.cn
More informationC m Extension by Linear Operators
C m Extension by Linear Operators by Charles Fefferman Department of Mathematics Princeton University Fine Hall Washington Road Princeton, New Jersey 08544 Email: cf@math.princeton.edu Partially supported
More informationLectures 6: Degree Distributions and Concentration Inequalities
University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More informationA Las Vegas approximation algorithm for metric 1-median selection
A Las Vegas approximation algorithm for metric -median selection arxiv:70.0306v [cs.ds] 5 Feb 07 Ching-Lueh Chang February 8, 07 Abstract Given an n-point metric space, consider the problem of finding
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More information17 Random Projections and Orthogonal Matching Pursuit
17 Random Projections and Orthogonal Matching Pursuit Again we will consider high-dimensional data P. Now we will consider the uses and effects of randomness. We will use it to simplify P (put it in a
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem
More information1 Randomized Computation
CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at
More informationAn almost sure central limit theorem for the weight function sequences of NA random variables
Proc. ndian Acad. Sci. (Math. Sci.) Vol. 2, No. 3, August 20, pp. 369 377. c ndian Academy of Sciences An almost sure central it theorem for the weight function sequences of NA rom variables QUNYNG WU
More informationLecture 5: February 16, 2012
COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on
More informationAlgebraic gossip on Arbitrary Networks
Algebraic gossip on Arbitrary etworks Dinkar Vasudevan and Shrinivas Kudekar School of Computer and Communication Sciences, EPFL, Lausanne. Email: {dinkar.vasudevan,shrinivas.kudekar}@epfl.ch arxiv:090.444v
More informationMultivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma
Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional
More informationProclaiming Dictators and Juntas or Testing Boolean Formulae
Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas The Academic College of Tel-Aviv-Yaffo Tel-Aviv, ISRAEL michalp@mta.ac.il Dana Ron Department of EE Systems Tel-Aviv University
More informationMinimum spanning tree problem of uncertain random network
DOI 1.17/s1845-14-115-3 Minimum spanning tree problem of uncertain random network Yuhong Sheng Zhongfeng Qin Gang Shi Received: 29 October 214 / Accepted: 29 November 214 Springer Science+Business Media
More informationSIZE-RAMSEY NUMBERS OF CYCLES VERSUS A PATH
SIZE-RAMSEY NUMBERS OF CYCLES VERSUS A PATH ANDRZEJ DUDEK, FARIDEH KHOEINI, AND PAWE L PRA LAT Abstract. The size-ramsey number ˆRF, H of a family of graphs F and a graph H is the smallest integer m such
More informationSublinear-Time Algorithms
Lecture 20 Sublinear-Time Algorithms Supplemental reading in CLRS: None If we settle for approximations, we can sometimes get much more efficient algorithms In Lecture 8, we saw polynomial-time approximations
More informationLECTURE 5 HYPOTHESIS TESTING
October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.
More informationRandom matrices: Distribution of the least singular value (via Property Testing)
Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationarxiv:quant-ph/ v1 22 Aug 2005
Conditions for separability in generalized Laplacian matrices and nonnegative matrices as density matrices arxiv:quant-ph/58163v1 22 Aug 25 Abstract Chai Wah Wu IBM Research Division, Thomas J. Watson
More informationSupplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data
Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department
More informationDistributed Learning with Regularized Least Squares
Journal of Machine Learning Research 8 07-3 Submitted /5; Revised 6/7; Published 9/7 Distributed Learning with Regularized Least Squares Shao-Bo Lin sblin983@gmailcom Department of Mathematics City University
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationSymmetrization and Rademacher Averages
Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and
More informationMost data analysis starts with some data set; we will call this data set P. It will be composed of a set of n
3 Convergence This topic will overview a variety of extremely powerful analysis results that span statistics, estimation theorem, and big data. It provides a framework to think about how to aggregate more
More information