Supplementary Material: A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data

Size: px
Start display at page:

Download "Supplementary Material: A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data"

Transcription

1 Supplementary Material: A Single-Pass Algorithm for Efficiently Recovering Sparse Cluster Centers of High-dimensional Data Jinfeng Yi JINFENGY@US.IBM.COM IBM Thomas J. Watson Research Center, Yorktown Heights, NY 0598, USA Lijun Zhang ZHANGLJ@LAMDA.NJU.EDU.CN National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 003, China Jun Wang WANGJUN@US.IBM.COM IBM Thomas J. Watson Research Center, Yorktown Heights, NY 0598, USA Rong Jin Anil K. Jain RONGJIN@CSE.MSU.EDU JAIN@CSE.MSU.EDU Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 4884 USA Theorem. Let /(6m be a parameter to control the success probability. Assume max, ( s λ c s, ( ( 8 T max ln K µ 0, 3c ( η 0 6c3 σ λ, λ (ln n + ln d (3 where c, c and c 3 are some universal constants. Then, with a probability at least 6m, we have m+ = max i K ĉm+ i c i max (, c. m Corollary. The convergence rate for, the maximum difference between the optimal cluster centers and the estimated ones, is O( (s log d/n before reaching the optimal difference.. Proof of Corollary According to the assumption of λ in (, we know that λ the right side of (3, we have T s log d, which implies s. Since the value of T is dominated by the last term in n m T m s log d.

2 Combining with the conclusion m+, we have m m+ s log d n. Lemma. Let t be the maximum difference between the optimal cluster centers and the ones estimated from iteration t, and (0, be the failure probability. Assume t ρ S t 8 ln K µ 0, σ 5 ln (3K max, (4 λ t c exp ( ( t ρ 8( + t σ for some constants c, c and c 3. Then with a probability 6, we have. Proof of Lemma (η 0 + σ ln S t + c η 0 ln S t + c St + ln d 3σ, (6 St t+ sλ t. For the simplicity of analysis, we will drop the superscript t through this analysis... Preliminaries We denote by C k the support of c k and C k = [d] \ C k. For any vector z, z(c is defined as [z(c] i = z i if i C and zero, otherwise. For any x i S, we use k i to denote the index of the true cluster, and k i to denote index of the cluster assigned by the nearest neighbor search, i.e., x i =c ki + g i and g i N(0, σ I, ki = arg max ĉ j x i. j [K] Then, we can partition data points in S based on either the ground truth or the assigned cluster. Let S k be the subset of data points in S that belong to the k-th cluster, i.e., S k = {x i S : x i = c k + g i and g i N(0, σ I} (7 Let Ŝk be the subset of data points that are assigned to the k-th cluster based on the nearest neighbor search, i.e.,.. The Main Analysis Ŝ k = {x i S : k = arg max ĉ j x i } (8 j [K] Let L k (c be the objective function in Step of Algorithm. We expand L k (c as (5 L k (c =λ c + c c k + =λ c + c c k + x i c k x i c k (c c k (x i c k (c c k (c ki c k (c c k g i. \S k }{{}}{{} A k B k (9

3 Let c k be the optimal solution that minimizes L k(c, and define f k = c k c k. We have L k (c k L k (c k =λ f k + c k + f k fk A k fk B k λ c k λ c k λ f k (C k + λ f k (C k + f k fk A k fk B k λ c k λ f k (C k + λ f k (C k + f k f k A k f k B k = (λ + A k + B k f k (C k + (λ A k B k f k (C k + f k C k (λ + A k + B k f k (C k + (λ A k B k f k (C k + f k. Thus, if we have λ A k + B k, f k (C k f k (λ + A k + B k C k f k (C k λ C k f k (C k f k (C k λ C k, and thus f k λ C k f k (C k 4λ C k f k λ C k. In summary, if we have λ A k + B k, k [K] max k K c k c k sλ. In the following, we discuss how to bound A k and B k..3. Bound for A k From the definition of A k in (9, we have Ŝk \ S k A k η LOWER BOUND OF First, we show that the size of S k is lower-bounded, which means a significant amount of data points in S belong to the k-th cluster. Recall that µ,..., µ K are the weight of the Gaussian mixtures, and µ 0 = min µ i. According to the Chernoff i K bound (Angluin & Valiant, 979 provided in Appendix A, we have, with a probability at least ( S k µ k µ k ln K (5 3 µ k, k [K]. (0 Next, we prove that a larger amount of data points in S k belong to Ŝk. We begin by analyzing the probability that the assigned cluster k i of x i is the true cluster k i. The similarity between x i and the estimated cluster centers can be bounded by ĉ k i x i =ĉ k i (c ki + g i = c ki + [ĉ ki c ki ] c ki + ĉ k i g i ĉ ki c ki ĉ k i g i ( + ĉ ki ĉ ki, ĉ j x i =ĉ j (c ki + g i = c j c ki + [ c j ] c ki + ĉ j g i ρ + c j + ĉ j g i ρ + + ( +, j k i.

4 Hence, x i will be assigned to cluster k i if ( + which leads to the following sufficient condition max j K ρ ( + ĉ ki ĉ ki ρ + + ( + gi, j k i, (4 g 0 σ 5 ln(3k σ ln(3k. ( 3 It is easy to verify that for any fixed direction ĉ with ĉ =, gi c is a Gaussian random variable with mean 0 and variance σ. Based on the tail bound for the Gaussian distribution (Chang et al., 0 provided in Appendix B, we have [ ] ( Pr max j K g 0 K exp g 0 σ. Define In summary, we have proved the following lemma. ( δ = K exp g ( 0 σ 3. ( Lemma. Under the condition in (4, with a probability at least δ, x i = c ki + g i S ki S satisfies max j K g 0, and is assigned to the correct cluster k i based on the nearest neighbor search (i.e., k i = k i. Define S k = { x i S k : max j K } g 0 Ŝk S k. (3 Since each data point in S k has a probability at least δ to be assigned to set Sk, using the Chernoff bound again, we have, with a probability at least, Ŝk S k Sk E [ Sk ] ( E [ Sk ] ln K ( ( δ S k ( ( 3 S k 3 S k ln K ( δ S k ln K (5, (0 3 S k, k [K]. (4.3.. UPPER BOUND OF Ŝk \ S k Define O = K k=sk S and O = K k= (Ŝk \ Sk = S \ O S. From Lemma, we know that with a probability at least δ, each x i S k belongs to the set Sk O. Thus, with probability at least δ, each x i S belongs to O. In other words, with probability at most δ, each x i S belongs to O. Based on the Chernoff bound, we have, with a probability at least, O E [ O ] + ln δ + ln. (5 Since Sk S k, we have Ŝk \ S k Ŝk \ Sk O. Therefore, with a probability at least, we have Ŝk \ S k δ + ln, k [K]. (6

5 Combining (0, (4 and (6, we have, with probability at least 3 δ + ln A k η 0 9 µ = 8η ( 0 δ + k µ k ln ( η0 = O(δη 0 + O, k [K]. (7.4. Bound for B k Notice that {g i : x i Ŝk}, determined by the estimated centers ĉ,..., ĉ K, is a specific subset of {g i : x i S}. Although g i is drawn from the Gaussian distribution N(0, σ I, the distribution of elements in {g i : x i Ŝk} is unknown. As a result, we cannot direct apply concentration inequality of Gaussian random vectors to bound B k. Let U R d K be a matrix whose columns are basis vectors of the subspace spanned by ĉ,..., ĉ K, and U R d (d K be a matrix whose columns are basis vectors of the complementary subspace. We then divide each g i as where g i = U U g i, and g i = U U g i. First, we upper bound B k as B k g i } {{ } B k g i = g i + g i, + Ŝk\S k Ŝk\Sk g i \Sk }{{} B k In the following, we discuss how to bound each term in the right hand side of (8. + S k Sk g i x i S k }{{} B k 3. (8.4.. UPPER BOUND OF B k Following the property of Gaussian random vector, ( x U i Ŝk g i / σ can be treated as a (d K-dimensional ( Gaussian random vector. As a result, each element of U x U i Ŝk g i / σ is a Gaussian random variable with variance smaller than. Based on the tail bound for the Gaussian distribution (Chang et al., 0 provided in Appendix B and the union bound, with a probability at least, we have ( gi / σ = U ( U g i / σ ln Kd, k [K], which implies B k σ ln Kd (0, (4 ( ln Kd σ µ k /9 = O σ ln d, k [K]. (9.4.. UPPER BOUND OF B k First, we have Ŝk\Sk \S k g i = Ŝk\Sk \S k U U g i Ŝk\Sk \S k U g i (0 Since U g i /σ can be treated as a K-dimensional Gaussian random vector, based on the tail bound for the χ distribution (Laurent & Massart, 000, we have with a probability at least, ( K U g i σ + log

6 Applying the union bound again, with a probability at least, we have ( max U K g i σ + log i ( Combining (0 and (, we have B k 9σ ( δ + µ k ln ( K + log = O(δσ ( ln ln + O σ, k [K]. (.4.3. UPPER BOUND OF B 3 k First, we have Sk x i S k g i = U Sk U g i x i S k Sk U g i := u k (3 x i S k Recall the definition of S k in (3. Due to the fact that the domain is symmetric, we have E [ U g i ] = 0. Under the condition in (, we can invoke the following lemma to bound u k. Lemma 3. (Lemma from (Smale & Zhou, 007 Let H be a Hilbert space and ξ be a random variable on (Z, ρ with values in H. Assume ξ M < almost surely. Denote σ (ξ = E( ξ. Let {z i } m i= be independent random drawers of ρ. For any 0 < δ <, with confidence δ, m M ln(/δ σ (ξ ln(/δ (ξ i E[ξ i ] + m m m i= From Lemma 3 and the union bound, with a probability at least, we have ( ( K u k σ + log ln(k/ ln(k/ Sk + Sk, k [K]. (4 Combining (3 and (4, we have ( ( K B k 3 σ + log Sk ln K K + Sk ln ( ( (0, (4, (5 K σ + log 9 µ k ln K ln = O σ, k [K]. (5 In summary, under the condition that (0, (4 and (5 are true, with a probability at least 3, B k O(δσ ( ln + ln d ln + O σ, k [K]. (6 A. Chernoff Bound Theorem (Multiplicative Chernoff Bound (Angluin & Valiant, 979. Let X, X,..., X n be independent binary random variables with Pr[X i = ] = p i. Denote S = n i= X i and µ = E[S] = n i= p i. We have Pr [S ( µ] exp ( µ, for 0 < <, Pr [S ( + µ] exp ( + µ, for > 0.

7 Therefore, [ ( Pr S µ ln ] ( µ δ, for exp < δ <, δ µ Pr S µ + ln δ + ln δ + µ ln δ µ δ, for 0 < δ <. µ B. Tail bounds for the Gaussian distribution Theorem 3 (Chernoff-type upper bound for the Q-function (Chang et al., 0. The Q-function defined as Q(x = ( exp t dt π is the tail probability of the standard Gaussian distribution. When x > 0, we have Q(x ( exp x. Let X N (0, be a Gaussian random variable. According to Theorem 3, we have ( Pr [ X ] exp [ ] Pr X x ln δ δ., or References Angluin, D. and Valiant, L.G. Fast probabilistic algorithms for hamiltonian circuits and matchings. Journal of Computer and System Sciences, 8(:55 93, 979. Chang, Seok-Ho, Cosman, Pamela C., and Milstein, Laurence B. Chernoff-type bounds for the gaussian error function. IEEE Transactions on Communications, 59(: , 0. Laurent, B. and Massart, P. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 8 (5:30 338, 000. Smale, Steve and Zhou, Ding-Xuan. Learning theory estimates via integral operators and their approximations. Constructive Approximation, 6(:53 7, 007.

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning

Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Large-scale Image Annotation by Efficient and Robust Kernel Metric Learning Supplementary Material Zheyun Feng Rong Jin Anil Jain Department of Computer Science and Engineering, Michigan State University,

More information

Learning Theory of Randomized Kaczmarz Algorithm

Learning Theory of Randomized Kaczmarz Algorithm Journal of Machine Learning Research 16 015 3341-3365 Submitted 6/14; Revised 4/15; Published 1/15 Junhong Lin Ding-Xuan Zhou Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue Kowloon,

More information

Supplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

Supplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition Supplementary Material for Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition Serhat S. Bucak bucakser@cse.msu.edu Rong Jin rongjin@cse.msu.edu Anil

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Project in Computational Game Theory: Communities in Social Networks

Project in Computational Game Theory: Communities in Social Networks Project in Computational Game Theory: Communities in Social Networks Eldad Rubinstein November 11, 2012 1 Presentation of the Original Paper 1.1 Introduction In this section I present the article [1].

More information

Supplementary Material of Dynamic Regret of Strongly Adaptive Methods

Supplementary Material of Dynamic Regret of Strongly Adaptive Methods Supplementary Material of A Proof of Lemma 1 Lijun Zhang 1 ianbao Yang Rong Jin 3 Zhi-Hua Zhou 1 We first prove the first part of Lemma 1 Let k = log K t hen, integer t can be represented in the base-k

More information

Effective Dimension and Generalization of Kernel Learning

Effective Dimension and Generalization of Kernel Learning Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance

More information

Fast Algorithms for Constant Approximation k-means Clustering

Fast Algorithms for Constant Approximation k-means Clustering Transactions on Machine Learning and Data Mining Vol. 3, No. 2 (2010) 67-79 c ISSN:1865-6781 (Journal), ISBN: 978-3-940501-19-6, IBaI Publishing ISSN 1864-9734 Fast Algorithms for Constant Approximation

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes )

CSE548, AMS542: Analysis of Algorithms, Spring 2014 Date: May 12. Final In-Class Exam. ( 2:35 PM 3:50 PM : 75 Minutes ) CSE548, AMS54: Analysis of Algorithms, Spring 014 Date: May 1 Final In-Class Exam ( :35 PM 3:50 PM : 75 Minutes ) This exam will account for either 15% or 30% of your overall grade depending on your relative

More information

Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data

Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data Appendices for the article entitled Semi-supervised multi-class classification problems with scarcity of labelled data Jonathan Ortigosa-Hernández, Iñaki Inza, and Jose A. Lozano Contents 1 Appendix A.

More information

Saharon Rosset 1 and Ji Zhu 2

Saharon Rosset 1 and Ji Zhu 2 Aust. N. Z. J. Stat. 46(3), 2004, 505 510 CORRECTED PROOF OF THE RESULT OF A PREDICTION ERROR PROPERTY OF THE LASSO ESTIMATOR AND ITS GENERALIZATION BY HUANG (2003) Saharon Rosset 1 and Ji Zhu 2 IBM T.J.

More information

Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations

Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations Joshua R. Wang November 1, 2016 1 Model and Results Continuing from last week, we again examine provable algorithms for

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Geometry on Probability Spaces

Geometry on Probability Spaces Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,

More information

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0.

Handout 5. α a1 a n. }, where. xi if a i = 1 1 if a i = 0. Notes on Complexity Theory Last updated: October, 2005 Jonathan Katz Handout 5 1 An Improved Upper-Bound on Circuit Size Here we show the result promised in the previous lecture regarding an upper-bound

More information

On the Sample Complexity of Noise-Tolerant Learning

On the Sample Complexity of Noise-Tolerant Learning On the Sample Complexity of Noise-Tolerant Learning Javed A. Aslam Department of Computer Science Dartmouth College Hanover, NH 03755 Scott E. Decatur Laboratory for Computer Science Massachusetts Institute

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

Hoeffding, Chernoff, Bennet, and Bernstein Bounds

Hoeffding, Chernoff, Bennet, and Bernstein Bounds Stat 928: Statistical Learning Theory Lecture: 6 Hoeffding, Chernoff, Bennet, Bernstein Bounds Instructor: Sham Kakade 1 Hoeffding s Bound We say X is a sub-gaussian rom variable if it has quadratically

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

The Informativeness of k-means for Learning Mixture Models

The Informativeness of k-means for Learning Mixture Models The Informativeness of k-means for Learning Mixture Models Vincent Y. F. Tan (Joint work with Zhaoqiang Liu) National University of Singapore June 18, 2018 1/35 Gaussian distribution For F dimensions,

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

ACO Comprehensive Exam October 14 and 15, 2013

ACO Comprehensive Exam October 14 and 15, 2013 1. Computability, Complexity and Algorithms (a) Let G be the complete graph on n vertices, and let c : V (G) V (G) [0, ) be a symmetric cost function. Consider the following closest point heuristic for

More information

High-Rank Matrix Completion and Subspace Clustering with Missing Data

High-Rank Matrix Completion and Subspace Clustering with Missing Data High-Rank Matrix Completion and Subspace Clustering with Missing Data Authors: Brian Eriksson, Laura Balzano and Robert Nowak Presentation by Wang Yuxiang 1 About the authors

More information

On the convergence of uncertain random sequences

On the convergence of uncertain random sequences Fuzzy Optim Decis Making (217) 16:25 22 DOI 1.17/s17-16-9242-z On the convergence of uncertain random sequences H. Ahmadzade 1 Y. Sheng 2 M. Esfahani 3 Published online: 4 June 216 Springer Science+Business

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

Image Tag Completion by Noisy Matrix Recovery

Image Tag Completion by Noisy Matrix Recovery Image Tag Completion by Noisy Matrix Recovery Supplementary Document Zheyun Feng, Songhe Feng, Rong Jin, Anil K Jain {fengzhey, rongjin, jain}@csemsuedu, shfeng@bjtueducn Michigan State University, Beijing

More information

Algorithms, Geometry and Learning. Reading group Paris Syminelakis

Algorithms, Geometry and Learning. Reading group Paris Syminelakis Algorithms, Geometry and Learning Reading group Paris Syminelakis October 11, 2016 2 Contents 1 Local Dimensionality Reduction 5 1 Introduction.................................... 5 2 Definitions and Results..............................

More information

The information-theoretic value of unlabeled data in semi-supervised learning

The information-theoretic value of unlabeled data in semi-supervised learning The information-theoretic value of unlabeled data in semi-supervised learning Alexander Golovnev Dávid Pál Balázs Szörényi January 5, 09 Abstract We quantify the separation between the numbers of labeled

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Lecture 9: March 26, 2014

Lecture 9: March 26, 2014 COMS 6998-3: Sub-Linear Algorithms in Learning and Testing Lecturer: Rocco Servedio Lecture 9: March 26, 204 Spring 204 Scriber: Keith Nichols Overview. Last Time Finished analysis of O ( n ɛ ) -query

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Induced subgraphs with many repeated degrees

Induced subgraphs with many repeated degrees Induced subgraphs with many repeated degrees Yair Caro Raphael Yuster arxiv:1811.071v1 [math.co] 17 Nov 018 Abstract Erdős, Fajtlowicz and Staton asked for the least integer f(k such that every graph with

More information

Simulation of quantum computers with probabilistic models

Simulation of quantum computers with probabilistic models Simulation of quantum computers with probabilistic models Vlad Gheorghiu Department of Physics Carnegie Mellon University Pittsburgh, PA 15213, U.S.A. April 6, 2010 Vlad Gheorghiu (CMU) Simulation of quantum

More information

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Deterministic Approximation Algorithms for the Nearest Codeword Problem

Deterministic Approximation Algorithms for the Nearest Codeword Problem Deterministic Approximation Algorithms for the Nearest Codeword Problem Noga Alon 1,, Rina Panigrahy 2, and Sergey Yekhanin 3 1 Tel Aviv University, Institute for Advanced Study, Microsoft Israel nogaa@tau.ac.il

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke

More information

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 2, FEBRUARY

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 2, FEBRUARY IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 54, NO 2, FEBRUARY 2006 423 Underdetermined Blind Source Separation Based on Sparse Representation Yuanqing Li, Shun-Ichi Amari, Fellow, IEEE, Andrzej Cichocki,

More information

On Learning Mixtures of Heavy-Tailed Distributions

On Learning Mixtures of Heavy-Tailed Distributions On Learning Mixtures of Heavy-Tailed Distributions Anirban Dasgupta John Hopcroft Jon Kleinberg Mar Sandler Abstract We consider the problem of learning mixtures of arbitrary symmetric distributions. We

More information

Chernoff-Type Bounds for the Gaussian Error Function

Chernoff-Type Bounds for the Gaussian Error Function IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO., NOVEMBER 0 939 Chernoff-Type Bounds for the Gaussian Error Function Seok-Ho Chang, Member, IEEE, Pamela C. Cosman, Fellow, IEEE, and Laurence B. Milstein,

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Conditional Sparse Linear Regression

Conditional Sparse Linear Regression COMS 6998-4 Fall 07 November 3, 07 Introduction. Background Conditional Sparse Linear Regression esenter: Xingyu Niu Scribe: Jinyi Zhang Given a distribution D over R d R and data samples (y, z) D, linear

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

A Lower Bound Analysis of Population-based Evolutionary Algorithms for Pseudo-Boolean Functions

A Lower Bound Analysis of Population-based Evolutionary Algorithms for Pseudo-Boolean Functions A Lower Bound Analysis of Population-based Evolutionary Algorithms for Pseudo-Boolean Functions Chao Qian,2, Yang Yu 2, and Zhi-Hua Zhou 2 UBRI, School of Computer Science and Technology, University of

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Notes on MapReduce Algorithms

Notes on MapReduce Algorithms Notes on MapReduce Algorithms Barna Saha 1 Finding Minimum Spanning Tree of a Dense Graph in MapReduce We are given a graph G = (V, E) on V = N vertices and E = m N 1+c edges for some constant c > 0. Our

More information

Solving LPN Using Covering Codes

Solving LPN Using Covering Codes Solving LPN Using Covering Codes Qian Guo 1,2 Thomas Johansson 1 Carl Löndahl 1 1 Dept of Electrical and Information Technology, Lund University 2 School of Computer Science, Fudan University ASIACRYPT

More information

Large Zero Autocorrelation Zone of Golay Sequences and 4 q -QAM Golay Complementary Sequences

Large Zero Autocorrelation Zone of Golay Sequences and 4 q -QAM Golay Complementary Sequences Large Zero Autocorrelation Zone of Golay Sequences and 4 q -QAM Golay Complementary Sequences Guang Gong 1 Fei Huo 1 and Yang Yang 2,3 1 Department of Electrical and Computer Engineering University of

More information

Robustness of PSPACE-complete sets

Robustness of PSPACE-complete sets Robustness of PSPACE-complete sets A. Pavan a,1 and Fengming Wang b,1 a Department of Computer Science, Iowa State University. pavan@cs.iastate.edu b Department of Computer Science, Rutgers University.

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Uncertain Satisfiability and Uncertain Entailment

Uncertain Satisfiability and Uncertain Entailment Uncertain Satisfiability and Uncertain Entailment Zhuo Wang, Xiang Li Department of Mathematical Sciences, Tsinghua University, Beijing, 100084, China zwang0518@sohu.com, xiang-li04@mail.tsinghua.edu.cn

More information

C m Extension by Linear Operators

C m Extension by Linear Operators C m Extension by Linear Operators by Charles Fefferman Department of Mathematics Princeton University Fine Hall Washington Road Princeton, New Jersey 08544 Email: cf@math.princeton.edu Partially supported

More information

Lectures 6: Degree Distributions and Concentration Inequalities

Lectures 6: Degree Distributions and Concentration Inequalities University of Washington Lecturer: Abraham Flaxman/Vahab S Mirrokni CSE599m: Algorithms and Economics of Networks April 13 th, 007 Scribe: Ben Birnbaum Lectures 6: Degree Distributions and Concentration

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

A Las Vegas approximation algorithm for metric 1-median selection

A Las Vegas approximation algorithm for metric 1-median selection A Las Vegas approximation algorithm for metric -median selection arxiv:70.0306v [cs.ds] 5 Feb 07 Ching-Lueh Chang February 8, 07 Abstract Given an n-point metric space, consider the problem of finding

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

17 Random Projections and Orthogonal Matching Pursuit

17 Random Projections and Orthogonal Matching Pursuit 17 Random Projections and Orthogonal Matching Pursuit Again we will consider high-dimensional data P. Now we will consider the uses and effects of randomness. We will use it to simplify P (put it in a

More information

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory Definition Reminder: We are given m samples {(x i, y i )} m i=1 Dm and a hypothesis space H and we wish to return h H minimizing L D (h) = E[l(h(x), y)]. Problem

More information

1 Randomized Computation

1 Randomized Computation CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at

More information

An almost sure central limit theorem for the weight function sequences of NA random variables

An almost sure central limit theorem for the weight function sequences of NA random variables Proc. ndian Acad. Sci. (Math. Sci.) Vol. 2, No. 3, August 20, pp. 369 377. c ndian Academy of Sciences An almost sure central it theorem for the weight function sequences of NA rom variables QUNYNG WU

More information

Lecture 5: February 16, 2012

Lecture 5: February 16, 2012 COMS 6253: Advanced Computational Learning Theory Lecturer: Rocco Servedio Lecture 5: February 16, 2012 Spring 2012 Scribe: Igor Carboni Oliveira 1 Last time and today Previously: Finished first unit on

More information

Algebraic gossip on Arbitrary Networks

Algebraic gossip on Arbitrary Networks Algebraic gossip on Arbitrary etworks Dinkar Vasudevan and Shrinivas Kudekar School of Computer and Communication Sciences, EPFL, Lausanne. Email: {dinkar.vasudevan,shrinivas.kudekar}@epfl.ch arxiv:090.444v

More information

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional

More information

Proclaiming Dictators and Juntas or Testing Boolean Formulae

Proclaiming Dictators and Juntas or Testing Boolean Formulae Proclaiming Dictators and Juntas or Testing Boolean Formulae Michal Parnas The Academic College of Tel-Aviv-Yaffo Tel-Aviv, ISRAEL michalp@mta.ac.il Dana Ron Department of EE Systems Tel-Aviv University

More information

Minimum spanning tree problem of uncertain random network

Minimum spanning tree problem of uncertain random network DOI 1.17/s1845-14-115-3 Minimum spanning tree problem of uncertain random network Yuhong Sheng Zhongfeng Qin Gang Shi Received: 29 October 214 / Accepted: 29 November 214 Springer Science+Business Media

More information

SIZE-RAMSEY NUMBERS OF CYCLES VERSUS A PATH

SIZE-RAMSEY NUMBERS OF CYCLES VERSUS A PATH SIZE-RAMSEY NUMBERS OF CYCLES VERSUS A PATH ANDRZEJ DUDEK, FARIDEH KHOEINI, AND PAWE L PRA LAT Abstract. The size-ramsey number ˆRF, H of a family of graphs F and a graph H is the smallest integer m such

More information

Sublinear-Time Algorithms

Sublinear-Time Algorithms Lecture 20 Sublinear-Time Algorithms Supplemental reading in CLRS: None If we settle for approximations, we can sometimes get much more efficient algorithms In Lecture 8, we saw polynomial-time approximations

More information

LECTURE 5 HYPOTHESIS TESTING

LECTURE 5 HYPOTHESIS TESTING October 25, 2016 LECTURE 5 HYPOTHESIS TESTING Basic concepts In this lecture we continue to discuss the normal classical linear regression defined by Assumptions A1-A5. Let θ Θ R d be a parameter of interest.

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

arxiv:quant-ph/ v1 22 Aug 2005

arxiv:quant-ph/ v1 22 Aug 2005 Conditions for separability in generalized Laplacian matrices and nonnegative matrices as density matrices arxiv:quant-ph/58163v1 22 Aug 25 Abstract Chai Wah Wu IBM Research Division, Thomas J. Watson

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

Distributed Learning with Regularized Least Squares

Distributed Learning with Regularized Least Squares Journal of Machine Learning Research 8 07-3 Submitted /5; Revised 6/7; Published 9/7 Distributed Learning with Regularized Least Squares Shao-Bo Lin sblin983@gmailcom Department of Mathematics City University

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

Most data analysis starts with some data set; we will call this data set P. It will be composed of a set of n

Most data analysis starts with some data set; we will call this data set P. It will be composed of a set of n 3 Convergence This topic will overview a variety of extremely powerful analysis results that span statistics, estimation theorem, and big data. It provides a framework to think about how to aggregate more

More information