BINARY STABLE EMBEDDING VIA PAIRED COMPARISONS. Andrew K. Massimino and Mark A. Davenport
|
|
- Gloria Cole
- 5 years ago
- Views:
Transcription
1 06 IEEE Statistical Signal Processing Workshop SSP BINAY STABLE EMBEDDING VIA PAIED COMPAISONS Andrew K. Massimino and Mark A. Davenport Georgia Institute of Technology, School of Electrical and Computer Engineering ABSTACT Suppose that we wish to estimate a vector x from a set of binary paired comparisons of the form x is closer to p than to q for various choices of vectors p and q. The problem of estimating x from this type of observation arises in a variety of contexts, including nonmetric multidimensional scaling, unfolding, and ranking problems, often because it provides a powerful and flexible model of preference. The main contribution of this paper is to show that under a randomized model for p and q, a suitable number of binary paired comparisons yield a stable embedding of the space of target vectors. Index Terms paired comparisons, binary stable embeddings, recommendation systems, -bit compressive sensing. INTODUCTION The central problem we consider in this paper is the estimation of a vector x n where, rather than directly observing x, we assume that we are restricted to m observations of the form x is closer to p i than q i, where p i, q i n correspond to points whose locations are known. Each of these comparisons essentially divides n in half and tells us which side of a hyperplane the point x lies. Observations of this form arise in a variety of contexts, but a particularly important class of applications involve recommendation systems, targeted advertisement, and psychological studies where x represents an ideal point that models a particular user s preferences and the p i and q i represent items that the user compares []. Items which are close to x are those most preferred by the user. Paired comparisons arise naturally in this context since precise numerical scores quantifying a user s preference are generally much more difficult to assign than comparative judgements [, 3]. Moreover, data consisting of paired comparisons is often generated implicitly in contexts where the user has the option to act on two or more alternatives; for instance they may choose to watch a particular movie, or click a particular advertisement, out of those displayed to them [4]. In such contexts, the true distances in the ideal point model s preference space are generally inaccessible in any direct way, but it is nevertheless still possible to obtain a good estimate of a user s ideal point. This work was supported by grants NL N C00, AFOS FA , NSF CCF-35066, CCF , and CMMI The fundamental question which interests us in this paper is how many comparisons suffice in order to guarantee that the number of differing paired comparisons generated by x and y is roughly proportional to the Euclidean distance between x and y. This allows us to understand how well x can potentially be estimated from highly quantized information and gives an idea of the stability in the presence of labeling errors. We consider the case where we are given an existing embedding of the items as in a mature recommender system and focus on the on-line problem of locating a single new user from their feedback consisting of binary data generated from paired comparisons. The item embedding could be generated using a variety of methods, such as multidimensional scaling applied to a set of item features, or even using the results of previous paired comparisons via an approach like that in [5]. We wish to understand how many comparisons are then required to accurately localize a user s ideal point. Any precise answer to this question would depend on the underlying geometry of the item embedding since some sets of hyperplanes will yield better tesselations of the preference space than others. Thus, to gain some intuition on this problem without reference to the geometry of a particular embedding, we will instead consider a probabilistic model where the items are generated at random from a particular distribution. It is important to note that the ideal point model, while similar, is distinct from the low-rank factor or attribute model used in the matrix completion approaches which have recently gained much attention as applied to recommendation systems, e.g., [6, 7]. Although both models suppose user choices are guided by a number of attributes, the ideal point model leads to preferences that are non-monotonic functions of those attributes. There is also empirical evidence that the ideal point model captures user behavior more accurately than factorization based approaches do [8, 9]. There is a large body of work that studies the problem of learning to rank items from various sources of data, including paired comparisons of the sort we consider in this paper. See, for example, [0,, ] and references therein. We first note that in most work on rankings, the central focus is on learning a correct rank-ordered list for a particular user, without providing any guarantees on recovering a correct parameterization for the user s preferences as we do here. Perhaps most /6/$ IEEE
2 06 IEEE Statistical Signal Processing Workshop SSP closely related to our work is that of [0, ], which examines the problem of learning a rank ordering using the same ideal point model considered in this paper. The message in this work is broadly consistent with ours, in that the number of comparisons required should scale with the dimension of the preference space not the total number of items. Also closely related is the work in [3, 4, 5] which consider paired comparisons and more general ordinal measurements in the similar but as discussed above, subtly different context of low-rank factorizations. Finally, while seemingly unrelated, we note that our work builds on the growing body of literature on -bit compressive sensing. In particular, our results are largely inspired by those in [6, 7], and borrow techniques from [8] in the proofs of some of our main results.. THE ANDOM OBSEVATION MODEL For the moment we will consider the noise-free setting where a user always prefers the item closest to the user s ideal point x. In this case we can represent the observed comparisons mathematically by letting A i x denote the i th observation, which consists of comparisons between p i and q i, and setting x q i x p i A i x :=sign = { + if x is closer to p i if x is closer to q i. We will also use Ax =[A x,, A m x] T to denote the vector of all observations resulting from m comparisons. Note that if we set a i =p i q i and τ i = p i q i, then we can re-write our observation model as A i x =sign a T i x τ i =sign a T i x τ i.. This is reminiscent of the one-bit compressive sensing setting with dithers [6, 7] with the important differences that: i we have not yet made any kind of sparsity or other structural assumption on x and, ii the dithers τ i, at least in this formulation, are dependent on the a i, which results in difficulty applying existing results from this theory to this setting. However, many of the techniques from this literature will nevertheless be helpful in analyzing this problem. To see this, we consider a randomized observation model where the pairs p i, q i are chosen independently with i.i.d. entries drawn according to a normal distribution, i.e., p i, q i N0,σ I. In this case, we have that the entries of our sensing vectors are i.i.d. with a ij N0, σ. Moreover, if we define b i = p i + q i, then we also have that b i N0, σ I, and we can write τ i = at i b i. Note that while τ i is clearly dependent on a i, we do have that a i and b i are independent. To further simplify things, let s re-normalize by dividing by a i, i.e., setting ã i = a i / a i and τ i = τ i / a i and A i x =sign ã T i x τ i. It is easy to see that ã i is distributed uniformly on the sphere S n = {a n : a =}. Note also that we can write τ i = ãt i b i. Since a i and b i are independent, ã i and b i are also independent. Moreover, for any unit-vector ã i,if b i N0, σ I then ã T i b i N0, σ. Thus, we must have τ i N0,σ /, independent of ã i, which is the key insight that enables the analysis below. 3. MAIN ESULT Under the model described above, we would like to understand how much the sign patterns of two different signals can differ from their Euclidean distance. Our main result is that given enough comparisons there is an approximate embedding of the preference space into {, } m via our measurement model. We assume any signal of interest has norm at most, i.e., we say signals x, y B n, the n-ball of radius. Asan example application, x might be considered the true user s ideal point and y some other possible estimate. Theorem 3. states that if x and y are sufficiently nearby, the respective sign-measurement patterns Ax and Ay closely match. We denote by d H the Hamming distance, i.e., d H counts the number of {, +} sign measurement errors; d H Ax, Ay := m i= A ix A i y. 3. Note that the following result applies for all x and y, and follows from Lemma 3., which in contrast applies only for fixed x and y. Theorem 3.. Let η>0 and ζ>0. Suppose m total measurements of the form. may be taken, i.e., each a i is drawn uniformly on the unit sphere and τ m is a vector with entries τ i N0,σ / independent of the a i, where σ =/n. If m ζ n log 3 ζ +log. η Then there are constants c,c,c 3, such that with probability at least η, for all points x, y B n, c x y c ζ + c + n + ζ c 3 + π +. Proof. By Lemma 3., for any fixed pair w, z B n we have bounds on the Hamming distance with probability at least exp ζ m, for all u B δ w and v B δ z. Since the radius ball can be covered with a set U of radius δ balls with U 3/δ n, by a union bound we have with
3 06 IEEE Statistical Signal Processing Workshop SSP probability at least 3/δ n exp ζ m, for all w, z U, for all u B δ w and v B δ z, c w z c δ ζ d HAu, Av c 3 w z + δ Since x y δ w z x y +δ, c x y c δ c δ n ζ + c 3δ + δ n π + ζ. Letting δ = ζ/ this becomes c x y ζ c + c + n + ζ c 3 + Lower bounding the probability by η, 3/δ n exp ζ m η. earranging, we have the desired result. π +. n π + ζ. We comment that Theorem 3. concerns a particular choice of the variance parameter σ. A natural question is what would happen with a different choice of σ. In fact, this assumption is critical intuitively, if σ is too small, then nearly all the hyperplanes induced by the comparisons will pass very close to the origin, so that accurate estimation of even x becomes impossible. On the other hand, if σ is too large, then an increasing number of these hyperplanes will not even intersect the ball of radius in which x is presumed to lie, thus yielding no new information. Lemma 3.. Let w, z B n be fixed. Let m measurements of the form. be taken in which each a i is drawn uniformly on the unit sphere and let τ m be a vector with entries τ i N0,σ / independent of the a i.fix ζ>0, δ>0, and define B δ w :={u B n : u w δ}. Then there are constants c,c, and c 3 such that with probability at least exp ζ m, for all u B δ w and v B δ z, c w z c δ ζ d HAu, Av c 3 w z where d H is the Hamming distance δ n π + ζ, Proof. Fix δ>0and let u B δ w, v B δ z. ecall that the Hamming distance d H is a sum of independent identically distributed Bernoulli random variables and we may bound it using Hoeffding s inequality. Since our probabilistic upper and lower bounds must hold for all u, v as described above, we introduce quantities L 0 and L which represent two extreme cases of the Bernoulli variables: L 0 := L := Then we have sup u B δ w,v B δ z inf u B δ w,v B δ z m m A i u A i v i= A i u A i v. i= L d H Au, Av L 0 Denote P 0 = E L 0 and P = E L, i.e., P 0 = P{ u B δ w, v B δ z :A i u =A i v} P = P{ u B δ w, v B δ z :A i u A i v}. We give a lower bound for P 0 in Lemma 3.3 and for P in Lemma 3.4. Now by Hoeffding s inequality, P{L 0 > P 0 +ζ} exp mζ P{L <P ζ} exp mζ. Hence, with probability at least exp mζ, P ζ d H Au,Av P 0 +ζ. Since P w z 4δ 6, πe and recalling σ = /n, P 0 Γ n δ σπ Γ n+ w z + σ π π δ n w z + π n π w z = + δ n π π, the lemma is satisfied with appropriate c,c,c Probability estimates Lemma 3.3. Let w, z B n be distinct. Fix δ>0. Denote by P 0 the probability that all points u and v that are within δ of w and z respectively do not differ in the random measurement denoted by A i i.e., the two δ-balls are separated by hyperplane i. The direction and threshold of hyperplane i are denoted by a and τ respectively. That is, P 0 = P{ u B δ w, v B δ z :A i u =A i v}.
4 06 IEEE Statistical Signal Processing Workshop SSP Then P 0 σπ Γ n δ Γ n+ w z + σ π. Proof. This result will require the following integral; a T w z νda = Γ n w z, S π n Γ n+ where ν is the uniform probability measure on the unit sphere. We need a lower bound on P 0 which is equivalent to an upper bound on P 0 = P{A i u A i v for some u B δ w, v B δ z}. Assume a T w > a T z. Then this probability is just But P{a T v <τ<a T u for some u B δ w,v B δ z} = P{ min v B δ z at v <τ< max u B δ w at u} min v B δ z at v a T z δ, so we have max u B δ w at u a T w + δ, P 0 P{a T z δ<τ<a T w + δ}. = a T Φ w + δ a T S n σ/ z δ σ/ νda S σ a T w z +δ νda π n σ a T w z νda+ δ π S σ π n Γ n δ σπ Γ n+ w z + σ π. Lemma 3.4 Probability of separation with a single random measurement. Let w, z B n be distinct. Fix δ>0. Denote by P the probability that all points u and v that are within δ of w and z respectively differ by a random measurement denoted by A i i.e., the two δ-balls are separated by hyperplane i. The direction and threshold of hyperplane i are denoted by a and τ respectively. That is, P = P{ u B δ w, v B δ z :A i u A i v}. Set ɛ 0 w z. Then, P ɛ 0 4δ 6 πe. Proof. Assume = w z, otherwise swap w and z in the argument that follows. Denote Δ=w zand ɛ = Δ. If B δ w B δ z, then the chance that a hyperplane with orientation a and offset τ splits the two balls is zero. Hence, we may consider a restriction to the portion of the sphere where a T w z δ. Further, since the distribution of τ is symmetric, we can restrict to C α = {a : a T w z αɛ} for some α δ and double the integral. Hence C α is a hyper-spherical cap of height αɛ. P = P{a T z + δ τ a T w δ a T w + δ τ a T z δ} a T Φ w δ a T σ/ z + δ σ/ ωda C α To obtain a lower bound, we consider the area of an arbitrary subset C α C α and multiply this by the minimum value of the integrand over that set. Let W = {a : a T w ξ w δ} and Z = {a : a T z ξ z δ}. Note that for any a Z c, a T z + δ ξ and for any a W c, a T w δ ξ w + δ ξ + δ by this assumption. W c and Z c are each the complement of a spherical cap and are of equal area. The set we are considering is C α = C α W c Z c. One can show, by properties of the normal distribution, a T min a C α Φ w δ a T z + δ σ σ a T w z δ σ φ σ ξ. C α has maximal area when W = Z, i.e. when z = ɛ 0 w. However, this will provide a lower bound for arbitrary w and z. Setting α and ξ will effect a trade-off between the area of the spherical section C α and the Gaussian density. We set ξ = ξ /, and α = α /. ecall by assumption σ = /n. Then the lower bound above becomes α ɛ 0 n δ n φξ = α ɛ 0 δ φξ = α ɛ 0 δ exp ξ /. π By integrating, it can be shown that the normalized area of C α is bounded by νc α 4 3 ξ α exp ξ /. Combining the two previous formulae, we have P α ɛ 0 δ ξ α 4 6π exp ξ. We obtain the stated result by setting ξ =/ and α = /.
5 06 IEEE Statistical Signal Processing Workshop SSP 4. EFEENCES [] C. Coombs, Psychological scaling without a unit of measurement, Psych. ev., vol. 57, no. 3, pp , 950. [] G. Miller, The magical number seven, plus or minus two: some limits on our capacity for processing information., Psychological review, vol. 63, no., pp. 8, 956. [3] H. David, The method of paired comparisons, London, UK: Charles Griffin & Company Limited, 963. [4] F. adlinski and T. Joachims, Active exploration for learning rankings from clickthrough data, in Proc. ACM SIGKDD Int. Conf. on Knowledge, Discovery, and Data Mining KDD, San Jose, CA, Aug [5] S. Agarwal, J. Wills, L. Cayton, G. Lanckriet, D. Kriegman, and S. Belongie, Generalized non-metric multidimensional scaling, in Proc. Int. Conf. Art. Intell. Stat. AISTATS, San Juan, Puerto ico, Mar [4] D. Park, J. Neeman, J. Zhang, S. Sanghavi, and I. Dhillon, Preference completion: Large-scale collaborative ranking form pairwise comparisons, in Proc. Int. Conf. Machine Learning ICML, Lille, France, July 05. [5] S. Oh, K. Thekumparampil, and J. Xu, Collaboratively learning preferences from ordinal data, in Proc. Adv. in Neural Processing Systems NIPS, Montréal, Québec, Dec. 04. [6] K. Knudson,. Saab, and. Ward, One-bit compressive sensing with norm estimation, IEEE Trans. Inform. Theory, vol. 6, no. 5, pp , 06. [7]. Baraniuk, S. Foucart, D. Needell, Y. Plan, and M. Wootters, Exponential decay of reconstruction error from binary measurements of sparse signals, arxiv: , 04. [8] L. Jacques, J. Laska, P. Boufounos, and. Baraniuk, obust -bit compressive sensing via binary stable embeddings of sparse vectors, IEEE Trans. Inform. Theory, vol. 59, no. 4, pp. 08 0, 03. [6] J. ennie and N. Srebro, Fast maximum margin matrix factorization for collaborative prediction, in Proc. Int. Conf. Machine Learning ICML, Bonn, Germany, Aug [7] E. Candès and B. echt, Exact matrix completion via convex optimization, Found. Comput. Math., vol. 9, no. 6, pp , 009. [8] B. Dubois, Ideal point versus attribute models of brand preference: a comparison of predictive validity, Adv. Consumer esearch, vol., no., pp , 975. [9] A. Maydeu-Olivares and U. Böckenholt, Modeling preference data, The SAGE handbook of quantitative methods in psychology, pp. 64 8, 009. [0] K. Jamieson and. Nowak, Active ranking using pairwise comparisons, in Proc. Adv. in Neural Processing Systems NIPS, Granada, Spain, Dec. 0. [] K. Jamieson and. Nowak, Low-dimensional embedding using adaptively selected ordinal data, in Proc. Allerton Conf. Communication, Control, and Computing, Monticello, IL, 0. [] F. Wauthier, M. Jordan, and N. Jojic, Efficient ranking from pairwise comparisons, in Proc. Int. Conf. Machine Learning ICML, Atlanta, GA, June 03. [3] Y. Lu and S. Negahban, Individualized rank aggregration using nuclear norm regularization, arxiv: , 04.
arxiv: v1 [cs.it] 21 Feb 2013
q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto
More informationExponential decay of reconstruction error from binary measurements of sparse signals
Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationBinary matrix completion
Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion
More informationRandom hyperplane tessellations and dimension reduction
Random hyperplane tessellations and dimension reduction Roman Vershynin University of Michigan, Department of Mathematics Phenomena in high dimensions in geometric analysis, random matrices and computational
More informationSparse Recovery from Inaccurate Saturated Measurements
Sparse Recovery from Inaccurate Saturated Measurements Simon Foucart and Jiangyuan Li Texas A&M University Abstract This article studies a variation of the standard compressive sensing problem, in which
More informationSharp Generalization Error Bounds for Randomly-projected Classifiers
Sharp Generalization Error Bounds for Randomly-projected Classifiers R.J. Durrant and A. Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/ axk
More informationSecure Degrees of Freedom of the MIMO Multiple Access Wiretap Channel
Secure Degrees of Freedom of the MIMO Multiple Access Wiretap Channel Pritam Mukherjee Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College Park, MD 074 pritamm@umd.edu
More informationPerformance Regions in Compressed Sensing from Noisy Measurements
Performance egions in Compressed Sensing from Noisy Measurements Junan Zhu and Dror Baron Department of Electrical and Computer Engineering North Carolina State University; aleigh, NC 27695, USA Email:
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationNew ways of dimension reduction? Cutting data sets into small pieces
New ways of dimension reduction? Cutting data sets into small pieces Roman Vershynin University of Michigan, Department of Mathematics Statistical Machine Learning Ann Arbor, June 5, 2012 Joint work with
More informationAdaptive one-bit matrix completion
Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines
More informationSparse Recovery with Pre-Gaussian Random Matrices
Sparse Recovery with Pre-Gaussian Random Matrices Simon Foucart Laboratoire Jacques-Louis Lions Université Pierre et Marie Curie Paris, 75013, France Ming-Jun Lai Department of Mathematics University of
More informationCS168: The Modern Algorithmic Toolbox Lecture #6: Regularization
CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationIEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior
More informationThe Decision List Machine
The Decision List Machine Marina Sokolova SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 sokolova@site.uottawa.ca Nathalie Japkowicz SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 nat@site.uottawa.ca
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationReferences for online kernel methods
References for online kernel methods W. Liu, J. Principe, S. Haykin Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, 2010. W. Liu, P. Pokharel, J. Principe. The kernel least mean square
More informationContent-based Recommendation
Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3
More informationHigh Dimensional Geometry, Curse of Dimensionality, Dimension Reduction
Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationLearning to Learn and Collaborative Filtering
Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany
More informationOptimal compression of approximate Euclidean distances
Optimal compression of approximate Euclidean distances Noga Alon 1 Bo az Klartag 2 Abstract Let X be a set of n points of norm at most 1 in the Euclidean space R k, and suppose ε > 0. An ε-distance sketch
More informationSketching for Large-Scale Learning of Mixture Models
Sketching for Large-Scale Learning of Mixture Models Nicolas Keriven Université Rennes 1, Inria Rennes Bretagne-atlantique Adv. Rémi Gribonval Outline Introduction Practical Approach Results Theoretical
More informationInformation-Theoretic Limits of Matrix Completion
Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationarxiv: v1 [math.na] 26 Nov 2009
Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,
More informationWorst-Case Analysis of the Perceptron and Exponentiated Update Algorithms
Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationAlgorithms Reading Group Notes: Provable Bounds for Learning Deep Representations
Algorithms Reading Group Notes: Provable Bounds for Learning Deep Representations Joshua R. Wang November 1, 2016 1 Model and Results Continuing from last week, we again examine provable algorithms for
More informationVariable-Rate Universal Slepian-Wolf Coding with Feedback
Variable-Rate Universal Slepian-Wolf Coding with Feedback Shriram Sarvotham, Dror Baron, and Richard G. Baraniuk Dept. of Electrical and Computer Engineering Rice University, Houston, TX 77005 Abstract
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationRecovery of Simultaneously Structured Models using Convex Optimization
Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationSampling and high-dimensional convex geometry
Sampling and high-dimensional convex geometry Roman Vershynin SampTA 2013 Bremen, Germany, June 2013 Geometry of sampling problems Signals live in high dimensions; sampling is often random. Geometry in
More informationBALANCING GAUSSIAN VECTORS. 1. Introduction
BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationRaRE: Social Rank Regulated Large-scale Network Embedding
RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationSparse and Low Rank Recovery via Null Space Properties
Sparse and Low Rank Recovery via Null Space Properties Holger Rauhut Lehrstuhl C für Mathematik (Analysis), RWTH Aachen Convexity, probability and discrete structures, a geometric viewpoint Marne-la-Vallée,
More informationSupremum of simple stochastic processes
Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there
More informationConditions for Robust Principal Component Analysis
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and
More informationONE-BIT COMPRESSED SENSING WITH PARTIAL GAUSSIAN CIRCULANT MATRICES
ONE-BIT COMPRESSED SENSING WITH PARTIAL GAUSSIAN CIRCULANT MATRICES SJOERD DIRKSEN, HANS CHRISTIAN JUNG, AND HOLGER RAUHUT Abstract. In this paper we consider memoryless one-bit compressed sensing with
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationDesign of Spectrally Shaped Binary Sequences via Randomized Convex Relaxations
Design of Spectrally Shaped Binary Sequences via Randomized Convex Relaxations Dian Mo Department of Electrical and Computer Engineering University of Massachusetts Amherst, MA 3 mo@umass.edu Marco F.
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationSparse Legendre expansions via l 1 minimization
Sparse Legendre expansions via l 1 minimization Rachel Ward, Courant Institute, NYU Joint work with Holger Rauhut, Hausdorff Center for Mathematics, Bonn, Germany. June 8, 2010 Outline Sparse recovery
More informationSupport Vector Regression with Automatic Accuracy Control B. Scholkopf y, P. Bartlett, A. Smola y,r.williamson FEIT/RSISE, Australian National University, Canberra, Australia y GMD FIRST, Rudower Chaussee
More informationSample Complexity of Learning Mahalanobis Distance Metrics. Nakul Verma Janelia, HHMI
Sample Complexity of Learning Mahalanobis Distance Metrics Nakul Verma Janelia, HHMI feature 2 Mahalanobis Metric Learning Comparing observations in feature space: x 1 [sq. Euclidean dist] x 2 (all features
More informationSparsest Solutions of Underdetermined Linear Systems via l q -minimization for 0 < q 1
Sparsest Solutions of Underdetermined Linear Systems via l q -minimization for 0 < q 1 Simon Foucart Department of Mathematics Vanderbilt University Nashville, TN 3740 Ming-Jun Lai Department of Mathematics
More informationLattices and Lattice Codes
Lattices and Lattice Codes Trivandrum School on Communication, Coding & Networking January 27 30, 2017 Lakshmi Prasad Natarajan Dept. of Electrical Engineering Indian Institute of Technology Hyderabad
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationLearning Bound for Parameter Transfer Learning
Learning Bound for Parameter Transfer Learning Wataru Kumagai Faculty of Engineering Kanagawa University kumagai@kanagawa-u.ac.jp Abstract We consider a transfer-learning problem by using the parameter
More informationLecture 13 October 6, Covering Numbers and Maurey s Empirical Method
CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationMATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS
MATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS Andrew S. Lan 1, Christoph Studer 2, and Richard G. Baraniuk 1 1 Rice University; e-mail: {sl29, richb}@rice.edu 2 Cornell University; e-mail:
More informationImproved Group Testing Rates with Constant Column Weight Designs
Improved Group esting Rates with Constant Column Weight Designs Matthew Aldridge Department of Mathematical Sciences University of Bath Bath, BA2 7AY, UK Email: maldridge@bathacuk Oliver Johnson School
More informationRandom projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016
Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationA Large Deviation Bound for the Area Under the ROC Curve
A Large Deviation Bound for the Area Under the ROC Curve Shivani Agarwal, Thore Graepel, Ralf Herbrich and Dan Roth Dept. of Computer Science University of Illinois Urbana, IL 680, USA {sagarwal,danr}@cs.uiuc.edu
More informationNetBox: A Probabilistic Method for Analyzing Market Basket Data
NetBox: A Probabilistic Method for Analyzing Market Basket Data José Miguel Hernández-Lobato joint work with Zoubin Gharhamani Department of Engineering, Cambridge University October 22, 2012 J. M. Hernández-Lobato
More informationCompressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery
Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad
More informationarxiv: v5 [math.na] 16 Nov 2017
RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationConvex Phase Retrieval without Lifting via PhaseMax
Convex Phase etrieval without Lifting via PhaseMax Tom Goldstein * 1 Christoph Studer * 2 Abstract Semidefinite relaxation methods transform a variety of non-convex optimization problems into convex problems,
More informationImproved Bounds on the Dot Product under Random Projection and Random Sign Projection
Improved Bounds on the Dot Product under Random Projection and Random Sign Projection Ata Kabán School of Computer Science The University of Birmingham Birmingham B15 2TT, UK http://www.cs.bham.ac.uk/
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationA Generalized Restricted Isometry Property
1 A Generalized Restricted Isometry Property Jarvis Haupt and Robert Nowak Department of Electrical and Computer Engineering, University of Wisconsin Madison University of Wisconsin Technical Report ECE-07-1
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationGREEDY SIGNAL RECOVERY REVIEW
GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin
More informationRank Determination for Low-Rank Data Completion
Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,
More informationOn the Observability of Linear Systems from Random, Compressive Measurements
On the Observability of Linear Systems from Random, Compressive Measurements Michael B Wakin, Borhan M Sanandaji, and Tyrone L Vincent Abstract Recovering or estimating the initial state of a highdimensional
More informationSample width for multi-category classifiers
R u t c o r Research R e p o r t Sample width for multi-category classifiers Martin Anthony a Joel Ratsaby b RRR 29-2012, November 2012 RUTCOR Rutgers Center for Operations Research Rutgers University
More informationLOW ALGEBRAIC DIMENSION MATRIX COMPLETION
LOW ALGEBRAIC DIMENSION MATRIX COMPLETION Daniel Pimentel-Alarcón Department of Computer Science Georgia State University Atlanta, GA, USA Gregory Ongie and Laura Balzano Department of Electrical Engineering
More informationBreaking the Limits of Subspace Inference
Breaking the Limits of Subspace Inference Claudia R. Solís-Lemus, Daniel L. Pimentel-Alarcón Emory University, Georgia State University Abstract Inferring low-dimensional subspaces that describe high-dimensional,
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationOn Optimal Frame Conditioners
On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationSuccinct Data Structures for Approximating Convex Functions with Applications
Succinct Data Structures for Approximating Convex Functions with Applications Prosenjit Bose, 1 Luc Devroye and Pat Morin 1 1 School of Computer Science, Carleton University, Ottawa, Canada, K1S 5B6, {jit,morin}@cs.carleton.ca
More information1 Differential Privacy and Statistical Query Learning
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 5: December 07, 015 1 Differential Privacy and Statistical Query Learning 1.1 Differential Privacy Suppose
More informationConstructing Explicit RIP Matrices and the Square-Root Bottleneck
Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry
More informationLearning Kernels -Tutorial Part III: Theoretical Guarantees.
Learning Kernels -Tutorial Part III: Theoretical Guarantees. Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute & Google Research mohri@cims.nyu.edu Afshin Rostami UC Berkeley
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Week #1
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationSparse and low-rank decomposition for big data systems via smoothed Riemannian optimization
Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Yuanming Shi ShanghaiTech University, Shanghai, China shiym@shanghaitech.edu.cn Bamdev Mishra Amazon Development
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Problem Set 3 Issued: Thursday, September 25, 2014 Due: Thursday,
More information