1-Bit Matrix Completion

Size: px

Start display at page:

Download "1-Bit Matrix Completion"

Naomi Robbins
5 years ago
Views:

1 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg

2 Matrix Completion d When is it possible to recover the original matrix? d How can we do this efficiently? How many samples will we need?

3 Low-Rank Matrices r d d r Singular value decomposition: M = U V ¼ dr d 2 degrees of freedom

4 Collaborative Filtering The Netflix Problem M i;j = how much user i likes movie j Rank 1 model: u i = how much user i likes romantic movies v j = amount of romance in movie j M i;j = u i v j Rank 2 model: w i = how much user i likes zombie movies x j = amount of zombies in movie j M i;j = u i v j + w i x j

5 Beyond Netflix Recovery of incomplete survey data Analysis of voting data Sensor localization Quantum state tomography

6 Low-Rank Matrix Recovery Given: a d d matrix Mof rank r samples of M on the set : Y = M How can we recover M? cm = arg inf X:X =Y rank(x) Can we replace this with something computationally feasible?

7 Nuclear Norm Minimization Convex relaxation! Replace with cm = arg inf X:X =Y kxk If j j = O(r d log d), under certain assumptions, this procedure can recover M!

8 Matrix Completion in Practice Noise Y = (M + Z) Quantization Netflix: Ratings are integers between 1 and 5 Survey responses: True/False, Yes/No, Agree/Disagree Voting data: Yea/Nay Quantum state tomography: Binary outcomes Extreme quantization destroys low-rank structure

9 1-Bit Matrix Completion Extreme case Y = sign(m ) Claim: Recovering M from Y is impossible! M = No matter how many samples we obtain, all we can learn is whether > 0 or < 0

10 1-Bit Matrix Completion Extreme case Y = sign(m ) Claim: Recovering M from Y is impossible! M = uv eu = sign(u) ev = sign(v) fm = euev sign(fm) = sign(m)

11 Is There Any Hope? If we consider a noisy version of the problem, recovery becomes feasible! Y = sign(m + Z ) M = Fraction of positive/negative observations tells us something about Example of the power of dithering

12 Observation Model For (i; j) 2 we observe ( +1 with probability f(m i;j ) Y i;j = 1 with probability 1 f(m i;j ) If f behaves like a CDF, then this is equivalent to Y i;j = sign(m i;j + Z i;j ) where Z i;j is drawn according to a suitable distribution We will assume that is drawn uniformly at random

13 Examples Logistic regression / Logistic noise f(x) = ex 1 + e x Z i;j» logistic distribution Probit regression / Gaussian noise f(x) = (x=¾) Z i;j» N(0; ¾ 2 )

14 Assumptions M = 2 3 d If the upper-left corner of information M is not sampled, we have no Solution: Assume that M is spread 1 kmk d pr kmk 1 = max i;j jm i;jj ¼ O(1)

15 Maximum Likelihood Estimation Log-likelihood function: F(X) = X log(f(x i;j )) + (i;j)2 + X (i;j)2 log(1 f(x i;j )) cm = arg max F(X) X s.t. rank(x) r

16 Maximum Likelihood Estimation Log-likelihood function: F(X) = X log(f(x i;j )) + (i;j)2 + X (i;j)2 log(1 f(x i;j )) cm = arg max F(X) X s.t. 1 d kxk pr kxk 1

17 Recovery of the Matrix Theorem (Upper bound achieved by convex ML estimator) 1 Assume that kmk and. If is chosen at d pr kmk 1 random with Ej j = m > dlog d, then with high probability r 1 d kc M Mk 2 rd 2 F C L m where L := sup jxj jf 0 (x)j f(x)(1 f(x)) := sup jxj f(x)(1 f(x)) (f 0 (x)) 2 Is this bound tight?

18 Recovery of the Matrix Theorem (Upper bound achieved by convex ML estimator) 1 Assume that kmk and. If is chosen at d pr kmk 1 random with Ej j = m > dlog d, then with high probability r 1 d kc M Mk 2 rd 2 F C L m Theorem (Lower bound on any estimator) There exist M satisfying the assumptions above such that for any set with j j = m, we have (under mild technical assumptions) that r 1 q inf E cm d kc M Mk 2 rd 2 F c 3 4 m

19 Logistic Model L = 1 ¼ e Theorem (Upper bound achieved by convex ML estimator) r 1 d kc M Mk 2 2 F C e rd m Theorem (Lower bound on any estimator) inf cm E 1 d kc M Mk 2 2 F c e 3 8 r rd m

20 Probit Model L ¼ ¾ + 1 ¾ ¼ ¾ 2 e 2 =2¾ 2 Theorem (Upper bound achieved by convex ML estimator) 1 ³ d kc M Mk 2 1 r 2 F C ¾ + e 2 =2¾ 2 rd ¾ m Two regimes High signal-to-noise ratio: ¾ Low signal-to-noise ratio: ¾ Compare to how well we can estimate noisy measurements M from unquantized,

21 Probit Model (High SNR) Theorem (Upper bound achieved by convex ML estimator) 1 d 2 kc M Mk 2 F C 2 e 2 =2¾2r rd m Theorem (Lower bound on any estimator with unquantized measurements) r 1 inf E cm d kc M Mk 2 rd 2 F c ¾ m

22 Probit Model (Low SNR) Theorem (Upper bound achieved by convex ML estimator) r 1 d kc M Mk 2 rd 2 F C ¾ m Theorem (Lower bound on any estimator with unquantized measurements) r 1 inf E cm d kc M Mk 2 rd 2 F c ¾ m More noise can lead to improved performance!

23 Recovery of the Distribution It is also possible to establish bounds concerning the recovery of the distribution f(m), i.e., the matrix where each entry gives us the probability of observing +1 when we sample that entry We obtain matching upper and lower bounds on the average Hellinger distance between f(m) and f( M) c When lim L < 1, we can recover the distribution f(m)!1 without any assumptions on kmk 1 logistic model not probit model any model where the noise has heavy tails

24 Proof Methods Upper bounds Probability in Banach spaces Random matrix theory Lower bounds Information theoretic arguments Fano s inequality Packing sets of low-rank matrices

25 Tiny Sketch of Proof of Upper Bound Recall that we maximize the log-likelihood F(X) For a fixed matrix X, E[F(M) F(X)] = c D(f(X)jjf(M)) 1 Lemma: Let K = fx : kxk. With high d prg probability, sup X2K jf(x) EF(X)j ± By definition, F(cM) F(M) 0 F (M) F ( M) c h E F (M) F ( M) c i 2± = c D(f(cM)jjf(M)) 2± Thus, D(f( c M)jjf(M)) 2 c ±

26 Synthetic Simulations d = 500 m = :15d 2 r = 5 r = 3 r = 2 k c M Mk F kmk F log 10 ¾

27 Voting Simulation Binary (incomplete) data: Voting history of 105 US senators on 299 bills from First singular vector of c M Senator party affiliations First singular vector of Y

28 Voting Simulation Randomly delete 90% of the entries First singular vector of c M Senator party affiliations First singular vector of Y

29 Voting Simulation Randomly delete 95% of the entries First singular vector of c M Senator party affiliations First singular vector of Y 86% of missing votes correctly predicted

30 MovieLens Data Set 100,000 movie ratings (1000 users, 1700 movies) on a scale from 1 to 5 Convert to binary outcomes by comparing each rating to the average rating in the data set Evaluate by checking if we predict the correct sign Training on 95,000 ratings and testing on remainder standard matrix completion: 68% accuracy 1-bit matrix completion: 74% accuracy

31 Thank You!

1-Bit Matrix Completion

1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible