1-Bit Matrix Completion

Size: px

Start display at page:

Download "1-Bit Matrix Completion"

Evan Wade
5 years ago
Views:

1 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg

2 Matrix Completion d When is it possible to recover the original matrix? d How can we do this efficiently? How many samples will we need?

3 Low-Rank Matrices r d d r Singular value decomposition: M = U V ¼ dr d 2 degrees of freedom

4 Collaborative Filtering The Netflix Problem M i;j = how much user i likes movie j Rank 1 model: u i = how much user i likes romantic movies v j = amount of romance in movie j M i;j = u i v j Rank 2 model: w i = how much user i likes zombie movies x j = amount of zombies in movie j M i;j = u i v j + w i x j

6 Beyond Netflix Recovery of incomplete survey data Analysis of voting data Sensor localization Student response data Quantum state tomography

7 Low-Rank Matrix Recovery Given: a d d matrix Mof rank r samples of M on the set : Y = M How can we recover M? cm = arg inf X:X =Y rank(x) Can we replace this with something computationally feasible?

8 Nuclear Norm Minimization Convex relaxation! Replace with cm = arg inf X:X =Y kxk If j j = O(r d log d), under certain assumptions, this procedure can recover M!

9 Matrix Completion in Practice Noise Y = (M + Z) Quantization Netflix: Ratings are integers between 1 and 5 Survey responses: True/False, Yes/No, Agree/Disagree Voting data: Yea/Nay Quantum state tomography: Binary outcomes Extreme quantization destroys low-rank structure

10 What s the Problem?

11 1-Bit Matrix Completion Extreme case Y = sign(m ) Claim: Recovering M from Y is impossible! M = No matter how many samples we obtain, all we can learn is whether > 0 or < 0

12 Is There Any Hope? If we consider a noisy version of the problem, recovery becomes feasible! Y = sign(m + Z ) M + Z = Z 1;1 + Z 1;2 + Z 1;3 + Z 1;4 6 + Z 2;1 + Z 2;2 + Z 2;3 + Z 2; Z 3;1 + Z 3;2 + Z 3;3 + Z 3;4 5 + Z 4;1 + Z 4;2 + Z 4;3 + Z 4;4 Fraction of positive/negative observations tells us something about Example of the power of dithering

13 Observation Model For (i; j) 2 we observe ( +1 with probability f(m i;j ) Y i;j = 1 with probability 1 f(m i;j ) If f behaves like a CDF, then this is equivalent to Y i;j = sign(m i;j + Z i;j ) where Z i;j is drawn according to a suitable distribution We will assume that is drawn uniformly at random

14 Examples Logistic regression / Logistic noise f(x) = ex 1 + e x Z i;j» logistic distribution Probit regression / Gaussian noise f(x) = (x=¾) Z i;j» N(0; ¾ 2 )

15 Maximum Likelihood Estimation Log-likelihood function: F(X) = X log(f(x i;j )) + (i;j)2 + X (i;j)2 log(1 f(x i;j )) cm = arg max F(X) X s.t. rank(x) r

16 Maximum Likelihood Estimation Log-likelihood function: F(X) = X log(f(x i;j )) + (i;j)2 + X (i;j)2 log(1 f(x i;j )) cm = arg max F(X) X s.t. 1 d kxk pr kxk 1

17 Recovery of the Matrix Theorem (Upper bound achieved by convex ML estimator) 1 Assume that kmk and. If is chosen at d pr kmk 1 random with Ej j = m > dlog d, then with high probability r 1 d kc M Mk 2 rd 2 F C L m where L := sup jxj jf 0 (x)j f(x)(1 f(x)) := sup jxj f(x)(1 f(x)) (f 0 (x)) 2 Is this bound tight?

18 Recovery of the Matrix Theorem (Upper bound achieved by convex ML estimator) 1 Assume that kmk and. If is chosen at d pr kmk 1 random with Ej j = m > dlog d, then with high probability r 1 d kc M Mk 2 rd 2 F C L m Theorem (Lower bound on any estimator) There exist M satisfying the assumptions above such that for any set with j j = m, we have (under mild technical assumptions) that r 1 q inf E cm d kc M Mk 2 rd 2 F c 3 4 m

19 Logistic Model L = 1 ¼ e Theorem (Upper bound achieved by convex ML estimator) r 1 d kc M Mk 2 2 F C e rd m Theorem (Lower bound on any estimator) inf cm E 1 d kc M Mk 2 2 F c e 3 8 r rd m

20 Probit Model L ¼ ¾ + 1 ¾ ¼ ¾ 2 e 2 =2¾ 2 Theorem (Upper bound achieved by convex ML estimator) 1 ³ d kc M Mk 2 1 r 2 F C ¾ + e 2 =2¾ 2 rd ¾ m Two regimes High signal-to-noise ratio: ¾ Low signal-to-noise ratio: ¾ Compare to how well we can estimate noisy measurements M from unquantized,

21 Probit Model (High SNR) Theorem (Upper bound achieved by convex ML estimator) 1 d 2 kc M Mk 2 F C 2 e 2 =2¾2r rd m Theorem (Lower bound on any estimator with unquantized measurements) r 1 inf E cm d kc M Mk 2 rd 2 F c ¾ m

22 Probit Model (Low SNR) Theorem (Upper bound achieved by convex ML estimator) r 1 d kc M Mk 2 rd 2 F C ¾ m Theorem (Lower bound on any estimator with unquantized measurements) r 1 inf E cm d kc M Mk 2 rd 2 F c ¾ m More noise can lead to improved performance!

23 Recovery of the Distribution It is also possible to establish bounds concerning the recovery of the distribution f(m), i.e., the matrix where each entry gives us the probability of observing +1 when we sample that entry We obtain matching upper and lower bounds on the average Hellinger distance between f(m) and f( M) c When lim L < 1, we can recover the distribution f(m)!1 without any assumptions on kmk 1 logistic model not probit model any model where the noise has heavy tails

24 Tiny Sketch of Proof of Upper Bound Recall that we maximize the log-likelihood F(X) For a fixed matrix X, E[F(M) F(X)] = c D(f(X)jjf(M)) 1 Lemma: Let K = fx : kxk. With high d prg probability, sup X2K jf(x) EF(X)j ± By definition, F(cM) F(M) 0 F (M) F ( M) c h E F (M) F ( M) c i 2± = c D(f(cM)jjf(M)) 2± Thus, D(f( c M)jjf(M)) 2 c ±

25 Implementation minimize x f(x) subject to x 2 C f(x) is smooth, convex C is a closed, convex set Nonmonotone spectral projected-gradient (SPG) algorithm Iterative algorithm, each iteration requires computation of f(x) rf(x)

26 Synthetic Simulations d = 500 m = :15d 2 r = 5 r = 3 r = 2 k c M Mk F kmk F log 10 ¾

27 MovieLens Data Set 100,000 movie ratings on a scale from 1 to 5 Convert to binary outcomes by comparing each rating to the average rating in the data set Evaluate by checking if we predict the correct sign Training on 95,000 ratings and testing on remainder standard matrix completion: 60% accuracy 1: 64% 2: 56% 3: 44% 4: 65% 5: 74% 1-bit matrix completion: 73% accuracy 1: 79% 2: 73% 3: 58% 4: 75% 5: 89%

28 Conclusions 1-bit matrix completion is hard! What did you really expect? Sometimes 1-bit is all we can get We have algorithms that are near optimal Open questions Are there simpler/better/faster/stronger algorithms? What about 2.32-bit matrix completion?

29 Thank You!

1-Bit Matrix Completion

1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible