Binary matrix completion

Size: px

Start display at page:

Download "Binary matrix completion"

Brian Neal
5 years ago
Views:

1 Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion 1 / 28

2 Low-rank matrices Example: Netflix matrix M 1,1 M 1,2 M 1,3 M 1,4 M 2,1 M 2,2 M 2,3 M 2,4 M 3,1 M 3,2 M 3,3 M 3,4, M 4,1 M 4,2 M 4,3 M 4,4 M i,j = How much user i likes movie j Yaniv Plan (U. Mich.) Binary matrix completion 2 / 28

3 Low-rank matrices Example: Netflix matrix M 1,1 M 1,2 M 1,3 M 1,4 M 2,1 M 2,2 M 2,3 M 2,4 M 3,1 M 3,2 M 3,3 M 3,4, M 4,1 M 4,2 M 4,3 M 4,4 M i,j = How much user i likes movie j Rank-1 model: a j = Amount of action in movie j x i = How much user i likes action M i,j = x i a j Yaniv Plan (U. Mich.) Binary matrix completion 2 / 28

4 Low-rank matrices Example: Netflix matrix M 1,1 M 1,2 M 1,3 M 1,4 M 2,1 M 2,2 M 2,3 M 2,4 M 3,1 M 3,2 M 3,3 M 3,4, M 4,1 M 4,2 M 4,3 M 4,4 M i,j = How much user i likes movie j Rank-1 model: a j = Amount of action in movie j x i = How much user i likes action M i,j = x i a j Rank-2 model: b j = Amount of comedy in movie j y i = How much user i likes comedy M i,j = x i a j + y i b j Yaniv Plan (U. Mich.) Binary matrix completion 2 / 28

5 Low-rank assumption The number of characteristics that determine user preferences should be smaller than the number of movies or users. M should depend linearly on the characteristics: M i,j exp(x i a j ). These characteristics need not be known. Yaniv Plan (U. Mich.) Binary matrix completion 3 / 28

6 Matrix completion Completion of M from a subset of the entries.? M 1,2??? M 2,3 Matrix Completion M 3,1? M 3,3 M 4,1 M 4,2? M 1,1 M 1,2 M 1,3 M 2,1 M 2,2 M 2,3 M 3,1 M 3,2 M 3,3 M 4,1 M 4,2 M 4,3 [Incomplete set of researchers: Srebro, Fazel, Candès, Recht, Rennie, Jaakkola, Montanari, Soo, Wainwright, Negahban, Yu, Koltchinskii, Lounici, Tsybakov, Klopp, Cai, Zhou, P., present] Yaniv Plan (U. Mich.) Binary matrix completion 4 / 28

7 Binary matrix completion: motivation Yaniv Plan (U. Mich.) Binary matrix completion 5 / 28

8 Binary data with missing entries Senate Voting Senators Senate bills??????????????? Yaniv Plan (U. Mich.) Binary matrix completion 6 / 28

9 Binary data with missing entries Mathoverflow Math enthusiasts Math questions??????????????? Yaniv Plan (U. Mich.) Binary matrix completion 6 / 28

10 Binary data with missing entries Pandora People Songs??????????????? Yaniv Plan (U. Mich.) Binary matrix completion 6 / 28

11 Binary data with missing entries Research literature Researchers Papers??????????????? Yaniv Plan (U. Mich.) Binary matrix completion 6 / 28

12 Binary data with missing entries Netflix People Movies??????????????? Yaniv Plan (U. Mich.) Binary matrix completion 6 / 28

13 Binary data with missing entries Reddit People Articles??????????????? Yaniv Plan (U. Mich.) Binary matrix completion 6 / 28

14 Binary data with missing entries Incomplete binary survey People Survey questions??????????????? Yaniv Plan (U. Mich.) Binary matrix completion 6 / 28

15 Binary data with missing entries Sensor triangulation Sensors Sensors Yaniv Plan (U. Mich.) Binary matrix completion 6 / 28

16 Binary data with missing entries Senate Voting Y = Senators Senate bills??????????????? Yaniv Plan (U. Mich.) Binary matrix completion 6 / 28

17 A classical numerical experiment with voting data. Yaniv Plan (U. Mich.) Binary matrix completion 7 / 28

18 Senate voting Y : Voting history of US senators on 299 bills from (d) First singular vector of Y Yaniv Plan (U. Mich.) Binary matrix completion 8 / 28

19 Senate voting Y : Voting history of US senators on 299 bills from (f) First singular vector of Y (g) Senate party affiliations Yaniv Plan (U. Mich.) Binary matrix completion 8 / 28

20 Senate voting Y : Voting history of US senators on 299 bills from (h) First singular vector of Y (i) Senate party affiliations A low-rank model? What matrix has low rank? Yaniv Plan (U. Mich.) Binary matrix completion 8 / 28

21 Low-rank assumption Consider the senate voting example. Can the voting preferences of a certain senator be predicted given only a few characteristics of this senator? Does Y depend linearly on these characteristics? Yaniv Plan (U. Mich.) Binary matrix completion 9 / 28

22 Generalized linear model M Preference Matrix P ( Y i,j = ) =f (M i,j ) Binary data matrix,y Yaniv Plan (U. Mich.) Binary matrix completion 10 / 28

23 Generalized linear model M Preference Matrix P ( Y i,j = ) =f (M i,j ) M is unknown. M has (approximately) low rank. Binary data matrix,y f : R [0, 1] is a known function (e.g., the logistic curve). M R d d, Y {, } d d. Yaniv Plan (U. Mich.) Binary matrix completion 10 / 28

24 Generalized linear model P M ( Y i,j = ) =f (M i,j )?????????????? Preference Matrix???? Incomplete binary matrix, Y M is unknown. M has (approximately) low rank. f : R [0, 1] is a known function (e.g., the logistic curve). M R d d, Y {, } d d. Ω {1, 2,..., d} {1, 2,..., d}. You see Y Ω. Yaniv Plan (U. Mich.) Binary matrix completion 10 / 28

25 Latent variable formulation Y i,j = sign(m i,j + Z i,j if M i,j + Z i,j 0 ) = if M i,j + Z i,j < 0 Z is an iid noise matrix. f (x) := P (Z 1,1 x). You see Y Ω. Yaniv Plan (U. Mich.) Binary matrix completion 11 / 28

26 Main assumption, main goal Goal: Efficiently approximate M and/or f (M). Data: Y Ω =???????????????. Assumption: M has (approximately) low rank. Yaniv Plan (U. Mich.) Binary matrix completion 12 / 28

27 Approximately low-rank Assumption: M d r. M conv(rank-r matrices with Frobenius norm d) (Nuclear-norm ball) d r M = i σ i(m) = (σ 1, σ 2,..., σ d ) 1. Robust extension of the rank [Chatterjee 2013]. Allows for convex programming reconstruction. Figure: Nuclear-norm ball in high dimensions Yaniv Plan (U. Mich.) Binary matrix completion 13 / 28

28 Maximum likelihood estimation Take our estimate ˆM be the solution to the following convex program: max X F Ω,Y(X) such that 1 d X r F Ω,Y : log-likelihood function. Yaniv Plan (U. Mich.) Binary matrix completion 14 / 28

29 Estimation of the distribution, f (M) Theorem (Upper bound achieved by convex programming) Let f be the logistic function. Assume that 1 d M r. Suppose the sampling set is chosen at random with E Ω = m d log(d). Then with high probability, ( ) 1 d 2 dh 2 (f ( Mˆ rd i,j ), f (M i,j )) 2 C min m, 1. i,j d H (p, q) 2 := ( p q) 2 + ( 1 p 1 q) 2 = squared Hellinger distance. Is this bound tight? Yaniv Plan (U. Mich.) Binary matrix completion 15 / 28

30 Estimation of the distribution, f (M) Theorem (Upper bound achieved by convex programming) Let f be the logistic function. Assume that 1 d M r. Suppose the sampling set is chosen at random with E Ω = m d log(d). Then with high probability, ( ) 1 d 2 dh 2 (f ( Mˆ rd i,j ), f (M i,j )) 2 C min m, 1. i,j Theorem (Lower bound achievable by any estimator) In the setup of the above theorem, inf ˆM(Y ) sup E 1 M d 2 i,j ( ) dh 2 (f ( Mˆ rd i,j ), f (M i,j )) 2 c min m, 1. Yaniv Plan (U. Mich.) Binary matrix completion 15 / 28

31 Estimation of M Assumption: M α. M max X F Ω,Y(X) such that d dα X r and X α Yaniv Plan (U. Mich.) Binary matrix completion 16 / 28

32 1-bit matrix completion vs noisy matrix completion: error bounds Let Y 0 := M + Z. Z is a matrix with iid Gaussian noise with variance σ 2. Let f follows the probit model. Y i,j := sign(y 0 i,j). Question: How much harder is it to estimate M from Y in comparison to estimating M from Y 0? Two regimes: High SNR and low SNR. Yaniv Plan (U. Mich.) Binary matrix completion 17 / 28

33 Case 1: σ α (high signal-to-noise ratio). Theorem (Upper bound, convex programming, quantized input, Y) Let f be the probit function. Assume that 1 dα M r and M α. Suppose the sampling set is chosen at random with E Ω = m d log(d). Then with high probability, 1 2 ˆM M d 2 F ( ) α Cα 2 2 rd exp 2σ 2 m. Theorem (Lower bound, achievable by any estimator, unquantized input) In the setup of the above theorem, and under mild technical conditions, inf sup E 1 2 rd ˆM M cασ ˆM(Y 0 ) M d 2 F m. Yaniv Plan (U. Mich.) Binary matrix completion 18 / 28

34 Case 2: σ α (low signal-to-noise ratio). Theorem (Upper bound, convex programming, quantized input, Y) Let f be the probit function. Assume that 1 M dα r and M α. Suppose the sampling set is chosen at random with E Ω = m d log(d). Then with high probability, 1 2 ˆM M d 2 F rd Cασ m. Theorem (Lower bound, achievable by any estimator, unquantized input) In the setup of the above theorem, and under mild technical conditions, inf sup E 1 2 rd ˆM M cασ ˆM(Y 0 ) M d 2 F m. Yaniv Plan (U. Mich.) Binary matrix completion 18 / 28

35 Dithering: noise helps! Conclusion: When the noise is larger than the signal, quantizing to a single bit loses almost no information! When the noise is (significantly) smaller than the signal, increasing the noise improves recovery from quantized measurements! Program in (3) Program in (4) M M 2 F M 2 F log 10 σ Yaniv Plan (U. Mich.) Binary matrix completion 19 / 28

36 Why we need noise Take Now remove the noise! Y i,j = sign(m i,j + Z i,j ) Yaniv Plan (U. Mich.) Binary matrix completion 20 / 28

37 Why we need noise Take Y i,j = sign(m i,j ) Claim: Accurate reconstruction of M is impossible! Yaniv Plan (U. Mich.) Binary matrix completion 20 / 28

38 Why we need noise Take Suppose that M = Y i,j = sign(m i,j ) λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ. Yaniv Plan (U. Mich.) Binary matrix completion 20 / 28

39 Why we need noise Take Suppose that M = Y i,j = sign(m i,j ) λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ λ. Y i,j = sign(λ) Approximation of M is impossible even if every entry of Y is seen. Yaniv Plan (U. Mich.) Binary matrix completion 20 / 28

40 Why we need noise Take Y i,j = sign(m i,j ) Suppose that we know that M has rank 1 so that M = uv T for some two vectors u, v R d. M = ũṽ T leads to exactly the same binary data as M if the signs of ũ match the signs of u and similarly for ṽ and v. Approximation of M is impossible even if every entry of Y is seen. Yaniv Plan (U. Mich.) Binary matrix completion 20 / 28

41 Related literature [Srebro - Rennie - Jaakkola et al. 04 ] Model free: If an estimate has low nuclear norm and matches the signs of the observed entries by a significant margin, then the error on unobserved entries is small. Our results: If the model is correct, then the overall error is nearly minimax. Noise helps! [Cai-Zhou]: Extension to non-uniform sampling. Yaniv Plan (U. Mich.) Binary matrix completion 21 / 28

42 general f Let f (x) L α := sup x α f (x)(1 f (x)) and f (x)(1 f (x)) β α := sup x α (f (x)) 2. Theorem (Upper bound achieved by convex programming) 1 d 2 M rd M 2 F CαL αβ α m. Theorem (Lower bound achievable by any estimator) 1 M d 1 d M rd 2 F cα β α m. Yaniv Plan (U. Mich.) Binary matrix completion 22 / 28

43 general f Let f (x) L α := sup x α f (x)(1 f (x)) and f (x)(1 f (x)) β α := sup x α (f (x)) 2. Theorem (Upper bound achieved by convex programming) d 2 H (f ( M), f (M)) CαL α rd m. Theorem (Lower bound achievable by any estimator) d 2 H (f (M), f ( M)) c α L 1 rd m. Yaniv Plan (U. Mich.) Binary matrix completion 22 / 28

44 Voting simulation Binary data: Voting history of US senators on 299 bills from (a) First singular vector of ˆM Yaniv Plan (U. Mich.) Binary matrix completion 23 / 28

45 Voting simulation Binary data: Voting history of US senators on 299 bills from (b) First singular vector of ˆM (c) Senate party affiliations Yaniv Plan (U. Mich.) Binary matrix completion 23 / 28

46 Voting simulation Binary data: Voting history of US senators on 299 bills from (d) First singular vector of ˆM (e) Senate party affiliations (f) First singular vector of Y Ω. Yaniv Plan (U. Mich.) Binary matrix completion 23 / 28

47 Voting simulation (g) First singular vector of ˆM (h) Senate party affiliations (i) First singular vector of Y Ω. (j) The first five singular values of ˆM : 463, 216, 40, 32, 29 Yaniv Plan (U. Mich.) Binary matrix completion 23 / 28

48 Voting simulation Randomly delete 90% of entries. (k) First singular vector of ˆM (l) Senate party affiliations (m) First singular vector of Y Ω. Yaniv Plan (U. Mich.) Binary matrix completion 23 / 28

49 Voting simulation Randomly delete 95% of entries. (n) First singular vector of ˆM (o) Senate party affiliations (p) First singular vector of Y Ω. Yaniv Plan (U. Mich.) Binary matrix completion 23 / 28

50 Voting simulation With 95% of votes deleted: 86% of missing votes were correctly predicted. (Averaged over 20 experiments.) Figure: Percent of missed predictions versus model rank r Rank-r approximation of Y Ω Nuclear-norm constrained maximum-likelihood estimation Yaniv Plan (U. Mich.) Binary matrix completion 23 / 28

51 MovieLens data set 100,000 movie ratings on a scale from 1 to 5 Convert to binary outcomes by comparing each rating to the average rating in the data set Evaluate by checking if we predict the correct sign Training on 95,000 ratings and testing on remainder - Standard matrix completion: 60% accuracy 1: 64% 2: 56% 3: 44% 4: 65% 5: 74% - Binary matrix completion: 73% accuracy 1: 79% 2: 73% 3: 58% 4: 75% 5: 89% Yaniv Plan (U. Mich.) Binary matrix completion 24 / 28

52 Restaurant recommendations [REU with Gao, Wootters, Vershynin] 100 restaurants, 107 users. 11 yes/no answers per user. > 75% success rate in recommending 1 restaurant per user (estimated using cross validation). Yaniv Plan (U. Mich.) Binary matrix completion 25 / 28

53 Methods of proof Upper bounds: Probability in Banach spaces/random matrix theory Lower bounds: Information theoretic techniques: Fano s inequality Yaniv Plan (U. Mich.) Binary matrix completion 26 / 28

54 Tiny sketch of upper bound proof Recall: F Ω,Y (X) is the log-likelihood of X (we maximize it). 1 For a fixed matrix, X, E(F Ω,Y (M) F Ω,Y (X)) = c D(f (X) f (M)). 2 Lemma: the following holds for all X satisfying 1 dα X r: F Ω,Y (X) E F Ω,Y (X) δ. 3 The maximizer, ˆM satisfies F Y,Ω ( ˆM) F Y,Ω (M). 4 0 F Ω,Y (M) F Ω,Y ( ˆM) E(F Ω,Y (M) F Ω,Y ( ˆM)) 2δ = c D(f ( ˆM) f (M)) 2δ Thus, D(f ( ˆM) f (M)) 2 c δ. Key step: Proof of lemma. Yaniv Plan (U. Mich.) Binary matrix completion 27 / 28

55 Thank you! Yaniv Plan (U. Mich.) Binary matrix completion 28 / 28

1-Bit Matrix Completion

1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible