1-bit Matrix Completion. PAC-Bayes and Variational Approximation

Size: px

Start display at page:

Download "1-bit Matrix Completion. PAC-Bayes and Variational Approximation"

Geoffrey Owen
5 years ago
Views:

1 : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, September 2016

2 Introduction: Matrix Completion Incomplete Matrix: black cells: known white cells: unknown

3 Introduction: Matrix Completion Incomplete Matrix: black cells: known white cells: unknown Matrix Completion: Main Questions When is it possible? What algorithm? What is the rate of convergence?

4 Introduction: Low Rank Assumption M L R m 1 m 2 entries m 1 r + m 2 r entries

5 Introduction: Recovery Data Given n observations: Y i R X i {1,..., m 1 } {1,..., m 2 } Heuristic: Best low-rank approximation of the observations Method Penalized Least Squares: M = arg min M n (Y i M Xi ) 2 + λ rank(m) i=1

6 Introduction: Recovery Data Given n observations: Y i R X i {1,..., m 1 } {1,..., m 2 } Heuristic: Best low-rank approximation of the observations Method Penalized Least Squares: Convex Penalty: M = arg min M M = arg min M n (Y i M Xi ) 2 + λ rank(m) i=1 n (Y i M Xi ) 2 + λ M i=1

7 Introduction: 1-bit matrix completion Observations are binary: Y i { 1, 1} +1: Like / vote Yes 1: dislike / vote No

8 Introduction: 1-bit matrix completion Observations are binary: Y i { 1, 1} +1: Like / vote Yes 1: dislike / vote No Toy Example: Movie recommendation Movie User James Bond Toy Story Batman Heat Psycho... Michel Vincent Pierre 1... Keefe Gosia Emma

9 Introduction - Previous Models Statistical model { 1 with prob. f ( M 0 ) x Y {X = x} = 1 with prob. 1 f ( Mx 0 ) f : R [0, 1] is the link function, M 0 the parameter.

10 Introduction - Previous Models Statistical model { 1 with prob. f ( M 0 ) x Y {X = x} = 1 with prob. 1 f ( Mx 0 ) f : R [0, 1] is the link function, M 0 the parameter. Assumptions and Results Assumption: M 0 has low rank. Estimator: M = arg min M { Results: Recovery of M 0 } n log L i (M) + λ M i=1

11 Plan 1 Introduction 2 PAC-Bayes Estimation 3 Theoretical Results 4 Example

12 PAC-Bayes Estimation Different background: Classification (Machine Learning) M R m 1 m 2, (y, x) { 1, 1} R Relevant loss: 0-1 l M (y, x) = I (y sign(m x ))

13 PAC-Bayes Estimation Different background: Classification (Machine Learning) M R m 1 m 2, (y, x) { 1, 1} R Relevant loss: 0-1 l M (y, x) = I (y sign(m x )) Convex surrogate: Hinge Loss l h M (y, x) = max(0, 1 ym x) Integrated 0 1 risk of M: R(M) = E[l M (Y, X )] Empirical hinge risk of M: rn h (M) = 1 n l h M n (Y i, X i ) i=1

14 PAC-Bayes Estimation - Practical issues Prior distribution on matrices Factorization M = LR and: L i,k, R j,k γ k N (0, γ k ) 1 γ k Γ(a, b) Parameter: θ = (L, R, γ). Object of interest: pseudo-posterior distribution p(dθ) exp[ λr h n (LR )]π(dθ) Big Issue: Hard to use

15 PAC-Bayes Estimation - Variational Approximation Fast method Search an approximation in a family F. ρ = arg min KL(ρ, p)) ρ F Practical Aspects: Difficulty family F small for a fast computation; family F large to get a close approx. KL(ρ, p)) intractable Optimize a bound, which could be not so far: ρ F, KL(ρ, p)) rn h (M(ρ)) + R(ρ) { } ρ = arg min rn h (M(ρ)) + R(ρ) ρ

16 Outline 1 Introduction 2 PAC-Bayes Estimation 3 Theoretical Results 4 Example

17 Theoretical Results Best Predictor Bayes predictor: x, M B x = sign (E[Y X = x])

18 Theoretical Results Best Predictor Bayes predictor: x, M B x = sign (E[Y X = x]) Risk control in a restrictive case Margin assumption Y is noiseless observed M B has rank r (hopefully small) Then Rd ρ C w.p. 1 ɛ. [ r(m1 + m 2 )(log n + l) + log 1 ɛ n ]

19 Theoretical Results Best Predictor Bayes predictor: x, M B x = sign (E[Y X = x]) Risk control in a restrictive case Margin assumption Y is observed with a switch noise, with prob. p. M B has rank r (hopefully small) Then Rd ρ C w.p. 1 ɛ. [ r(m1 + m 2 )(log n + l) + log 1 ɛ n ]

20 Theoretical Results Best Predictor Bayes predictor: x, M B x = sign (E[Y X = x]) Risk control in a restrictive case Margin assumption Y is observed with a switch noise, with prob. p. M B has rank r (hopefully small) Then Rd ρ C w.p. 1 ɛ. [ r(m1 + m 2 )(log n + l) + log 1 ɛ n ] +(2 + δ)p

21 Plan 1 Introduction 2 PAC-Bayes Estimation 3 Theoretical Results 4 Example

22 Example Misclassification Rate method Freq. Logit HL G HL IG Level of noise Figure: Different level of switch noise

23 Conclusion Summary New approach on 1-bit MC PAC-Bayes Estimation Theoretical Bounds, not enough conclusive Works well in practice Future works Different models on MC New estimation to get better bounds

24 Bibliography OLS Matrix Completion: Koltchinskii, Lounici, Tsybakov, Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion, bit Matrix Completion: Lafond, Klopp, Moulines, Salmon, Probabilistic low-rank matrix completion on finite alphabets, 2014 Variational Approximation of PAC-Bayes Estimation: Alquier, Ridgway, Chopin On the properties of variational approximations of Gibbs posteriors, 2015 Our paper: Cottet, Alquier, : PAC-Bayesian Analysis of a Variational Approximation, 2016

25 Thank you Questions?

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational