Free Energy Rates for Parameter Rich Optimization Problems

Size: px

Start display at page:

Download "Free Energy Rates for Parameter Rich Optimization Problems"

Julianna Roberts
5 years ago
Views:

1 Free Energy Rates for Parameter Rich Optimization Problems Joachim M. Buhmann 1, Julien Dumazert 1, Alexey Gronskiy 1, Wojciech Szpankowski 2 1 ETH Zurich, {jbuhmann, alexeygr}@inf.ethz.ch, julien.dumazert@gmail.com, 2 Purdue University, spa@cs.purdue.edu AofA Workshop Princeton, 20th June 2017

2 Roadmap Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 1/24

3 Generic Problem Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 2/24

4 Generic Problem: Gibbs Sampling We search for solutions: X c AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24

5 Generic Problem: Gibbs Sampling We search for solutions: X c Input X (e. g. a graph) is random! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24

6 Generic Problem: Gibbs Sampling We search for solutions: X c Input X (e. g. a graph) is random! Want solution c still to be stable define a Maximum Entropy Gibbs measure over c: Gibbs measures p(c X) exp( β cost(c,x)) solutions AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24

7 Generic Problem: free energy Now p(c X) = exp ( β cost F(X) ), where the following is free energy: F(X) = log Z(X) (here Z(X) is partition function). Goal: compute expected free energy hard: E X F(X) = E X log Z(X). AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 4/24

8 Case Study: Sparse Minimum Bisection Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Corollary Application AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 5/24

9 Case Study: Sparse Minimum Bisection Find two subgraphs of some small size d with minimum edge cost between them AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

10 Case Study: Sparse Minimum Bisection Find two subgraphs of some small size d with minimum edge cost between them AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

11 Case Study: Sparse Minimum Bisection Find two subgraphs of some small size d with minimum edge cost between them AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

12 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

13 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! solutions are dependent on each other not Random Energy Model (next slide)! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

14 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! solutions are dependent on each other not Random Energy Model (next slide)! contribution: free energy asymptotics solved! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

15 Case Study: Sparse Minimum Bisection flashback to Generic Problem : in high dimensions may require solving via sampling! solutions are dependent on each other not Random Energy Model (next slide)! contribution: free energy asymptotics solved! another case study Lawler Quadratic Assignment Problem solved as well: (see the paper) AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

16 Free Energy: History and Advances [Derrida 80] established REM no dependencies; AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24

17 Free Energy: History and Advances [Derrida 80] established REM no dependencies; [Talagrand 02]: systematization; simple proof of REM free energy phase transition and beyond: { E[log Z(β, X)] β 2 lim = + log 2 β < 2 log 2, 4 n n β log 2 β 2 log 2. AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24

18 Main Result Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 8/24

19 Main Result: Setting m is number of solutions log m = o(n) N AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24

20 Main Result: Setting m is number of solutions log m = o(n) N is size of one solution (number of parameters) N AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24

21 Main Result: Setting m is number of solutions log m = o(n) N is size of one solution (number of parameters) require to be parameter rich N 10 log m = o(n) AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24

22 Main Result: 2 nd Order Phase Transition Main theorem Consider sparse MBP on a complete graph; edge weights mutually independent within any given solution; have mean µ and variance σ 2. Then, the following holds: E[log Z] + lim ˆβµ N log m n log m provided log n d n2/7 log n. = { 1 + ˆβ 2 σ 2, ˆβ 2 < 2 ˆβσ 2, ˆβ 2 σ σ, AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 10/24

23 Proof Outline Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 11/24

24 Proof Outline I Introduce solution overlap D: average intersection of two bisections D is key to understand dependencies (remember we are no REM!) Compute dependencies P(``Z close to EZ ) via Var Z Lemma 1 The following holds E rand choice D = O(d 4 /n). Breaking E log Z into two cond s Final bounding via adjusting parameter AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 12/24

25 Proof Outline II Introduce event A: happens when Z is close to EZ, i. e. A := {Z ɛez} Goal is to compute P(A) Fact 1 P(A) can be bounded by VarZ via Chebychev. Lemma 2 [Buhmann et al., 2014] VarZ can be asymptotically approximated via E rand choice D: Compute dependencies P(``Z close to EZ ) via Var Z Breaking E log Z into two cond s Final bounding via adjusting parameter VarZ (EZ) 2( σ 2 β 2 E rand choice D ). AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 13/24

26 Proof Outline III Break E log Z into Fact 3 E log Z = E[log Z A] P(A) + E[log Z1(Ā)] (log EZ + log ɛ)p(a) + E[log Z1(Ā)] Can expand log EZ via Taylor expansion (used assumptions of Theorem) and bound P(A) from previous. Fact 4 E[log Z1(Ā)]: enough to bound loosely Compute dependencies P(``Z close to EZ ) via Var Z Breaking E log Z into two cond s Final bounding via adjusting parameter AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 14/24

27 Proof Outline IV Finally, the right choice of ɛ for two regimes of β gives the phase transition in lower bound E[log Z] + lim n > { 1 + ˆβ 2 σ 2, ˆβ 2 < 2, σ ˆβσ 2, ˆβ 2. σ Compute dependencies P(``Z close to EZ ) via Var Z Breaking E log Z into two cond s The same phase transition happens for upper bound, easier to prove (no computing dependencies) Final bounding via adjusting parameter AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 15/24

28 Possible Application Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 16/24

29 Application Setting: Gibbs Sampling Regularizer Remember our approach: AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24

30 Application Setting: Gibbs Sampling Regularizer Remember our approach: Q: what is regularizing here? A: choice of β essentially controls width AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24

31 Application Setting: Gibbs Sampling Regularizer Remember our approach: Q: what is regularizing here? A: choice of β essentially controls width Q: how? A: maximize expected log-convolution: [ β = arg max E X,X log ] p β (c X )p β (c X ) β c AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24

32 Application Setting: Intuition Log-convolution is almost cross entropy; AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24

33 Application Setting: Intuition Log-convolution is almost cross entropy; It stabilizes solution output: β = 0.25: underfitting β = 2.8: near-optimal β = 19.5: overfitting copt(x ) copt(x ) pβ( X ) pβ( X ) solutions solutions solutions β = 2.8 β = 0.25 β = β AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24

34 Corollary and Application Generic problem Case study: sparse minimum bisection Main result formulation Proof outline Application Corollary AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 19/24

35 Corollary: Log-Convolution The log-convolution score can be rewritten: E log c p β (c X )p β (c X ) = E log Z(β, X + X ) 2 E log Z(β, X) }{{}}{{} Thm Thm AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24

36 Corollary: Log-Convolution The log-convolution score can be rewritten: E log c p β (c X )p β (c X ) = E log Z(β, X + X ) 2 E log Z(β, X) }{{}}{{} Thm Thm Thus possible to analytically compute score: more noise less beta less noise greater beta noise-to-signal ratio AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24

37 Conclusion We computed asymptotically precisely the value of E log Z with small dependencies It has applications to model validation (submitted to J. of Theor. CS) in machine learning: method involves comparing two fluctuating Gibbs distributions Still not found a general way (and probably there is no such way) AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 21/24

38 Thanks for your attention! AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 22/24

39 References I E. T. Jaynes, Information Theory and Statistical Mechanics. Phys. Rev., 106, 1957, p W. Szpankowski, Combinatorial optimization problems for which almost every algorithm is asymptotically optimal. Optimization, 33, 1995, pp M. Talagrand. Spin Glasses: A Challenge for Mathematicians. Springer, NY, AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 23/24

40 References II J. Vannimenus and M. Mézard, On the Statistical Mechanics of Optimization Problems of the traveling Salesman Type. J. de Physique Lettres, 45, 1984, pp J. M. Buhmann, A. Gronskiy and W. Szpankowski Free Energy Rates for a Class of Combinatorial Optimization Problems AofA 14, Paris J. M. Buhmann, A. Gronskiy, J. Dumazert and W. Szpankowski Phase Transitions in Parameter Rich Optimization Problems SODA/ANALCO 17, Barcelona AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 24/24

A Validation Criterion for a Class of Combinatorial Optimization Problems via Helmholtz Free Energy and its Asymptotics

JMLR: Workshop and Conference Proceedings vol 40:1 19, 2015 A Validation Criterion for a Class of Combinatorial Optimization Problems via Helmholtz Free Energy and its Asymptotics Joachim M. Buhmann Julien