Phase transitions in low-rank matrix estimation

Similar documents
Sparse Superposition Codes for the Gaussian Channel

THE PHYSICS OF COUNTING AND SAMPLING ON RANDOM INSTANCES. Lenka Zdeborová

On convergence of Approximate Message Passing

The non-backtracking operator

Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

Fundamental limits of symmetric low-rank matrix estimation

Mismatched Estimation in Large Linear Systems

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

Reconstruction in the Generalized Stochastic Block Model

Message passing and approximate message passing

arxiv: v2 [cs.it] 6 Sep 2016

Mismatched Estimation in Large Linear Systems

The Sherrington-Kirkpatrick model

Performance Regions in Compressed Sensing from Noisy Measurements

Information, Physics, and Computation

Rigorous Dynamics and Consistent Estimation in Arbitrarily Conditioned Linear Systems

Approximate Message Passing for Bilinear Models

Planted Cliques, Iterative Thresholding and Message Passing Algorithms

Risk and Noise Estimation in High Dimensional Statistics via State Evolution

Approximate Message Passing Algorithms

Fundamental Limits of Compressed Sensing under Optimal Quantization

9 Forward-backward algorithm, sum-product on factor graphs

The spin-glass cornucopia

Information-theoretically Optimal Sparse PCA

Compressed Sensing under Optimal Quantization

Phase Transitions (and their meaning) in Random Constraint Satisfaction Problems

Relative Entropy and Score Function: New Information Estimation Relationships through Arbitrary Additive Perturbation

Approximate Message Passing

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

arxiv: v1 [cond-mat.dis-nn] 16 Feb 2018

For final project discussion every afternoon Mark and I will be available

Joint Channel Estimation and Co-Channel Interference Mitigation in Wireless Networks Using Belief Propagation

Spin glasses and Stein s method

arxiv: v4 [cond-mat.stat-mech] 28 Jul 2016

Optimality and Sub-optimality of Principal Component Analysis for Spiked Random Matrices

arxiv: v3 [cs.na] 21 Mar 2016

Pattern Recognition and Machine Learning

The Minimax Noise Sensitivity in Compressed Sensing

Probabilistic Graphical Models

Phil Schniter. Supported in part by NSF grants IIP , CCF , and CCF

Statistical and Computational Phase Transitions in Planted Models

much more on minimax (order bounds) cf. lecture by Iain Johnstone

Community detection in stochastic block models via spectral methods

Approximate Inference Part 1 of 2

Organization. I MCMC discussion. I project talks. I Lecture.

Approximate Inference Part 1 of 2

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

Machine Learning Lecture Notes

1. Aufgabenblatt zur Vorlesung Probability Theory

Lecture 8 Analyzing the diffusion weighted signal. Room CSB 272 this week! Please install AFNI

Interactions of Information Theory and Estimation in Single- and Multi-user Communications

Least-Squares Regression on Sparse Spaces

5 Operations on Multiple Random Variables

5. Density evolution. Density evolution 5-1

Capacity Analysis of MIMO Systems with Unknown Channel State Information

( 1 k "information" I(X;Y) given by Y about X)

Statistical Physics of Inference and Bayesian Estimation

Differentiation ( , 9.5)

Turbo-AMP: A Graphical-Models Approach to Compressive Inference

Directed and Undirected Graphical Models

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

Information Dimension

Recall that if X 1,...,X n are random variables with finite expectations, then. The X i can be continuous or discrete or of any other type.

Least Distortion of Fixed-Rate Vector Quantizers. High-Resolution Analysis of. Best Inertial Profile. Zador's Formula Z-1 Z-2

Phase transitions in discrete structures

Single-letter Characterization of Signal Estimation from Linear Measurements

Probabilistic Graphical Models

Fast Direct Methods for Gaussian Processes

On the Applications of the Minimum Mean p th Error to Information Theoretic Quantities

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

THERE is a deep connection between the theory of linear

Gaussian processes for inference in stochastic differential equations

Approximate Message Passing with Built-in Parameter Estimation for Sparse Signal Recovery

Fast Numerical Methods for Stochastic Computations

Independent Component Analysis. Contents

Lecture 5 Channel Coding over Continuous Channels

8.1 Concentration inequality for Gaussian random matrix (cont d)

A New Minimum Description Length

Linear Dynamical Systems

Growth Rate of Spatially Coupled LDPC codes

Robust Principal Component Analysis

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Chapter I: Fundamental Information Theory

The Game of Twenty Questions with noisy answers. Applications to Fast face detection, micro-surgical tool tracking and electron microscopy

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

An equivalence between high dimensional Bayes optimal inference and M-estimation

ee378a spring 2013 April 1st intro lecture statistical signal processing/ inference, estimation, and information processing Monday, April 1, 13

Propagating beliefs in spin glass models. Abstract

Multi-View Clustering via Canonical Correlation Analysis

Chapter 16. Structured Probabilistic Models for Deep Learning

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Vector Approximate Message Passing. Phil Schniter

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Chapter 4: Interpolation and Approximation. October 28, 2005

Statistical Physics on Sparse Random Graphs: Mathematical Perspective

Causal Modeling with Generative Neural Networks

Transcription:

Phase transitions in low-rank matrix estimation Florent Krzakala http://krzakala.github.io/lowramp/ arxiv:1503.00338 & 1507.03857

Thibault Lesieur Lenka Zeborová Caterina Debacco Cris Moore Jess Banks Jiaming Xu Jean Barbier Nicolas Macris Mohama Dia

Low-rank matrix estimation W = 1 p n XX T W = 1 p n UV T X 2 R n r or U 2 R m r,v 2 R n r unknown unknown Matrix W has low (finite) rank r n, m W is observe element-wise trough a noisy channel: P out (y ij w ij ) Goal: Estimate unknown X (or U & V) from known Y.

Some examples x T i =(0,...,0, 1, 0,...,0) w ij = x T i x j / p n { r-imensional variable (r=rank) (Goal: Estimate unknown X from known Y.)

Some examples x T i =(0,...,0, 1, 0,...,0) w ij = x T i x j / p n Aitive white Gaussian noise (sub-matrix localization) P out (y w) = 1 p 2 e (y w)2 2 (Goal: Estimate unknown X from known Y.)

Some examples x T i =(0,...,0, 1, 0,...,0) w ij = x T i x j / p n Aitive white Gaussian noise (sub-matrix localization) P out (y w) = 1 p 2 e (y w)2 2 Dense stochastic block moel (community etection) P out (y ij =1 w ij )=p out + µw ij P out (y ij =0 w ij )=1 p out µw ij p in = p out + µ/ p n = p out(1 p out ) µ 2. (Goal: Estimate unknown X from known Y.)

Some examples u T i =(v 1 i,...,v R i ) 2 R v T i =(0,...,0, 1, 0,...,0) Y W = 1 p n UV T +N (0, 2 ) Aitive white Gaussian noise (clustering mixture of Gaussians) (Goal: Estimate unknown U an V from known Y.)

Even more examples Sparse PCA, robust PCA Collaborative filtering (low rank matrix completion) 1-bit Collaborative filtering (like/unlike) Bi-clustering Plante clique (cf. Anrea Montanari yesteray) etc

QUESTIONS Many interesting problems can be formulate this way Q1: When is it possible to perform a goo factorization? Q2: When is it algorithmically tractable? Q3: How goo are spectral methos (main tool)? Yesteray: Anrea taught us how to analyze AMP for such problems with the state evolution approach. Toay: We continue in this irection an answer these questions in a probabilistic setting, with instances ranomly generate from a moel.

Probabilistic setting Assume X is generate from P (X) = The posterior istribution reas P (X Y )= 1 Z(Y ) ny ny i=1 i=1 P X (x i ) Y i<j P X (x i ) P out y ij x T i x j p n x j y ij x i Graphical moel where y ij are pair-wise observations of variables MMSE estimator (minimal error) Marginals probability of the posterior EXACTLY Solvable? When P X an P out known, n!1,r = O(1) Approximate message passing = ense-factor-graph simplifications of belief propagation, hopefully asymptotically exact marginals of the posterior istribution. Exact analysis possible with statistical-physics style methos Many rigorous proofs possible when one works a bit harer

OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof

OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof

AMP FOR GAUSSIAN ADDITIVE CHANNEL w ij = x T i x j / p AWGN p n y ij = w ij + B t = 1 p n Sa t 1 1 n A t = 1 n S = Y at a t X i v t i! a t 1 Mean an variance of the marginals: a t+1 i = f(a t,bi) t @f v t+1 i = (A t,b t @B i) First written for rank r=1 in Rangan, Fletcher 12 Depenence on the prior only via a thresholing function f(a,b) given by the expectation of: P (x) = 1 Z(A, B) P X(x)exp B > x x > Ax 2

AMP FOR GAUSSIAN ADDITIVE CHANNEL w ij = x T i x j / p AWGN p n y ij = w ij + B t = 1 p n Sa t 1 1 n A t = 1 n S = Y at a t X i v t i! a t 1 Mean an variance of the marginals: a t+1 i = f(a t,bi) t @f v t+1 i = (A t,b t @B i) First written for rank r=1 in Rangan, Fletcher 12 Note: for x=±1, these are nothing but the TAP equations for the Ising Sherrington-Kirkpatrick moel (on the Nishimori line) a t+1 i = tanh(b t i) v t+1 i =1 (a t+1 i ) 2

State evolution Single letter characterization of the AMP M t = 1 n ˆx AMP x r M t! # M t+1 = E x, "f M t, M t x + x Depens on the channel only though its Fisher information. 1-parameter family of channels having the same MMSE. cf. Yesteray s talk: rigorously proven in Montanari, Bayati 10 - Montanari, Deshpane 14

State evolution Single letter characterization of the AMP M t = 1 n ˆx AMP x M t+1 = E x, "f M t, M t x + r M t! x # Note: for x=±1, these are nothing but the replica symmetric equations for the Ising Sherrington-Kirkpatrick moel (on the Nishimori line) Z M t+1 = xd tanh M r! t M t x +

Replica Mutual information Most quantities of interest can be compute from the Mutual Information (free energy for physicist) I(X; Y )= Z xyp (x, y) log P (x, y) P (x)p (y) The replica methos preicts an asymptotic formula for i = I/n: i RS (m) = m2 + E x (x 2 ) 2 r m, mx mz E x,z applej + 4 with Z J (A, B) = log e Bx Ax2 /2 p(x)e FACT: The state evolution recursion for AMP is a fixe point of i RS (m) CONJECTURE: lim n!1 I(X; Y ) n =min m i rs (m)

Replica Mutual information i RS (m) = m2 + E x (x 2 ) 2 4 E x,z applej m, mx + r mz J (A, B) = log Z e Bx Ax2 /2 p(x)e Goo for AMP Ba for AMP i RS (E) i RS (E) 0.125 0.12 0.115 0.11 0.105 0.1 0.095 0.084 0.082 0.08 =0.0008 0 0.005 0.01 0.015 0.02 =0.00125 0 0.005 0.01 0.015 0.02 E 0.086 0.085 0.084 0.083 0.082 0.08 0.078 0.076 0.074 0.072 0.07 0.068 0.066 =0.0012 0 0.005 0.01 0.015 0.02 =0.0015 0 0.005 0.01 0.015 0.02 E Goo for AMP Goo for AMP E = MSE vector AMP = E[x 2 ] m

Replica Mutual information Can we prove lim n!1 I(X; Y ) n =min m i rs (m)? yes! i RS (m) = m2 + E x (x 2 ) 2 with 4 Z J (A, B) = log E x,z applej e Bx m, mx + r mz Ax2 /2 p(x)e * Proven for not-too-sparse PCA (Montanari & Deshpane 14) an for symmetric community etection (Montanari, Abbe & Deshpane 16). * Proven for the plante SK moel by Koraa & Macris 10 FK, Xu & Zeborová 16 : Upper Boun (Guerra Interpolation) I n apple i rs(m) Barbier, Dia, Macris, FK & Zeborová 16: lim n!1 (Spatial coupling+thermoynamic integration/i-mmse) I n min m i rs (m)

FORMULAS FOR UV T B t+1 u = A t+1 u = r 1 n Xf v(a t v,b t v) AMP 1 hf t v 0 (A t v,b t v)if u (A t 1 u,b t 1 u ) 1n f v(a t v,b t v)f v (A t v,b t v) T B t+1 v = A t+1 v = r 1 n XT f s (A t v,b t v) hfs 0 (A t s,b t s)if v (A t 1 v,b t 1 v ) 1n f u(a t ub t u)f u (A t u,b t u) T M t u = 1 nû AMP u State evolution M t+1 u = E u, "f u M tv, M tv x + r M t v! u # M t v = 1 m ˆv AMP v M t+1 v = E v, "f v M tu, M tu x + r M tu! v # Mutual information i RS (m u,m v )= m um v +[E(u 2 )][E(v 2 )] 2 E u,z Ju m v, m vu + p m v z E v,z Jv m u, m uv + p m u z

OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof

WHAT ABOUT OTHER CHANNELS? w ij = x T i x j / p n P out(y ij w ij ) y ij

CHANNEL UNIVERSALITY w ij = x T i x j / p n P out(y ij w ij ) y ij Depenence on the channel only via: S ij @ log P out(y ij w) @w y ij,0 Fisher-score matrix 1 EPout (y w=0) " @ log Pout (y w) @w y,0 2 # Fisher information Effective Gaussian channel with y=δs w ij = x T i x j / p AWGN p n y ij = w ij +

} CHANNEL UNIVERSALITY A physicist argument small quantity W = 1 p n XX T P out (Y ij W ij )=e log P out(y ij W ij ) = P out (Y ij 0)e W ijs ij + 1 2 W 2 ij S0 ij +O(n 3/2 ) n 2 terms P out (Y W )=P out (Y 0)e P iapplej (W ijs ij + 1 2 W 2 ij S0 ij )+O(p n) concentration P out (Y W ) P out (Y 0)e P iapplej (W ijs ij 1 2 W 2 ij )+O(p n) Effective Gaussian posterior probability P (W Y ) / P (W )e 1 2 P iapplej ( S ij W ij ) 2 +O( p n).

CHANNEL UNIVERSALITY arxiv:1603.08447 FK, Xu & Zeborova 16 Conjecture in Lesieur, FK & Zeborova 15 Rank 1 SBM use & proven in Abbe, Deshpane, Montanari 16

OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof

Minimum mean-square error x T W = p 1 i =(0,...,0, 1, 0,...,0) XX T Y = P out (W ) n Rank r=2 Size n=20000 Fisher information:

LARGER RANK? x T W = p 1 i =(0,...,0, 1, 0,...,0) XX T Y = P out (W ) n Rank >=4 Fisher information:

Two phase transitions for r>4 Fisher information: EASY < c

Two phase transitions for r>4 Fisher information: EASY HARD c < < s

Two phase transitions for r>4 Fisher information: EASY HARD IMPOSSIBLE > s

Two phase transitions for r>4 Easy phase < 1 r 2 = c Same transition in spectral methos (~BBP 05) Har phase p in p out > 1 p n r p p out (1 p out ). 1 r 2 < < 1 4r ln(r) [1 + o r(1)] Decelle, Moore, FK, Zeborova 11 = s Conjecture to be har for all polynomial algorithms Impossible phase > 1 4r ln(r) [1 + o r(1)] = s p in p out < 1 p n 2 p r log r p p out (1 p out ). Relate to Potts glass temperature (Kanter, Gross, Sompolinsky 85)

Non-symmetric Community etection 3 2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 AMP Opt spectral Asymmetric Community Detection 1 0.8 0.6 0.4 0.2 0 MMSE AMP 0 0.1 0.2 0.3 0.4 0.5 ρ MSE( ) at ρ=0.05 AMP opt 0 0.5 1 1.5 2 2.5 2 communities +/- size +: ρn size -: (1-ρ)n p ++ = p + µ 1 p = p + µ p n (1 ) p n p + = p µ 1 p n P (x) = x r 1 +(1 ) x + r 1 = p(1 p) µ 2

CLUSTERING MIXTURES OF GAUSSIANS IN HIGH DIMENSIONS 2 groups 20 groups Algorithm first propose by Matsushita, Tanaka 13 u T i =(v 1 i,...,v R i ) 2 R α=m/n=1 v T i =(0,...,0, 1, 0,...,0) W = 1 p n UV T +N (0, 1/ ) Y

What about RANK 1 sparse PCA? X i = {0, 1} W = 1 p n XX T P (x) = x,1 +(1 ) x,0 Y = W + N (0, ) ρ 1 Rank r=1 For ρ > 0.05, Montanari an Deshpane showe that AMP achieve the optimal MMSE ρ=0.05 Δ

What about RANK 1 sparse PCA? X i = {0, 1} W = 1 p n XX T Y = W + N (0, ) P (x) = x,1 +(1 ) x,0 Rank r=1 AMP achieve the optimal MMSE everywhere EXCEPT between the blue an re curves 0.05 0.04 spectral methos ρ= Δ 0.03 ρ 0.02 0.01 0 Δ AMP spinoal inf. theoretic transition 2n spinoal 0 0.001 0.002 0.003 0.004

Easy, Har an impossible inference A very common phenomena: Information is hien by metastability (1st orer transition in physics) known bouns har known algorithms easy impossible ck cs amount of information in the measurements The har phase quantifie also in: plante constraint satisfaction, compresse sensing, stochastic block moel, ictionary learning, blin source separation, sparse PCA, error correcting coes, hien clique problem, others. Conjecture: har for all polynomial algorithms

OUTLINE 1) Message passing, State evolution, Mutual information 2) Universality property 3) Main results 4) Sketch of proof

SKETCH OF THE PROOF Result 1 [FK, Xu & Zeborová] : Upper Boun I n apple i Bethe(m) = m2 + Ex (x 2 ) 2 4 E x,z applej m, mx + r mz (note that this is true at any value of n, not only asymptotically) Metho: Guerra s interpolation+nishimori ientities

SKETCH OF THE PROOF Result 1 [FK, Xu & Zeborová] : Upper Boun Interpolate the factorization problem at t=1 from a enoising problem at t=0 P t (x Y,D) = 1 Z t p 0 (x)e 2 t P 4 iapplej xi x j p n Y ij 2 2 3 5+(1 t) P i apple (D i x i ) 2 2 D Factorization: w ij = x T i x j / p n N (0, /t) y ij Denoising: x i N (0, D/(1 t)) D i with D = m I t=1 (x; Y )=I t=0 (x; y)+ R 1 0 t It (x; Y,y)

SKETCH OF THE PROOF Result 1 [FK, Xu & Zeborová] : Upper Boun Interpolate the factorization problem at t=1 from a enoising problem at t=0 P t (x Y,D) = 1 Z t p 0 (x)e 2 t P 4 iapplej xi x j p n Y ij 2 2 3 5+(1 t) P i apple (D i x i ) 2 2 D I t=1 (x; Y )=I t=0 (x; y)+ R 1 0 t It (x; Y,y) Use Stein lemma an Nishimori s ientities, one can show that I(x;Y ) n = i rs (m) R 1 0 t (m E t[xx 0 ]) 2 4 <i rs (m) Remark: for estimation problems, Guerra s interpolation yiels an upper boun while in the usual case (i.e. CSP) it yiels a lower boun

SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ - ΔRS - ΔAMP - 0

SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS - ΔAMP - 0

SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS - ΔAMP - 0 I-mmse theorem (Guo-Veru) (For physicists: this is just thermoynamics) 1 i( )=1 4 MMSE( )

SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS 1 [min Ei RS (E; )] 1 i( ) - ΔAMP - 0 I-mmse theorem (Guo-Veru) (For physicists: this is just thermoynamics) 1 i( )=1 4 MMSE( )

SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS 1 [min Ei RS (E; )] 1 i( ) - ΔAMP [min E i RS (E; )] apple i( ) - 0

SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS 1 [min Ei RS (E; )] 1 i( ) - ΔAMP [min E i RS (E; )] apple i( ) min E i RS (E; ) min E i RS (E; 0) apple i( ) i(0) - 0 Both are equal to H(x)

SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 1: ForΔ < ΔAMP MSE AMP ( ) MMSE( ) - ΔRS 1 [min Ei RS (E; )] 1 i( ) - ΔAMP [min E i RS (E; )] apple i( ) min E i RS (E; ) min E i RS (E; 0) apple i( ) i(0) - 0 min E i RS (,E) apple i( )

SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ Step 2: Prove that the true thermoynamic potential is analytic until ΔRS - - ΔRS ΔAMP This is the harest part. Sketch: 1) Map the problem to its spatially couple version 2) Prove both moels have the same mutual information (with Guerra s type interpolation) 3) Prove the spatially couple version is analytic until ΔRS (Threshol saturation) More in Nicolas Macris's talk this afternoon - 0

SKETCH OF THE PROOF Result 2 [Barbier, Dia, Macris, FK & Zeborová] : Converse Lower boun Δ - ΔRS Step 3: ForΔ >ΔRS, AMP provies back the MMSE Use again the I-MMSE theorem from ΔRS to Δ>ΔRS - ΔAMP - 0

Conclusions MMSE in a ranom setting of the low-rank matrix estimation evaluate (Proven rigorously for symmetric problems, open for generic UV T factorization). Promising proof strategy for the replica formula in estimation problems (possible generalization for larger ranks, tensors, etc ) Depenence on the output channel through its Fisher information: universality Two phase transitions: an algorithmic one an an information theoretic one. Sometimes equal, but not always; AMP reach the MMSE in a large region. When the problem is not balance (i.e. if the prior has a non-zero mean) AMP has a better etectability transition than spectral methos. Promising algorithm with goo convergence properties for applications beyon ranom setting (example: Restricte Boltzmann Machine). Open-source implementation available http://krzakala.github.io/lowramp/