Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Size: px

Start display at page:

Download "Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning"

Stanley Dawson
5 years ago
Views:

1 APPROXIMATE MESSAGE PASSING WITH CONSISTENT PARAMETER ESTIMATION Approxiae Message Passing wih Consisen Paraeer Esiaion and Applicaions o Sparse Learning Ulugbe S. Kailov, Suden Meber, IEEE, Sundeep Rangan, Meber, IEEE, Alyson K. Flecher, Meber, IEEE, Michael Unser, Fellow, IEEE, arxiv: v3 [cs.it] Dec 202 Absrac We consider he esiaion of an i.i.d. (possibly non-gaussian) vecor x R n fro easureens y R obained by a general cascade odel consising of a nown linear ransfor followed by a probabilisic coponenwise (possibly nonlinear) easureen channel. A novel ehod, called adapive generalized approxiae essage passing (Adapive GAMP), ha enables join learning of he saisics of he prior and easureen channel along wih esiaion of he unnown vecor x is presened. The proposed algorih is a generalizaion of a recenly-developed EM-GAMP ha uses expecaionaxiizaion (EM) ieraions where he poseriors in he E-seps are copued via approxiae essage passing. The ehodology can be applied o a large class of learning probles including he learning of sparse priors in copressed sensing or idenificaion of linear-nonlinear cascade odels in dynaical syses and neural spiing processes. We prove ha for large i.i.d. Gaussian ransfor arices he asypoic coponenwise behavior of he adapive GAMP algorih is prediced by a siple se of scalar sae evoluion equaions. In addiion, we show ha when a cerain axiu-lielihood esiaion can be perfored in each sep, he adapive GAMP ehod can yield asypoically consisen paraeer esiaes, which iplies ha he algorih achieves a reconsrucion qualiy equivalen o he oracle algorih ha nows he correc paraeer values. Rearably, his resul applies o essenially arbirary paraerizaions of he unnown disribuions, including ones ha are nonlinear and non-gaussian. The adapive GAMP ehodology hus provides a syseaic, general and copuaionally efficien ehod applicable o a large range of coplex linear-nonlinear odels wih provable guaranees. I. INTRODUCTION Consider he esiaion of a rando vecor x R n fro he easureen odel illusraed in Figure. The rando vecor x, which is assued o have independen and idenically disribued (i.i.d.) coponens x j P X, is passed hrough a nown linear ransfor ha oupus z = Ax. The coponens of y R are generaed by a coponenwise ransfer funcion P Y Z. This wor addresses he cases where he disribuions P X and P Y Z have soe unnown paraeers, The wor of S. Rangan was suppored by he Naional Science Foundaion under Gran No U. S. Kailov and M. Unser were suppored by he European Coission under Gran ERC-200-AdG FUN-SP. U. S. Kailov (eail: ulugbe.ailov@epfl.ch) is wih Bioedical Iaging Group, École Polyechnique Fédérale de Lausanne, Swizerland S. Rangan (eail: srangan@poly.edu) is wih Polyechnic Insiue of New Yor Universiy, Broolyn, NY. A. K. Flecher (eail: aflecher@ucsc.edu) is wih he Deparen of Elecrical Engineering, Universiy of California, Sana Cruz. M. Unser (eail: ichael.unser@epfl.ch) is wih Bioedical Iaging Group, École Polyechnique Fédérale de Lausanne, Swizerland Signal Prior Unnown i.i.d. signal Mixing Marix Unnown Linear Measureens Coponenwise Oupu Channel Available Measureens Fig. : Measureen odel considered in his wor. λ x and λ z, ha us be learned in addiion o he esiaion of x. Such join esiaion and learning probles wih linear ransfors and coponenwise nonlineariies arise in a range of applicaions, including epirical Bayesian approaches o inverse probles in signal processing, linear regression and classificaion [], [2], and, ore recenly, Bayesian copressed sensing for esiaion of sparse vecors x fro underdeerined easureens [3] [5]. Also, since he paraeers in he oupu ransfer funcion P Y Z can odel unnown nonlineariies, his proble forulaion can be applied o he idenificaion of linear-nonlinear cascade odels of dynaical syses, in paricular for neural spie responses [6] [8]. When he disribuions P X and P Y Z are nown, or reasonably bounded, here are a nuber of ehods available ha can be used for he esiaion of he unnown vecor x. In recen years, here has been significan ineres in so-called approxiae essage passing (AMP) and relaed ehods based on Gaussian approxiaions of loopy belief propagaion (LBP) [9] [8]. These ehods originae fro CDMA uliuser deecion probles in [9] [], and have received considerable recen aenion in he conex of copressed sensing [3] [7]. See, also he survey aricle [9]. The Gaussian approxiaions used in AMP are also closely relaed o sandard expecaion propagaion echniques [20], [2], bu wih addiional siplificaions ha exploi he linear coupling beween he variables x and z. The ey benefis of AMP ehods are heir copuaional sipliciy, large doain of applicaion, and, for cerain large rando A, heir exac asypoic perforance characerizaions wih esable condiions for opialiy [], [2], [6], [7]. This paper considers he so-called generalized AMP (GAMP) ehod of [8] ha exends he algorih in [3] o arbirary oupu disribuions P Y Z (any original forulaions assued addiive whie Gaussian noise (AWGN) easureens). However, alhough he curren forulaion of AMP and GAMP ehods is aracive concepually, in pracice, one ofen does no now he prior and noise disribuions exacly.

2 2 APPROXIMATE MESSAGE PASSING WITH CONSISTENT PARAMETER ESTIMATION To overcoe his iaion, Vila and Schnier [22], [23] and Krzaala e al. [24], [25] have recenly proposed exension of AMP and GAMP based on Expecaion Maxiizaion (EM) ha enable join learning of he paraeers (λ x, λ z ) along wih he esiaion of he vecor x. While siulaions indicae excellen perforance, he analysis of hese ehods is difficul. This wor provides a unifying analyic fraewor for such AMP-based join esiaion and learning ehods. The ain conribuions of his paper are as follows: Generalizaion of he GAMP ehod of [8] o a class of algorihs we call adapive GAMP ha enables join esiaion of he paraeers λ x and λ z along wih vecor x. The ehods are copuaionally fas and general wih poenially large doain of applicaion. In addiion, he adapive GAMP ehods include he EM-GAMP algorihs of [22] [25] as special cases. Exac characerizaion of he asypoic behavior of adapive GAMP. We show ha, siilar o he analysis of he AMP and GAMP algorihs in [], [2], [6] [8], he coponenwise asypoic behavior of adapive GAMP can be described exacly by a siple scalar sae evoluion (SE) equaions. Deonsraion of asypoic consisency of adapive GAMP wih axiu-lielihood (ML) paraeer esiaion. Our ain resul shows ha when he ML paraeer esiaion is copued exacly, he esiaed paraeers converge o he rue values and he perforance of adapive GAMP asypoically coincides wih he perforance of he oracle GAMP algorih ha nows he correc paraeer values. Rearably, his resul applies o essenially arbirary paraeerizaions of he unnown disribuions P X and P Y Z, hus enabling provably consisen esiaion on non-convex and nonlinear probles. Experienal evaluaion of he algorih for he probles of learning of sparse priors in copressed sensing and idenificaion of linear-nonlinear cascade odels in neural spiing processes. Our siulaions illusrae he perforance gain of adapive GAMP and is asypoic consisency. Adapive GAMP hus provides a copuaionallyefficien ehod for a large class of join esiaion and learning probles wih a siple, exac perforance characerizaion and provable condiions for asypoic consisency. A. Relaed Lieraure As enioned above, he adapive GAMP ehod proposed here can be seen as a generalizaion of he EM ehods in [22] [25]. In [22], [23], he prior P X is described by a generic L-er Gaussian ixure (GM) whose paraeers are idenified by an EM procedure [26]. The expecaion or E-sep is perfored by GAMP, which can approxiaely deerine he arginal poserior disribuions of he coponens x j given he observaions y and he curren paraeer esiaes of he GM disribuion P X. A relaed EM-GAMP algorih has also appeared in [24], [25] for he case of cerain sparse priors and AWGN oupus. Siulaions in [22], [23] show rearably good perforance and copuaional speed for EM-GAMP over a wide class of disribuions, paricularly in he conex of copressed sensing. Also, using arguens fro saisical physics, [24], [25] presens sae evoluion (SE) equaions for he join evoluion of he paraeers and vecor esiaes and confirs he nuerically. As discussed in Secion III-B, EM-GAMP is a special case of adapive GAMP wih a paricular choice of he adapaion funcions. Therefore, one conribuion of his paper is o provide a rigorous heoreical jusificaion of he EM- GAMP ehodology. In paricular, he curren wor provides a rigorous jusificaion of he SE analysis in [24], [25] along wih exensions o ore general inpu and oupu channels and adapaion ehods. However, he ehodology in [24], [25] in oher ways is ore general in ha i can also sudy seeded or spaially-coupled arices as proposed in [24], [25], [27]. An ineresing open quesion is wheher he analysis ehods in his paper can be exended o hese scenarios as well. An alernae ehod for join learning and esiaion has been presened in [28], which assues ha he disribuions on he source and oupu channels are heselves described by graphical odels wih he paraeers λ x and λ z appearing as unnown variables. The ehod in [28], called Hybrid-GAMP, ieraively cobines sandard loopy BP wih AMP ehods. One avenue of fuure wor is o see if ehodology in his paper can be applied o analyze he Hybrid-GAMP ehods as well. Finally, i should be poined ou ha while siulaneous recovery of unnown paraeers is appealing concepually, i is no a sric requireen. An alernae soluion o he proble is o assue ha he signal belongs o a nown class of disribuions and o iniize he axial ean-squared error (MSE) for he class. This iniax approach [29] was proposed for AMP recovery of sparse signals in [3]. Alhough iniax approach resuls in he esiaors ha are uniforly good over he enire class of disribuions, here ay be a significan gap beween he MSE achieved by he iniax approach and he oracle algorih ha nows he disribuion exacly. Indeed, his gap was he ain jusificaion of he EM- GAMP ehods in [22], [23]. Due o is asypoic consisency, adapive GAMP provably achieves he perforance of he oracle algorih. B. Ouline of he Paper The paper is organized as follows: In Secion II, we review he non-adapive GAMP and corresponding sae evoluion equaions. In Secion III, we presen adapive GAMP and describe ML paraeer learning. In Secion IV, we provide he ain heores characerizing asypoic perforance of adapive GAMP and deonsraing is consisency. In Secion V, we provide nuerical experiens deonsraing he applicaions of he ehod. Secion VI concludes he paper. A conference version of his paper has appeared in [30]. This paper conains all he proofs, ore deailed descripions and addiional siulaions.

3 KAMILOV, RANGAN, FLETCHER AND UNSER 3 A. GAMP Algorih II. REVIEW OF GAMP Before describing he adapive GAMP algorih, i is useful o review he basic (non-adapive) GAMP algorih of [8]. Consider he esiaion proble in Fig. where he coponenwise disribuions on he inpus and oupus have soe paraeric for, P X (x λ x ), P Y Z (y z, λ z ), () where λ x Λ x and λ z Λ z represen paraeers of he disribuions and Λ x and Λ z soe paraeer ses. The GAMP algorih of [8] can be seen as a class of ehods for esiaing he vecors x and z for he case when he paraeers λ x and λ z are nown. In conras, he adapive GAMP ehod ha is discussed in Secion III enables join esiaion of he paraeers λ x and λ z along wih he vecors x and z. In order ha o undersand how he adapaion wors, i is bes o describe he basic GAMP algorih as a special case of he ore general adapive GAMP procedure. The basic GAMP algorih corresponds o he special case of Algorih when he adapaion funcions H x and H z oupu fixed values H z(p, y, τ p) = λ z, H x(r, τ r) = λ x, (2) for soe pre-copued sequence of paraeers λ x and λ z. By pre-copued, we ean ha he values do no depend on he daa hrough he vecors p, y, and r. In he oracle scenario λ x and λ z are se o he rue values of he paraeers and do no change wih he ieraion nuber. The esiaion funcions G x, G z and G s deerine he esiaes for he vecors x and z, given he paraeer values λ x and λ z. As described in [8], here are wo iporan ses of choices for he esiaion funcions, resuling in wo varians of GAMP: Su-produc GAMP: In his case, he esiaion funcions are seleced so ha GAMP provides a Gaussian approxiaion of su-produc loopy BP. The esiaes x and ẑ hen represen approxiaions of he MMSE esiaes of he vecors x and z. Max-su GAMP: In his case, he esiaion funcions are seleced so ha GAMP provides a quadraic approxiaion of ax-su loopy BP and x and ẑ represen approxiaions of he MAP esiaes. The esiaion funcions of he su-produc GAMP are equivalen o scalar MMSE esiaion probles for he coponens of he vecors x and z observed in Gaussian noise. For ax-su GAMP, he esiaion funcions correspond o scalar MAP probles. Thus, for boh versions, he GAMP ehod reduces he vecor-valued MMSE and MAP esiaion probles o a sequence of scalar AWGN probles cobined wih linear ransfors by A and A T. GAMP is hus copuaionally siple, wih each ieraion involving only scalar nonlinear operaions followed by linear ransfors. The operaions are siilar in for o separable and proxial iniizaion ehods widely used for such probles [3] [35]. Appendix A reviews he equaions for he su-produc GAMP. More deails, as well as he equaions for ax-su GAMP can be found in [8]. B. Sae Evoluion Analysis In addiion o is copuaional sipliciy and generaliy, a ey oivaion of he GAMP algorih is ha is asypoic behavior can be precisely characerized when A is a large i.i.d. Gaussian ransfor. The asypoic behavior is described by wha is nown as a sae evoluion (SE) analysis. By now, here are a large nuber of SE resuls for AMP-relaed algorihs [9], [] [8]. Here, we review he paricular SE analysis fro [8] which is based on he fraewor in [6]. Assupion : Consider a sequence of rando realizaions of he GAMP algorih, indexed by he diension n, saisfying he following assupions: (a) For each n, he arix A R n has i.i.d. coponens wih A ij N (0, /) and he diension = (n) is a deerinisic funcion of n saisfying n/ β for soe β > 0 as n. (b) The inpu vecors x and iniial condiion x 0 are deerinisic sequences whose coponens converge epirically wih bounded oens of order s = 2 2 as (x, x0 ) PL(s) = (X, X 0 ), (3) o soe rando vecor (X, X 0 ) for soe 2. See Appendix B for he precise definiion of his for of convergence. (c) The oupu vecors z and y R are generaed by z = Ax, and y i = h(z i, w i ) for all i =,...,, (4) for soe scalar funcion h(z, w) and vecor w R represening an oupu disurbance. I is assued ha he oupu disurbance vecor w is deerinisic, bu epirically converges as w PL(s) = W, (5) where s = 2 2 is as in Assupion (b) and W is soe rando variable. We le P Y Z denoe he condiional disribuion of he rando variable Y = h(z, W ). (d) The esiaion funcion G x(r, τ r, λ x ) and is derivaive wih respec o r, is Lipschiz coninuous in r a (τ r, λ x ) = (τ r, λ x), where τ r is a deerinisic paraeer fro he SE equaions below. A siilar assupions holds for G z(p, τ p, λ z ). (e) The adapaion funcions Hx and Hz are se o (2) for soe deerinisic sequence of paraeers λ x and λ z. Also, in he esiaion seps in lines 7, 8 and 5 of Algorih, he values of he τp and τr are replaced wih he deerinisic paraeers τ p and τ r fro he SE equaions defined below. Assupion (a) siply saes ha we are considering large, Gaussian i.i.d. arices A. Assupions (b) and (c) sae ha he inpu vecor x and oupu disurbance w are odeled as deerinisic, bu whose epirical disribuions asypoically appear as i.i.d. This deerinisic odel is one of ey feaures

4 4 APPROXIMATE MESSAGE PASSING WITH CONSISTENT PARAMETER ESTIMATION of Bayai and Monanari s analysis in [6]. Assupion (d) is a ild coninuiy condiion. Assupion (e) defines he resricion of adapive GAMP o he non-adapive GAMP algorih. We will reove his final assupion laer. Noe ha, for now, here is no assupion ha he rue disribuion of X or he rue condiional disribuion of Y given Z us belong o he class of disribuions () for any paraeers λ x and λ z. The analysis can hus odel he effecs of odel isach. Now, given he above assupions, define he ses of vecors θx := {(x j, rj, x + j ), j =,..., n}, (6a) θz := {(z i, ẑi, y i, p i), i =,..., }. (6b) The firs vecor se, θ x, represens he coponens of he he rue, bu unnown, inpu vecor x, is GAMP esiae x as well as r. The second vecor, θ z, conains he coponens he rue, bu unnown, oupu vecor z, is GAMP esiae ẑ, as well as p and he observed oupu y. The ses θ x and θ z are iplicily funcions of he diension n. The ain resul of [8] shows ha if we fix he ieraion, and le n, he asypoic join epirical disribuion of he coponens of hese wo ses θ x and θ z converges o rando vecors of he for θ x := (X, R, X + ), θ z := (Z, Ẑ, Y, P ). (7) We precisely sae he naure of convergence oenarily (see Theore ). In (7), X is he rando variable in Assupion (b), while R and X + are given by R = α X + V, V N (0, ξ r), (8a) X + = G x(r, τ r, λ x) (8b) for soe deerinisic consans α r, ξ r, and τ r ha are defined below. Siilarly, (Z, P ) N (0, K p) for soe covariance arix K p, and Y = h(z, W ), Ẑ = G z(p, Y, τ p, λ z), (9) where W is he rando variable in (5) and K p conains deerinisic consans. The deerinisic consans αr, ξr, τ r and K p represen paraeers of he disribuions of θ x and θ z and depend on boh he disribuions of he inpu and oupus as well as he choice of he esiaion and adapaion funcions. The SE equaions provide a siple ehod for recursively copuing hese paraeers. The equaions are bes described algorihically as shown in Algorih 2. In order ha we do no repea ourselves, in Algorih 2, we have wrien he SE equaions for adapive GAMP: For non-adapive GAMP, he updaes (9b) and (20a) can be ignored as he values of λ z and λ x are pre-copued. Wih hese definiions, we can sae he ain resul fro [8]. Theore ( [8]): Consider he rando vecors θ x and θ z generaed by he oupus of GAMP under Assupion. Le θ x and θ z be he rando vecors in (7) wih he paraeers Algorih Adapive GAMP Require: Marix A, esiaion funcions G x, G s and G z and adapaion funcions H x and H z. : Se 0, s 0 and selec soe iniial values for x 0 and τ 0 x. 2: repea 3: {Oupu node updae} 4: τ p A 2 F τ x/ 5: p A x s τp 6: λ z Hz(p, y, τp) 7: ẑi G z(p i, y i, τp, λ z) for all i =,..., 8: s i G s(p i, y i, τp, λ z) for all i =,..., 9: τs (/) i G s(p i, y i, τp, λ z)/ p i 0: : {Inpu node updae} 2: /τr A 2 F τ s/n 3: r = x + τra T s 4: λ x Hx(r, τr) 5: x + j G x(rj, τ r, λ x) for all j =,..., n 6: τx + (τr/n) j G x(rj, τ r, λ x)/ r j 7: unil Terinaed deerined by he SE equaions in Algorih 2. Then, for any fixed, he coponens of θ x and θ z converge epirically wih bounded oens of order as θ PL() x = θ x, θ PL() z = θ z. (0) where θ x and θ z are given in (7). In addiion, for any, he is τ r = τ r, τ p = τ p, () also hold alos surely. The heore shows ha he behavior of any coponen of he vecors x and z and heir GAMP esiaes x and ẑ are disribued idenically o a siple scalar equivalen syse wih rando variables X, Z, X and Ẑ. This scalar equivalen odel appears in several analyses and can be hough of as a single-leer characerizaion [36] of he syse. Rearably, his iing propery holds for essenially arbirary disribuions and esiaion funcions, even ones ha arise fro probles ha are highly nonlinear or noncovex. Fro he singleleer characerizaion, one can copue he asypoic value of essenially any coponenwise perforance eric such as ean-squared error or deecion accuracy. Siilar singleleer characerizaions can also be derived by arguens fro saisical physics [24], [37] [40]. III. ADAPTIVE GAMP As described in he previous secion, he sandard GAMP algorih of [8] considers he case when he paraeers λ x and λ z in he disribuions in () are nown. The adapive GAMP ehod proposed in his paper, and shown in Algorih, is an exension of he sandard GAMP procedure ha enables siulaneous idenificaion of he paraeers λ x and λ z along wih esiaion of he vecors x and z. The

5 KAMILOV, RANGAN, FLETCHER AND UNSER 5 ey odificaion is he inroducion of he wo adapaion funcions: H z(p, y, τ p) and H x(r, τ r). In each ieraion, hese funcions oupu esiaes, λ z and λ x of he paraeers based on he daa p, y, r, τ p and τ r. We saw he sandard GAMP ehod corresponds o he adapaion funcions in (2) which oupus fixed values λ z and λ x ha do no depend on he daa, and can be used when he rue paraeers are nown. For he case when he rue paraeers are no nown, we will see ha a siple axiu lielihood (ML) can be used o esiae he paraeers fro he daa. A. ML Paraeer Esiaion To undersand how o esiae paraeers via he adapaion funcions, observe ha fro Theore, we now ha he disribuion of he coponens of r are disribued idenically o he scalar R in (8). Now, he disribuion of R only depends on hree paraeers α r, ξ r and λ x. I is hus naural o aep o esiae hese paraeers fro he epirical disribuion of he coponens of r. To his end, le φ x (r, λ x, α r, ξ r ) be he log lielihood φ x (r, λ x, α r, ξ r ) = log p R (r λ x, α r, ξ r ), (2) where he righ-hand side is he probabiliy densiy of a rando variable R wih disribuion R = α r X + V, X P X ( λ x ), V N (0, ξ r ). Then, a any ieraion, we can aep o perfor a axiu-lielihood (ML) esiae λ x = H x(r, τ r) = arg ax λ x Λ x ax (α r,ξ r) S x(τ r ) n n φ x (rj, λ x, α r, ξ r ).(3) j= Here, he se S x (τ r) is a se of possible values for he paraeers α r, ξ r. The se ay depend on he easured variance τ r. We will see he precise role of his se below. Siilarly, he join disribuion of he coponens of p and y are disribued according o he scalar (P, Y ) which depend only on he paraeers K p and λ z. Thus, we can define he lielihood φ z (p, y, λ z, K p ) = log p P,Y (p, y λ z, K p ), (4) where he righ-hand side is he join probabiliy densiy of (P, Y ) wih disribuion Y P Y Z ( Z, λ z ), (Z, P ) N (0, K p ). Then, we can aep o esiae λ z via he ML esiae λ z = H z(p, y, τ p) = arg ax λ z Λ z ax K p S z(τ p ) { } φ z (p i, y i, K p ). (5) i= Again, he se S z (τ p) is a se of possible covariance arices K p. B. Relaion o EM-GAMP I is useful o briefly copare he above ML paraeer esiaion wih he EM-GAMP ehod proposed by Vila and Schnier in [22], [23] and Krzaala e. al. in [24], [25]. Boh of hese ehods cobine he Bayesian AMP [4], [5] or GAMP algorihs [8] wih a sandard EM procedure [26] as follows. Firs, he algorihs use he su-produc version of he AMP / GAMP algorihs, so ha he oupus can provide an esiae of he poserior disribuions on he coponens of x given he curren paraeer esiaes. Specifically, a any ieraion, define he disribuion P j (x j rj, τr, λ x ) = [ Z exp ] 2τr x j rj 2 P X (x j λ x ). (6) For he su-produc AMP or GAMP algorihs, i is shown in [8] ha he SE equaions siplify so ha αr = and ξr = τ r, if he paraeers were seleced correcly. Therefore, fro Theore, he condiional disribuion P (x j rj ) should approxiaely ach he disribuion (6) for large n. If, in addiion, we rea rj and τ r as sufficien saisics for esiaing x j given y and A, hen P j can be reaed as an approxiaion for he poserior disribuion of x j given he curren paraeer esiae λ x. Soe jusificaion for his las sep can be found in [], [2], [7]. Using he approxiaion, we can approxiaely ipleen he EM procedure o updae he paraeer esiae via a axiizaion λ x = H x(r, τ r) := arg ax λ x Λ x n n j= [ E log P X (x j λ x ) P ] j, (7) where he expecaion is wih respec o he disribuion in (6). In [22], [23], he paraeer updae (7) is perfored only once every few ieraions o allow P o converge o he approxiaion of he poserior disribuion of x j given he curren paraeer esiaes. In [24], [25], he paraeer esiae is updaed every ieraion. A siilar procedure can be perfored for he esiaion of λ z. We hus see ha he EM-GAMP procedures in [22], [23] and in [24], [25] are boh special cases of he adapive GAMP algorih in Algorih wih paricular choices of he adapaion funcions Hx and Hz. As a resul, our analysis in Theore 2 below can be applied o hese algorihs o provide rigorous asypoic characerizaions of he EM-GAMP perforance. However, a he curren ie, we can only prove he asypoic consisency resul, Theore 3, for he ML adapaion funcions (3) and (5) described above. Tha being said, i should be poined ou ha EM-GAMP updae (7) is generally copuaionally uch sipler han he ML updaes in (3) and (5). For exaple, when P X (x λ x ) is an exponenial faily, he opiizaion in (7) is convex. Also, he opiizaions in (3) and (5) require searches over addiional paraeers such as α r and ξ r. Thus, an ineresing avenue of fuure wor is o apply he analysis resul, Theore 3 below, o see if he EM-GAMP ehod or soe siilarly copuaionally siple echnique can be developed which also provides asypoic consisency.

6 6 APPROXIMATE MESSAGE PASSING WITH CONSISTENT PARAMETER ESTIMATION Algorih 2 Adapive GAMP Sae Evoluion Given he disribuions in Assupion, copue he sequence of paraeers as follows: Iniializaion Se = 0 wih K 0 x = cov(x, X 0 ), τ 0 x = τ 0 x, (8) where he expecaion is over he rando variables (X, X 0 ) in Assupion (b) and τ 0 x is he iniial value in he GAMP algorih. Oupu node updae: Copue he variables associaed wih he oupu nodes Copue he variables τ p = βτ x, K p = βk x, (9a) λ z = Hz(P, Y, τ p), (9b) [ ] τ r = E p G s(p, Y, τ p, λ z), (9c) [ ] ξr = (τ r) 2 E G s(p, Y, τ p, λ z), (9d) [ ] αr = τ re z G s(p, h(z, W ), τ p, λ z), (9e) where he expecaions are over he rando variables (Z, P ) N (0, K p) and Y is given in (9). Inpu node updae: Copue λ x = Hx(R, τ r), (20a) [ ] τ + x = τ re r G x(r, τ r, λ x), (20b) K + x = cov(x, X + ), (20c) where he expecaions are over he rando variables in (8). IV. CONVERGENCE AND ASYMPTOTIC CONSISTENCY WITH GAUSSIAN TRANSFORMS A. General Sae Evoluion Analysis Before proving he asypoic consisency of he adapive GAMP ehod wih ML adapaion, we firs prove he following ore general convergence resul. Assupion 2: Consider he adapive GAMP algorih running on a sequence of probles indexed by he diension n, saisfying he following assupions: (a) Sae as Assupion (a) o (c) wih = 2. (b) For every, he adapaion funcion Hx(r, τ r ) can be regarded as a funcional over r saisfying he following wea pseudo-lipschiz coninuiy propery: Consider any sequence of vecors r = r (n) and sequence of scalars τ r = τ r (n), indexed by n saisfying PL() r(n) = R, τ r (n) = τ r, where R and τ r are he oupus of he sae evoluion equaions defined below. Then, H x(r (n), τ r (n) ) = Hx(R, τ r). Siilarly, Hz(y, p, τ p ) saisfies analogous coninuiy condiions in τ p and (y, p). See Appendix B for a general definiion of wealy pseudo-lipschiz coninuous funcionals. (c) The scalar funcion G x(r, τ r, λ x ) and is derivaive G x(r, τ r, λ x ) wih respec o r are coninuous in λ x uniforly over r in he following sense: For every ɛ > 0,, τr and λ x Λ x, here exiss an open neighborhood U of (τr, λ x) such ha for all (τ r, λ x ) U and r, G x(r, τ r, λ x ) G x(r, τ r, λ x) < ɛ, G x(r, τ r, λ x ) G x(r, τ r, λ x) < ɛ. In addiion, he funcions G x(r, τ r, λ x ) and G x(r, τ r, λ x ) us be Lipschiz coninuous in r wih a Lipschiz consan ha can be seleced coninuously in τ r and λ x. The funcions G s(p, y, τ p, λ z ), G z(p, y, τ p, λ z ) and heir derivaives G s(p, y, τ p, λ z ) G z(p, y, τ p, λ z ) saisfy analogous coninuiy assupions wih respec o p, y, τ p and λ z. Assupions (b) and (c) are soewha echnical, bu ild, coninuiy condiions ha can be saisfied by a large class of adapaion funcionals and esiaion funcions. For exaple, fro he definiions in Appendix B, he coninuiy assupion (d) will be saisfied for any funcional given by an epirical average Hx(r, τ r ) = n φ n x(r j, τ r ), j= where, for each, φ x(r j, τ r ) is pseudo-lipschiz coninuous in r of order p and coninuous in τ r uniforly over r. A siilar funcional can be used for H z. As we will see in Secion IV-B, he ML funcionals (3) and (5) will also saisfy he condiions of his assupion. Theore 2: Consider he rando vecors θ x and θ z generaed by he oupus of he adapive GAMP under Assupion 2. Le θ x and θ z be he rando vecors in (7) wih he paraeers deerined by he SE equaions in Algorih 2. Then, for any fixed, he coponens of θ x and θ z converge epirically wih bounded oens of order = 2 as θ PL() x = θ x, θ PL() z = θ z. (2) where θ x and θ z are given in (7). In addiion, for any, he is λ x = λ x, τ r = τ r, λ z = λ z, τ p = τ p, (22a) (22b) also hold alos surely. Proof: See Appendix C. The resul is a naural generalizaion of Theore and provides a siple exension of he SE analysis o incorporae he adapaion. The SE analysis applies o essenially arbirary adapaion funcions. I paricular, i can be used o analyze boh he behavior of he adapive GAMP algorih wih eiher ML and EM-GAMP adapaion funcions in he previous secion. The proof is sraighforward and is based on a coninuiy arguen also used in [4].

7 KAMILOV, RANGAN, FLETCHER AND UNSER 7 B. Asypoic Consisency wih ML Adapaion We ca now use Theore 2 o prove he asypoic consisency of he adapive GAMP ehod wih he ML paraeer esiaion described in Secion III-A. The following wo assupions can be regarded as idenifiabiliy condiions. Definiion : Consider a faily of disribuions, {P X (x λ x ), λ x Λ x }, a se S x of paraeers (α r, ξ r ) of a Gaussian channel and funcion φ x (r, λ x, α r, ξ r ). We say ha P X (x λ x ) is idenifiable wih Gaussian oupus wih paraeer se S x and funcion φ x if: (a) The ses S x and Λ x are copac. (b) For any rue paraeers λ x Λ x, and (α r, ξ r ) S x, he axiizaion λ x = arg ax λ x Λ x ax (α r,ξ r) S x E [φ x (αrx + V, λ x, α r, ξ r ) λ x, ξr ], (23) is well-defined, unique and reurns he rue value, λ x = λ x. The expecaion in (23) is wih respec o X P X ( λ x) and V N (0, ξr ). (c) For every λ x and α r, ξ r, he funcion φ x (r, λ x, α r, ξ r ) is pseudo-lipschiz coninuous of order = 2 in r. In addiion, i is coninuous in λ x, α r, ξ r uniforly over r in he following sense: For every ɛ > 0 and λ x, α r, ξ r, here exiss an open neighborhood U of λ x, α r, ξ r, such ha for all (λ x, α r, ξ r ) U and all r, φ x (r, λ x, α r, ξ r ) φ x (r, λ x, α r, ξ r ) < ɛ. Definiion 2: Consider a faily of condiional disribuions, {P Y Z (y z, λ z ), λ z Λ z } generaed by he apping Y = h(z, W, λ z ) where W P W is soe rando variable and h(z, w, λ z ) is a scalar funcion. Le S z be a se of covariance arices K p and le φ z (y, p, λ z, K p ) be soe funcion. We say ha condiional disribuion faily P Y Z (, λ z ) is idenifiable wih Gaussian inpus wih covariance se S z and funcion φ z if: (a) The paraeer ses S z and Λ z are copac. (b) For any rue paraeer λ z Λ z and rue covariance K p, he axiizaion λ z = arg ax ax λ z Λ z K p S z E [ φ z (Y, P, λ z, K p ) λ z, K p], (24) is well-defined, unique and reurns he rue value, λ z = λ z, The expecaion in (24) is wih respec o Y Z P Y Z (y z, λ z) and (Z, P ) N (0, K p). (c) For every λ z and K p, he funcion φ z (y, p, λ z, K p ) is pseudo-lipschiz coninuous in (p, y) of order = 2. In addiion, i is coninuous in λ p, K p uniforly over p and y. Definiions and 2 essenially require ha he paraeers λ x and λ z can be idenified hrough a axiizaion. The funcions φ x and φ z can be he log lielihood funcions (2) and (4), alhough we peri oher funcions as well, since he axiizaion ay be copuaionally sipler. Such funcions are soeies called pseudo-lielihoods. The exisence of a such a funcion is a ild condiion. Indeed, if such a funcion does no exiss, hen he disribuions on R or (Y, P ) us be he sae for a leas wo differen paraeer values. In ha case, one canno hope o idenify he correc value fro observaions of he vecors r or (y, p ). Assupion 3: Le P X (x λ x ) and P Y Z (y z, λ z ) be failies of disribuions and consider he adapive GAMP algorih, Algorih, run on a sequence of probles, indexed by he diension n saisfying he following assupions: (a) Sae as Assupion (a) o (c) wih = 2. In addiion, he disribuions for he vecor X is given by P X ( λ x) for soe rue paraeer λ x Λ x and he condiional disribuion of Y given Z is given by P Y Z (y z, λ z) for soe rue paraeer λ z Λ z. (b) Sae as Assupion 2(c). (c) The adapaion funcions are se o (3) and (5). Theore 3: Consider he oupus of he adapive GAMP algorih wih ML adapaion as described in Assupion 3. Then, for any fixed, (a) The coponens of θ x and θ z in (6) converge epirically wih bounded oens of order = 2 as in (2) and he is (22) hold alos surely. (b) In addiion, if (α r, ξ r) S x (τ r), and he faily of disribuions P X ( λ x ), λ x Λ x is idenifiable in Gaussian noise wih paraeer se S x (τ r) and pseudo-lielihood φ x (see Definiion ), hen λ x = λ x = λ x (25) alos surely. (c) Siilarly, if K p S z (τp) for soe, and he faily of disribuions P Y Z ( λ z ), λ z Λ z is idenifiable wih Gaussian inpus wih paraeer se S z (τp) and pseudolielihood φ z (see Definiion 2) hen λ z = λ z = λ z (26) alos surely. Proof: See Appendix D. The heore shows, rearably, ha for a very large class of he paraeerized disribuions, he adapive GAMP algorih is able o asypoically esiae he correc paraeers. Moreover, here is asypoically no perforance loss beween he adapive GAMP algorih and a corresponding oracle GAMP algorih ha nows he correc paraeers in he sense ha he epirical disribuions of he algorih oupus are described by he sae SE equaions. There are wo ey requireens: Firs, ha he opiizaions in (3) and (5) can be copued. These opiizaions ay be non-convex. Secondly, ha he opiizaions can be perfored are over sufficienly large ses of Gaussian channel paraeers S x and S z such ha i can be guaraneed ha he SE equaions evenually ener hese ses. In he exaples below, we will see ways o reduce he search space of Gaussian channel paraeers. V. NUMERICAL RESULTS A. Esiaion of a Gauss-Bernoulli inpu Recen resuls sugges ha here is considerable value in learning of priors P X in he conex of copressed sensing

8 8 APPROXIMATE MESSAGE PASSING WITH CONSISTENT PARAMETER ESTIMATION MSE (db) Sae Evoluion LASSO Oracle GAMP Adapive GAMP MSE (db) Measureen raio ( ) (a) Noise variance ( ) (b) Fig. 2: Reconsrucion of a Gauss-Bernoulli signal fro noisy easureens. The average reconsrucion MSE is ploed agains (a) easureen raio /n and (b) AWGN variance σ 2. The plos illusrae ha adapive GAMP yields considerable iproveen over l -based LASSO esiaor. Moreover, i exacly aches he perforance of oracle GAMP ha nows he prior paraeers. [42], which considers he esiaion of sparse vecors x fro underdeerined easureens ( < n). I is nown ha esiaors such as LASSO offer cerain opial inax perforance over a large class of sparse disribuions [43]. However, for any paricular disribuions, here is a poenially large perforance gap beween LASSO and MMSE esiaor wih he correc prior. This gap was he ain oivaion for [22], [23] which showed large gains of he EM- GAMP ehod due o is abiliy o learn he prior. Here, we illusrae he perforance and asypoic consisency of adapive GAMP in a siple copressed sensing exaple. Specifically, we consider he esiaion of a sparse vecor x R n fro noisy easureens y = Ax + w = z + w, where he addiive noise w is rando wih i.i.d. enries w i N (0, σ 2 ). Here, he oupu channel is deerined by he saisics on w, which are assued o be nown o he esiaor. So, here are no unnown paraeers λ z. As a odel for he sparse inpu vecor x, we assued he coponens are i.i.d. wih he Gauss-Bernoulli disribuion, { 0 prob = ρ, x j N (0, σx) 2 (27) prob = ρ where ρ represens he probabiliy ha he coponen is nonzero (i.e. he vecor s sparsiy raio) and σ 2 x is he variance of he non-zero coponens. The paraeers λ x = (ρ, σ 2 x) are reaed as unnown. In he adapive GAMP algorih, we use he esiaion funcions G x, G s, and G z corresponding o he su-produc GAMP algorih. As described in Appendix A, for he suproduce GAMP he SE equaions siplify so ha α r = and ξ r = τ r. Since he noise variance is nown, he iniial oupu noise variance τ 0 r obained by adapive GAMP in Algorih exacly aches ha of oracle GAMP. Therefore, for = 0, he paraeers α r and ξ r do no need o be esiaed, and (3) convenienly siplifies o H x (r, τ r ) = arg ax λ x Λ x n n log p R (r j λ x, τ r ), (28) j= where Λ x = [0, ] [0, + ). For ieraion > 0, we rely on asypoic consisency, and assue ha he axiizaion (28) yields he correc paraeer esiaes, so ha λ x = λ x. Then, in principle, for > 0 adapive GAMP uses he correc paraeer esiaes and we expec i o ach he perforance of oracle GAMP. In our ipleenaion, we run EM updae (7) unil convergence o approxiae he ML adapaion (28). Fig. 2 illusraes he perforance of adapive GAMP on signals of lengh n = 400 generaed wih he paraeers λ x = (ρ = 0.2, σx 2 = 5). The perforance of adapive GAMP is copared o ha of LASSO wih MSE opial regularizaion paraeer, and oracle GAMP ha nows he paraeers of he prior exacly. For generaing he graphs, we perfored 000 rando rials by foring he easureen arix A fro i.i.d. zero-ean Gaussian rando variables of variance /. In Figure 2(a), we eep he variance of he noise fixed o σ 2 = 0. and plo he average MSE of he reconsrucion agains he easureen raio /n. In Figure 2(b), we eep he easureen raio fixed o /n = 0.75 and plo he average MSE of he reconsrucion agains he noise variance σ 2. For copleeness, we also provide he asypoic MSE values copued via SE recursion. The resuls illusrae ha GAMP significanly ouperfors LASSO over he whole range of /n and σ 2. Moreover, he resuls corroborae he consisency of adapive GAMP which achieves nearly idenical qualiy of reconsrucion wih oracle GAMP. The perforance resuls indicae ha adapive GAMP can be an effecive ehod for esiaion when he paraeers of he proble are difficul o characerize and us be esiaed fro daa.

9 Measureen raio ( ) 0 2 Noise variance ( ) (a) (b) KAMILOV, RANGAN, FLETCHER AND UNSER 9 MSE (db) Oracle Adapive (n = 000) Adapive (n = 0 000) Measureen Measureen raio raio ( (/n) ) Fig. 3: Idenificaion of linear-nonlinear-poisson cascade odel. The average reconsrucion MSE is ploed agains he easureen raio /n. This plos illusraes near consisency of adapive GAMP for large n. B. Esiaion of a Nonlinear Oupu Classificaion Funcion As a second exaple, we consider he esiaion of he linear-nonlinear-poisson (LNP) cascade odel [8]. The odel has been successfully used o characerize neural spie responses in early sensory pahways of he visual syse. In he conex of LNP cascade odel, he vecor x R n represens he linear filer, which odels he linear recepive field of he neuron. AMP echniques cobined wih he paraeer esiaion have been recenly proposed for neural recepive field esiaion and conneciviy deecion in [44]. As in Secion V-A, we odel x as a Gauss-Bernoulli vecor of unnown paraeers λ x = (ρ, σ 2 x). To obain he easureens y, he vecor z = Ax is passed hrough a coponenwise nonlineariy u o resul in u(z) =. (29) + e z Le he funcion ψ R r denoe a vecor ψ(z) = (, u(z),..., u(z) r ) T. (30) Then, he final easureen vecor y is generaed by a easureen channel wih a condiional densiy of he for p Y Z (y i z i, λ z ) = f(z i) yi e f(zi), (3) y! where f denoes he nonlineariy given by f(z) = exp ( λ T z ψ(z) ). Adapive GAMP can now be used o also esiae vecor of polynoial coefficiens λ z, which ogeher wih x, copleely characerizes he LNP syse. The esiaion of λ z is perfored wih ML esiaor described in Secion III-A. We assue ha he ean and variance of he vecor x are nown a ieraion = 0. This iplies ha for su-produc GAMP he covariance K 0 p is iniially nown and he opiizaion (5) siplifies o H z (p, y, τ p ) = arg ax λ z Λ z { } log p Y (y i λ z ), (32) i= where Λ z R r. The esiaion of λ x is perfored as in Secion V-A. As before, for ieraion > 0, we assue ha he axiizaions (28) and (32) yield correc paraeer esiaes λ x = λ x and λ z = λ z, respecively. Thus we can conclude by inducion ha for > 0 he adapive GAMP algorih should coninue aching oracle GAMP for large enough n. In our siulaions, we ipleened (32) wih a gradien ascend algorih and run i unil convergence. In Fig. 3, we copare he reconsrucion perforance of adapive GAMP agains he oracle version ha nows he rue paraeers (λ x, λ z ) exacly. We consider he vecor x generaed wih rue paraeers λ x = (ρ = 0., σ 2 x = 30). We consider he case r = 3 and se he paraeers of he oupu channel o λ z = [ 4.88, 7.4, 2.58]. To illusrae he asypoic consisency of he adapive algorih, we consider he signals of lengh n = 000 and n = We perfor 0 and 00 rando rials for long and shor signals, respecively, and plo he average MSE of he reconsrucion agains /n. As expeced, for large n, he perforance of adapive GAMP is nearly idenical (wihin 0.5) o ha of oracle GAMP. VI. CONCLUSIONS AND FUTURE WORK We have presened an adapive GAMP ehod for he esiaion of i.i.d. vecors x observed hrough a nown linear ransfors followed by an arbirary, coponenwise rando ransfor. The procedure, which is a generalizaion of EM- GAMP ehodology of [22] [25] ha esiaes boh he vecor x as well as paraeers in he source and coponenwise oupu ransfor. In he case of large i.i.d. Gaussian ransfors, i is shown ha he adapive GAMP ehod is provably asypoically consisen in ha he paraeer esiaes converge o he rue values. This convergence resul holds over a large class of odels wih essenially arbirarily coplex paraeerizaions. Moreover, he algorih is copuaionally efficien since i reduces he vecor-valued esiaion proble o a sequence of scalar esiaion probles in Gaussian noise. We believe ha his ehod is applicable o a large class of linear-nonlinear odels wih provable guaranees can have applicaions in a wide range of probles. We have enioned he use of he ehod for learning sparse priors in copressed sensing. Fuure wor will include learning of paraeers of oupu funcions as well as possible exensions o non-gaussian arices. APPENDIX A SUM-PRODUCT GAMP EQUATIONS As described in [8], he su-produc esiaion can be ipleened wih he esiaion funcions G x(r, τ r, λ x ) := E[X R = r, τ r, λ x ], (33a) G z(p, y, τ p, λ z ) := E[Z P = p, Y = y, τ p, λ z ], (33b) G s(p, y, τ p, λ z ) := (G τ z(p, y, τ p, λ ) z ) p, (33c) p where he expecaions are wih respec o he scalar rando variables R = X + V x, V x N (0, τ r ), X P X ( λ x ), Z = P + V z, V z N (0, τ p ), Y P Y Z ( Z, λ z ). (34a) (34b)

10 0 APPROXIMATE MESSAGE PASSING WITH CONSISTENT PARAMETER ESTIMATION The paper [8] shows ha he derivaives of hese esiaion funcions for lines 9 and 6 are copued via he variances: τ r G x(r, τ r, λ x ) r = var[x R = r, τ r, λ x ] (35a) G s(p, y, τ p, λ z ) p ( = var[z P = p, Y = y, τ p, λ ) z ]. (35b) τ p τ p The esiaion funcions (33) correspond o scalar esiaes of rando variables in addiive whie Gaussian noise (AWGN). A ey resul of [8] is ha, when he paraeers are se o he rue values (i.e. ( λ x, λ z ) = (λ x, λ z )), he oupus x and ẑ can be inerpreed as su producs esiaes of he condiional expecaions E(x y) and E(z y). The algorih hus reduces he vecor-valued esiaion proble o a copuaionally siple sequence of scalar AWGN esiaion probles along wih linear ransfors. Moreover, he SE equaions in Algorih 2 reduce o a paricularly siple fors, where τ r and ξr in (9) are given by [ ] τ r = ξr = E 2 p 2 log p Y P (Y P ), (36a) where he expecaions are over he rando variables (Z, P ) N (0, K p) and Y is given in (9). The covariance arix K p has he for [ ] βτx0 βτ K x0 τ p p = βτ x0 τ p βτ x0 τ, (36b) p where τ x0 is he variance of X and β > 0 is he asypoic easureen raio (see Assupion for deails). The scaling consan (9e) becoes αr =. The updae rule for τ + x also siplifies o τ + x = E [ var ( X R )], (36c) where he expecaion is over he rando variables in (8). APPENDIX B CONVERGENCE OF EMPIRICAL DISTRIBUTIONS Bayai and Monanari s analysis in [6] eploys cerain deerinisic odels on he vecors and hen proves convergence properies of relaed epirical disribuions. To apply he sae analysis here, we need o review soe of heir definiions. We say a funcion φ : R r R s is pseudo-lipschiz of order >, if here exiss an L > 0 such for any x, y R r, φ(x) φ(y) L( + x + y ) x y. Now suppose ha for each n =, 2,..., v (n) is a se of vecors v (n) = {v i (n), i =,..., l(n)}, (37) where each eleen v i (n) R s and l(n) is he nuber of eleens in he se. Thus, v (n) can iself be regarded as a vecor wih sl(n) coponens. We say ha v (n) epirically converges wih bounded oens of order as n o a rando vecor V on R s if: For all pseudo-lipschiz coninuous funcions, φ, of order, n φ(v i (n)) = E(φ(V)) <. n i= When he naure of convergence is clear, we ay wrie (wih soe abuse of noaion) (n) PL() v V as n, or PL() v(n) = V. Finally, le P s be he se of probabiliy disribuions on Rs wih bounded h oens, and suppose ha H : P s Λ is a funcional P s o soe opological space Λ. Given a se v(n) as in (37), wrie H(v) for H(P v ) where P v is he epirical disribuion on he coponens of v. Also, given a rando vecor V wih disribuion P V wrie H(V) for H(P V ). Then, we will say ha he funcional H is wealy pseudo-lipschiz coninuous of order if PL() v(n) = V = H(v(n) ) = H(V), where he i on he righ hand side is in he opology of Λ. APPENDIX C PROOF OF THEOREM 2 The proof follows along he adapaion arguen of [4]. We use he ilde superscrip on quaniies such as x, r, τ r, p, τ p, s, and z o denoe values generaed via a non-adapive version of he GAMP. The non-adapive GAMP algorih has he sae iniial condiions as he adapive algorih (i.e. x 0 = x 0, τ 0 p = τ 0 p, s = s = 0), bu wih λ x and λ z replaced by heir deerinisic is λ x and λ z, respecively. Tha is, we replace lines 7, 8 and 5 wih z i = G z(p i, y i, τ p, λ z), s i = G s(p i, y i, τ p, λ z), x + j = G x(r j, τ r, λ x). This non-adapive algorih is precisely he sandard GAMP ehod analyzed in [8]. The resuls in ha paper show ha he oupus of he non-adapive algorih saisfy all he required is fro he SE analysis. Tha is, θ PL() x = θ x, θ PL() z = θ z, where θ x and θ z are he ses generaed by he non-adapive GAMP algorih: θ x := { (x j, r j, x + j ) : j =,..., n }, θ z = { (z i, z i, y i, p i) : i =,..., }. The is (2) are now proven hrough a coninuiy arguen ha shows ha he adapive and non-adapive quaniies us asypoically agree wih one anoher. Specifically, we will sar by proving ha he following is holds alos surely for all 0 x = n x x = 0,, (38a) τ p = τ p τ p = 0 (38b)

11 KAMILOV, RANGAN, FLETCHER AND UNSER where is usual he -nor. Moreover, in he course of proving (38), we will also show ha he following is hold alos surely p = r = s = p p = 0, n r r = 0, s s = 0, ẑ z = 0, z = τ r = τ r τ r = 0, λ x = λ x, λ z = λ z, (39a) (39b) (39c) (39d) (39e) (39f) (39g) The proof of he is (38) and (39) is achieved by an inducion on. Alhough we only need o show he above is for = 2, os of he arguens hold for arbirary 2. We hus presen he general derivaion where possible. To begin he inducion arguen, firs noe ha he nonadapive algorih has he sae iniial condiions as he adapive algorih. Thus he is (38) and (39c) hold for = 0 and =, respecively. We now proceed by inducion. Suppose ha 0 and he is (38) and (39c) hold for soe and, respecively. Since A has i.i.d. coponens wih zero ean and variance /, i follows fro he Marčeno-Pasur Theore [45] ha ha is 2-nor operaor nor is bounded. Tha is, here exiss a consan C A such ha A C A, AT C A. (40) This bound is he only par of he proof ha specifically requires = 2. Fro (40), we obain p p = A x τ ps A x + τ p s = A( x x ) + τ p( s s ) + ( τ p τ p) s A( x x ) + τ p s s + τ p τ p s (a) A x x + τ p s s + τ p τ p s C A x x + τ p s s + τ p τ p s (4) where (a) is due o he nor inequaliy Ax A x. Since, we have ha for any posiive nubers a and b (a + b) 2 (a + b ). (42) Applying he inequaliy (42) ino (4), we obain p p ) (C A x x + τ p s s + τp s ( ) 2 n C A x + 2 τp s + 2 ( τ p ) s. (43) Now, since s and τ p are he oupus of he non-adapive algorih hey saisfy he is s = s i = E [ S ] <, (44a) τ p = τ p <. i= (44b) Now, he inducion hypoheses sae ha x, s and τ p 0. Applying hese inducion hypoheses along he bounds (44a), and he fac ha n/ β we obain (39a). To prove (39g), we firs prove he epirical convergence of (p, y) o (P, Y ). Towards his end, le φ(p, y) be any pseudo-lipschiz coninuous funcion φ of order. Then φ(p i, y i ) E [ φ(p, Y ) ] i= φ(p i, y i ) φ( p i, y i ) i= + φ( p i, y i ) E [ φ(p, Y ) ] i= (a) L ( + p i + p i + y i ) p i p i i= + φ( p i, y i ) E [ φ(p, Y ) ] i= (b) LC p + φ( p i, y i ) E [ φ(p, Y ) ]. (45) i= In (a) we use he fac ha φ is pseudo-lipschiz, and in (b) we use Hölder s inequaliy x T y = x y q wih q = p/(p ) and define C as [ /( ) ( C := + p i + p i + y i )] i= ( + p i + p i + y i ) /( ) i= cons [ + ( p ) ( + p ) ( + y ) ], (46) where he firs sep is fro Jensen s inequaliy. Since ( p, y) saisfy he is for he non-adapive algorih we have: p = p i = E [ P ] < (47a) y = i= y i = E [ Y ] < (47b) i= Also, fro he inducion hypohesis (39a), i follows ha he adapive oupu us saisfy he sae i p = p i = E [ P ] <. (48) i=

12 2 APPROXIMATE MESSAGE PASSING WITH CONSISTENT PARAMETER ESTIMATION Cobining (45), (46), (47), (48), (39a) we conclude ha for all 0 (p, y) PL() = (P, Y ). (49) The i (49) along wih (38b) and he coninuiy condiion on H z in Assupion (d) prove he i in (39g). The i (39a) ogeher wih coninuiy condiions on G z in Assupions show ha (39c), (39d) and (39e) hold for. For exaple, o show (39d), we consider he i of he following expression ẑ z = G z(p, y, τ p, λ z) G z( p, y, τ p, λ z) (a) L p p = L p, where a (a) we used he Lipschiz coninuiy assupion. Siilar arguens can be used for (39c) and (39e). To show (39b), we proceed exacly as for (39a). Due o he coninuiy assupions on H x, his i in urn shows ha (39f) holds alos surely. Then, (38a) and (38b) follow direcly fro he coninuiy of G x in Assupions, ogeher wih (39b) and (39f). We have hus shown ha if he is (38) and (39) hold for soe, hey hold for +. Thus, by inducion hey hold for all. Finally, o show (2), le φ be any pseudo-lipschiz coninuous funcion φ(x, r, x), and define ɛ = n j= j= [ )] φ(x j, r j, x + j ) E φ(x, R, X +, (50) which, due o convergence of non-adapive GAMP, can be ade arbirarily sall by choosing n large enough. Then, consider [ )] φ(x j, ˆr n j, x + j ) E φ(x, R, X + j= ɛ n + n φ(x j, ˆr n j, x + j ) φ(x j, r j, x + j ) (a) ɛ n + L r r + L ˆx + x + n + L ( ˆr n j + r j ) ( ˆr j r j + x + j x + j ) + L n (b) j= n j= + L ( x ( x + j + x + j ) ( ˆr j r j + x + j x + j ) ) ɛ n + L ( r + L ( ) ( r ( ) ( ( ˆM + x + L ( ) x + M x ) + ( + M x ) + ( = n + ˆM x ) + ( M r) + ˆM x ) + ( M r) where L, L are consans independen of n and ˆx +, r, M + x = n x +, ˆM r = n M r = n r + ( ˆM ) r) ) + ( ˆM r) (5) In (a) we use he fac ha φ is pseudo-lipshiz, in (b) we use l p -nor equivalence x n /p x and Hölder s inequaliy x T y = x y q wih q = p/(p ). By applying + + of (38a), (39b) and since, ˆM x, M x, ˆM r, and M r converge o a finie value we can obain he firs equaion of (2) by aing n. The second equaion in (2) can be shown in a siilar way. This proves he is (2). Also, he firs wo is in (22) are a consequence of (39f) and (39f). The second wo is follow fro coninuiy assupions in Assupion (e) and he convergence of he epirical disribuions in (2). This coplees he proof. APPENDIX D PROOF OF THEOREM 3 Par (a) of Theore 3 is a direc applicaion of he general resul, Theore IV-A. To apply he general resul, firs observe ha Assupions 3(a) and (c) iediaely iply he corresponding ies in Assupions 2. So, we only need o verify he coninuiy condiion in Assupion 2(b) for he adapaion funcions in (3) and (5). We begin by proving he coninuiy of Hz. Fix, and le (y (n), p (n) ) be a sequence of vecors and τ p (n) be a sequence of scalars such ha (y(n), p (n) ) PL(p) = (Y, P ) τ p (n) = τ p, (52) where (Y, P ) and τ p are he oupus of he sae evoluion equaions. For each n, le λ (n) z = H z(y (n), p (n), τ (n) p ). (53) λ (n) z We wish o show ha λ z, he rue paraeer. Since λ (n) z Λ z and Λ z is copac, i suffices o show ha, any i poin of any convergen subsequence is equal o λ z. (n) So, suppose ha λ z λ z o soe i poin λ z on soe (n) subsequence λ z. (n) Fro λ z and he definiion (5) i follows ha φ z (y (n) i, p (n) i, τ (n) (n) p, λ z ) i= i= φ z (y (n) i λ (n) z Now, since τ p (n) τ p and coninuiy condiion in Definiion 2(c) o obain inf [ i= φ z (y (n) i, p (n) i, τ p (n), λ z). (54) λ z, we can apply he, p (n) i, τ p, λ z ) ] φ z (y (n) i, p (n) i, τ p, λ z) 0. (55) Also, he i (52) and he fac ha φ z is psuedo-lipschiz coninuous of order iplies ha E[φ z (Y, P, τ p, λ z )] E[φ z (Y, P, τ p, λ z)]. (56) Bu, propery (b) of Definiion 2 shows ha λ z is he axia of he righ-hand side, so E[φ z (Y, P, τ p, λ z )] = E[φ z (Y, P, τ p, λ z)]. (57)

Introduction to Numerical Analysis. In this lesson you will be taken through a pair of techniques that will be used to solve the equations of.

Introduction to Numerical Analysis. In this lesson you will be taken through a pair of techniques that will be used to solve the equations of. Inroducion o Nuerical Analysis oion In his lesson you will be aen hrough a pair of echniques ha will be used o solve he equaions of and v dx d a F d for siuaions in which F is well nown, and he iniial