Weighted Infinite Relational Model for Network Data

Similar documents
Multistage Median Ranked Set Sampling for Estimating the Population Median

Learning the structure of Bayesian belief networks

Machine Learning. Spectral Clustering. Lecture 23, April 14, Reading: Eric Xing 1

Distinct 8-QAM+ Perfect Arrays Fanxin Zeng 1, a, Zhenyu Zhang 2,1, b, Linjie Qian 1, c

Tian Zheng Department of Statistics Columbia University

P 365. r r r )...(1 365

Dirichlet Mixture Priors: Inference and Adjustment

A Brief Guide to Recognizing and Coping With Failures of the Classical Regression Assumptions

3. A Review of Some Existing AW (BT, CT) Algorithms

On Maneuvering Target Tracking with Online Observed Colored Glint Noise Parameter Estimation

Thermodynamics of solids 4. Statistical thermodynamics and the 3 rd law. Kwangheon Park Kyung Hee University Department of Nuclear Engineering

A. Thicknesses and Densities

Set of square-integrable function 2 L : function space F

Experimental study on parameter choices in norm-r support vector regression machines with noisy input

Chapter Fifiteen. Surfaces Revisited

Correspondence Analysis & Related Methods

COMPLEMENTARY ENERGY METHOD FOR CURVED COMPOSITE BEAMS

UNIT10 PLANE OF REGRESSION

N = N t ; t 0. N is the number of claims paid by the

8 Baire Category Theorem and Uniform Boundedness

an application to HRQoL

An Approach to Inverse Fuzzy Arithmetic

Bayesian Assessment of Availabilities and Unavailabilities of Multistate Monotone Systems

PARAMETER ESTIMATION FOR TWO WEIBULL POPULATIONS UNDER JOINT TYPE II CENSORED SCHEME

Exact Simplification of Support Vector Solutions

Generating Functions, Weighted and Non-Weighted Sums for Powers of Second-Order Recurrence Sequences

Physics 2A Chapter 11 - Universal Gravitation Fall 2017

Khintchine-Type Inequalities and Their Applications in Optimization

A Study about One-Dimensional Steady State. Heat Transfer in Cylindrical and. Spherical Coordinates

The Greatest Deviation Correlation Coefficient and its Geometrical Interpretation

Physics 11b Lecture #2. Electric Field Electric Flux Gauss s Law

Optimal System for Warm Standby Components in the Presence of Standby Switching Failures, Two Types of Failures and General Repair Time

Integral Vector Operations and Related Theorems Applications in Mechanics and E&M

A NOTE ON ELASTICITY ESTIMATION OF CENSORED DEMAND

PHYS 705: Classical Mechanics. Derivation of Lagrange Equations from D Alembert s Principle

Recursive Least-Squares Estimation in Case of Interval Observation Data

A NOVEL DWELLING TIME DESIGN METHOD FOR LOW PROBABILITY OF INTERCEPT IN A COMPLEX RADAR NETWORK

ON THE FRESNEL SINE INTEGRAL AND THE CONVOLUTION

The Forming Theory and the NC Machining for The Rotary Burs with the Spectral Edge Distribution

Part V: Velocity and Acceleration Analysis of Mechanisms

THE REGRESSION MODEL OF TRANSMISSION LINE ICING BASED ON NEURAL NETWORKS

Efficiency of the principal component Liu-type estimator in logistic

Mechanics Physics 151

Approximate Abundance Histograms and Their Use for Genome Size Estimation

Backward Haplotype Transmission Association (BHTA) Algorithm. Tian Zheng Department of Statistics Columbia University. February 5 th, 2002

Bayesian Tangent Shape Model: Estimating Shape and Pose Parameters via Bayesian Inference *

On the Distribution of the Product and Ratio of Independent Central and Doubly Non-central Generalized Gamma Ratio random variables

LASER ABLATION ICP-MS: DATA REDUCTION

4 Recursive Linear Predictor

Event Shape Update. T. Doyle S. Hanlon I. Skillicorn. A. Everett A. Savin. Event Shapes, A. Everett, U. Wisconsin ZEUS Meeting, October 15,

INTERVAL ESTIMATION FOR THE QUANTILE OF A TWO-PARAMETER EXPONENTIAL DISTRIBUTION

Scalable Bayesian Low-Rank Decomposition of Incomplete Multiway Tensors

Theo K. Dijkstra. Faculty of Economics and Business, University of Groningen, Nettelbosje 2, 9747 AE Groningen THE NETHERLANDS

A Queuing Model for an Automated Workstation Receiving Jobs from an Automated Workstation

GENERALIZATION OF AN IDENTITY INVOLVING THE GENERALIZED FIBONACCI NUMBERS AND ITS APPLICATIONS

Stellar Astrophysics. dt dr. GM r. The current model for treating convection in stellar interiors is called mixing length theory:

Optimization Methods: Linear Programming- Revised Simplex Method. Module 3 Lecture Notes 5. Revised Simplex Method, Duality and Sensitivity analysis

Clustering Techniques

Chapter 23: Electric Potential

Scalars and Vectors Scalar

SOME NEW SELF-DUAL [96, 48, 16] CODES WITH AN AUTOMORPHISM OF ORDER 15. KEYWORDS: automorphisms, construction, self-dual codes

Vibration Input Identification using Dynamic Strain Measurement

Chapter 8. Linear Momentum, Impulse, and Collisions

Machine Learning 4771

A Novel Ordinal Regression Method with Minimum Class Variance Support Vector Machine

Analytical and Numerical Solutions for a Rotating Annular Disk of Variable Thickness

Evaluation of Various Types of Wall Boundary Conditions for the Boltzmann Equation

Robust Feature Induction for Support Vector Machines

Energy in Closed Systems

CS649 Sensor Networks IP Track Lecture 3: Target/Source Localization in Sensor Networks

Exploring and Modeling Fire Department Emergencies with a Spatio-Temporal Marked Point Process

Re-Ranking Retrieval Model Based on Two-Level Similarity Relation Matrices

VISUALIZATION OF THE ABSTRACT THEORIES IN DSP COURSE BASED ON CDIO CONCEPT

Optimization Algorithms for System Integration

DISC-GLASSO: DISCRIMINATIVE GRAPH LEARNING WITH SPARSITY REGULARIZATION. 201 Broadway, Cambridge, MA 02139, USA

AN EXACT METHOD FOR BERTH ALLOCATION AT RAW MATERIAL DOCKS

VParC: A Compression Scheme for Numeric Data in Column-Oriented Databases

CEEP-BIT WORKING PAPER SERIES. Efficiency evaluation of multistage supply chain with data envelopment analysis models

Pattern Analyses (EOF Analysis) Introduction Definition of EOFs Estimation of EOFs Inference Rotated EOFs

Using DP for hierarchical discretization of continuous attributes. Amit Goyal (31 st March 2008)

The Unique Solution of Stochastic Differential Equations With. Independent Coefficients. Dietrich Ryter.

Combining IMM Method with Particle Filters for 3D Maneuvering Target Tracking

Minimal Detectable Biases of GPS observations for a weighted ionosphere

Constraint Score: A New Filter Method for Feature Selection with Pairwise Constraints

International Journal of Pure and Applied Sciences and Technology

A-Kappa: A measure of Agreement among Multiple Raters

APPLICATIONS OF SEMIGENERALIZED -CLOSED SETS

q-bernstein polynomials and Bézier curves

Observer Design for Takagi-Sugeno Descriptor System with Lipschitz Constraints

If there are k binding constraints at x then re-label these constraints so that they are the first k constraints.

Chapter IV Vector and Tensor Analysis IV.2 Vector and Tensor Analysis September 29,

Some Approximate Analytical Steady-State Solutions for Cylindrical Fin

Groupoid and Topological Quotient Group

19 The Born-Oppenheimer Approximation

Amplifier Constant Gain and Noise

Identification of the degradation of railway ballast under a concrete sleeper

Information Retrieval

Bayesian Uncertainty Quantification and Propagation in Large-Order Finite Element Models using CMS Techniques

Physics Exam 3

Professor Wei Zhu. 1. Sampling from the Normal Population

Transcription:

Jounal of Communcatons Vol. 10, No. 6, June 2015 Weghted Infnte Relatonal Model fo Netwo Data Xaojuan Jang and Wensheng Zhang Insttute of Automaton, Unvesty of Chnese Academy of Scences, Bejng 100190, PR Chna Emal: {xaojuan.jang, wensheng.zhang}@a.ac.cn Abstact As the avalablty and scope of socal netwos and elatonal datasets ncease, leanng latent stuctue n complex netwos has become an mpotant poblem fo patten ecognton. To contact compact and flexble epesentatons fo weghted netwos, a Weghted Infnte Relatonal Model (WIRM) s poposed to lean fom both the pesence and weght of lns n netwos. As a Bayesan nonpaametc model based on the Dchlet pocess po, a dstnctve featue of WIRM s ts ablty to lean the latent stuctue undelyng weghted netwos wthout specfyng the numbe of clustes. Ths s patculaly mpotant fo stuctue dscovey n complex netwos, especally fo novel domans whee we may have lttle po nowledge. We develop a mean-feld vaatonal algothm to effcently appoxmate the model's posteo dstbuton ove the nfnte latent clustes. Expements on synthetc data set and eal-wold data sets demonstate that WIRM can effectvely captue the latent stuctue undelyng the complex weghted netwos. Index Tems Patten ecognton, netwo modelng; bayesan nonpaametc model, dchlet pocess, Chnese estauant pocess, exponental famly, vaatonal nfeence I. INTRODUCTION Statstcal analyss of complex netwos has been an actve aea of eseach fo decades, and s becomng an nceasngly mpotant challenge n patten ecognton as the scope and avalablty of netwo datasets ncease n vaous scentfc felds [1]. Unle tadtonal data collected fom ndvdual objects, the obsevatons n netwo data ae no longe ndependent o exchangeable because vetces ae pawse elated. Independence o exchangeablty s a ey assumpton made n machne leanng and statstcs fo tadtonal attbute data [2]. Ths ntnsc dffeence n stuctue eques specal teatments fo netwo data. Uncoveng the latent stuctue based on the obseved pawse nteactons between vetces [1], [3] has been a focus of attenton n the netwo lteatue. Among all the statstcal models poposed fo ths end, the Stochastc Bloc Model (SBM) [4] s an elegant geneatve model of goup stuctue n unweghted netwos. The Stochastc Bloc Model has been successfully used fo modelng assotatve netwo stuctue [5], dsassotatve stuctue [1] and bpatte stuctue [6]. And the SBM also has been genealzed fo count-valued data, degee coecton [7] Manuscpt eceved Febuay 11, 2015; evsed June 24, 2015. Ths wo was suppoted by the Natonal Natual Scence Foundaton of Chna unde Gant No.U1135005. Coespondng autho emal: xaojuan.jang@a.ac.cn. do:10.12720/jcm.10.6.442-449 and categocal values [8]. The Mxed Membeshp Stochastc Bloc Model [9] nceases the expessveness of the latent class models by allowng mxed membeshp, assocatng each object wth a dstbuton vecto ove clustes. A basc assumpton n most of these models s that the netwos ae unweghted, whee the nteacton pesence o absence s epesented as a bnay vaable. Howeve, most eal-wold netwos contan nfomaton about ln weghts. Fo nstance, n socal netwos the weghts epesent the stengths of socal tes between people [1]. A wdely-used technque to analyze on weghted netwo s to tansfom the data nto the bnay famewo va thesholdng. As a esult, the potental loss of valuable nfomaton fom thesholdng may lead to obscuaton o dstoton n ecoveng undelyng stuctue [10], [11]. To dectly lean the latent stuctue of weghted netwos, an extenson of the stochastc bloc model wth Posson lelhood [7], [12] was consdeed fo countvalued pawse nteactons. Moeove, a genealzaton of the SBM, called Weghted Stochastc Bloc Model (WSBM) [10], was ntoduced to lean fom both the ln-exstence and -weght nfomaton. Howeve, the numbe of latent clustes (o blocs) n all these models s equed to be specfed. Ths may be vey dffcult to access fo eal-wold netwos. Usually, ths paamete s set a po o fxed va a computatonal expensve model selecton pocedue [3], [10], [13]. To ovecome ths poblem, the Infnte Relatonal Model (IRM) [14] and the Infnte Hdden Relatonal Model [15] use the Dchlet pocess po to defne a nonpaametc elatonal model fo unweghted netwos. In ths pape, we ntoduce the Weghted Infnte Relatonal Model (WIRM), a Bayesan nonpaametc model that can lean a potentally nfnte numbe of clustes fom both the exstence and weght of lns. We tae each weghted ln as a daw fom a paametc exponental famly dstbuton, whch ncludes as specal cases most standad dstbutons, e.g. the Benoull, the Gaussan and the genealzatons. Wth ths geneal dstbutonal fom, we can dectly use the weght nfomaton n ecoveng latent cluste o bloc stuctue. Moeove, the WIRM uses a nonpaametc Bayesan appoach to smultaneously nfe the numbe of latent clustes and the cluste membeshp of each object, whle at the same tme nfeng how cluste membeshp nfluences the obseved weghted ln nteactons. Thee ae two models that ae closely elated to WIRM. The nfnte elatonal model (IRM) [14] pevously 2015 Jounal of Communcatons 442

Jounal of Communcatons Vol. 10, No. 6, June 2015 adapted the Dchlet pocess to defne a nonpaametc elatonal model fo netwo modelng. IRM fts only to the ln-exstence nfomaton but gnoes ln-weght nfomaton; howeve, WIRM can lean fom both types of nfomaton usng a geneal fom of exponental famly dstbuton. On the othe hand, WIRM can also be seen as a nonpaametc extenson of the WSBM poposed n [10], whee the numbe of clustes s chosen befoe the model can be appled to data. Compaed to WSBM, a dstnctve featue of WIRM s ts ablty to nfe fom the obseved data that how many latent clustes thee ae. Ths s patculaly mpotant when we may have lttle po nowledge about the numbe of clustes, especally fo stuctue dscovey n novel domans. The pape s aanged as follows. We fst descbe the geneatve pocess of ou model n Secton II. Then a vaatonal nfeence algothm s deved fo pefomng appoxmate posteo nfeence and paamete estmaton n Secton III. Secton IV compaes the pefomance of the WIRM to altenatve methods fo two ln pedcton tass, and analyzes the esults. Secton V concludes the pape. II. WEIGHTED INFINITE RELATIONAL MODEL Let A be a N N matx that contans lns nfomaton among objects n a dected elatonal netwo. Hee, we consde the followng two types of nfomaton n the ln obsevatons: nfomaton about ln-exstence (pesence o absence of lns) and nfomaton about ln-weghts (the weghted values). To specfy these two types of nfomaton n the netwos, we can tae the adjacency matx A as a bnay-valued matx o a eal-valued (o count-valued) matx. We want to patton the set of objects nto clustes, so that the elatonshps between objects can be pedcted by the cluste assgnments. The numbe of latent clustes pesent n the netwo, whch s not nown a po, s denoted as, so that the cluste assgnment vaable of each object z {1, 2,, }. A. Modelng Obseved Ln Infomaton Suppose we ae gven the cluste assgnment vecto Z {z 1,z 2,z N}. Fo each pa of clustes ('), we can model the 'bundle' of lns fom objects n cluste to those n cluste ', usng an exponental dstbuton famly paametezed by. That s, fo object wth cluste assgnment z ' lelhood of obsevng a ln and object j wth z ', the A j s P(A j Z, ) exp T(A j ) ( z ) z j whee T s the vecto valued functon of suffcent statstcs, and s the vecto valued functon of natual paamete. Exponental famly [16] compses a set of flexble dstbutons angng both contnuous and dscete j (1) T and andom vaables, ncludng the Gaussan, the Benoull, the Posson, the Gamma, the Geometc, the NegBnomal, etc. Specfcally, we can choose the Benoull dstbuton to model bnay exstence nfomaton of the lns, settng (x,1) log p / (1 p),log 1 p. Fo countvalued exstence nfomaton of lns, we may choose the Posson dstbutons wth T (x,1) and log,. To model eal-valued weght nfomaton of lns, the 2 nomal dstbuton may be used by settng T (x,x,1) / 2, 1/ (2 2 ), 2 / (2 2 ). and We may also ncopoate two types of nfomaton nto the lelhood functon va a smple elatve mpotance paamete c [0,1] : log P(A Z, ) j ct c T (e) (w) e (A j ) e ( z ) 1 (A ) ( ). z j w j w z z j whee the pa Te, e denotes the famly of ln-exstence dstbutons and Tw, w denotes the famly of lnweght dstbutons. B. Nonpaametc Po on Cluste Assgnment In ode to allow flexble epesentaton of the latent stuctue of data, we use the Dchlet pocess po cluste assgnments. The Dchlet Pocess, ntoduced n [17], s the undelyng andom measue of the Chnese estauant pocess (CRP) [2], [18], whch s wdely used as a nonpaametc po fo latent class models [19]. A mpotant chaactestc of the po s that condtoned on data, we can examne the posteo dstbuton of Z to get a data-dependent dstbuton of clustes numbe. The CRP metapho gves the ntuton. Imagne a estauant wth an nfnte numbe of tables, each wth an nfnte numbe of seats. The customes ente the estauant one afte anothe, and each chooses a table at andom. In the CRP wth paamete, each custome chooses an occuped table wth pobablty popotonal to the numbe of occupants, and chooses the next vacant table wth pobablty popotonal to. Ths pocess contnues untl all customes have seats, defnng a dstbuton ove allocatons of people to tables, and, moe geneally, objects to classes. One mpotant (and supsng) popety of ths pocess s that the jont pobablty of fnal assgnment s not affected by the ode of customes gettng nto the estauant, whch s called exchangeablty [18]. The Chnese estauant constucton of Dchlet Pocess dectly leads tself to a Gbbs sample; wheeas fo the vaatonal nfeence of Bayesan nonpaametc models, we tun to the stc-beang constucton of [20], whch povdes a concete set of hdden vaables on whch to place an appoxmate posteo [21-[23]. The stc-beang epesentaton of the cluste assgnment z {1,2,,} s as follows: (2) 2015 Jounal of Communcatons 443

Jounal of Communcatons Vol. 10, No. 6, June 2015 1 l1 v Beta 1,, 1,2, ( v) v (1 v ), 1,2, z Mult( ( v)), 1,2,, N C. The Full Bayesan Model To pefom fully-bayesan nfeence, we need to specfy the po fo ln bundle paametes. Fo Bayesan models, the nfeence analyss wth conjugate pos would be consdeably smplfed as the posteo dstbutons have the same functonal fom as the pos. Hee, fo the gven lelhood n exponental famly fom, the standad conjugate po on [16] s (3) 1 p( ) exp ( ) (4) Z() whee paametezes the po and Z() s a nomalzng facto. Fo notatonal convenence, we let ndex the ln-bundles between clustes; hence ( 1,, ). Now, we can summaze the whole geneatve pocess of the Weghted Infnte Relatonal Model as: Fo each object, assgn a cluste membeshp z as n (3). Fo each pa of clustes ('), daw a ln bundle paamete accodng to (4). Fo each pa of objects wth ndex and j, daw the ln obsevaton A fom the exponental famly n j (1). To specfy the ln-exstence obsevaton and -weght obsevaton smultaneously, we can tae (2) nstead of (1) as the lelhood functon n the geneatve pocess of WIRM. III. INFERENCE The WIRM defnes a geneatve pobablstc pocess of netwo data wth hdden stuctue. Gven netwo ln obsevatons A, we need to ecove the undelyng stuctue of the netwo by nfeng the posteo dstbuton of the latent vaables. Howeve, the posteo dstbuton of the latent vaables unde a Dchlet Pocess Po s not avalable n closed fom [19], [21], [24]. In ths secton, we epesent a vaatonal algothm [25] fo WIRM wth lelhood functon defned as n (1). Fo geneal case wth lelhood as n (2), the nfeence algothm follows wth mno modfcatons, and hee we omt the edundant detals. A. Tuncated Vaatonal Dstbutons The hdden vaables that we ae nteested n ae the auxlay stc-beang vaables V { v1,, v }, the cluste assgnment Z { z1,, z N }, and the ln paametes { 1,, }. To apply the vaatonal appoach, we tae the tuncated stc-beang epesentaton fo the vaatonal dstbutons. By settng qv ( 1) 1fo a fxed, we enfoce the popotons ( ) to be zeo fo. It s ponted out n [21], that V the model follows a full Dchlet pocess po whch s not tuncated; only the vaatonal posteo dstbuton s tuncated. The tuncaton level s a vaatonal paamete whch can be feely set; t s not a pat of the po model specfcaton. And f s lage enough, the ftted appoxmate posteo wll exhbt fewe than clustes. We use the followng fully factozed vaatonal dstbuton fo mean-feld vaatonal nfeence: 1 q( V, Z, ) q( v ; ) q( z ; ) q( ; ) N 1 1 whee qv ( ; ) ae beta dstbutons, qz ( ; ) ae multnomal dstbutons, and q( ; ) ae exponental famly dstbutons wth natual paametes suffcent statstcs ( ). B. Lowe Bound on the Magnal Lelhood and Usng the standad vaatonal theoy, we lowe bound the magnal log lelhood of the obseved data A usng Jensen's nequalty: log p( A) Eq[log p( A, V, Z, ))] Eq[log q( V, Z, )] (5) ( q) hee and elsewhee n the pape we omt the vaatonal paametes when usng q as a subscpt of an expectaton. Now we expand the lowe bound ( q) n (5) wth the appoxmate posteo q. To smplfy notaton, let and statstcs T be the expected values of the suffcent T and natual paametes unde the appoxmaton dstbuton q, that s, T, z, (, ), j z T A j j, j ( z, z j) log Z( ). By substtutng q and the conjugate po p n (5), and evaluatng all the expectatons, we have: q ( ) 0 log Z ( 0 ) 1,1,2 ( 1)log log 1 (,1 ) (,2 ) 1 ( T ),1 q,2 q, ( ) Z( ) 1 {( 1) E [log v ] ( ) E [log(1 v )]} N 1, leq[log(1 v )], Eq[log v ] 1 1 l1 1 1 N log,, (6) 2015 Jounal of Communcatons 444

Jounal of Communcatons Vol. 10, No. 6, June 2015 whee E [log v ] ( ) ( ), q,1,1,2 E [log(1 v )] ( ) ( ). q,2,1,2 The dgamma functon, denoted by, ases fom the devatve of the log nomalzaton facto n the Beta dstbuton. C. Coodnate Ascent Algothm Now, we pesent an explct coodnate ascent algothm fo optmzng the bound (6). We teatvely optmze the vaatonal lowe bound wth espect to each facto n tun. Convegence s guaanteed [16] because the bound ( q ) s convex wth espect to each of the factos n the vaatonal dstbuton q. The detals of the teaton ae as follows: Update fo the ln bundle paametes : The vaatonal dstbuton fo the ln bundle paamete s exponental famly wth suffcent statstcs ( ) and natual paamete. Coodnate ascent update equaton fo the vaatonal paamete s 0 T. Update fo the cluste assgnment z : The vaatonal paametes fo the cluste assgnment z ae {, }, and the update equaton fo {, } s T exp [log ] [log(1 )]. 1, Eq v Eq vl l1, Update fo the auxlay stc-beang vaable v : The vaatonal dstbuton fo the auxlay stcbeang vaable v s a beta dstbuton paametezed wth the shape paametes (,1,,2). Coodnate ascent update equaton fo these fee vaatonal paametes s,1,,2., l l 1 1, Although the vaatonal nfeence algothm yelds a bound fo any statng values of the vaatonal paametes, poo ntalzaton can lead to local maxma that yeld poo bounds [16]. In pactce, we un the algothm multple tmes wth andom ntalzatons and choose the fnal paamete settngs that gve the best bound on the magnal lelhood (6). To futhe mpove the pefomance, we can follow a sequental ntalzaton scheme [16] to ntalze the vaatonal dstbuton by ncementally updatng the paametes accodng to a andom pemutaton of the nodes n the netwo. IV. EXPERIMENTS In ths secton, we evaluate the pefomance of WIRM on a synthetc data and seveal eal-wold netwos. Expements wee conducted fo two puposes. Fst, we geneate synthetc data to exploe the ablty of ou model to nfe the numbe of latent clustes usng both the lnexstence and -weght nfomaton. Second, we compae the pefomance of ou model wth a numbe of state-ofthe-at netwo models on two pedcton tass. A. Synthetc Data We consde a smple N=100 synthetc dataset geneated wth 4 nown equal-sze clustes, whch has been used n [10], see Fg. 1. The weghts and exstences of each ln bundle ae nomally dstbuted and bnay dstbuted, espectvely, wth dffeent bundle-specfc paametes. Ths dataset s nteestng, as the bundlespecfc paametes ae shaed n a subtle manne. Specfcally, f we only consde the weght nfomaton of the netwo, the nodes can be natually sepaated nto two equal-sze sup-clustes: one s the cluste compsed of nodes ndexed by {1,,50}, the othe s compsed of nodes ndexed by {51,,100}; wheeas consdeng the exstence nfomaton (gnong the weghts) leads to a dffeent cluste assgnment: one cluste s compsed of nodes ndexed by {1,,25,51,,75}, the othe one s compsed of nodes ndexed by {26,,50,76,,100}. To analyze ths netwo, we set the tuncaton level to be 20 and ft ou model wth pue weght nfomaton by settng c=0, pue exstence nfomaton by settng c=1, and mxed nfomaton by settng c=0.5, espectvely. The posteo cluste assgnments ove 20 possble clustes leaned by WIRM ae shown n Fg. 2. Examnng the esults, we can see that the latent stuctue leaned by WIRM wth c=0 exactly ecoves the two sup-clustes undelyng the ln-weght nfomaton (Fg. 2(a)), and wth c=1 ecoves the othe two dffeent sup-clustes fo the exstence nfomaton of nteactons (Fg. 2(b)). Moeove, Fg. 2(c) demonstates the ablty of ftted model wth c=0.5 to captue the fou gound-tuth clustes, egadng the combnaton of both types of nfomaton. B. Real-Wold Netwos We now compae ou model to seveal othe netwo models fo pedctng the exstence o the weght of some unobseved nteactons on thee eal-wold netwos. The weghted netwos used fo the compason ae gven as follows: Collaboaton [26]. Vetces epesent 226 natons on Eath, and each of the 20616 edges s weghted by a nomalzed count of academc papes whose autho lsts nclude that pa of natons. Apot [27]. Ths s a netwo of the 500 busest commecal apots n the Unted States, and each of the 5960 dected edges s weghted by the numbe of passenges tavelng fom one apot to anothe. Foum [28]. The student socal netwo at UC Ivne ncludes 1899 uses that sent o eceved at least one message, and each of the 20291 dected edges s weghted by the numbe of messages sent between uses. Fo each of the two pedcton tass and fo each dataset, we evaluate the followng vaants of ou model: the 2015 Jounal of Communcatons 445

Jounal of Communcatons Vol. 10, No. 6, June 2015 `pue` WIRM (pwirm), usng only weght nfomaton (c=0), the `mxed` WSBM (mwirm), usng both edge and weght nfomaton (c=0.5), and the `non-`wirm (nwirm), usng only edge nfomaton (c=1). We use nomal dstbuton to model the weght of ln nteactons, and Benoull dstbuton to model the exstence of lns. A compaatve study wth the othe typcal models, (namely, WSBM [10] and IRM [14]), s also pefomed. In both pedcton tass, we teat all netwos as dected, and ft each model on 80% of N 2 nteactons, and use the emanng 20% as a test set. The tuncaton level fo ou model s fxed at 50 fo each model and each dataset. Fo those models that wee ntally establshed fo unweghted netwos (nwirm and IRM), we tae the pattons and compute the sample mean weght fo each of the nduced ln bundles n the weghted netwo and tae ths value as pedcto fo the weght of any mssng ln n that bundle. Fo each model and each dataset, we un 5 epeats, each tme wth a dffeent 80/20 coss-valdaton splt and usng a dffeent andom ntalzaton, and then compute the aveage mean-squaed eo (MSE) on the patcula pedcton tas. To compae the esults acoss dffeent datasets, we nomalzed ln-weghts to the nteval [-1, 1] afte applyng a logathmc tansfom. To demonstate the effcency and stablty of ou appoxmate nfeence algothm, we examne the change of the log magnal pobablty bound dung the teatons fo vaants of ou model. The esults on thee datasets ae shown n Fg. 3-5. We can see that wthn seveal teatons, the log magnal pobablty bound convege to a patcula egon, and then eep stable dung the followng teatons. We now epot the pedcton esults. Fo both pedcton tass, we use the sequental ntalzaton scheme to futhe mpove the pefomance. Table I epesents the esults fo pedctng ln-exstences and Table II epesents the esults fo pedctng ln-weghts. It s easly seen that, fo the ln-exstence pedcton, nwirm(c=1) and IRM sgnfcantly outpefom WSBM by usng the Dchlet pocess po. Fo the fo the lnweght pedcton, as a model desgned to lean only fom ln weght nfomaton, pwirm(c=0) s the most accuate model fo these thee datasets. We also notce, mwirm(c=0.5) poduces vey compettve esults on both tas by leanng fom both exstence and weght nfomaton. Ths mples that we can lean both types nfomaton wthout confusng each othe. (a) Fg. 1. Obseved synthetc data example. (a) Obseved synthetc 100 100 ln-weght matx. (b) Obseved synthetc 100 100 ln-exstence matx. Whte coesponds to zeo, blac to one. (b) (a) (b) (c) Fg. 2. Results fo synthetc data. (a) Posteo cluste assgnments leaned fom ln-weght nfomaton. (b) Posteo cluste assgnments leaned fom ln-exstence nfomaton. (c) Posteo cluste assgnments leaned fom both type of nfomaton 2015 Jounal of Communcatons 446

Jounal of Communcatons Vol. 10, No. 6, June 2015 (a) (b) (c) Fg. 3. Log magnal pobablty bound dung teatons fo pwirm n (a), nwirm n (b) and mwirm n (c) on Collaboaton dataset wth 5 andomly ntalzed uns (a) (b) (c) Fg. 4. Log magnal pobablty bound dung teatons fo pwirm n (a), nwirm n (b) and mwirm n (c) on Apot dataset wth 5 andomly ntalzed uns (a) (b) (c) Fg. 5. Log magnal pobablty bound dung teatons fo pwirm n (a), nwirm n (b) and mwirm n (c) on Foum dataset wth 5 andomly ntalzed uns TABLE I: AVERAGE MSE ON LIN EXISTENCE PREDICTION pwirm mwirm nwirm WSBM IRM Collaboaton 0.0738 0.0696 0.0682 0.1167 0.0687 Apot 0.0103 0.0089 0.0070 0.0156 0.0068 Foum 0.00548 0.00517 0.00509 0.00535 0.00516 2015 Jounal of Communcatons 447

Jounal of Communcatons Vol. 10, No. 6, June 2015 TABLE II: AVERAGE MSE ON LIN WEIGHT PREDICTION pwirm mwirm nwirm WSBM IRM Collaboaton 0.0443 0.0465 0.0549 0.0407 0.0798 Apot 0.0158 0.0180 0.0224 0.0486 0.0227 Foum 0.0491 0.0505 0.0519 0.0726 0.0549 V. CONCLUSIONS In ths pape, we popose a novel Bayesan nonpaametc model to genealze the classc nfnte elaton model to the mpotant case of weghted netwos. Ths model follows a Dchlet pocess po, n ode to nfe the numbe of latent clustes dung the nfeence pocedue. We develop an effcent coodnate ascent algothm to pefom vaatonal nfeence fo ou model. The empcal esults show that ou model can effcently captue the complex latent stuctue of weghted netwos, and accuately pedct the mssng nteactons and the weghts. REFERENCES [1] M. E. Newman, Netwos: An Intoducton, Oxfod Unvesty Pess, 2010. [2] D. J. Aldous, Exchangeablty and Related Topcs, Beln: Spnge, 1985. [3] J. M. Hofman and C. H. Wggns, Bayesan appoach to netwo modulaty, Physcal evew Lettes, vol. 100, no. 25, 2008 [4]. Nowc and T. A. B. Snjdes, Estmaton and pedcton fo stochastc bloc stuctues, Jounal of the Amecan Statstcal Assocaton, vol. 96, no. 455, pp. 1077-1087, 2001. [5] M. E. Newman, Mxng pattens n netwos, Physcal Revew E, vol. 67, no. 2, 2003. [6] D. B. Laemoe, A. Clauset, and A. Z. Jacobs, Effcently nfeng communty stuctue n bpatte netwos, Physcal Revew E, vol. 90, no. 1, 2014. [7] B. ae and M. E. Newman, Stochastc bloc models and communty stuctue n netwos, Physcal Revew E, vol. 83, no. 1, 2011. [8] R. Gumeàand M. Sales-Pado, A netwo nfeence method fo lage-scale unsupevsed dentfcaton of novel dug-dug nteactons, PLoS Computatonal Bology, vol. 9, no. 12, 2013. [9] E. M. Aold, D. M. Ble, S. E. Fenbeg, and E. P. Xng, Mxed membeshp stochastc bloc models, n Advances n Neual Infomaton Pocessng Systems, 2009, pp. 33-40. [10] C. Ache, A. Z. Jacobs, and A. Clauset, Leanng latent bloc stuctue n weghted netwos, Jounal of Complex Netwos, 2014. [11] A. C. Thomas and J.. Bltzsten, Valued tes tell fewe les: Why not to dchotomze netwo edges wth thesholds, axv: 1101.0788, 2011. [12] M. Maadassou, S. Robn, and C. Vache, Uncoveng latent stuctue n valued gaphs: a vaatonal appoach, The Annals of Appled Statstcs, vol. 4, no. 2, pp. 715-742, 2010. [13] T. P. Pexoto, Pasmonous module nfeence n lage netwos, Physcal Revew Lettes, vol. 110, no. 14, 2013. [14] C. emp, J. B. Tenenbaum, T. L. Gffths, T. Yamada, and N. Ueda, Leanng systems of concepts wth an nfnte elatonal model, n Poc. AAAI, 2006. [15] Z. Xu, V. Tesp,. Yu, and H. P. egel, Infnte hdden elatonal models, n Poc. Twenty-Second Confeence on Uncetanty n Atfcal Intellgence, 2006. [16] C. M. Bshop, Patten Recognton and Machne Leanng, New Yo: Spnge, 2006. [17] T. S. Feguson, A Bayesan analyss of some nonpaametc poblems, The Annals of Statstcs, vol. 1, no. 2, pp. 209-230, 1973. [18] J. Ptman, Combnatoal stochastc pocesses, Techncal Repot 621, Dept. Statstcs, UC Beeley, 2002. [19] C. Antona, Mxtues of Dchlet pocesses wth applcatons to Bayesan nonpaametc poblems, The Annals of Statstcs, vol. 2, no. 6, pp. 1152-1174, 1974. [20] J. Sethuaman, A constuctve defnton of dchlet pos, Statstca Snca, vol. 4, pp. 639-650, 1994. [21] D. M. Ble and M. I. Jodan, Vaatonal nfeence fo Dchlet pocess mxtue, Bayesan Analyss, vol. 1, no. 1, pp. 121-143, 2006. [22]. uhaa, M. Wellng, and Y. W. Teh, Collapsed vaatonal dchlet pocess mxtue models, n IJCAI, 2007, pp. 2796-2801. [23]. uhaa, M. Wellng, and N. A. Vlasss, Acceleated vaatonal dchlet pocess mxtues, n Advances n Neual Infomaton Pocessng Systems, 2006, pp. 761-768. [24] S. Jan and R. M. Neal, A splt-mege maov chan monte calo pocedue fo the dchlet pocess mxtue model, Jounal of Computatonal and Gaphcal Statstcs, vol. 13, no. 1, pp. 158-182, 2004. [25] H. Attas, A vaatonal bayesan famewo fo gaphcal models, n Advances n Neual Infomaton Pocessng Systems, 2000, pp. 209-215. [26] R.. Pan,. as, and S. Fotunato, Wold ctaton and collaboaton netwos: uncoveng the ole of geogaphy n scence, Scentfc Repots, vol. 2, 2012. [27] V. Colzza, R. Pasto-Satoas, and A. Vespgnan, Reactondffuson pocesses and metapopulaton models n heteogeneous netwos, Natue Physcs, vol. 3, no. 4, pp. 276-282, 2007. [28] T. Opsahl and P. Panzaasa, Clusteng n weghted netwos, Socal Netwos, vol. 31, no. 2, pp. 155-163, 2009. Xaojuan Jang s a Ph.D. canddate at Insttute of Automaton, Chnese Academy of Scence. He eseach nteests nclude machne leanng, netwo modelng and pobablstc gaphcal models. 2015 Jounal of Communcatons 448

Jounal of Communcatons Vol. 10, No. 6, June 2015 Wensheng Zhang s a pofesso and Ph.D. supevso at Insttute of Automaton, Chnese Academy of Scence. Hs eseach nteests nclude patten ecognton and machne leanng, Bg Data mnng, pobablstc gaphcal model, deep neual netwos, 3D numecal smulaton and vdeo mage pocessng. 2015 Jounal of Communcatons 449