Beyond ERGMs. Scalable methods for the statistical modeling of networks. David Hunter. Department of Statistics Penn State University

Size: px
Start display at page:

Download "Beyond ERGMs. Scalable methods for the statistical modeling of networks. David Hunter. Department of Statistics Penn State University"

Transcription

1 Beyond ERGMs Scalable methods for the statistical modeling of networks David Hunter Department of Statistics Penn State University Supported by ONR MURI Award Number N4---5 University of Texas at Austin, May, 3

2 Outline Estimation and the ERGM Framework Statistical Estimation for Large, Time-Varying Networks Model-Based Clustering of Large Networks

3 Outline Estimation and the ERGM Framework Statistical Estimation for Large, Time-Varying Networks Model-Based Clustering of Large Networks

4 A network model is a probability distribution (or family of distributions) on the set of all possible networks Thus, we assign each possible network a probability, e.g., 4,, and so on. 4 But we d like to avoid explicit enumeration. (Think of Occam s Razor.) ERGMs are one way to allow the assignment to depend (explicitly) on a relatively small number of parameters. ERGM = Exponential-family Random Graph Model

5 ERGM: Exponential-Family Random Graph Model An ERGM (or p-star model) says P θ (Y = y) = exp{θ g(y)}, y Y κ(θ, Y) Y is a random network Y is the set of all possible networks,, θ is a vector of parameters g(y) is a known vector of network statistics on y κ(θ, Y) makes all the probabilities sum to

6 The Gilbert-Erdős-Rényi model: The simplest ERGM The function κ(θ, Y) can be troublesome, but not always. Consider the following case (Ann. Math. Stat., 5):

7 The Gilbert-Erdős-Rényi model: The simplest ERGM Let p be some fixed constant between and. P(Y = y) = p E(y) ( p) E(y), where E(y) is the number of edges in y and E(y) is the number of non-edges in y. Rewrite using θ = log p log( p): ( p )# of edges P(Y = y) = ( p) const p = exp{θ # of edges} κ(θ)

8 Dyadic independence ERGMs are generally tractable Gilbert-Erdős-Rényi is a special case of dyadic independence: P θ (Y = y) = P θ (D ij = d ij ) i<j Dyad D ij, directed case: Dyad D ij, undirected case: i j i j Dyadic independence models have drawbacks but they facilitate estimation; facilitate simulation; avoid degeneracy issue (cf. Schweinberger, ).

9 Statistical inference is Probability in reverse The ERGM hypothesizes: P θ (Y = y) = exp{θ g(y)}, y Y κ(θ, Y) PROBABILITY θ ERGM,, STATISTICS Statistical Goal: Use observed data to select from the given ERGM class i.e., to learn about θ. We might search for a best θ or a density p(θ data).

10 The loglikelihood function is L(θ) = P θ (Y = y obs ) The ERGM hypothesizes: P θ (Y = y) = exp{θ g(y)}, y Y κ(θ, Y) To choose a θ, we might try to search for a best theta by maximizing L(θ) or l(θ) = log L(θ) = θ g(y obs ) log κ(θ, Y) Alternatively, a Bayesian approach tries to describe an entire distribution over θ values, the posterior: p(θ Y = y obs ) L(θ) π(θ).

11 Computing the likelihood is sometimes very difficult The likelihood is L(θ) = P θ (Y = y obs ), viewed as a function of θ. For this undirected, 34-node network, computing l(θ) directly may require summation of,54,4,4,643,,4,43,,6,6,53,,33,4, 44,3,,56,5,4,6, 4,4,34,4,,4,, 53,45,355,3,535,,36, 5,6,5,6,,54,6, 5,4,55,65,,6,5, 65,3,,,,653,5 terms.

12 The log-likelihood may be written as an expectation Recall: l(θ) = log L(θ) = θ g(y obs ) log κ(θ, Y) Suppose we fix θ. A bit of algebra shows that l(θ) l(θ ) = (θ θ ) g(y obs ) log E θ [ exp { (θ θ ) t g(y ) }] = (θ θ ) g(y obs ) log E θ [blah blah Y blah]. Thus, randomly sampling networks from P θ allows approximation of l(θ) l(θ ).

13 Example Network: High School Friendship Data School : 5 Students An edge indicates a mutual friendship. Colored labels give grade level, through. Circles = female, squares = male, triangles = unknown. N.B.: Missing data ignored here, though this could be altered.

14 Fitting an ERGM to the high school dataset ERGM parameter estimates from Hunter et al (): Coefficient Coefficient edges 3.4(.) AD (Gr.) = 3.4(.4) GWESP.3(.3) AD (Gr.) =.4(.4) GWD.(.35) AD (Gr.) = 3.43(.6) GWDSP.5(.) DH (Gr. ) 6.(.56) NF (Gr. ).34(.) DH (Gr. ) 6.4(.64) NF (Gr. ).64(.5) DH (Gr. ) 4.5(.5) NF (Gr. ).55(.5) DH (Gr. ) 4.6(.5) NF (Gr. ).(.6) DH (Gr. ) 4.3(.54) NF (Gr. ).3(.6) DH (Gr. ) 4.(.5) NF (Gr. NA) 3.6(.3) DH (White).55(.6) NF (Black).5(.4) DH (Black).(.55) NF (Hisp).3(.33) DH (Hisp).(.43) NF (Nat Am).(.3) DH (Nat Am).3(.43) NF (Other).6(.6) NF (Race NA).53(.) NF (Female).(.) UH (Sex).6(.6) NF (Sex NA).(.4) NF stands for Node Factor. AD stands for Absolute Difference. DH stands for Differential Homophily. UH stands for Uniform Homophily. Significant at.5 level Significant at. level Significant at. level School : 5 Students

15 But what about Large Networks? 5 nodes does not really qualify as large in this context. The estimation techniques used previously do not scale well. School : 5 Students

16 Outline Estimation and the ERGM Framework Statistical Estimation for Large, Time-Varying Networks Model-Based Clustering of Large Networks

17 Idea: Use counting process theory to model networks t= 3 4 t= t=6 t=3 Goal: Model a dynamically evolving network using counting processes. Methods should be applicable to large network datasets (tens or hundreds of thousands of nodes) Two modeling frameworks (terminology of Butts, ): Egocentric: The counting process Ni (t) = cumulative number of events involving the ith node by time t. Relational: The counting process Nij (t) = cumulative number of events involving the (i, j)th node pair by time t. NB: Events need not be edge additions

18 Counting processes may be considered multivariate Combine the N i (t) to give a multivariate counting process N(t) = (N (t),..., N n (t)). Genuinely multivariate; no assumption about the independence of N i (t). N(t) t= 3 4 t= t=6 t=3 N (t) N 4 (t) N 3 (t) N (t) 5 5 t

19 Citation Networks may be modeled as egocentric,55 articles; 35, citations; 5,4 unique times (theoretical physics articles on arxiv) At arrival, a paper cites others that are already in the network. Main dynamic development: Number of citations received. Time N i (t): Number of citations to paper i by time t. At-risk indicator R i (t): Equal to I{t arr i < t}.

20 Twitter behavior provides another egocentric example,53 people; 4,6 vaccination-related tweets from Aug. to Jan.. More than 4 million follower edges Of particular interest: HN vaccination sentiment. Some express or + sentiments regarding HN vaccination. N i (t) and N + i (t) are numbers of such tweets by time t. Question of interest: Which predictors (e.g., past behavior or self / followers / followees, position in directed following network) predict the propensity to tweet or +? Is tweeting behavior about HN vaccinations contagious?

21 A multivariate counting process is a submartingale Each N i (t) is nondecreasing in time, so N(t) may be considered a submartingale; i.e., it satisfies E [N(t) past up to time s] N(s) for all t > s. N(t) N (t) N 4 (t) N 3 (t) N (t) 5 5 t

22 The so-called Doob-Meyer Decomposition uniquely decomposes any submartingale N(t) = t λ(s) ds + M(t) : λ(t) is the signal at time t, called the intensity function M(t) is the noise, a continuous-time Martingale. We will model each λ i (t) or λ ij (t).

23 We use standard models for the intensity processes In the egocentric case, consider the Cox or Aalen model for the node i process: Cox Proportional Hazard Model, fixed coefficients: λ i (t H t ) = R i (t)α (t) exp ( β s i (t) ), Aalen additive model, time-varying coefficients: where λ i (t H t ) = R i (t) ( β (t) + β(t) s i (t) ), R i (t) = I(t > ti arr ) is the at-risk indicator H t is the past of the network up to but not including time t α (t) or β (t) is the baseline hazard function β is the p-vector of coefficients to estimate s i (t) = ( s i (t),..., s ip (t) ) is a vector of statistics for node i

24 Relational case is similar (cf. Perry and Wolfe, ) Cox Proportional Hazard Model, fixed coefficients: λ ij (t H t ) = R ij (t)α (t) exp ( β s(i, j, t) ), Aalen Additive Model, time-varying coefficients: λ ij (t H t ) = R ij (t) ( β (t) + β(t) s(i, j, t) ) where R ij (t) = I(max{ti arr, tj arr } < t < t eij ) is the at-risk indicator H t is the past of the network up to but not including time t α (t) or β (t) is the baseline hazard function β or β(t) is the vector of coefficients to estimate s(i, j, t) is a p-vector of statistics for pair (i, j)

25 For large networks, maximizing the partial likelihood in the Cox model requires some computing tricks Recall: The intensity process for node i is λ i (t H t ) = R i (t)α (t) exp ( β s i (t) ) Treat α as a nuisance parameter and take a partial likelihood approach: Maximize ( ) ( ) m exp β s ie (t e ) m exp β s ie (t e ) L(β) = ( ) =. n i= R i(t e ) exp β s i (t e ) κ(t e ) e= e= Computational Trick: Write κ(t e ) = κ(t e ) + κ(t e ), then optimize κ(t e ) calculation.

26 Fitting the Aalen model uses weighted least squares Recall: The intensity process for node i is λ i (t H t ) = R i (t) ( β (t) + β(t) s i (t) ). We do inference not for the β k but rather for their time-integrals B k (t) = t Then (basically weighted least squares) β k (s)ds. () ˆB(t) = J(t e ) [ W(t e ) W(t e ) ] W(te ) N(t e ), () where te t W(t) is N(N ) p with (i, j)th row Rij (t)s(i, j, t) ; J(t) is the indicator that W(t) has full column rank.

27 Example statistics: Preferential Attachment For each cited paper j already in the network... First-order PA: s j (t) = N i= y ij(t ). Rich get richer effect Second-order PA: s j (t) = i k y ki(t )y ij (t ). Effect due to being cited by well-cited papers j Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y(t ) is the network just prior to time t.

28 Example statistics: Recency PA Statistic For each cited paper j already in the network... Recency-based first-order PA (we take T w = days): s j3 (t) = N i= y ij(t )I(t ti arr < T w ). Temporary elevation of citation intensity after recent citations j Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y(t ) is the network just prior to time t.

29 Example statistics: Triangle Statistics For each cited paper j already in the network... Seller statistic: s j4 (t) = i k y ki(t )y ij (t)y kj (t ). Broker statistic: s j5 (t) = i k y kj(t)y ji (t )y ki (t ). Buyer statistic: s j6 (t) = i k y jk(t)y ki (t)y ji (t ). Seller A Buyer C Broker Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y(t ) is the network just prior to time t. B

30 Example statistics: Out-Path Statistics For each cited paper j already in the network... First-order out-degree (OD): s j (t) = N i= y ji(t ). Second-order OD: s j (t) = i k y jk(t )y ki (t ). j Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y(t ) is the network just prior to time t.

31 Example statistics: Topic Modeling Statistics Additional statistics, using abstract text if available, as follows: A latent Dirichlet allocation (LDA) model (Blei et al, 3) is learned on the training set. Figure from Wikipedia entry for Latent Dirichlet Analysis, Feb. N=words per document M=documents θ i =topic distribution for paper i We construct a vector (5 dimensions) of similarity statistics: s LDA j (ti arr ) = θ i θ j, where denotes the element-wise product of two vectors.

32 Coefficient Estimates for LDA + PPTR Model Statistics Coefficients (β) s (PA).36 s ( nd PA). s 3 (PA-).5 s 4 (Seller) -.6 s 5 (Broker) -.66 s 6 (Buyer) -.3 s ( st OD). s ( nd OD).5 All coefficient estimates are significant at the. level. Seller A Buyer C Broker B D C B Diverse seller effect: D more likely cited than A. Seller A Buyer C Broker B Diverse buyer effect: E more likely cited than C. A E B

33 average turned positive in mid October (as the vaccine became available) and remained positive for the rest of the year (Figure B). Twitter For vaccinationand sentiments measured HN online to Vaccination be meaningful, Sentiments they need to be compared to empirical data for validation. A where r~ i eii{ i aibi { P i aibi Negative (red), positive (green), and neutral (blue) tweets during fall wave of HN pandemic Salathé and Khandelwal () collected over 4 million tweets from over Twitter users. Here, counting process of interest is not formation of ties; it is expression of HN vaccination sentiments. Figure. (A) Total number of negative (red), positive (green), and neutral (blue) tweets relating to influenza A(HN) vaccination during the Fall wave of the pandemic. (B) Daily (gray) and 4 day moving average (blue) sentiment score during the same time. (C) Correlation between estimated N + vaccination rates for individuals older than 6 months, and sentiment score per HHS region (black dots) and states i (t) = # of positive tweets by i before time t (gray dots). Numbers represent the ten regions as defined by the US Department of Human Health & Services. Lines shows best fit of linear regression (blue for regions, red for states). doi:.3/journal.pcbi..g N i (t) = # of negative tweets by i before time t PLoS Computational Biology October Volume Issue e

34 Cox Model Coefficients for Twitter Dataset Intensity of Intensity of Positive Tweeting Negative Tweeting f + : # friends who tweet +. (p =.).5 (p < 3 ) f + : (+ tweets) (+ friends).43 (p =.34).3 (p < 3 ) f : # friends who tweet.5 (p < 3 ).3 (p =.) f : ( tweets) ( friends).6 (p < 3 ).6 (p < 3 ) where the statistics here are given by f + (i, t) = j F(i,t) f + (i, t) = f + (i, t) # of + tweets by j up to time t total # of tweets by j up to time t j F(i,t) (# of + tweets by j up to time t) total # of tweets by j up to time t

35 Sensitivity Analysis for Twitter Dataset The automatic evaluation algorithm for tweets can err. Randomly re-classify all tweets using small test dataset comparing human classification to automatic classification. Repeat times Response: λ i (negative tweeting) Response: λ + i (positive tweeting) Pos. Friends Pos. Tweets Pos. CIs = 44.5 % Neg. CIs = 3 % Neg. Friends Neg. Tweets Pos. CIs = 3 % Neg. CIs = % Coefficient Coefficient

36 Outline Estimation and the ERGM Framework Statistical Estimation for Large, Time-Varying Networks Model-Based Clustering of Large Networks

37 Epinions.com: Example of large network dataset Unbiased Reviews by Real People Members of Epinions.com can decide whether to trust each other. Web of Trust combined with review ratings to determine which reviews are shown to the user. Dataset of Massa and Avesani (): n = 3, nodes n(n ) =.4 billion observations 4,3 of these are nonzero (±)

38 The Goal: Cluster 3, users Basis for clustering: Patterns of trusts and distrusts in the network If possible: understand the features of the clusters by examining parameter estimates. Unbiased Reviews by Real People Notation: Throughout, we let y ij be rating of j by i and y = (y ij ). We d like to restrict attention to dyadic independence ERGMs in order to model observed (y ij ) data.

39 To model dependence, add K -component mixture structure Suppose nodes have latent (unobserved) colors Z,..., Z n. Reality: What we observe: Simplifying assumption: P(Y = y Z ) = i<j P(D ij = d ij Z i, Z j ) where D ij is the state in Y of the (i, j)th pair.

40 Consider two examples of conditional dyadic independence for the Epinions dataset. The full model of Nowicki and Snijders (): P θ (D ij = d Z i = k, Z j = l) = θ d;kl. A more parsimonious model: P θ (D ij = d ij Z i = k, Z j = l) exp{θ (y ij + y ji ) +θk y ji + θl y ij +θ y ij y ji + θ ++ y + ij y + ji } where y ij = I{Y ij = } and y + ij = I{Y ij = +}. When K = 5 components, these models have and parameters, respectively.

41 There is a problem with the simplifying assumption Our conditional independence model says Z i iid Multinomial(; γ,..., γ K ); P θ (Y = y Z ) = i<j P θ (D ij = d ij Z i, Z j ) Not so simple when we do not observe Z. The full (unconditional) loglikelihood is rather complicated: l(γ, θ) = log z P γ (Z = z)p θ (Y = y Z = z)

42 Approximate maximum likelihood estimation uses a variational EM algorithm For MLE, goal is to maximize the loglikelihood l(γ, θ). Basic idea: Establish lower bound J(γ, θ, α) l(γ, θ) (3) after augmenting parameters by adding α. Create an EM-like algorithm guaranteed to increase J(γ, θ, α) at each iteration. If we maximize the lower bound, then we re hoping that the inequality (3) will be tight enough to put us close to a maximum of l(γ, θ). We adapt the variational EM idea of Daudin, Picard, & Robin ().

43 We may derive a lower bound by simple algebra Clever variational idea: Augment the parameter set, letting α ik = P(Z i = k) for all i n and k K. Let A α (Z ) = i Mult(z i; α i ) denote the joint dist. of Z. Direct calculation gives J(γ, θ, α) def = l(γ, θ) KL {A α (Z ), P γ,θ (Z Y )} =... = E α [log P γ.θ (Y, Z )] H [A α (Z )]. Thus, an EM-like algorithm consists of alternately: maximizing J(γ, θ, α) with respect to α ( E-step ) maximizing Eα [log P γ,θ (Y, Z )] with respect to γ, θ ( M-step )

44 The variational E-step may be modified using a (non-variational) MM algorithm Idea: Use a generalized variational E-step in which J(γ, θ, α) is increased but not necessarily maximized. To this end, we create a surrogate function Q(α, γ (t), θ (t), α (t) ) of α, where t is the counter of the iteration number. In the figure, the red curve minorizes f (x) at x. The surrogate function is a minorizer of J(γ, θ, α): It has the property that maximizing or increasing its value will guarantee an increase in the value of J(γ, θ, α). f(x ) x

45 Construction of the minorizer of J(γ, θ, α) uses standard MM algorithm methods K J(γ, θ, α) = i<j n + k= l= i= k= K α ik α jl log π dij ;kl(θ) C α ik (log γ k log α ik ). We may define a minorizing function as follows: Q(α, γ, θ, α (t) ) = K K α (t) α jl ik i<j k= l= α (t) + αjl ik ( n K + α ik log γ k log α (t) ik i= k= α (t) ik α (t) jl log π dij ;kl(θ) α ik α (t) + ik Can be maximized (in α) using quadratic programming. ).

46 The parsimonious model for the Epinions dataset P θ (D ij = d ij Z i = k, Z j = l) exp{θ (y ij where y ij = I{Y ij = } and y + ij = I{Y ij = +}. + y ji ) +θk y ji + θl y ij +θ y ij y ji + θ ++ y + ij y + ji } NB: The term θ + (y + ij + y + ji ) is missing to avoid perfect collinearity. Dyad D ij, directed case: i j θ : Overall tendency toward distrust θ k : Category-specific trustedness θ : lex talionis tendency (eye for an eye) θ ++ : quid pro quo tendency (one good turn... )

47 Parameter estimates themselves are of interest Parameter Confidence Parameter Estimate Interval Negative edges (θ ) 4. ( 4., 4.) Positive edges (θ + ) Negative reciprocity (θ ).66 (.64,.6) Positive reciprocity (θ ++ ). (.,.) Cluster Trustworthiness (θ ) 6.56 ( 6.6, 6.5) Cluster Trustworthiness (θ ).65 (.66,.653) Cluster 3 Trustworthiness (θ3 ).343 (.34,.33) Cluster 4 Trustworthiness (θ4 ).4 (.,.) Cluster 5 Trustworthiness (θ5 ) 5. ( 5.5, 5.) 5% Confidence intervals based on parametric bootstrap using 5 simulated networks. NB: There are some strange aspects of the bootstrap we cannot explain yet.

48 Multiple starting points converge to the same solution Trace plots from different randomly selected starting parameter values: Iterations 3 Loglikelihood values 4 Iterations 3 Trustedness parameters Full (-parameter) model results look nothing like this. 4

49 We may use average ratings of reviews by other users as a way to ground-truth the clustering solutions 65, articles categorized by author s highest-probability component. (Vertical axis is average article rating.) Parsimonious (-parameter) Model Full (-parameter) Model Size of Cluster Size of Cluster

50 ERGMs may not be a great way to model large networks with dependencies, but... The ERGM framework is useful because it forces researchers to think about which network statistics are important. Alternative models can exploit similar ways of thinking about networks, or even exploit ERGMs themselves.

51 Cited References: ERGMs Erdős, P and Rényi, A On Random Graphs I Publicationes Mathematicae [Debrecen], 5. Gilbert, E. N. Random Graphs Annals of Mathematical Statistics, 5. Hunter, D. R., Goodreau, S. M., and Handcock M. S. Goodness of fit for social network models J. Am. Stat. Assoc.,.

52 Cited References: Counting Processes for Networks Brandes, U., Lerner, J., and Snijders, T. A. B. Networks evolving step by step: Statistical analysis of dyadic event data. In Advances in Social Network Analysis and Mining, pp. 5. IEEE,. Butts, C.T. A relational event framework for social action. Sociological Methodology, 3():55,. Cox, D. R. Regression models and life-tables. Journal of the Royal Statistical Society, Series B, 34:,. Perry, P. O. and Wolfe, P. J. Point process modeling for directed interaction networks Journal of the Royal Statistical Society, Series B, to appear. Salathé, M., Vu, D. Q., Khandelwal, S., and Hunter, D. R. The Dynamics of Health Behavior Sentiments on a Large Social Network EPJ Data Science, :4, 3. Vu, D. Q., Asuncion, A. U., Hunter, D. R., and Smyth, P. Dynamic Egocentric Models for Citation Networks, Proceedings of the th International Conference on Machine Learning (ICML ), 5 64,. Vu, D. Q., Asuncion, A. U., Hunter, D. R., and Smyth, P. Continuous-Time Regression Models for Longitudinal Networks Advances in Neural Information Processing Systems 4 (NIPS ), to appear.

53 Cited References: Variational EM for Large Networks Daudin, J. J., Picard, F., and Robin, S. A Mixture Model for Random Graphs. Statistics & Computing,. Nowicki, K and Snijders, T. A. B. Estimation and Prediction for Stochastic Blockstructures. Journal of the American Statistical Association,. Vu, D. Q., Hunter, D. R., and Schweinberger, M. Model-Based Clustering of Large Networks Annals of Applied Statistics, to appear.

54 A FEW EXTRA SLIDES

55 Maximum Pseudolikelihood: Intuition What if we assume that there is no dependence (or very weak dependence) among the Y ij? In other words, what if we approximate the marginal P(Y ij = ) by the conditional P(Y ij = Yij c = yij c)? Then the Y ij are independent with log P(Y ij = ) P(Y ij = ) = θ δ(y obs ) ij, so we obtain an estimate of θ using straightforward logistic regression. Result: The maximum pseudolikelihood estimate. For independence models, MPLE = MLE!

56 MLE vs. MPLE Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. John W. Tukey MLE (maximum likelihood estimation): Well-established method but very hard because the normalizing constant κ(α) is difficult to evaluate, so we approximate it instead. MPLE (maximum pseudo-likelihood estimation): Easy to do using logistic regression, but based on an independence assumption that is often not justified. Several authors, notably van Duijn et al. (), argue forcefully against the use of MPLE (except when MLE=MPLE!).

57 Model construction and Testing Dataset: arxiv-th. High-energy physics theory articles, Jan. 3 Apr. 3. Timestamps are continuous time; abstract text is included. (,55 articles; 35, citations; 5,4 unique times). Statistics-building phase (,4 unique event times): Construct network history and build up network statistics.. Training phase ( unique event times): Construct partial likelihood and estimate model coefficients. 3. Test phase (5 unique event times): Evaluate predictive capability of the learned model. Statistics-building is ongoing even through the training and test phases. The phases are split along citation event times.

58 Recall Performance Recall: Proportion of true citations among largest K likelihoods.. Recall.6.4 PA PPT. PPTR LDA LDA+PPTR 5 5 Cut point K PA: pref. attach only (s (t)); PPT: s,..., s except s 3 ; PPTR: s,..., s ; LDA: LDA stats only

59 Social networks may be modeled as relational Irvine: Online social network of students users; 6 directed contact edges anteater Links are non-recurrent; i.e., N ij (t) is either or. At-risk indicator R ij (t) = I{max(ti arr, tj arr ) < t < t eij }. contactee contacter date :: :: :: :3: :3: :3: :: :4:6.43!

60 Relational Example: Modeling a network of contacts Irvine: Online social network of students users; 6 directed contact edges anteater Some of the statistics in the model: Sender out-degree: s (i, j, t) = h V,h i N ih(t ) Reciprocity: s 5 (i, j, t) = N ji (t ) Transitivity: s 6 (i, j, t) = h V,h i,j N ih(t )N hj (t ) Shared contacters: s (i, j, t) = h V,h i,j N hi(t )N hj (t )

61 Aalen model estimates for Irvine Data Set Aalen coefficients suggest two distinct phases of network evolution, consistent with an independent analysis (Panzarasa et al, ). On prediction experiments, Aalen/Cox outperforms logistic regression. Coefficient 3e 5 Coefficient.3 (a) Sender Out-Degree 6//4 //4 //4 Time (c) Transitivity 6//4 //4 //4 Time Coefficient. Coefficient 5e 5 (b) Reciprocity 6//4 //4 //4 Time (d) Shared Contacters 6//4 //4 //4 Time Recall..3.4 Adaptive LR (:) Adaptive LR (:5) Adaptive Cox Aalen (Uniform ) Aalen (Uniform ) 5 5 Cut Point K

Continuous-Time Regression Models for Longitudinal Networks

Continuous-Time Regression Models for Longitudinal Networks Continuous- Regression Models for Longitudinal Networks Duy Q. Vu Department of Statistics Pennsylvania State University University Park, PA 16802 dqv100@stat.psu.edu David R. Hunter Department of Statistics

More information

Algorithmic approaches to fitting ERG models

Algorithmic approaches to fitting ERG models Ruth Hummel, Penn State University Mark Handcock, University of Washington David Hunter, Penn State University Research funded by Office of Naval Research Award No. N00014-08-1-1015 MURI meeting, April

More information

Dynamic Egocentric Models for Citation Networks

Dynamic Egocentric Models for Citation Networks Duy Q. Vu * DQV100@STAT.PSU.EDU Arthur U. Asuncion ASUNCION@ICS.UCI.EDU David R. Hunter * DHUNTER@STAT.PSU.EDU Padhraic Smyth SMYTH@ICS.UCI.EDU * Department of Statistics, Pennsylvania State University,

More information

Goodness of Fit of Social Network Models 1

Goodness of Fit of Social Network Models 1 Goodness of Fit of Social Network Models David R. Hunter Pennsylvania State University, University Park Steven M. Goodreau University of Washington, Seattle Mark S. Handcock University of Washington, Seattle

More information

arxiv: v1 [stat.me] 3 Apr 2017

arxiv: v1 [stat.me] 3 Apr 2017 A two-stage working model strategy for network analysis under Hierarchical Exponential Random Graph Models Ming Cao University of Texas Health Science Center at Houston ming.cao@uth.tmc.edu arxiv:1704.00391v1

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Goodness of Fit of Social Network Models

Goodness of Fit of Social Network Models Goodness of Fit of Social Network Models David R. HUNTER, StevenM.GOODREAU, and Mark S. HANDCOCK We present a systematic examination of a real network data set using maximum likelihood estimation for exponential

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Consistency Under Sampling of Exponential Random Graph Models

Consistency Under Sampling of Exponential Random Graph Models Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Sampling and incomplete network data

Sampling and incomplete network data 1/58 Sampling and incomplete network data 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/58 Network sampling methods It is sometimes difficult to obtain a

More information

Conditional Marginalization for Exponential Random Graph Models

Conditional Marginalization for Exponential Random Graph Models Conditional Marginalization for Exponential Random Graph Models Tom A.B. Snijders January 21, 2010 To be published, Journal of Mathematical Sociology University of Oxford and University of Groningen; this

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Curved exponential family models for networks

Curved exponential family models for networks Curved exponential family models for networks David R. Hunter, Penn State University Mark S. Handcock, University of Washington February 18, 2005 Available online as Penn State Dept. of Statistics Technical

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Specification and estimation of exponential random graph models for social (and other) networks

Specification and estimation of exponential random graph models for social (and other) networks Specification and estimation of exponential random graph models for social (and other) networks Tom A.B. Snijders University of Oxford March 23, 2009 c Tom A.B. Snijders (University of Oxford) Models for

More information

Network Event Data over Time: Prediction and Latent Variable Modeling

Network Event Data over Time: Prediction and Latent Variable Modeling Network Event Data over Time: Prediction and Latent Variable Modeling Padhraic Smyth University of California, Irvine Machine Learning with Graphs Workshop, July 25 th 2010 Acknowledgements PhD students:

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Introduction to statistical analysis of Social Networks

Introduction to statistical analysis of Social Networks The Social Statistics Discipline Area, School of Social Sciences Introduction to statistical analysis of Social Networks Mitchell Centre for Network Analysis Johan Koskinen http://www.ccsr.ac.uk/staff/jk.htm!

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Assessing Goodness of Fit of Exponential Random Graph Models

Assessing Goodness of Fit of Exponential Random Graph Models International Journal of Statistics and Probability; Vol. 2, No. 4; 2013 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Assessing Goodness of Fit of Exponential Random

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Probabilistic Graphical Models & Applications

Probabilistic Graphical Models & Applications Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Statistical Models for Social Networks with Application to HIV Epidemiology

Statistical Models for Social Networks with Application to HIV Epidemiology Statistical Models for Social Networks with Application to HIV Epidemiology Mark S. Handcock Department of Statistics University of Washington Joint work with Pavel Krivitsky Martina Morris and the U.

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Assessing the Goodness-of-Fit of Network Models

Assessing the Goodness-of-Fit of Network Models Assessing the Goodness-of-Fit of Network Models Mark S. Handcock Department of Statistics University of Washington Joint work with David Hunter Steve Goodreau Martina Morris and the U. Washington Network

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Statistical NLP for the Web

Statistical NLP for the Web Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

i=1 h n (ˆθ n ) = 0. (2)

i=1 h n (ˆθ n ) = 0. (2) Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Stochastic blockmodeling of relational event dynamics

Stochastic blockmodeling of relational event dynamics Christopher DuBois Carter T. Butts Padhraic Smyth Department of Statistics University of California, Irvine Department of Sociology Department of Statistics Institute for Mathematical and Behavioral Sciences

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Mining Triadic Closure Patterns in Social Networks

Mining Triadic Closure Patterns in Social Networks Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Statistical Model for Soical Network

Statistical Model for Soical Network Statistical Model for Soical Network Tom A.B. Snijders University of Washington May 29, 2014 Outline 1 Cross-sectional network 2 Dynamic s Outline Cross-sectional network 1 Cross-sectional network 2 Dynamic

More information

IV. Analyse de réseaux biologiques

IV. Analyse de réseaux biologiques IV. Analyse de réseaux biologiques Catherine Matias CNRS - Laboratoire de Probabilités et Modèles Aléatoires, Paris catherine.matias@math.cnrs.fr http://cmatias.perso.math.cnrs.fr/ ENSAE - 2014/2015 Sommaire

More information

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 21 Cosma Shalizi 3 April 2008 Models of Networks, with Origin Myths Erdős-Rényi Encore Erdős-Rényi with Node Types Watts-Strogatz Small World Graphs Exponential-Family

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Generative Models for Discrete Data

Generative Models for Discrete Data Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Dynamic modeling of organizational coordination over the course of the Katrina disaster

Dynamic modeling of organizational coordination over the course of the Katrina disaster Dynamic modeling of organizational coordination over the course of the Katrina disaster Zack Almquist 1 Ryan Acton 1, Carter Butts 1 2 Presented at MURI Project All Hands Meeting, UCI April 24, 2009 1

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Modeling heterogeneity in random graphs

Modeling heterogeneity in random graphs Modeling heterogeneity in random graphs Catherine MATIAS CNRS, Laboratoire Statistique & Génome, Évry (Soon: Laboratoire de Probabilités et Modèles Aléatoires, Paris) http://stat.genopole.cnrs.fr/ cmatias

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 21: More Networks: Models and Origin Myths Cosma Shalizi 31 March 2009 New Assignment: Implement Butterfly Mode in R Real Agenda: Models of Networks, with

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt. Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider

More information

Overview course module Stochastic Modelling

Overview course module Stochastic Modelling Overview course module Stochastic Modelling I. Introduction II. Actor-based models for network evolution III. Co-evolution models for networks and behaviour IV. Exponential Random Graph Models A. Definition

More information

Delayed Rejection Algorithm to Estimate Bayesian Social Networks

Delayed Rejection Algorithm to Estimate Bayesian Social Networks Dublin Institute of Technology ARROW@DIT Articles School of Mathematics 2014 Delayed Rejection Algorithm to Estimate Bayesian Social Networks Alberto Caimo Dublin Institute of Technology, alberto.caimo@dit.ie

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media

Hybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media Hybrid Models for Text and Graphs 10/23/2012 Analysis of Social Media Newswire Text Formal Primary purpose: Inform typical reader about recent events Broad audience: Explicitly establish shared context

More information

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data

More information

Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data

Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data Maksym Byshkin 1, Alex Stivala 4,1, Antonietta Mira 1,3, Garry Robins 2, Alessandro Lomi 1,2 1 Università della Svizzera

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Chapter 16. Structured Probabilistic Models for Deep Learning

Chapter 16. Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information