Beyond ERGMs. Scalable methods for the statistical modeling of networks. David Hunter. Department of Statistics Penn State University
|
|
- Charleen Shepherd
- 5 years ago
- Views:
Transcription
1 Beyond ERGMs Scalable methods for the statistical modeling of networks David Hunter Department of Statistics Penn State University Supported by ONR MURI Award Number N4---5 University of Texas at Austin, May, 3
2 Outline Estimation and the ERGM Framework Statistical Estimation for Large, Time-Varying Networks Model-Based Clustering of Large Networks
3 Outline Estimation and the ERGM Framework Statistical Estimation for Large, Time-Varying Networks Model-Based Clustering of Large Networks
4 A network model is a probability distribution (or family of distributions) on the set of all possible networks Thus, we assign each possible network a probability, e.g., 4,, and so on. 4 But we d like to avoid explicit enumeration. (Think of Occam s Razor.) ERGMs are one way to allow the assignment to depend (explicitly) on a relatively small number of parameters. ERGM = Exponential-family Random Graph Model
5 ERGM: Exponential-Family Random Graph Model An ERGM (or p-star model) says P θ (Y = y) = exp{θ g(y)}, y Y κ(θ, Y) Y is a random network Y is the set of all possible networks,, θ is a vector of parameters g(y) is a known vector of network statistics on y κ(θ, Y) makes all the probabilities sum to
6 The Gilbert-Erdős-Rényi model: The simplest ERGM The function κ(θ, Y) can be troublesome, but not always. Consider the following case (Ann. Math. Stat., 5):
7 The Gilbert-Erdős-Rényi model: The simplest ERGM Let p be some fixed constant between and. P(Y = y) = p E(y) ( p) E(y), where E(y) is the number of edges in y and E(y) is the number of non-edges in y. Rewrite using θ = log p log( p): ( p )# of edges P(Y = y) = ( p) const p = exp{θ # of edges} κ(θ)
8 Dyadic independence ERGMs are generally tractable Gilbert-Erdős-Rényi is a special case of dyadic independence: P θ (Y = y) = P θ (D ij = d ij ) i<j Dyad D ij, directed case: Dyad D ij, undirected case: i j i j Dyadic independence models have drawbacks but they facilitate estimation; facilitate simulation; avoid degeneracy issue (cf. Schweinberger, ).
9 Statistical inference is Probability in reverse The ERGM hypothesizes: P θ (Y = y) = exp{θ g(y)}, y Y κ(θ, Y) PROBABILITY θ ERGM,, STATISTICS Statistical Goal: Use observed data to select from the given ERGM class i.e., to learn about θ. We might search for a best θ or a density p(θ data).
10 The loglikelihood function is L(θ) = P θ (Y = y obs ) The ERGM hypothesizes: P θ (Y = y) = exp{θ g(y)}, y Y κ(θ, Y) To choose a θ, we might try to search for a best theta by maximizing L(θ) or l(θ) = log L(θ) = θ g(y obs ) log κ(θ, Y) Alternatively, a Bayesian approach tries to describe an entire distribution over θ values, the posterior: p(θ Y = y obs ) L(θ) π(θ).
11 Computing the likelihood is sometimes very difficult The likelihood is L(θ) = P θ (Y = y obs ), viewed as a function of θ. For this undirected, 34-node network, computing l(θ) directly may require summation of,54,4,4,643,,4,43,,6,6,53,,33,4, 44,3,,56,5,4,6, 4,4,34,4,,4,, 53,45,355,3,535,,36, 5,6,5,6,,54,6, 5,4,55,65,,6,5, 65,3,,,,653,5 terms.
12 The log-likelihood may be written as an expectation Recall: l(θ) = log L(θ) = θ g(y obs ) log κ(θ, Y) Suppose we fix θ. A bit of algebra shows that l(θ) l(θ ) = (θ θ ) g(y obs ) log E θ [ exp { (θ θ ) t g(y ) }] = (θ θ ) g(y obs ) log E θ [blah blah Y blah]. Thus, randomly sampling networks from P θ allows approximation of l(θ) l(θ ).
13 Example Network: High School Friendship Data School : 5 Students An edge indicates a mutual friendship. Colored labels give grade level, through. Circles = female, squares = male, triangles = unknown. N.B.: Missing data ignored here, though this could be altered.
14 Fitting an ERGM to the high school dataset ERGM parameter estimates from Hunter et al (): Coefficient Coefficient edges 3.4(.) AD (Gr.) = 3.4(.4) GWESP.3(.3) AD (Gr.) =.4(.4) GWD.(.35) AD (Gr.) = 3.43(.6) GWDSP.5(.) DH (Gr. ) 6.(.56) NF (Gr. ).34(.) DH (Gr. ) 6.4(.64) NF (Gr. ).64(.5) DH (Gr. ) 4.5(.5) NF (Gr. ).55(.5) DH (Gr. ) 4.6(.5) NF (Gr. ).(.6) DH (Gr. ) 4.3(.54) NF (Gr. ).3(.6) DH (Gr. ) 4.(.5) NF (Gr. NA) 3.6(.3) DH (White).55(.6) NF (Black).5(.4) DH (Black).(.55) NF (Hisp).3(.33) DH (Hisp).(.43) NF (Nat Am).(.3) DH (Nat Am).3(.43) NF (Other).6(.6) NF (Race NA).53(.) NF (Female).(.) UH (Sex).6(.6) NF (Sex NA).(.4) NF stands for Node Factor. AD stands for Absolute Difference. DH stands for Differential Homophily. UH stands for Uniform Homophily. Significant at.5 level Significant at. level Significant at. level School : 5 Students
15 But what about Large Networks? 5 nodes does not really qualify as large in this context. The estimation techniques used previously do not scale well. School : 5 Students
16 Outline Estimation and the ERGM Framework Statistical Estimation for Large, Time-Varying Networks Model-Based Clustering of Large Networks
17 Idea: Use counting process theory to model networks t= 3 4 t= t=6 t=3 Goal: Model a dynamically evolving network using counting processes. Methods should be applicable to large network datasets (tens or hundreds of thousands of nodes) Two modeling frameworks (terminology of Butts, ): Egocentric: The counting process Ni (t) = cumulative number of events involving the ith node by time t. Relational: The counting process Nij (t) = cumulative number of events involving the (i, j)th node pair by time t. NB: Events need not be edge additions
18 Counting processes may be considered multivariate Combine the N i (t) to give a multivariate counting process N(t) = (N (t),..., N n (t)). Genuinely multivariate; no assumption about the independence of N i (t). N(t) t= 3 4 t= t=6 t=3 N (t) N 4 (t) N 3 (t) N (t) 5 5 t
19 Citation Networks may be modeled as egocentric,55 articles; 35, citations; 5,4 unique times (theoretical physics articles on arxiv) At arrival, a paper cites others that are already in the network. Main dynamic development: Number of citations received. Time N i (t): Number of citations to paper i by time t. At-risk indicator R i (t): Equal to I{t arr i < t}.
20 Twitter behavior provides another egocentric example,53 people; 4,6 vaccination-related tweets from Aug. to Jan.. More than 4 million follower edges Of particular interest: HN vaccination sentiment. Some express or + sentiments regarding HN vaccination. N i (t) and N + i (t) are numbers of such tweets by time t. Question of interest: Which predictors (e.g., past behavior or self / followers / followees, position in directed following network) predict the propensity to tweet or +? Is tweeting behavior about HN vaccinations contagious?
21 A multivariate counting process is a submartingale Each N i (t) is nondecreasing in time, so N(t) may be considered a submartingale; i.e., it satisfies E [N(t) past up to time s] N(s) for all t > s. N(t) N (t) N 4 (t) N 3 (t) N (t) 5 5 t
22 The so-called Doob-Meyer Decomposition uniquely decomposes any submartingale N(t) = t λ(s) ds + M(t) : λ(t) is the signal at time t, called the intensity function M(t) is the noise, a continuous-time Martingale. We will model each λ i (t) or λ ij (t).
23 We use standard models for the intensity processes In the egocentric case, consider the Cox or Aalen model for the node i process: Cox Proportional Hazard Model, fixed coefficients: λ i (t H t ) = R i (t)α (t) exp ( β s i (t) ), Aalen additive model, time-varying coefficients: where λ i (t H t ) = R i (t) ( β (t) + β(t) s i (t) ), R i (t) = I(t > ti arr ) is the at-risk indicator H t is the past of the network up to but not including time t α (t) or β (t) is the baseline hazard function β is the p-vector of coefficients to estimate s i (t) = ( s i (t),..., s ip (t) ) is a vector of statistics for node i
24 Relational case is similar (cf. Perry and Wolfe, ) Cox Proportional Hazard Model, fixed coefficients: λ ij (t H t ) = R ij (t)α (t) exp ( β s(i, j, t) ), Aalen Additive Model, time-varying coefficients: λ ij (t H t ) = R ij (t) ( β (t) + β(t) s(i, j, t) ) where R ij (t) = I(max{ti arr, tj arr } < t < t eij ) is the at-risk indicator H t is the past of the network up to but not including time t α (t) or β (t) is the baseline hazard function β or β(t) is the vector of coefficients to estimate s(i, j, t) is a p-vector of statistics for pair (i, j)
25 For large networks, maximizing the partial likelihood in the Cox model requires some computing tricks Recall: The intensity process for node i is λ i (t H t ) = R i (t)α (t) exp ( β s i (t) ) Treat α as a nuisance parameter and take a partial likelihood approach: Maximize ( ) ( ) m exp β s ie (t e ) m exp β s ie (t e ) L(β) = ( ) =. n i= R i(t e ) exp β s i (t e ) κ(t e ) e= e= Computational Trick: Write κ(t e ) = κ(t e ) + κ(t e ), then optimize κ(t e ) calculation.
26 Fitting the Aalen model uses weighted least squares Recall: The intensity process for node i is λ i (t H t ) = R i (t) ( β (t) + β(t) s i (t) ). We do inference not for the β k but rather for their time-integrals B k (t) = t Then (basically weighted least squares) β k (s)ds. () ˆB(t) = J(t e ) [ W(t e ) W(t e ) ] W(te ) N(t e ), () where te t W(t) is N(N ) p with (i, j)th row Rij (t)s(i, j, t) ; J(t) is the indicator that W(t) has full column rank.
27 Example statistics: Preferential Attachment For each cited paper j already in the network... First-order PA: s j (t) = N i= y ij(t ). Rich get richer effect Second-order PA: s j (t) = i k y ki(t )y ij (t ). Effect due to being cited by well-cited papers j Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y(t ) is the network just prior to time t.
28 Example statistics: Recency PA Statistic For each cited paper j already in the network... Recency-based first-order PA (we take T w = days): s j3 (t) = N i= y ij(t )I(t ti arr < T w ). Temporary elevation of citation intensity after recent citations j Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y(t ) is the network just prior to time t.
29 Example statistics: Triangle Statistics For each cited paper j already in the network... Seller statistic: s j4 (t) = i k y ki(t )y ij (t)y kj (t ). Broker statistic: s j5 (t) = i k y kj(t)y ji (t )y ki (t ). Buyer statistic: s j6 (t) = i k y jk(t)y ki (t)y ji (t ). Seller A Buyer C Broker Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y(t ) is the network just prior to time t. B
30 Example statistics: Out-Path Statistics For each cited paper j already in the network... First-order out-degree (OD): s j (t) = N i= y ji(t ). Second-order OD: s j (t) = i k y jk(t )y ki (t ). j Statistics in red are time-dependent. Others are fixed once j joins the network. NB: y(t ) is the network just prior to time t.
31 Example statistics: Topic Modeling Statistics Additional statistics, using abstract text if available, as follows: A latent Dirichlet allocation (LDA) model (Blei et al, 3) is learned on the training set. Figure from Wikipedia entry for Latent Dirichlet Analysis, Feb. N=words per document M=documents θ i =topic distribution for paper i We construct a vector (5 dimensions) of similarity statistics: s LDA j (ti arr ) = θ i θ j, where denotes the element-wise product of two vectors.
32 Coefficient Estimates for LDA + PPTR Model Statistics Coefficients (β) s (PA).36 s ( nd PA). s 3 (PA-).5 s 4 (Seller) -.6 s 5 (Broker) -.66 s 6 (Buyer) -.3 s ( st OD). s ( nd OD).5 All coefficient estimates are significant at the. level. Seller A Buyer C Broker B D C B Diverse seller effect: D more likely cited than A. Seller A Buyer C Broker B Diverse buyer effect: E more likely cited than C. A E B
33 average turned positive in mid October (as the vaccine became available) and remained positive for the rest of the year (Figure B). Twitter For vaccinationand sentiments measured HN online to Vaccination be meaningful, Sentiments they need to be compared to empirical data for validation. A where r~ i eii{ i aibi { P i aibi Negative (red), positive (green), and neutral (blue) tweets during fall wave of HN pandemic Salathé and Khandelwal () collected over 4 million tweets from over Twitter users. Here, counting process of interest is not formation of ties; it is expression of HN vaccination sentiments. Figure. (A) Total number of negative (red), positive (green), and neutral (blue) tweets relating to influenza A(HN) vaccination during the Fall wave of the pandemic. (B) Daily (gray) and 4 day moving average (blue) sentiment score during the same time. (C) Correlation between estimated N + vaccination rates for individuals older than 6 months, and sentiment score per HHS region (black dots) and states i (t) = # of positive tweets by i before time t (gray dots). Numbers represent the ten regions as defined by the US Department of Human Health & Services. Lines shows best fit of linear regression (blue for regions, red for states). doi:.3/journal.pcbi..g N i (t) = # of negative tweets by i before time t PLoS Computational Biology October Volume Issue e
34 Cox Model Coefficients for Twitter Dataset Intensity of Intensity of Positive Tweeting Negative Tweeting f + : # friends who tweet +. (p =.).5 (p < 3 ) f + : (+ tweets) (+ friends).43 (p =.34).3 (p < 3 ) f : # friends who tweet.5 (p < 3 ).3 (p =.) f : ( tweets) ( friends).6 (p < 3 ).6 (p < 3 ) where the statistics here are given by f + (i, t) = j F(i,t) f + (i, t) = f + (i, t) # of + tweets by j up to time t total # of tweets by j up to time t j F(i,t) (# of + tweets by j up to time t) total # of tweets by j up to time t
35 Sensitivity Analysis for Twitter Dataset The automatic evaluation algorithm for tweets can err. Randomly re-classify all tweets using small test dataset comparing human classification to automatic classification. Repeat times Response: λ i (negative tweeting) Response: λ + i (positive tweeting) Pos. Friends Pos. Tweets Pos. CIs = 44.5 % Neg. CIs = 3 % Neg. Friends Neg. Tweets Pos. CIs = 3 % Neg. CIs = % Coefficient Coefficient
36 Outline Estimation and the ERGM Framework Statistical Estimation for Large, Time-Varying Networks Model-Based Clustering of Large Networks
37 Epinions.com: Example of large network dataset Unbiased Reviews by Real People Members of Epinions.com can decide whether to trust each other. Web of Trust combined with review ratings to determine which reviews are shown to the user. Dataset of Massa and Avesani (): n = 3, nodes n(n ) =.4 billion observations 4,3 of these are nonzero (±)
38 The Goal: Cluster 3, users Basis for clustering: Patterns of trusts and distrusts in the network If possible: understand the features of the clusters by examining parameter estimates. Unbiased Reviews by Real People Notation: Throughout, we let y ij be rating of j by i and y = (y ij ). We d like to restrict attention to dyadic independence ERGMs in order to model observed (y ij ) data.
39 To model dependence, add K -component mixture structure Suppose nodes have latent (unobserved) colors Z,..., Z n. Reality: What we observe: Simplifying assumption: P(Y = y Z ) = i<j P(D ij = d ij Z i, Z j ) where D ij is the state in Y of the (i, j)th pair.
40 Consider two examples of conditional dyadic independence for the Epinions dataset. The full model of Nowicki and Snijders (): P θ (D ij = d Z i = k, Z j = l) = θ d;kl. A more parsimonious model: P θ (D ij = d ij Z i = k, Z j = l) exp{θ (y ij + y ji ) +θk y ji + θl y ij +θ y ij y ji + θ ++ y + ij y + ji } where y ij = I{Y ij = } and y + ij = I{Y ij = +}. When K = 5 components, these models have and parameters, respectively.
41 There is a problem with the simplifying assumption Our conditional independence model says Z i iid Multinomial(; γ,..., γ K ); P θ (Y = y Z ) = i<j P θ (D ij = d ij Z i, Z j ) Not so simple when we do not observe Z. The full (unconditional) loglikelihood is rather complicated: l(γ, θ) = log z P γ (Z = z)p θ (Y = y Z = z)
42 Approximate maximum likelihood estimation uses a variational EM algorithm For MLE, goal is to maximize the loglikelihood l(γ, θ). Basic idea: Establish lower bound J(γ, θ, α) l(γ, θ) (3) after augmenting parameters by adding α. Create an EM-like algorithm guaranteed to increase J(γ, θ, α) at each iteration. If we maximize the lower bound, then we re hoping that the inequality (3) will be tight enough to put us close to a maximum of l(γ, θ). We adapt the variational EM idea of Daudin, Picard, & Robin ().
43 We may derive a lower bound by simple algebra Clever variational idea: Augment the parameter set, letting α ik = P(Z i = k) for all i n and k K. Let A α (Z ) = i Mult(z i; α i ) denote the joint dist. of Z. Direct calculation gives J(γ, θ, α) def = l(γ, θ) KL {A α (Z ), P γ,θ (Z Y )} =... = E α [log P γ.θ (Y, Z )] H [A α (Z )]. Thus, an EM-like algorithm consists of alternately: maximizing J(γ, θ, α) with respect to α ( E-step ) maximizing Eα [log P γ,θ (Y, Z )] with respect to γ, θ ( M-step )
44 The variational E-step may be modified using a (non-variational) MM algorithm Idea: Use a generalized variational E-step in which J(γ, θ, α) is increased but not necessarily maximized. To this end, we create a surrogate function Q(α, γ (t), θ (t), α (t) ) of α, where t is the counter of the iteration number. In the figure, the red curve minorizes f (x) at x. The surrogate function is a minorizer of J(γ, θ, α): It has the property that maximizing or increasing its value will guarantee an increase in the value of J(γ, θ, α). f(x ) x
45 Construction of the minorizer of J(γ, θ, α) uses standard MM algorithm methods K J(γ, θ, α) = i<j n + k= l= i= k= K α ik α jl log π dij ;kl(θ) C α ik (log γ k log α ik ). We may define a minorizing function as follows: Q(α, γ, θ, α (t) ) = K K α (t) α jl ik i<j k= l= α (t) + αjl ik ( n K + α ik log γ k log α (t) ik i= k= α (t) ik α (t) jl log π dij ;kl(θ) α ik α (t) + ik Can be maximized (in α) using quadratic programming. ).
46 The parsimonious model for the Epinions dataset P θ (D ij = d ij Z i = k, Z j = l) exp{θ (y ij where y ij = I{Y ij = } and y + ij = I{Y ij = +}. + y ji ) +θk y ji + θl y ij +θ y ij y ji + θ ++ y + ij y + ji } NB: The term θ + (y + ij + y + ji ) is missing to avoid perfect collinearity. Dyad D ij, directed case: i j θ : Overall tendency toward distrust θ k : Category-specific trustedness θ : lex talionis tendency (eye for an eye) θ ++ : quid pro quo tendency (one good turn... )
47 Parameter estimates themselves are of interest Parameter Confidence Parameter Estimate Interval Negative edges (θ ) 4. ( 4., 4.) Positive edges (θ + ) Negative reciprocity (θ ).66 (.64,.6) Positive reciprocity (θ ++ ). (.,.) Cluster Trustworthiness (θ ) 6.56 ( 6.6, 6.5) Cluster Trustworthiness (θ ).65 (.66,.653) Cluster 3 Trustworthiness (θ3 ).343 (.34,.33) Cluster 4 Trustworthiness (θ4 ).4 (.,.) Cluster 5 Trustworthiness (θ5 ) 5. ( 5.5, 5.) 5% Confidence intervals based on parametric bootstrap using 5 simulated networks. NB: There are some strange aspects of the bootstrap we cannot explain yet.
48 Multiple starting points converge to the same solution Trace plots from different randomly selected starting parameter values: Iterations 3 Loglikelihood values 4 Iterations 3 Trustedness parameters Full (-parameter) model results look nothing like this. 4
49 We may use average ratings of reviews by other users as a way to ground-truth the clustering solutions 65, articles categorized by author s highest-probability component. (Vertical axis is average article rating.) Parsimonious (-parameter) Model Full (-parameter) Model Size of Cluster Size of Cluster
50 ERGMs may not be a great way to model large networks with dependencies, but... The ERGM framework is useful because it forces researchers to think about which network statistics are important. Alternative models can exploit similar ways of thinking about networks, or even exploit ERGMs themselves.
51 Cited References: ERGMs Erdős, P and Rényi, A On Random Graphs I Publicationes Mathematicae [Debrecen], 5. Gilbert, E. N. Random Graphs Annals of Mathematical Statistics, 5. Hunter, D. R., Goodreau, S. M., and Handcock M. S. Goodness of fit for social network models J. Am. Stat. Assoc.,.
52 Cited References: Counting Processes for Networks Brandes, U., Lerner, J., and Snijders, T. A. B. Networks evolving step by step: Statistical analysis of dyadic event data. In Advances in Social Network Analysis and Mining, pp. 5. IEEE,. Butts, C.T. A relational event framework for social action. Sociological Methodology, 3():55,. Cox, D. R. Regression models and life-tables. Journal of the Royal Statistical Society, Series B, 34:,. Perry, P. O. and Wolfe, P. J. Point process modeling for directed interaction networks Journal of the Royal Statistical Society, Series B, to appear. Salathé, M., Vu, D. Q., Khandelwal, S., and Hunter, D. R. The Dynamics of Health Behavior Sentiments on a Large Social Network EPJ Data Science, :4, 3. Vu, D. Q., Asuncion, A. U., Hunter, D. R., and Smyth, P. Dynamic Egocentric Models for Citation Networks, Proceedings of the th International Conference on Machine Learning (ICML ), 5 64,. Vu, D. Q., Asuncion, A. U., Hunter, D. R., and Smyth, P. Continuous-Time Regression Models for Longitudinal Networks Advances in Neural Information Processing Systems 4 (NIPS ), to appear.
53 Cited References: Variational EM for Large Networks Daudin, J. J., Picard, F., and Robin, S. A Mixture Model for Random Graphs. Statistics & Computing,. Nowicki, K and Snijders, T. A. B. Estimation and Prediction for Stochastic Blockstructures. Journal of the American Statistical Association,. Vu, D. Q., Hunter, D. R., and Schweinberger, M. Model-Based Clustering of Large Networks Annals of Applied Statistics, to appear.
54 A FEW EXTRA SLIDES
55 Maximum Pseudolikelihood: Intuition What if we assume that there is no dependence (or very weak dependence) among the Y ij? In other words, what if we approximate the marginal P(Y ij = ) by the conditional P(Y ij = Yij c = yij c)? Then the Y ij are independent with log P(Y ij = ) P(Y ij = ) = θ δ(y obs ) ij, so we obtain an estimate of θ using straightforward logistic regression. Result: The maximum pseudolikelihood estimate. For independence models, MPLE = MLE!
56 MLE vs. MPLE Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. John W. Tukey MLE (maximum likelihood estimation): Well-established method but very hard because the normalizing constant κ(α) is difficult to evaluate, so we approximate it instead. MPLE (maximum pseudo-likelihood estimation): Easy to do using logistic regression, but based on an independence assumption that is often not justified. Several authors, notably van Duijn et al. (), argue forcefully against the use of MPLE (except when MLE=MPLE!).
57 Model construction and Testing Dataset: arxiv-th. High-energy physics theory articles, Jan. 3 Apr. 3. Timestamps are continuous time; abstract text is included. (,55 articles; 35, citations; 5,4 unique times). Statistics-building phase (,4 unique event times): Construct network history and build up network statistics.. Training phase ( unique event times): Construct partial likelihood and estimate model coefficients. 3. Test phase (5 unique event times): Evaluate predictive capability of the learned model. Statistics-building is ongoing even through the training and test phases. The phases are split along citation event times.
58 Recall Performance Recall: Proportion of true citations among largest K likelihoods.. Recall.6.4 PA PPT. PPTR LDA LDA+PPTR 5 5 Cut point K PA: pref. attach only (s (t)); PPT: s,..., s except s 3 ; PPTR: s,..., s ; LDA: LDA stats only
59 Social networks may be modeled as relational Irvine: Online social network of students users; 6 directed contact edges anteater Links are non-recurrent; i.e., N ij (t) is either or. At-risk indicator R ij (t) = I{max(ti arr, tj arr ) < t < t eij }. contactee contacter date :: :: :: :3: :3: :3: :: :4:6.43!
60 Relational Example: Modeling a network of contacts Irvine: Online social network of students users; 6 directed contact edges anteater Some of the statistics in the model: Sender out-degree: s (i, j, t) = h V,h i N ih(t ) Reciprocity: s 5 (i, j, t) = N ji (t ) Transitivity: s 6 (i, j, t) = h V,h i,j N ih(t )N hj (t ) Shared contacters: s (i, j, t) = h V,h i,j N hi(t )N hj (t )
61 Aalen model estimates for Irvine Data Set Aalen coefficients suggest two distinct phases of network evolution, consistent with an independent analysis (Panzarasa et al, ). On prediction experiments, Aalen/Cox outperforms logistic regression. Coefficient 3e 5 Coefficient.3 (a) Sender Out-Degree 6//4 //4 //4 Time (c) Transitivity 6//4 //4 //4 Time Coefficient. Coefficient 5e 5 (b) Reciprocity 6//4 //4 //4 Time (d) Shared Contacters 6//4 //4 //4 Time Recall..3.4 Adaptive LR (:) Adaptive LR (:5) Adaptive Cox Aalen (Uniform ) Aalen (Uniform ) 5 5 Cut Point K
Continuous-Time Regression Models for Longitudinal Networks
Continuous- Regression Models for Longitudinal Networks Duy Q. Vu Department of Statistics Pennsylvania State University University Park, PA 16802 dqv100@stat.psu.edu David R. Hunter Department of Statistics
More informationAlgorithmic approaches to fitting ERG models
Ruth Hummel, Penn State University Mark Handcock, University of Washington David Hunter, Penn State University Research funded by Office of Naval Research Award No. N00014-08-1-1015 MURI meeting, April
More informationDynamic Egocentric Models for Citation Networks
Duy Q. Vu * DQV100@STAT.PSU.EDU Arthur U. Asuncion ASUNCION@ICS.UCI.EDU David R. Hunter * DHUNTER@STAT.PSU.EDU Padhraic Smyth SMYTH@ICS.UCI.EDU * Department of Statistics, Pennsylvania State University,
More informationGoodness of Fit of Social Network Models 1
Goodness of Fit of Social Network Models David R. Hunter Pennsylvania State University, University Park Steven M. Goodreau University of Washington, Seattle Mark S. Handcock University of Washington, Seattle
More informationarxiv: v1 [stat.me] 3 Apr 2017
A two-stage working model strategy for network analysis under Hierarchical Exponential Random Graph Models Ming Cao University of Texas Health Science Center at Houston ming.cao@uth.tmc.edu arxiv:1704.00391v1
More informationRandom Effects Models for Network Data
Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of
More informationGoodness of Fit of Social Network Models
Goodness of Fit of Social Network Models David R. HUNTER, StevenM.GOODREAU, and Mark S. HANDCOCK We present a systematic examination of a real network data set using maximum likelihood estimation for exponential
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationConsistency Under Sampling of Exponential Random Graph Models
Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationSampling and incomplete network data
1/58 Sampling and incomplete network data 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/58 Network sampling methods It is sometimes difficult to obtain a
More informationConditional Marginalization for Exponential Random Graph Models
Conditional Marginalization for Exponential Random Graph Models Tom A.B. Snijders January 21, 2010 To be published, Journal of Mathematical Sociology University of Oxford and University of Groningen; this
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationCurved exponential family models for networks
Curved exponential family models for networks David R. Hunter, Penn State University Mark S. Handcock, University of Washington February 18, 2005 Available online as Penn State Dept. of Statistics Technical
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationSpecification and estimation of exponential random graph models for social (and other) networks
Specification and estimation of exponential random graph models for social (and other) networks Tom A.B. Snijders University of Oxford March 23, 2009 c Tom A.B. Snijders (University of Oxford) Models for
More informationNetwork Event Data over Time: Prediction and Latent Variable Modeling
Network Event Data over Time: Prediction and Latent Variable Modeling Padhraic Smyth University of California, Irvine Machine Learning with Graphs Workshop, July 25 th 2010 Acknowledgements PhD students:
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationIntroduction to statistical analysis of Social Networks
The Social Statistics Discipline Area, School of Social Sciences Introduction to statistical analysis of Social Networks Mitchell Centre for Network Analysis Johan Koskinen http://www.ccsr.ac.uk/staff/jk.htm!
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationAssessing Goodness of Fit of Exponential Random Graph Models
International Journal of Statistics and Probability; Vol. 2, No. 4; 2013 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Assessing Goodness of Fit of Exponential Random
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationStatistical Models for Social Networks with Application to HIV Epidemiology
Statistical Models for Social Networks with Application to HIV Epidemiology Mark S. Handcock Department of Statistics University of Washington Joint work with Pavel Krivitsky Martina Morris and the U.
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationAssessing the Goodness-of-Fit of Network Models
Assessing the Goodness-of-Fit of Network Models Mark S. Handcock Department of Statistics University of Washington Joint work with David Hunter Steve Goodreau Martina Morris and the U. Washington Network
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationBehavioral Data Mining. Lecture 2
Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationi=1 h n (ˆθ n ) = 0. (2)
Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationStochastic blockmodeling of relational event dynamics
Christopher DuBois Carter T. Butts Padhraic Smyth Department of Statistics University of California, Irvine Department of Sociology Department of Statistics Institute for Mathematical and Behavioral Sciences
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationMining Triadic Closure Patterns in Social Networks
Mining Triadic Closure Patterns in Social Networks Hong Huang, University of Goettingen Jie Tang, Tsinghua University Sen Wu, Stanford University Lu Liu, Northwestern University Xiaoming Fu, University
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationStatistical Model for Soical Network
Statistical Model for Soical Network Tom A.B. Snijders University of Washington May 29, 2014 Outline 1 Cross-sectional network 2 Dynamic s Outline Cross-sectional network 1 Cross-sectional network 2 Dynamic
More informationIV. Analyse de réseaux biologiques
IV. Analyse de réseaux biologiques Catherine Matias CNRS - Laboratoire de Probabilités et Modèles Aléatoires, Paris catherine.matias@math.cnrs.fr http://cmatias.perso.math.cnrs.fr/ ENSAE - 2014/2015 Sommaire
More informationCSci 8980: Advanced Topics in Graphical Models Gaussian Processes
CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee November 15, 2007 Gaussian Processes Outline Gaussian Processes Outline Parametric Bayesian Regression Gaussian
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 21 Cosma Shalizi 3 April 2008 Models of Networks, with Origin Myths Erdős-Rényi Encore Erdős-Rényi with Node Types Watts-Strogatz Small World Graphs Exponential-Family
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationGenerative Models for Discrete Data
Generative Models for Discrete Data ddebarr@uw.edu 2016-04-21 Agenda Bayesian Concept Learning Beta-Binomial Model Dirichlet-Multinomial Model Naïve Bayes Classifiers Bayesian Concept Learning Numbers
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationDynamic modeling of organizational coordination over the course of the Katrina disaster
Dynamic modeling of organizational coordination over the course of the Katrina disaster Zack Almquist 1 Ryan Acton 1, Carter Butts 1 2 Presented at MURI Project All Hands Meeting, UCI April 24, 2009 1
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationModeling heterogeneity in random graphs
Modeling heterogeneity in random graphs Catherine MATIAS CNRS, Laboratoire Statistique & Génome, Évry (Soon: Laboratoire de Probabilités et Modèles Aléatoires, Paris) http://stat.genopole.cnrs.fr/ cmatias
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationApplying Latent Dirichlet Allocation to Group Discovery in Large Graphs
Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 21: More Networks: Models and Origin Myths Cosma Shalizi 31 March 2009 New Assignment: Implement Butterfly Mode in R Real Agenda: Models of Networks, with
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMaximum Smoothed Likelihood for Multivariate Nonparametric Mixtures
Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationDesign of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.
Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider
More informationOverview course module Stochastic Modelling
Overview course module Stochastic Modelling I. Introduction II. Actor-based models for network evolution III. Co-evolution models for networks and behaviour IV. Exponential Random Graph Models A. Definition
More informationDelayed Rejection Algorithm to Estimate Bayesian Social Networks
Dublin Institute of Technology ARROW@DIT Articles School of Mathematics 2014 Delayed Rejection Algorithm to Estimate Bayesian Social Networks Alberto Caimo Dublin Institute of Technology, alberto.caimo@dit.ie
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationHybrid Models for Text and Graphs. 10/23/2012 Analysis of Social Media
Hybrid Models for Text and Graphs 10/23/2012 Analysis of Social Media Newswire Text Formal Primary purpose: Inform typical reader about recent events Broad audience: Explicitly establish shared context
More informationDynamic Probabilistic Models for Latent Feature Propagation in Social Networks
Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks Creighton Heaukulani and Zoubin Ghahramani University of Cambridge TU Denmark, June 2013 1 A Network Dynamic network data
More informationFast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data
Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data Maksym Byshkin 1, Alex Stivala 4,1, Antonietta Mira 1,3, Garry Robins 2, Alessandro Lomi 1,2 1 Università della Svizzera
More informationMassachusetts Institute of Technology
Massachusetts Institute of Technology 6.867 Machine Learning, Fall 2006 Problem Set 5 Due Date: Thursday, Nov 30, 12:00 noon You may submit your solutions in class or in the box. 1. Wilhelm and Klaus are
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationCOMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017
COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationWeb Structure Mining Nodes, Links and Influence
Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More information