Modeling networks: regression with additive and multiplicative effects

Size: px
Start display at page:

Download "Modeling networks: regression with additive and multiplicative effects"

Transcription

1 Modeling networks: regression with additive and multiplicative effects Alexander Volfovsky Department of Statistical Science, Duke May May 25, 2017 Health Networks

2 1 Why model networks? Interested in understanding the formation of relationships

3 1 Why model networks? Interested in understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology

4 1 Why model networks? Interested in understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions:

5 1 Why model networks? Interested in understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models?

6 1 Why model networks? Interested in understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail?

7 1 Why model networks? Interested in understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail? How to develop fail-safes to overcome these problems?

8 Why model networks? Interested in understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail? How to develop fail-safes to overcome these problems? Where to apply these?

9 1 Why model networks? Interested in understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail? How to develop fail-safes to overcome these problems? Where to apply these? Causal inference

10 1 Why model networks? Interested in understanding the formation of relationships Applied fields: sociology, economics, biology, epidemiology Fundamental theory questions: What assumptions are made for different network models? What models work when the assumptions fail? How to develop fail-safes to overcome these problems? Where to apply these? Causal inference Link prediction

11 Some context: Facebook Facebook wants to change its ad algorithm. Source: Wikimedia

12 Some context: Facebook Facebook wants to change its ad algorithm. Can t do it on the whole graph Source: Wikimedia

13 Some context: Facebook Facebook wants to change its ad algorithm. Can t do it on the whole graph Need total network effect Source: Wikimedia

14 How do they solve it? Interested in estimating 1 N N [Y i (all treated) Y i (all controls)] i=1 At a high level, graph cluster randomization is a technique in which the graph is partitioned into a set of clusters, and then randomization between treatment and control is performed at the cluster level. Where can we find clusters? Observable information (e.g. same school) Unobservable information ( social space )

15 Some context: (im)migration Want to know how regime change affects population. Politicians during election years care about direct effects. Source:

16 Some more context Studying tram traffic in Vienna Source: kurier.at 5

17 And one more Studying taxi rides in Porto I 442 taxis I 1.7 million rides with (x, y ) coordinates at 15 second intervals. Source: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic Differentiation Variational Inference. Journal of Machine Learning Research, 18(14),

18 And one more Studying taxi rides in Porto I Project into a 100 dimensional latent space. I Learn hidden interpretable patterns... Source: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic Differentiation Variational Inference. Journal of Machine Learning Research, 18(14),

19 8 Relational data: common examples and goals Changes in exports from year to year second eigenvector of R^ row Finland United Kingdom rmany Italy Spain Switzerland France Ireland Norw New Zealand USA Canada Mexico Turkey Netherlands Austria Brazil Japan Australia China Rep. of Korea Indonesia Malaysia Greec Thailand China, Hong Kong SAR second eigenvector of R^ col Indonesia China Turkey Japan Malaysia New Norway Zealand Australia Thailand Greece Finland Rep. of Korea Austria Brazil Spain Italy Mexico Netherlands China, IrelandHong Kong SAR Canada United France Kingdom USA Germany Switzer first eigenvector of R^ row first eigenvector of R^ col Network regression problems y ij = x ij β + ɛ ij frequently assume independence of the ɛ ij

20 Estimating β in network regression second eigenvector of R^ row Finland United Kingdom rmany Italy Spain Switzerland France Ireland Norw New Zealand USA Canada Mexico Turkey Netherlands Austria Brazil Japan Australia China Rep. of Korea Indonesia Malaysia Greec Thailand China, Hong Kong SAR second eigenvector of R^ col Indonesia China Turkey Japan Malaysia New Norway Zealand Australia Thailand Greece Finland Rep. of Korea Austria Brazil Spain Italy Mexico Netherlands China, IrelandHong Kong SAR Canada United France Kingdom USA Germany Switzer first eigenvector of R^ row first eigenvector of R^ col For Y =< X, β > +E we have OLS (assume no dependence among ɛ ij ): ˆβ (ols) = (mat(x) t mat(x)) 1 mat(x) t vec(y ) Oracle GLS (assume dependence among ɛ ij ): ˆβ (gls) = (mat(x) t (Σ 1 )mat(x)) 1 mat(x) t (Σ 1 )vec(y )

21 Network models The data There are n actors/nodes labeled 1,..., n Y is a sociomatrix: y ij is a dyadic relationship between node i and node j. y ii frequently undefined. Covariates: node specific: x i dyad specific: xij

22 Social relations model Goal: describe the variability in Y. Sender effects describe sociability. Receiver effects describe popularity. Capture this in the Social Relations Model (SRM) y ij = a i + b j + ɛ ij Almost an ANOVA want to relate a i to b i since the senders/receivers are from the same set.

23 Social relations model y ij =µ + a i + b j + ɛ ij (a i, b i ) iid N(0, Σ ab ) (ɛ ij, ɛ ji ) iid N(0, Σ e ) ( ) σ 2 Σ ab = a σ ab describes sender/receiver variability and σ ab σ 2 b within person similarity. ( ) 1 ρ Σ e = σɛ 2 describes within dyad correlation. ρ 1

24 Variability var(y ij ) =σa 2 + 2σ ab + σb 2 + σ2 ɛ cov(y ij, y ik ) =σa 2 cov(y ij, u kj ) =σb 2 cov(y ij, y jk ) =σ ab cov(y ij, y ji ) =2σ ab + ρσɛ 2 How hard is it to fit this model? fit_srm <- ame(y)

25 Source: Hoff (2015). arxiv: Pictures that pop up These help capture how well the Markov Chain is mixing and goodness of fit information.

26 Source: Hoff (2015). arxiv: Goodness of fit Posterior predictive distributions. sd.rowmean: standard deviation of row means of Y. sd.colmean: standard deviation of column means of Y. dyad.dep: correlation between vectorized Y and vectorized Y t triad.dep: i jk e ije jk e ki Var(vec(Y ))3/2 #triangle on n nodes

27 Incorporating covariates Imagine you have some covariates and want to fit y ij = β t d x d,ij + β t r x r,i + β t cx c,j + a i + b j + ɛ ij x d,ij are dyad specific covariates. x r,i are row (sender) covariates. x c,i are column (receiver) covariates. Frequently x r,i = x c,i = x i When does this not make sense? (Example: popularity is affected by athletic success, but sociability is not) How hard is it to fit this model? fit_srrm <- ame(y, Xd=Xd,Xr=Xr,Xc=Xc)

28 Parsing the input fit_srrm <- ame(y, Xdyad=Xd, #n x n x pd array of covariates Xrow=Xr, #n x pr matrix of nodal row covariates Xcol=Xc #n x pc matrix of nodal column covariates ) Xr i,p is the value of the pth row covariate for node i. Xd i,j,p is the value of the pth dyadic covariate in the direction of i to j.

29 Back to basics Can you get rid of the dependencies in the model? fit_rm<-ame(y,xd=xd,xr=xn,xc=xn, rvar=false, #should you fit row random effects? cvar=false, #should you fit column random effects? dcor=false #should you fit a dyadic correlation? ) Note that summary will output: Variance parameters: pmean psd va cab vb rho ve

30 So what s missing here? We have a lot of left over variability. Common themes in network analysis: Homophily: similar people connect to each other Stochastic equivalence: similar people act similarly

31 Which is which? Source: Hoff (2008). NIPS

32 Which is which? Left: homophily; Right: stochastic equivalence What are good models for this? Source: Hoff (2008). NIPS

33 Introducing multiplicative effects SR(R)M can represent second-order dependencies very well. Has a hard time capturing triadic behavior. Homophily: create dyadic covariates x d,ij = x i x j Generally this can be represented by xr t i Bx j,i = k l b klx r,ik x c,jl This is linear in the covariates and so can be baked into the amen framework. Sometimes there is excess correlation to account. This suggests a multiplicative effects model: y ij = β t d x d,ij + β t r x r,i + β t cx c,j + a i + b j + u t i v j + ɛ ij

34 Source: Hoff (2015). arxiv: Fitting these models and beyond fit_ame2<-ame(y,xd,xn,xn, R=2 #dimension of the multiplicative effect )

35 What happened here? Why do multiplicative effects help triadic behavior? Triadic measure is related to transitivity (at least for binary data). Turns out homophily can capture transitivity... y ij = β t d x d,ij + β t r x r,i + β t cx c,j + a i + b j + u t i v j + ɛ ij u i is information about the sender, v j is information about the receiver if u i v j then u t i v j > 0... if u i u j then there is some stochastic equivalence...

36 Lets generalize: ordinal models Imagine a binary (probit) model: y ij = 1 zij >0 z ij = µ + a i + b j + ɛ ij Looks like the SRM on the latent scale. fit_srm<-ame(y, model="bin" #lots of model options here ) If we go to the iid set up this is just an Erdos-Renyi model: fit_srg<-ame(y,model="bin", rvar=false,cvar=false,dcor=false)

37 Even more general Consider the following generative model: z ij = u t i Dv j + ɛ ij y ij = g(z ij )

38 Even more general Consider the following generative model: z ij = u t i Dv j + ɛ ij y ij = g(z ij ) u i are latent factors describing i as a sender

39 25 Even more general Consider the following generative model: z ij = u t i Dv j + ɛ ij y ij = g(z ij ) u i are latent factors describing i as a sender v j are latent factors describing j as a receiver

40 25 Even more general Consider the following generative model: z ij = u t i Dv j + ɛ ij y ij = g(z ij ) u i are latent factors describing i as a sender v j are latent factors describing j as a receiver D is a matrix of factor weights

41 25 Even more general Consider the following generative model: z ij = u t i Dv j + ɛ ij y ij = g(z ij ) u i are latent factors describing i as a sender v j are latent factors describing j as a receiver D is a matrix of factor weights g is an increasing function mapping the latent space to the observed space.

42 25 Even more general Consider the following generative model: z ij = u t i Dv j + ɛ ij y ij = g(z ij ) u i are latent factors describing i as a sender v j are latent factors describing j as a receiver D is a matrix of factor weights g is an increasing function mapping the latent space to the observed space. (Some gs... Normal: g(z) = z, binomial: g(z) = 1 z 0 )

43 This works for symmetric matrices too Imagine that y ij = y ji then the model looks like: z ij = u i Λu j + ɛ ij y ij = g(z ij )

44 This works for symmetric matrices too Imagine that y ij = y ji then the model looks like: z ij = u i Λu j + ɛ ij y ij = g(z ij ) u i u j represents stochastic equivalence

45 This works for symmetric matrices too Imagine that y ij = y ji then the model looks like: z ij = u i Λu j + ɛ ij y ij = g(z ij ) u i u j represents stochastic equivalence Λ is a matrix of eigenvalues:

46 This works for symmetric matrices too Imagine that y ij = y ji then the model looks like: z ij = u i Λu j + ɛ ij y ij = g(z ij ) u i u j represents stochastic equivalence Λ is a matrix of eigenvalues: positive λ i imply homophily, negative ones imply heterophily.

47 What is this latent space? Problem 1: need to select a dimension R.

48 What is this latent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition.

49 What is this latent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition. Problem 2: should the latent positions be interpreted?

50 What is this latent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition. Problem 2: should the latent positions be interpreted? Unclear maybe think of the distances in this space...

51 What is this latent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition. Problem 2: should the latent positions be interpreted? Unclear maybe think of the distances in this space... Problem 3: what about my favorite other models like stochastic blockmodels?

52 What is this latent space? Problem 1: need to select a dimension R. This is hard... sometimes there is some intuition. Problem 2: should the latent positions be interpreted? Unclear maybe think of the distances in this space... Problem 3: what about my favorite other models like stochastic blockmodels? These are just a subclass of models For example, the stochastic blockmodel has discrete support for the latent positions.

53 What is this latent space? All quotes from Hoff, et al 2002 A subset of individuals in the population with a large number of social ties between them may be indicative of a group of individuals who have nearby positions in this space of characteristics, or social space. Various concepts of social space have been discussed by McFarland and Brown (1973) and Faust (1988). In the context of this article, social space refers to a space of unobserved latent characteristics that represent potential transitive tendencies in network relations. A probability measure over these unobserved characteristics induces a model in which the presence of a tie between two individuals is dependent on the presence of other ties.

54 (Tiny portion of the) literature Nowicki, Krzysztof, and Tom A. B. Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association 96, no. 455 (2001): Hoff, Peter D., Adrian E. Raftery, and Mark S. Handcock. Latent space approaches to social network analysis. Journal of the american Statistical association 97, no. 460 (2002): Hoff, Peter. Modeling homophily and stochastic equivalence in symmetric relational data. In Advances in Neural Information Processing Systems, pp Airoldi, Edoardo M., David M. Blei, Stephen E. Fienberg, and Eric P. Xing. Mixed membership stochastic blockmodels. Journal of Machine Learning Research 9, no. Sep (2008): Hoff, Peter, Bailey Fosdick, Alex Volfovsky, and Katherine Stovel. Likelihoods for fixed rank nomination networks. Network Science 1, no. 03 (2013): Hoff, Peter D. Dyadic data analysis with amen. arxiv preprint arxiv: (2015).

55 ame(y, Xdyad=NULL, Xrow=NULL, Xcol=NULL, rvar = (model=="rrl"), cvar = TRUE, dcor = symmetric, nvar = TRUE, R = 0, model="nrm", intercept=is.element(model,c("rrl","ord")), symmetric=false, odmax=rep(max(apply(y>0,1,sum,na.rm=true)),nrow(y)),...) Y: an n x n square relational matrix of relations. Xdyad: an n x n x pd array of covariates Xrow: an n x pr matrix of nodal row covariates Xcol: an n x pc matrix of nodal column covariates rvar: logical: fit row random effects (asymmetric case)? cvar: logical: fit column random effects (asymmetric case)? dcor: logical: fit a dyadic correlation (asymmetric case)? nvar: logical: fit nodal random effects (symmetric case)? R: int: dimension of the multiplicative effects (can be 0) model: char: one of "nrm","bin","ord","cbin","frn","rrl" odmax: a scalar integer or vector of length n giving the maximum number of nominations that each node may make

56 What s in the...? seed = 1, nscan = 10000, burn = 500, odens = 25, plot=true, print = TRUE, gof=true seed: random seed nscan: number of iterations of the Markov chain (beyond burn-in) burn: burn in for the Markov chain odens: output density for the Markov chain plot: logical: plot results while running? print: logical: print results while running? gof: logical: calculate goodness of fit statistics?

57 An AddHealth Example 32

58 Social network data Datasets: PROSPER, NSCR, AddHealth proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w 33

59 Social network data Datasets: PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w 33

60 Social network data Datasets: PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior Literature: ERGM, latent variable models proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w 33

61 Social network data Datasets: PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior Literature: ERGM, latent variable models Assumptions: Data is fully observed The support is the set of all sociomatrices proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w

62 Social network data Datasets: PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior Literature: ERGM, latent variable models Assumptions: Data is fully observed The support is the set of all sociomatrices In practice: Ranked data Censored observations proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w

63 Social network data Datasets: PROSPER, NSCR, AddHealth Relate network characteristics to individual-level behavior Literature: ERGM, latent variable models Assumptions: Data is fully observed The support is the set of all sociomatrices In practice: Ranked data Censored observations proportion Figure 3 interest is a comparison of such estima in order to see if the relationships betw study in Section 3.2. To this end, w A type of likelihood that accommodates the ranked and censored nature of data from Fixed Rank Nomination (FRN) surveys and allows for estimation of regression effects.

64 34 Data collection examples PROmoting School Community-University Partnerships to Enhance Resilience (PROSPER): Who are your best and closest friends in your grade? National Longitudinal Study of Adolescent to Adult Health (AddHealth): Your male friends. List your closest male friends. List your best male friend first, then your next best friend, and so on.

65 Notation Z = {z ij : i j} is a sociomatrix of ordinal relationships z ij > z ik denotes person i preferring person j to person k z 12 z 1n z 21 Z =. z n1

66 Notation Z = {z ij : i j} is a sociomatrix of ordinal relationships z ij > z ik denotes person i preferring person j to person k z 12 z 1n z 21 Z =. z n1

67 Notation Z = {z ij : i j} is a sociomatrix of ordinal relationships z ij > z ik denotes person i preferring person j to person k z 12 z 1n z 21 Z =. z n1 Instead of Z we observe a sociomatrix Y = {y ij : i j}

68 Notation Z = {z ij : i j} is a sociomatrix of ordinal relationships z ij > z ik denotes person i preferring person j to person k z 12 z 1n z 21 Z =. z n1 Instead of Z we observe a sociomatrix Y = {y ij : i j} Different sampling schemes define different maps between Y and Z (set relations between y ij and z ij ).

69 Notation Z = {z ij : i j} is a sociomatrix of ordinal relationships z ij > z ik denotes person i preferring person j to person k z 12 z 1n z 21 Z =. z n1 Instead of Z we observe a sociomatrix Y = {y ij : i j} Different sampling schemes define different maps between Y and Z (set relations between y ij and z ij ). Statistical model {p (Z θ) : θ Θ} assists in analysis

70 Fixed rank nominations y ij > y ik z ij > z ik } y ij = 0 and d i < m z ij 0 F (Y ) y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) m = maximal number of nominations, d i = individual outdegree

71 36 Fixed rank nominations y ij > y ik z ij > z ik } y ij = 0 and d i < m z ij 0 F (Y ) y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) m = maximal number of nominations, d i = individual outdegree Differentiates between different ranks Captures censoring in the data y i z i

72 36 Fixed rank nominations y ij > y ik z ij > z ik } y ij = 0 and d i < m z ij 0 F (Y ) y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) m = maximal number of nominations, d i = individual outdegree Differentiates between different ranks Captures censoring in the data y i z i

73 36 Fixed rank nominations y ij > y ik z ij > z ik } y ij = 0 and d i < m z ij 0 F (Y ) y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) m = maximal number of nominations, d i = individual outdegree Differentiates between different ranks Captures censoring in the data y i z i z i1 > z i2 > z i3 > z i4 > 0> 0> 0> 0> 0> 0>

74 36 Fixed rank nominations y ij > y ik z ij > z ik } y ij = 0 and d i < m z ij 0 F (Y ) y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) m = maximal number of nominations, d i = individual outdegree Differentiates between different ranks Captures censoring in the data y i z i

75 36 Fixed rank nominations y ij > y ik z ij > z ik } y ij = 0 and d i < m z ij 0 F (Y ) y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) m = maximal number of nominations, d i = individual outdegree Differentiates between different ranks Captures censoring in the data y i z i

76 Fixed rank nominations y ij > y ik z ij > z ik } y ij = 0 and d i < m z ij 0 F (Y ) y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) m = maximal number of nominations, d i = individual outdegree Differentiates between different ranks Captures censoring in the data y i z i z i1 > z i2 > z i3 > z i4 > z i5 >?????

77 Rank R(Y) y ij > y ik z ij > z ik } R (Y ) y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y)

78 37 Rank R(Y) y ij > y ik z ij > z ik } R (Y ) y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) Valid but not fully informative: F (Y ) R (Y ) y i z i

79 37 Rank R(Y) y ij > y ik z ij > z ik } R (Y ) y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) Valid but not fully informative: F (Y ) R (Y ) y i z i

80 37 Rank R(Y) y ij > y ik z ij > z ik } R (Y ) y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) Valid but not fully informative: F (Y ) R (Y ) y i z i z i1 > z i2 > z i3 > z i4 >??????

81 37 Rank R(Y) y ij > y ik z ij > z ik } R (Y ) y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) Valid but not fully informative: F (Y ) R (Y ) y i z i

82 37 Rank R(Y) y ij > y ik z ij > z ik } R (Y ) y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) Valid but not fully informative: F (Y ) R (Y ) y i z i

83 37 Rank R(Y) y ij > y ik z ij > z ik } R (Y ) y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) Valid but not fully informative: F (Y ) R (Y ) y i z i z i1 > z i2 > z i3 > z i4 > z i5 >?????

84 37 Rank R(Y) y ij > y ik z ij > z ik } R (Y ) y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y ij = 0 z ij < 0 F(Y) Valid but not fully informative: F (Y ) R (Y ) Cannot estimate row ( sender ) specific effects y i z i z i1 > z i2 > z i3 > z i4 > z i5 >?????

85 38 Binary R(Y) y ij > y ik z ij > z ik y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y } B (Y ) ij = 0 z ij < 0 F(Y) B(Y)

86 38 Binary R(Y) y ij > y ik z ij > z ik y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y } B (Y ) ij = 0 z ij < 0 F(Y) B(Y) Neither fully informative nor valid Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) B (Y ) y i z i

87 38 Binary R(Y) y ij > y ik z ij > z ik y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y } B (Y ) ij = 0 z ij < 0 F(Y) B(Y) Neither fully informative nor valid Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) B (Y ) y i z i

88 38 Binary R(Y) y ij > y ik z ij > z ik y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y } B (Y ) ij = 0 z ij < 0 F(Y) B(Y) Neither fully informative nor valid Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) B (Y ) y i z i >0 >0 >0 >0 0> 0> 0> 0> 0> 0>

89 38 Binary R(Y) y ij > y ik z ij > z ik y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y } B (Y ) ij = 0 z ij < 0 F(Y) B(Y) Neither fully informative nor valid Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) B (Y ) y i z i

90 38 Binary R(Y) y ij > y ik z ij > z ik y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y } B (Y ) ij = 0 z ij < 0 F(Y) B(Y) Neither fully informative nor valid Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) B (Y ) y i z i

91 Binary R(Y) y ij > y ik z ij > z ik y ij = 0 and d i < m z ij 0 y ij > 0 z ij > 0 y } B (Y ) ij = 0 z ij < 0 F(Y) B(Y) Neither fully informative nor valid Discards information on the ranks Ignores the censoring on the outdegrees In particular: F (Y ) B (Y ) y i z i >0 >0 >0 >0 >0 0> 0> 0> 0> 0>

92 39 Bayesian Estimation for Fixed Rank Nominations Model: Z p(z θ), θ Θ Data: Z F (Y ) Likelihood: L F (θ : Y ) = Pr (Z F (Y ) θ) = F (Y ) dp (Z θ) Estimation: Given p(θ), p(θ Z F (Y )) can be approximated by a Gibbs sampler.

93 Bayesian Estimation for Fixed Rank Nominations Model: Z p(z θ), θ Θ Data: Z F (Y ) Likelihood: L F (θ : Y ) = Pr (Z F (Y ) θ) = F (Y ) dp (Z θ) Estimation: Given p(θ), p(θ Z F (Y )) can be approximated by a Gibbs sampler. Simulate z ij p(z ij θ, Z ij, Z F (Y )):

94 Bayesian Estimation for Fixed Rank Nominations Model: Z p(z θ), θ Θ Data: Z F (Y ) Likelihood: L F (θ : Y ) = Pr (Z F (Y ) θ) = F (Y ) dp (Z θ) Estimation: Given p(θ), p(θ Z F (Y )) can be approximated by a Gibbs sampler. Simulate z ij p(z ij θ, Z ij, Z F (Y )): 1. y ij > 0: z ij p(z ij θ, Z ij )1 zij (a,b) where a = max(z ik : y ik < y ij ) and b = min(z ik : y ik > y ij ).

95 39 Bayesian Estimation for Fixed Rank Nominations Model: Z p(z θ), θ Θ Data: Z F (Y ) Likelihood: L F (θ : Y ) = Pr (Z F (Y ) θ) = F (Y ) dp (Z θ) Estimation: Given p(θ), p(θ Z F (Y )) can be approximated by a Gibbs sampler. Simulate z ij p(z ij θ, Z ij, Z F (Y )): 1. y ij > 0: z ij p(z ij θ, Z ij )1 zij (a,b) where a = max(z ik : y ik < y ij ) and b = min(z ik : y ik > y ij ). 2. y ij = 0 and d i < m: z ij p(z ij Z ij, θ)1 zij 0.

96 39 Bayesian Estimation for Fixed Rank Nominations Model: Z p(z θ), θ Θ Data: Z F (Y ) Likelihood: L F (θ : Y ) = Pr (Z F (Y ) θ) = F (Y ) dp (Z θ) Estimation: Given p(θ), p(θ Z F (Y )) can be approximated by a Gibbs sampler. Simulate z ij p(z ij θ, Z ij, Z F (Y )): 1. y ij > 0: z ij p(z ij θ, Z ij )1 zij (a,b) where a = max(z ik : y ik < y ij ) and b = min(z ik : y ik > y ij ). 2. y ij = 0 and d i < m: z ij p(z ij Z ij, θ)1 zij y ij = 0 and d i = m: z ij p(z ij Z ij, θ)1 zij min(z ik :y ik >0)

97 Bayesian Estimation for Fixed Rank Nominations Model: Z p(z θ), θ Θ Data: Z F (Y ) Likelihood: L F (θ : Y ) = Pr (Z F (Y ) θ) = F (Y ) dp (Z θ) Estimation: Given p(θ), p(θ Z F (Y )) can be approximated by a Gibbs sampler. Simulate z ij p(z ij θ, Z ij, Z F (Y )): 1. y ij > 0: z ij p(z ij θ, Z ij )1 zij (a,b) where a = max(z ik : y ik < y ij ) and b = min(z ik : y ik > y ij ). 2. y ij = 0 and d i < m: z ij p(z ij Z ij, θ)1 zij y ij = 0 and d i = m: z ij p(z ij Z ij, θ)1 zij min(z ik :y ik >0) Allows for imputation of missing y ij 39

98 40 Simulations We generated Z from the following Social Relations Model (Warner, Kenny and Stoto (1979)): ( ai b i ( ɛij z ij = β t x ij + a i + b j + ɛ ij ) ( ( )) iid normal 0, ) ( ( )) iid normal 0, ɛ ji Mean model: β t x ij = β 0 + β r x ir + β c x jc + β d1 x ij1 + β d2 x ij2 x ir, x jc : individual level variables x ij1 : pair specific variable x ij2 : co-membership in a group

99 40 Simulations We generated Z from the following Social Relations Model (Warner, Kenny and Stoto (1979)): ( ai b i ( ɛij z ij = β t x ij + a i + b j + ɛ ij ) ( ( )) iid normal 0, ) ( ( )) iid normal 0, ɛ ji Mean model: β t x ij = β 0 + β r x ir + β c x jc + β d1 x ij1 + β d2 x ij2 x ir, x jc : individual level variables x ij1 : pair specific variable x ij2 : co-membership in a group β r = β c = β d1 = β d2 = 1 and β 0 = 3.26 x ir, x ic, x ij1 iid N (0, 1) xij2 = s i s j /.42 for s i iid binary (1/2)

100 40 Simulations We generated Z from the following Social Relations Model (Warner, Kenny and Stoto (1979)): ( ai b i ( ɛij z ij = β t x ij + a i + b j + ɛ ij ) ( ( )) iid normal 0, ) ( ( )) iid normal 0, ɛ ji Mean model: β t x ij = β 0 + β r x ir + β c x jc + β d1 x ij1 + β d2 x ij2 x ir, x jc : individual level variables x ij1 : pair specific variable x ij2 : co-membership in a group β r = β c = β d1 = β d2 = 1 and β 0 = 3.26 x ir, x ic, x ij1 iid N (0, 1) xij2 = s i s j /.42 for s i iid binary (1/2)

101 41 Simulations - Censoring r r m = 5 m = simulations for each m {5, 15} with 100 nodes each c c m = 5 m = m = 5 m = d1 d d2 d and an iid dyadic variable. The groups of three CIs are based on binary, FRN and rank simulationlikelihoods from left to right. simulation simulation simulation Confidence intervals under the three different likelihood for column

102 Simulations - Censoring r m = 5 m = 5 m = m = c Z R (Y ) Z + c1 t R (Y ) c R n Rank likelihood cannot estimate row effects d

103 Simulations - Censoring r m = 5 m = 5 m = m = c Z R (Y ) Z + c1 t R (Y ) c R n Rank likelihood cannot estimate row effects Binary likelihood poorly estimates row effects d

104 Simulations - Censoring r m = 5 m = 5 m = m = c Z R (Y ) Z + c1 t R (Y ) c R n Rank likelihood cannot estimate row effects Binary likelihood poorly estimates row effects Large amount of censoring d

105 Simulations - Censoring r m = 5 m = 5 m = m = c d Z R (Y ) Z + c1 t R (Y ) c R n Rank likelihood cannot estimate row effects Binary likelihood poorly estimates row effects Large amount of censoring Heterogeneity of censored outdegrees is low

106 Simulations - Censoring r m = 5 m = 5 m = m = c d Z R (Y ) Z + c1 t R (Y ) c R n Rank likelihood cannot estimate row effects Binary likelihood poorly estimates row effects Large amount of censoring Heterogeneity of censored outdegrees is low Regression coefficients estimated too low

107 43 Simulations - Censoring d d m = 5 m = simulation simulation Recall: x ij2 s i s j, an indicator of comembership to a group

108 Simulations - Censoring d d m = 5 m = simulation simulation Recall: x ij2 s i s j, an indicator of comembership to a group Ignore the censoring

109 Simulations - Censoring d d m = 5 m = simulation simulation Recall: x ij2 s i s j, an indicator of comembership to a group Ignore the censoring Binary likelihood underestimates row variability

110 Simulations - Censoring d d m = 5 m = simulation simulation Recall: x ij2 s i s j, an indicator of comembership to a group Ignore the censoring Binary likelihood underestimates row variability Underestimate the variability in x ij2

111 44 Simulations - information in the ranks Let C (Y ) be the set of values for which the following is true: y ij > 0 z ij > 0 y ij = 0 and d i < m z ij 0 min {z ij : y ij > 0} max {z ij : y ij = 0} We refer to L C (θ : Y ) = Pr (Z C (Y ) θ) as the censored binary likelihood. Recognizes censoring but ignores information in the ranks

112 44 Simulations - information in the ranks Let C (Y ) be the set of values for which the following is true: y ij > 0 z ij > 0 y ij = 0 and d i < m z ij 0 min {z ij : y ij > 0} max {z ij : y ij = 0} We refer to L C (θ : Y ) = Pr (Z C (Y ) θ) as the censored binary likelihood. Recognizes censoring but ignores information in the ranks Performs similarly to FRN in the previous study Less precise than FRN when m is big

113 Simulations - information in the ranks Same setup as before, but average uncensored outdegree is m relative concentration around true value r c d2 d1 β r : row β c : column β d1 : continuous dyad β d2 : co-membership m Relative concentration [ around ] true[ value of each parameter: ] Measured by E (β 1) 2 F (Y ) /E (β 1) 2 C (Y ) for each β 2: Posterior concentration around true parameter values. The average of E[(β S)]/E[(β β ) 2 C(S)] across eight simulated datasets for each m {5, 15, 30, 50}. ensored binomial likelihood. As the censored binomial likelihood recognizes the censoring in ata, we expect it to provide parameter estimates that do not have the biases of the binomial od estimators. On the other hand, L C ignores the information in the ranks of the scored uals, and so we might expect it to provide less precise estimates than the FRN likelihood.

114 Simulations - information in the ranks Same setup as before, but average uncensored outdegree is m relative concentration around true value r c d2 d1 β r : row β c : column β d1 : continuous dyad β d2 : co-membership m Relative concentration [ around ] true[ value of each parameter: ] Measured by E (β 1) 2 F (Y ) /E (β 1) 2 C (Y ) for each β 2: Posterior concentration around true parameter values. The average of E[(β S)]/E[(β β ) 2 C(S)] across eight simulated datasets for each m {5, 15, 30, 50}. When m n, most of the information found by considering ranked/unranked individuals as groups rather than the relative ordering of the ranked individuals. ensored binomial likelihood. As the censored binomial likelihood recognizes the censoring in ata, we expect it to provide parameter estimates that do not have the biases of the binomial od estimators. On the other hand, L C ignores the information in the ranks of the scored uals, and so we might expect it to provide less precise estimates than the FRN likelihood.

115 AddHealth Data - Results β intercept rsmoke rdrink rgpa csmoke cdrink cgpa β dsmoke ddrink dgpa β dacad darts dsport dcivic β dgrade drace 646 females were asked to rank up to 5 female friends Mean model with row, column and dyadic effects for smoking, drinking and gpa as well as dyadic effects for comembership in activities and grade, and a similarity-in-race measure. The CIs are based on binary, FRN and rank likelihoods. 46

Sampling and incomplete network data

Sampling and incomplete network data 1/58 Sampling and incomplete network data 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/58 Network sampling methods It is sometimes difficult to obtain a

More information

Dyadic data analysis with amen

Dyadic data analysis with amen Dyadic data analysis with amen Peter D. Hoff May 23, 2017 Abstract Dyadic data on pairs of objects, such as relational or social network data, often exhibit strong statistical dependencies. Certain types

More information

Modeling homophily and stochastic equivalence in symmetric relational data

Modeling homophily and stochastic equivalence in symmetric relational data Modeling homophily and stochastic equivalence in symmetric relational data Peter D. Hoff Departments of Statistics and Biostatistics University of Washington Seattle, WA 98195-4322. hoff@stat.washington.edu

More information

Statistics 360/601 Modern Bayesian Theory

Statistics 360/601 Modern Bayesian Theory Statistics 360/601 Modern Bayesian Theory Alexander Volfovsky Lecture 8 - Sept 25, 2018 Monte Carlo 1 2 Monte Carlo approximation Want to compute Z E[ y] = p( y)d without worrying about actual integration...

More information

Theory and Methods for the Analysis of Social Networks

Theory and Methods for the Analysis of Social Networks Theory and Methods for the Analysis of Social Networks Alexander Volfovsky Department of Statistical Science, Duke University Lecture 1: January 16, 2018 1 / 35 Outline Jan 11 : Brief intro and Guest lecture

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

A Prior Distribution of Bayesian Nonparametrics Incorporating Multiple Distances

A Prior Distribution of Bayesian Nonparametrics Incorporating Multiple Distances A Prior Distribution of Bayesian Nonparametrics Incorporating Multiple Distances Brian M. Hartman 1 David B. Dahl 1 Debabrata Talukdar 2 Bani K. Mallick 1 1 Department of Statistics Texas A&M University

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Higher order patterns via factor models

Higher order patterns via factor models 1/39 Higher order patterns via factor models 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/39 Conflict data Y

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Department of Statistics. Bayesian Modeling for a Generalized Social Relations Model. Tyler McCormick. Introduction.

Department of Statistics. Bayesian Modeling for a Generalized Social Relations Model. Tyler McCormick. Introduction. A University of Connecticut and Columbia University A models for dyadic data are extensions of the (). y i,j = a i + b j + γ i,j (1) Here, y i,j is a measure of the tie from actor i to actor j. The random

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 21 Cosma Shalizi 3 April 2008 Models of Networks, with Origin Myths Erdős-Rényi Encore Erdős-Rényi with Node Types Watts-Strogatz Small World Graphs Exponential-Family

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 21: More Networks: Models and Origin Myths Cosma Shalizi 31 March 2009 New Assignment: Implement Butterfly Mode in R Real Agenda: Models of Networks, with

More information

Consistency Under Sampling of Exponential Random Graph Models

Consistency Under Sampling of Exponential Random Graph Models Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Nonparametric Latent Feature Models for Link Prediction

Nonparametric Latent Feature Models for Link Prediction Nonparametric Latent Feature Models for Link Prediction Kurt T. Miller EECS University of California Berkeley, CA 94720 tadayuki@cs.berkeley.edu Thomas L. Griffiths Psychology and Cognitive Science University

More information

The sbgcop Package. March 9, 2007

The sbgcop Package. March 9, 2007 The sbgcop Package March 9, 2007 Title Semiparametric Bayesian Gaussian copula estimation Version 0.95 Date 2007-03-09 Author Maintainer This package estimates parameters of

More information

Learning latent structure in complex networks

Learning latent structure in complex networks Learning latent structure in complex networks Lars Kai Hansen www.imm.dtu.dk/~lkh Current network research issues: Social Media Neuroinformatics Machine learning Joint work with Morten Mørup, Sune Lehmann

More information

From Argentina to Zimbabwe: Where Should I Sell my Widgets?

From Argentina to Zimbabwe: Where Should I Sell my Widgets? From Argentina to Zimbabwe: Department of Statistics Texas A&M University 15 Feb 2010 Acknowledgments This is joint work with my coauthors Bani Mallick (Texas A&M University) Debu Talukdar (SUNY - Buffalo)

More information

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64

More information

Mixed Membership Stochastic Blockmodels

Mixed Membership Stochastic Blockmodels Mixed Membership Stochastic Blockmodels (2008) Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg and Eric P. Xing Herrissa Lamothe Princeton University Herrissa Lamothe (Princeton University) Mixed

More information

Statistical Model for Soical Network

Statistical Model for Soical Network Statistical Model for Soical Network Tom A.B. Snijders University of Washington May 29, 2014 Outline 1 Cross-sectional network 2 Dynamic s Outline Cross-sectional network 1 Cross-sectional network 2 Dynamic

More information

2017 Source of Foreign Income Earned By Fund

2017 Source of Foreign Income Earned By Fund 2017 Source of Foreign Income Earned By Fund Putnam Emerging Markets Equity Fund EIN: 26-2670607 FYE: 08/31/2017 Statement Pursuant to 1.853-4: The fund is hereby electing to apply code section 853 for

More information

c Copyright 2013 Alexander Volfovsky

c Copyright 2013 Alexander Volfovsky c Copyright 2013 Alexander Volfovsky Statistical inference using Kronecker structured covariance Alexander Volfovsky A dissertation submitted in partial fulfillment of the requirements for the degree

More information

Shortfalls of Panel Unit Root Testing. Jack Strauss Saint Louis University. And. Taner Yigit Bilkent University. Abstract

Shortfalls of Panel Unit Root Testing. Jack Strauss Saint Louis University. And. Taner Yigit Bilkent University. Abstract Shortfalls of Panel Unit Root Testing Jack Strauss Saint Louis University And Taner Yigit Bilkent University Abstract This paper shows that (i) magnitude and variation of contemporaneous correlation are

More information

Appendix B: Detailed tables showing overall figures by country and measure

Appendix B: Detailed tables showing overall figures by country and measure 44 country and measure % who report that they are very happy Source: World Values Survey, 2010-2014 except United States, Pew Research Center 2012 Gender and Generations survey and Argentina 32% 32% 36%

More information

Package sbgcop. May 29, 2018

Package sbgcop. May 29, 2018 Package sbgcop May 29, 2018 Title Semiparametric Bayesian Gaussian Copula Estimation and Imputation Version 0.980 Date 2018-05-25 Author Maintainer Estimation and inference for parameters

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Export Destinations and Input Prices. Appendix A

Export Destinations and Input Prices. Appendix A Export Destinations and Input Prices Paulo Bastos Joana Silva Eric Verhoogen Jan. 2016 Appendix A For Online Publication Figure A1. Real Exchange Rate, Selected Richer Export Destinations UK USA Sweden

More information

Network Event Data over Time: Prediction and Latent Variable Modeling

Network Event Data over Time: Prediction and Latent Variable Modeling Network Event Data over Time: Prediction and Latent Variable Modeling Padhraic Smyth University of California, Irvine Machine Learning with Graphs Workshop, July 25 th 2010 Acknowledgements PhD students:

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Introduction to statistical analysis of Social Networks

Introduction to statistical analysis of Social Networks The Social Statistics Discipline Area, School of Social Sciences Introduction to statistical analysis of Social Networks Mitchell Centre for Network Analysis Johan Koskinen http://www.ccsr.ac.uk/staff/jk.htm!

More information

Probability models for multiway data

Probability models for multiway data Probability models for multiway data Peter Hoff Statistics, Biostatistics and the CSSS University of Washington Outline Introduction and examples Hierarchical models for multiway factors Deep interactions

More information

Modeling heterogeneity in random graphs

Modeling heterogeneity in random graphs Modeling heterogeneity in random graphs Catherine MATIAS CNRS, Laboratoire Statistique & Génome, Évry (Soon: Laboratoire de Probabilités et Modèles Aléatoires, Paris) http://stat.genopole.cnrs.fr/ cmatias

More information

Confidence Sets for Network Structure

Confidence Sets for Network Structure Confidence Sets for Network Structure Edoardo M. Airoldi 1, David S. Choi 2 and Patrick J. Wolfe 1 1 Department of Statistics, Harvard University 2 School of Engineering and Applied Sciences, Harvard University

More information

Hierarchical Models for Social Networks

Hierarchical Models for Social Networks Hierarchical Models for Social Networks Tracy M. Sweet University of Maryland Innovative Assessment Collaboration November 4, 2014 Acknowledgements Program for Interdisciplinary Education Research (PIER)

More information

Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data

Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data Fast Maximum Likelihood estimation via Equilibrium Expectation for Large Network Data Maksym Byshkin 1, Alex Stivala 4,1, Antonietta Mira 1,3, Garry Robins 2, Alessandro Lomi 1,2 1 Università della Svizzera

More information

Stochastic blockmodeling of relational event dynamics

Stochastic blockmodeling of relational event dynamics Christopher DuBois Carter T. Butts Padhraic Smyth Department of Statistics University of California, Irvine Department of Sociology Department of Statistics Institute for Mathematical and Behavioral Sciences

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Specification and estimation of exponential random graph models for social (and other) networks

Specification and estimation of exponential random graph models for social (and other) networks Specification and estimation of exponential random graph models for social (and other) networks Tom A.B. Snijders University of Oxford March 23, 2009 c Tom A.B. Snijders (University of Oxford) Models for

More information

Mathematics. Pre-Leaving Certificate Examination, Paper 2 Higher Level Time: 2 hours, 30 minutes. 300 marks L.20 NAME SCHOOL TEACHER

Mathematics. Pre-Leaving Certificate Examination, Paper 2 Higher Level Time: 2 hours, 30 minutes. 300 marks L.20 NAME SCHOOL TEACHER L.20 NAME SCHOOL TEACHER Pre-Leaving Certificate Examination, 2016 Name/vers Printed: Checked: To: Updated: Name/vers Complete Paper 2 Higher Level Time: 2 hours, 30 minutes 300 marks School stamp 3 For

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008 Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Conditional Marginalization for Exponential Random Graph Models

Conditional Marginalization for Exponential Random Graph Models Conditional Marginalization for Exponential Random Graph Models Tom A.B. Snijders January 21, 2010 To be published, Journal of Mathematical Sociology University of Oxford and University of Groningen; this

More information

ECON Introductory Econometrics. Lecture 13: Internal and external validity

ECON Introductory Econometrics. Lecture 13: Internal and external validity ECON4150 - Introductory Econometrics Lecture 13: Internal and external validity Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 9 Lecture outline 2 Definitions of internal and external

More information

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff David Gerard Department of Statistics University of Washington gerard2@uw.edu May 2, 2013 David Gerard (UW)

More information

A nonparametric test for path dependence in discrete panel data

A nonparametric test for path dependence in discrete panel data A nonparametric test for path dependence in discrete panel data Maximilian Kasy Department of Economics, University of California - Los Angeles, 8283 Bunche Hall, Mail Stop: 147703, Los Angeles, CA 90095,

More information

Do Policy-Related Shocks Affect Real Exchange Rates? An Empirical Analysis Using Sign Restrictions and a Penalty-Function Approach

Do Policy-Related Shocks Affect Real Exchange Rates? An Empirical Analysis Using Sign Restrictions and a Penalty-Function Approach ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Do Policy-Related Shocks Affect Real Exchange Rates? An Empirical Analysis

More information

Goodness of Fit of Social Network Models 1

Goodness of Fit of Social Network Models 1 Goodness of Fit of Social Network Models David R. Hunter Pennsylvania State University, University Park Steven M. Goodreau University of Washington, Seattle Mark S. Handcock University of Washington, Seattle

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Statistical Models for Social Networks with Application to HIV Epidemiology

Statistical Models for Social Networks with Application to HIV Epidemiology Statistical Models for Social Networks with Application to HIV Epidemiology Mark S. Handcock Department of Statistics University of Washington Joint work with Pavel Krivitsky Martina Morris and the U.

More information

Evaluating sensitivity of parameters of interest to measurement invariance using the EPC-interest

Evaluating sensitivity of parameters of interest to measurement invariance using the EPC-interest Evaluating sensitivity of parameters of interest to measurement invariance using the EPC-interest Department of methodology and statistics, Tilburg University WorkingGroupStructuralEquationModeling26-27.02.2015,

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Assessing Goodness of Fit of Exponential Random Graph Models

Assessing Goodness of Fit of Exponential Random Graph Models International Journal of Statistics and Probability; Vol. 2, No. 4; 2013 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education Assessing Goodness of Fit of Exponential Random

More information

Mixed Membership Matrix Factorization

Mixed Membership Matrix Factorization Mixed Membership Matrix Factorization Lester Mackey 1 David Weiss 2 Michael I. Jordan 1 1 University of California, Berkeley 2 University of Pennsylvania International Conference on Machine Learning, 2010

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Agent-Based Methods for Dynamic Social Networks. Duke University

Agent-Based Methods for Dynamic Social Networks. Duke University Agent-Based Methods for Dynamic Social Networks Eric Vance Institute of Statistics & Decision Sciences Duke University STA 395 Talk October 24, 2005 Outline Introduction Social Network Models Agent-based

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Jamie Monogan University of Georgia Spring 2013 For more information, including R programs, properties of Markov chains, and Metropolis-Hastings, please see: http://monogan.myweb.uga.edu/teaching/statcomp/mcmc.pdf

More information

Scalable Gaussian process models on matrices and tensors

Scalable Gaussian process models on matrices and tensors Scalable Gaussian process models on matrices and tensors Alan Qi CS & Statistics Purdue University Joint work with F. Yan, Z. Xu, S. Zhe, and IBM Research! Models for graph and multiway data Model Algorithm

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs

Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Quilting Stochastic Kronecker Graphs to Generate Multiplicative Attribute Graphs Hyokun Yun (work with S.V.N. Vishwanathan) Department of Statistics Purdue Machine Learning Seminar November 9, 2011 Overview

More information

Online Appendix to: Crises and Recoveries in an Empirical Model of. Consumption Disasters

Online Appendix to: Crises and Recoveries in an Empirical Model of. Consumption Disasters Online Appendix to: Crises and Recoveries in an Empirical Model of Consumption Disasters Emi Nakamura Columbia University Robert Barro Harvard University Jón Steinsson Columbia University José Ursúa Harvard

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

arxiv: v1 [stat.me] 3 Apr 2017

arxiv: v1 [stat.me] 3 Apr 2017 A two-stage working model strategy for network analysis under Hierarchical Exponential Random Graph Models Ming Cao University of Texas Health Science Center at Houston ming.cao@uth.tmc.edu arxiv:1704.00391v1

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University

More information

Mixed Membership Matrix Factorization

Mixed Membership Matrix Factorization Mixed Membership Matrix Factorization Lester Mackey University of California, Berkeley Collaborators: David Weiss, University of Pennsylvania Michael I. Jordan, University of California, Berkeley 2011

More information

Bayesian nonparametric models for bipartite graphs

Bayesian nonparametric models for bipartite graphs Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers

More information

Stochastic blockmodels with a growing number of classes

Stochastic blockmodels with a growing number of classes Biometrika (2012), 99,2,pp. 273 284 doi: 10.1093/biomet/asr053 C 2012 Biometrika Trust Advance Access publication 17 April 2012 Printed in Great Britain Stochastic blockmodels with a growing number of

More information

Latent Stochastic Actor Oriented Models for Relational Event Data

Latent Stochastic Actor Oriented Models for Relational Event Data Latent Stochastic Actor Oriented Models for Relational Event Data J.A. Lospinoso 12 J.H. Koskinen 2 T.A.B. Snijders 2 1 Network Science Center United States Military Academy 2 Department of Statistics

More information

Web Structure Mining Nodes, Links and Influence

Web Structure Mining Nodes, Links and Influence Web Structure Mining Nodes, Links and Influence 1 Outline 1. Importance of nodes 1. Centrality 2. Prestige 3. Page Rank 4. Hubs and Authority 5. Metrics comparison 2. Link analysis 3. Influence model 1.

More information

Predictive Discrete Latent Factor Models for large incomplete dyadic data

Predictive Discrete Latent Factor Models for large incomplete dyadic data Predictive Discrete Latent Factor Models for large incomplete dyadic data Deepak Agarwal, Srujana Merugu, Abhishek Agarwal Y! Research MMDS Workshop, Stanford University 6/25/2008 Agenda Motivating applications

More information

Statistical Methods for Social Network Dynamics

Statistical Methods for Social Network Dynamics Statistical Methods for Social Network Dynamics Tom A.B. Snijders University of Oxford University of Groningen June, 2016 c Tom A.B. Snijders Oxford & Groningen Methods for Network Dynamics June, 2016

More information

How to display data badly

How to display data badly How to display data badly Karl W Broman Biostatistics & Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Using Microsoft Excel to obscure your data and annoy

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

GLAD: Group Anomaly Detection in Social Media Analysis

GLAD: Group Anomaly Detection in Social Media Analysis GLAD: Group Anomaly Detection in Social Media Analysis Poster #: 1150 Rose Yu, Xinran He and Yan Liu University of Southern California Group Anomaly Detection Anomalous phenomenon in social media data

More information

Assessing the Goodness-of-Fit of Network Models

Assessing the Goodness-of-Fit of Network Models Assessing the Goodness-of-Fit of Network Models Mark S. Handcock Department of Statistics University of Washington Joint work with David Hunter Steve Goodreau Martina Morris and the U. Washington Network

More information

Social Network Notation

Social Network Notation Social Network Notation Wasserman & Faust (1994) Chapters 3 & 4 [pp. 67 166] Marsden (1987) Core Discussion Networks of Americans Junesoo, Xiaohui & Stephen Monday, February 8th, 2010 Wasserman & Faust

More information

Parity Reversion of Absolute Purchasing Power Parity Zhi-bai ZHANG 1,a,* and Zhi-cun BIAN 2,b

Parity Reversion of Absolute Purchasing Power Parity Zhi-bai ZHANG 1,a,* and Zhi-cun BIAN 2,b 2016 3 rd International Conference on Economics and Management (ICEM 2016) ISBN: 978-1-60595-368-7 Parity Reversion of Absolute Purchasing Power Parity Zhi-bai ZHANG 1,a,* and Zhi-cun BIAN 2,b 1,2 School

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

20: Gaussian Processes

20: Gaussian Processes 10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction

More information

CS Homework 3. October 15, 2009

CS Homework 3. October 15, 2009 CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

MULTILEVEL IMPUTATION 1

MULTILEVEL IMPUTATION 1 MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Model-Based Clustering for Social Networks

Model-Based Clustering for Social Networks Model-Based Clustering for Social Networks Mark S. Handcock, Adrian E. Raftery and Jeremy M. Tantrum University of Washington Technical Report no. 42 Department of Statistics University of Washington April

More information

Unified Modeling of User Activities on Social Networking Sites

Unified Modeling of User Activities on Social Networking Sites Unified Modeling of User Activities on Social Networking Sites Himabindu Lakkaraju IBM Research - India Manyata Embassy Business Park Bangalore, Karnataka - 5645 klakkara@in.ibm.com Angshu Rai IBM Research

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information