Sampling and incomplete network data

Size: px
Start display at page:

Download "Sampling and incomplete network data"

Transcription

1 1/58 Sampling and incomplete network data 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington

2 2/58 Network sampling methods It is sometimes difficult to obtain a complete network dataset: the population nodeset is too large; gathering all relational information is too costly; population nodes are hard to reach. In such cases, we need to think carefully how to gather the data (i.e. design the survey); make inference (i.e. estimate and evaluate parameters).

3 /58 Common sampling methods 1. node-induced subgraph sampling 2. edge-induced subgraph sampling 3. egocentric sampling 4. link tracing designs 5. censored nomination schemes

4 4/58 Node-induced subgraph sampling Procedure: 1. Uniformly sample a set s = {s 1,..., s ns } of nodes s {1,..., n}. 2. Observe relations y s between sampled nodes Y s = {y i,j : i s, j s}.

5 5/58 Node-induced subgraph sampling

6 6/58 Node-induced subgraph sampling

7 7/58 Node-induced subgraph sampling

8 8/58 Node-induced subgraph sampling

9 9/58 Node-induced subgraph sampling

10 10/58 Node-induced subgraph sampling

11 11/58 Estimation from sampled data In what ways does Y s resemble Y? For what functions g() will g(y s) estimate g(y)? Consider the following setup: n n sociomatrix Y n n dyadic covariate X d n 1 nodal covariate X n Can we estimate the following from a sample? ȳ = 1 y n(n 1) i,j, x d = 1 x n(n 1) d,i,j, x n = 1 xn,i n i j yx d = 1 n(n 1) i j y i,j x d,i,j, yx n = i j 1 n(n 1) x n,i ȳ i i

12 12/58 Node-induced subgraph sampling y x d x n yx d yx n

13 13/58 Node-induced subgraph sampling For some functions g, the sample value g(y s) is an unbiased estimator of the population value g(y): g(y) = an average of subgraphs of size k, for k n s g(y) = ( 1 h(y n i,j, y j,i ) 2) i<j g(y) = ( 1 h(y n i,j, y j,i, y i,k, y k,i, y j,k, y k,j ) if n s 3 3) i<j<k Why does it work?: Each subgraph of size k appears in the sample with equal probability (although the subgraphs that appear are dependent). Some functions of interest are not of this type: in and outdegree distributions; geodesics, distances, number of paths, etc.

14 14/58 Edge-induced subgraph sampling Procedure: 1. Uniformly sample a set e = {e 1,..., e ne } of edges e {(i, j) : y i,j = 1} 2. Let Y s be the edge-generated subgraph of e.

15 15/58 Edge-induced subgraph sampling

16 16/58 Edge-induced subgraph sampling

17 17/58 Edge-induced subgraph sampling

18 18/58 Edge-induced subgraph sampling

19 19/58 Edge-induced subgraph sampling

20 Edge-induced subgraph sampling 20/58

21 21/58 Edge-induced subgraph sampling How well do these subgraphs represent Y? Can you infer anything about Y from these data?

22 22/58 Edge-induced subgraph sampling y x d x n yx d yx n

23 23/58 Egocentric sampling Procedure: 1. Uniformly sample a set s 1 = {s 1,1,..., s 1,ns } of nodes s 1 {1,..., n}. 2. Observe the relations for each i s 1, i.e. observe {y i,1,..., y i,n }. 3. Let s 2 be the set of nodes having a link from anyone in s 1. Observe the relations of anyone in s 2 to anyone in s 1 s 2. Y s = {y i,j : i, j s 1 s 2} For large graphs, these data can be obtained (with high probability) by asking each i s 1 the following: 1. Who are your friends? 2. Among your friends, which are friends with each other?

24 24/58 Link-tracing designs Snowball sampling: Iteratively repeat the egocentric sampler, obtaining the stage-k nodes s k from the links of s k 1. This is a type of link-tracing design. The links of the current nodes determine who is next to be included in the sample. How will such subgraphs Y s be similar to Y? How will they differ?

25 25/58 Egocentric sampling y x d x n yx d yx n

26 26/58 Egocentric sampling

27 27/58 Egocentric sampling

28 28/58 Egocentric sampling

29 Egocentric sampling 29/58

30 Egocentric sampling 30/58

31 31/58 Egocentric sampling

32 32/58 Egocentric sampling

33 33/58 Egocentric sampling

34 34/58 Egocentric sampling

35 35/58 Egocentric sampling

36 6/58 Inference with egocentric samples Y s is not generally representative of Y. For some statistics, weighted averages based on Y s can be unbiased (Horwitz-Thompson estimator). For many statistics, part of Y s can be used to obtain good estimates: degree distributions can be estimated from degrees of egos; covariate distributions can be estimated from those of the egos; However, use of data from s 2 generally requires a reweighting scheme.

37 37/58 References Snijders (1992), Estimation on the basis of snowball samples: How to Weight? Kolaczyk (2009) Sampling and Estimation in Network Graphs, chapter 5 of Statistical Analysis of Network Data.

38 38/58 Parameter estimation with incomplete sampled data Model: Pr(Y = y θ), θ Θ. Complete data: Y Observed data: Y[O], where O is a set of pairs of indices i 1 j 1 i 2 j 2 O = i 3 j 3.. How can we make inference for θ based on Y[O]? i s j s

39 9/58 Study design and missing data Node-induced subgraph sampling NA NA NA NA NA NA

40 40/58 Study design and missing data Node-induced subgraph sampling NA NA NA NA NA NA

41 41/58 Study design and missing data Node-induced subgraph sampling NA NA NA NA NA NA

42 42/58 Study design and missing data Node-induced subgraph sampling: Observed data NA NA NA NA NA NA 2 NA NA 1 NA 0 NA 3 NA 0 NA NA 0 NA 4 NA NA NA NA NA NA 5 NA 0 1 NA NA NA 6 NA NA NA NA NA NA

43 3/58 Study design and missing data Edge-induced subgraph sampling NA NA NA NA NA NA

44 44/58 Study design and missing data Edge-induced subgraph sampling NA NA NA NA NA NA

45 45/58 Study design and missing data Edge-induced subgraph sampling: Observed data NA NA NA NA NA 1 2 NA NA 1 NA NA NA 3 1 NA NA NA NA NA 4 NA NA NA NA NA NA 5 NA NA 1 NA NA NA 6 NA NA 1 NA NA NA

46 6/58 Study design and missing data Egocentric sampling NA NA NA NA NA NA

47 47/58 Study design and missing data Egocentric sampling NA NA NA NA NA NA

48 48/58 Study design and missing data Egocentric sampling NA NA NA NA NA NA

49 49/58 Study design and missing data Egocentric sampling NA NA NA NA NA NA

50 50/58 Study design and missing data Egocentric sampling NA NA NA NA NA NA 2 0 NA NA 0 NA NA NA 0 4 NA NA NA NA NA NA 5 NA NA NA NA NA NA 6 NA 0 1 NA NA NA

51 1/58 Parameter estimation with missing data If the data are missing at random, i.e. the value of o, what you get to observe, doesn t depend on θ doesn t depend on values of Y, then valid likelihood and Bayesian inference can be obtained from the observed-data likelihood: l MAR (θ : y[o]) = Pr(Y[o] = y[o] : θ) = y[o c ] Pr(Y = y : θ) Inference based on l(θ : y[o]) is provided in amen: put NA s in place of any non-observed relations.

52 52/58 Missing at random designs Which designs we ve discussed correspond to MAR relations? Node-induced subgraph sampling? Edge-induced subgraph sampling? Egocentric sampling?

53 53/58 Ignorable designs While egocentric and other link-tracing designs are not MAR, they still can be analyzed as if they were. The argument is as follows: The data include O = o, the determination of which relations you get to see; Y[O] = y[o], the relationship values for the observable relations. The likelihood is then l(θ : o, y[o]) = Pr(Y[o] = y[o], O = o θ) = Pr(Y[o] = y[o] θ) Pr(O = o θ, Y[o] = y[o]) = l MAR (θ : y[o]) Pr(O = o θ, Y[o] = y[o]) If the design part doesn t depend on θ, then the observed likelihood is proportional to the MAR likelihood, and the design can be ignored.

54 54/58 Ignorable designs l(θ : o, y[o]) = l MAR (θ : y[o]) Pr(O = o θ, Y[o] = y[o]) When is the design ignorable? (MAR) If the probability that O equals o doesn t depend on θ or Y (e.g., node-induced subgraph sampling), the design is ignorable. ID If the probability that O equals o doesn t depend on θ only depends on Y through Y[o]. then the design is ignorable. The latter conditions are often met for link tracing designs, like egocentric and snowball sampling.

55 55/58 References Thompson and Frank (2000) Model-based estimation with link-tracing sampling designs Heitjan and Basu (1996) Distinguishing Missing at Random and Missing Completely at Random

56 56/58 Simulation study - ID likelihoods y i,j = β 0 + β r x n,i + β cx n,j + β d,i,j + a i + b j + ɛ i,j fit.pop = fitted model based on complete network data fit.samp = fitted model based on sampled network data How do the parameter estimates of fit.samp compare to those of fit.pop?

57 57/58 Node-induced subgraph sample n p = 32, n s = β β r β c β d

58 58/58 Egocentric sample n p = 32, n s1 = β β r β c β d

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Sampling and Estimation in Network Graphs

Sampling and Estimation in Network Graphs Sampling and Estimation in Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ March

More information

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff

Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff David Gerard Department of Statistics University of Washington gerard2@uw.edu May 2, 2013 David Gerard (UW)

More information

Modeling networks: regression with additive and multiplicative effects

Modeling networks: regression with additive and multiplicative effects Modeling networks: regression with additive and multiplicative effects Alexander Volfovsky Department of Statistical Science, Duke May 25 2017 May 25, 2017 Health Networks 1 Why model networks? Interested

More information

Dyadic data analysis with amen

Dyadic data analysis with amen Dyadic data analysis with amen Peter D. Hoff May 23, 2017 Abstract Dyadic data on pairs of objects, such as relational or social network data, often exhibit strong statistical dependencies. Certain types

More information

Random Effects Models for Network Data

Random Effects Models for Network Data Random Effects Models for Network Data Peter D. Hoff 1 Working Paper no. 28 Center for Statistics and the Social Sciences University of Washington Seattle, WA 98195-4320 January 14, 2003 1 Department of

More information

Statistics 360/601 Modern Bayesian Theory

Statistics 360/601 Modern Bayesian Theory Statistics 360/601 Modern Bayesian Theory Alexander Volfovsky Lecture 8 - Sept 25, 2018 Monte Carlo 1 2 Monte Carlo approximation Want to compute Z E[ y] = p( y)d without worrying about actual integration...

More information

Assessing the Goodness-of-Fit of Network Models

Assessing the Goodness-of-Fit of Network Models Assessing the Goodness-of-Fit of Network Models Mark S. Handcock Department of Statistics University of Washington Joint work with David Hunter Steve Goodreau Martina Morris and the U. Washington Network

More information

Specification and estimation of exponential random graph models for social (and other) networks

Specification and estimation of exponential random graph models for social (and other) networks Specification and estimation of exponential random graph models for social (and other) networks Tom A.B. Snijders University of Oxford March 23, 2009 c Tom A.B. Snijders (University of Oxford) Models for

More information

A function is a special kind of relation. More precisely... A function f from A to B is a relation on A B such that. f (x) = y

A function is a special kind of relation. More precisely... A function f from A to B is a relation on A B such that. f (x) = y Functions A function is a special kind of relation. More precisely... A function f from A to B is a relation on A B such that for all x A, there is exactly one y B s.t. (x, y) f. The set A is called the

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Latent Stochastic Actor Oriented Models for Relational Event Data

Latent Stochastic Actor Oriented Models for Relational Event Data Latent Stochastic Actor Oriented Models for Relational Event Data J.A. Lospinoso 12 J.H. Koskinen 2 T.A.B. Snijders 2 1 Network Science Center United States Military Academy 2 Department of Statistics

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Consistency Under Sampling of Exponential Random Graph Models

Consistency Under Sampling of Exponential Random Graph Models Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient

More information

Group Representations

Group Representations Group Representations Alex Alemi November 5, 2012 Group Theory You ve been using it this whole time. Things I hope to cover And Introduction to Groups Representation theory Crystallagraphic Groups Continuous

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations

Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations Catalogue no. 11-522-XIE Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations 2004 Proceedings of Statistics Canada

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Weighted Least Squares I

Weighted Least Squares I Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of

More information

Assignment 10. Arfken Show that Stirling s formula is an asymptotic expansion. The remainder term is. B 2n 2n(2n 1) x1 2n.

Assignment 10. Arfken Show that Stirling s formula is an asymptotic expansion. The remainder term is. B 2n 2n(2n 1) x1 2n. Assignment Arfken 5.. Show that Stirling s formula is an asymptotic expansion. The remainder term is R N (x nn+ for some N. The condition for an asymptotic series, lim x xn R N lim x nn+ B n n(n x n B

More information

Statistical Model for Soical Network

Statistical Model for Soical Network Statistical Model for Soical Network Tom A.B. Snijders University of Washington May 29, 2014 Outline 1 Cross-sectional network 2 Dynamic s Outline Cross-sectional network 1 Cross-sectional network 2 Dynamic

More information

Formalism of the Tersoff potential

Formalism of the Tersoff potential Originally written in December 000 Translated to English in June 014 Formalism of the Tersoff potential 1 The original version (PRB 38 p.990, PRB 37 p.6991) Potential energy Φ = 1 u ij i (1) u ij = f ij

More information

Modeling homophily and stochastic equivalence in symmetric relational data

Modeling homophily and stochastic equivalence in symmetric relational data Modeling homophily and stochastic equivalence in symmetric relational data Peter D. Hoff Departments of Statistics and Biostatistics University of Washington Seattle, WA 98195-4322. hoff@stat.washington.edu

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Analyzing Pilot Studies with Missing Observations

Analyzing Pilot Studies with Missing Observations Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Generalized Exponential Random Graph Models: Inference for Weighted Graphs

Generalized Exponential Random Graph Models: Inference for Weighted Graphs Generalized Exponential Random Graph Models: Inference for Weighted Graphs James D. Wilson University of North Carolina at Chapel Hill June 18th, 2015 Political Networks, 2015 James D. Wilson GERGMs for

More information

TEMPORAL EXPONENTIAL- FAMILY RANDOM GRAPH MODELING (TERGMS) WITH STATNET

TEMPORAL EXPONENTIAL- FAMILY RANDOM GRAPH MODELING (TERGMS) WITH STATNET 1 TEMPORAL EXPONENTIAL- FAMILY RANDOM GRAPH MODELING (TERGMS) WITH STATNET Prof. Steven Goodreau Prof. Martina Morris Prof. Michal Bojanowski Prof. Mark S. Handcock Source for all things STERGM Pavel N.

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

ebay/google short course: Problem set 2

ebay/google short course: Problem set 2 18 Jan 013 ebay/google short course: Problem set 1. (the Echange Parado) You are playing the following game against an opponent, with a referee also taking part. The referee has two envelopes (numbered

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Conditional Marginalization for Exponential Random Graph Models

Conditional Marginalization for Exponential Random Graph Models Conditional Marginalization for Exponential Random Graph Models Tom A.B. Snijders January 21, 2010 To be published, Journal of Mathematical Sociology University of Oxford and University of Groningen; this

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

c Copyright 2015 Ke Li

c Copyright 2015 Ke Li c Copyright 2015 Ke Li Degeneracy, Duration, and Co-evolution: Extending Exponential Random Graph Models (ERGM) for Social Network Analysis Ke Li A dissertation submitted in partial fulfillment of the

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

MULTILEVEL LONGITUDINAL NETWORK ANALYSIS

MULTILEVEL LONGITUDINAL NETWORK ANALYSIS MULTILEVEL LONGITUDINAL NETWORK ANALYSIS Tom A.B. Snijders University of Oxford, Nuffield College ICS, University of Groningen Version December, 2013 Tom A.B. Snijders Multilevel Longitudinal Network Analysis

More information

Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data

Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data SUPPLEMENTARY MATERIAL A FACTORED DELETION FOR MAR Table 5: alarm network with MCAR data We now give a more detailed derivation

More information

Statistical Models for Social Networks with Application to HIV Epidemiology

Statistical Models for Social Networks with Application to HIV Epidemiology Statistical Models for Social Networks with Application to HIV Epidemiology Mark S. Handcock Department of Statistics University of Washington Joint work with Pavel Krivitsky Martina Morris and the U.

More information

Math 104: Homework 7 solutions

Math 104: Homework 7 solutions Math 04: Homework 7 solutions. (a) The derivative of f () = is f () = 2 which is unbounded as 0. Since f () is continuous on [0, ], it is uniformly continous on this interval by Theorem 9.2. Hence for

More information

Bayesian Asymptotics

Bayesian Asymptotics BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Bargaining, Information Networks and Interstate

Bargaining, Information Networks and Interstate Bargaining, Information Networks and Interstate Conflict Erik Gartzke Oliver Westerwinter UC, San Diego Department of Political Sciene egartzke@ucsd.edu European University Institute Department of Political

More information

Statistical Methods for Social Network Dynamics

Statistical Methods for Social Network Dynamics Statistical Methods for Social Network Dynamics Tom A.B. Snijders University of Oxford University of Groningen June, 2016 c Tom A.B. Snijders Oxford & Groningen Methods for Network Dynamics June, 2016

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Lecture 14 More on structural estimation

Lecture 14 More on structural estimation Lecture 14 More on structural estimation Economics 8379 George Washington University Instructor: Prof. Ben Williams traditional MLE and GMM MLE requires a full specification of a model for the distribution

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Chapter 4. Parametric Approach. 4.1 Introduction

Chapter 4. Parametric Approach. 4.1 Introduction Chapter 4 Parametric Approach 4.1 Introduction The missing data problem is already a classical problem that has not been yet solved satisfactorily. This problem includes those situations where the dependent

More information

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks

6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks 6.207/14.15: Networks Lecture 16: Cooperation and Trust in Networks Daron Acemoglu and Asu Ozdaglar MIT November 4, 2009 1 Introduction Outline The role of networks in cooperation A model of social norms

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Bayesian Model Comparison

Bayesian Model Comparison BS2 Statistical Inference, Lecture 11, Hilary Term 2009 February 26, 2009 Basic result An accurate approximation Asymptotic posterior distribution An integral of form I = b a e λg(y) h(y) dy where h(y)

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Introduction to statistical analysis of Social Networks

Introduction to statistical analysis of Social Networks The Social Statistics Discipline Area, School of Social Sciences Introduction to statistical analysis of Social Networks Mitchell Centre for Network Analysis Johan Koskinen http://www.ccsr.ac.uk/staff/jk.htm!

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Complex Hadamard matrices and 3-class association schemes

Complex Hadamard matrices and 3-class association schemes Complex Hadamard matrices and 3-class association schemes Akihiro Munemasa 1 1 Graduate School of Information Sciences Tohoku University (joint work with Takuya Ikuta) June 26, 2013 The 30th Algebraic

More information

Newton s method in eigenvalue optimization for incomplete pairwise comparison matrices

Newton s method in eigenvalue optimization for incomplete pairwise comparison matrices Newton s method in eigenvalue optimization for incomplete pairwise comparison matrices Kristóf Ábele-Nagy Eötvös Loránd University (ELTE); Corvinus University of Budapest (BCE) Sándor Bozóki Computer and

More information

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Weilan Yang wyang@stat.wisc.edu May. 2015 1 / 20 Background CATE i = E(Y i (Z 1 ) Y i (Z 0 ) X i ) 2 / 20 Background

More information

Adaptive Web Sampling

Adaptive Web Sampling Biometrics 62, 1224 1234 December 26 DOI: 1.1111/j.1541-42.26.576.x Adaptive Web Sampling Steven K. Thompson Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia

More information

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Harry Crane Based on Chapter 3 of Probabilistic Foundations of Statistical Network Analysis Book website: http://wwwharrycranecom/networkshtml

More information

Bayesian Analysis of Network Data. Model Selection and Evaluation of the Exponential Random Graph Model. Dissertation

Bayesian Analysis of Network Data. Model Selection and Evaluation of the Exponential Random Graph Model. Dissertation Bayesian Analysis of Network Data Model Selection and Evaluation of the Exponential Random Graph Model Dissertation Presented to the Faculty for Social Sciences, Economics, and Business Administration

More information

Department of Statistics. Bayesian Modeling for a Generalized Social Relations Model. Tyler McCormick. Introduction.

Department of Statistics. Bayesian Modeling for a Generalized Social Relations Model. Tyler McCormick. Introduction. A University of Connecticut and Columbia University A models for dyadic data are extensions of the (). y i,j = a i + b j + γ i,j (1) Here, y i,j is a measure of the tie from actor i to actor j. The random

More information

Parameterizing Exponential Family Models for Random Graphs: Current Methods and New Directions

Parameterizing Exponential Family Models for Random Graphs: Current Methods and New Directions Carter T. Butts p. 1/2 Parameterizing Exponential Family Models for Random Graphs: Current Methods and New Directions Carter T. Butts Department of Sociology and Institute for Mathematical Behavioral Sciences

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Graph Detection and Estimation Theory

Graph Detection and Estimation Theory Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Using Potential Games to Parameterize ERG Models

Using Potential Games to Parameterize ERG Models Carter T. Butts p. 1/2 Using Potential Games to Parameterize ERG Models Carter T. Butts Department of Sociology and Institute for Mathematical Behavioral Sciences University of California, Irvine buttsc@uci.edu

More information

Prediction of double gene knockout measurements

Prediction of double gene knockout measurements Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair

More information

Statistical models for dynamics of social networks: inference and applications

Statistical models for dynamics of social networks: inference and applications Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin Session IPS068) p.1231 Statistical models for dynamics of social networks: inference and applications Snijders, Tom A.B. 1 University

More information

Computer Model Calibration: Bayesian Methods for Combining Simulations and Experiments for Inference and Prediction

Computer Model Calibration: Bayesian Methods for Combining Simulations and Experiments for Inference and Prediction Computer Model Calibration: Bayesian Methods for Combining Simulations and Experiments for Inference and Prediction Example The Lyon-Fedder-Mobary (LFM) model simulates the interaction of solar wind plasma

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 13 Measuring and Inferring Traffic Matrices

More information

Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009

Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 Notes on Blackwell s Comparison of Experiments Tilman Börgers, June 29, 2009 These notes are based on Chapter 12 of David Blackwell and M. A.Girshick, Theory of Games and Statistical Decisions, John Wiley

More information

Goodness of Fit of Social Network Models 1

Goodness of Fit of Social Network Models 1 Goodness of Fit of Social Network Models David R. Hunter Pennsylvania State University, University Park Steven M. Goodreau University of Washington, Seattle Mark S. Handcock University of Washington, Seattle

More information

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your answer

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Estimating network degree distributions from sampled networks: An inverse problem

Estimating network degree distributions from sampled networks: An inverse problem Estimating network degree distributions from sampled networks: An inverse problem Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University kolaczyk@bu.edu Introduction: Networks and Degree

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson Sample Geometry Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring

More information

Can a Pseudo Panel be a Substitute for a Genuine Panel?

Can a Pseudo Panel be a Substitute for a Genuine Panel? Can a Pseudo Panel be a Substitute for a Genuine Panel? Min Hee Seo Washington University in St. Louis minheeseo@wustl.edu February 16th 1 / 20 Outline Motivation: gauging mechanism of changes Introduce

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008 Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 21 Cosma Shalizi 3 April 2008 Models of Networks, with Origin Myths Erdős-Rényi Encore Erdős-Rényi with Node Types Watts-Strogatz Small World Graphs Exponential-Family

More information

Jacobi s formula for the derivative of a determinant

Jacobi s formula for the derivative of a determinant Jacobi s formula for the derivative of a determinant Peter Haggstrom www.gotohaggstrom.com mathsatbondibeach@gmail.com January 4, 208 Introduction Suppose the elements of an n n matrix A depend on a parameter

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Complex Numbers and Quaternions for Calc III

Complex Numbers and Quaternions for Calc III Complex Numbers and Quaternions for Calc III Taylor Dupuy September, 009 Contents 1 Introduction 1 Two Ways of Looking at Complex Numbers 1 3 Geometry of Complex Numbers 4 Quaternions 5 4.1 Connection

More information

An Introduction to Exponential-Family Random Graph Models

An Introduction to Exponential-Family Random Graph Models An Introduction to Exponential-Family Random Graph Models Luo Lu Feb.8, 2011 1 / 11 Types of complications in social network Single relationship data A single relationship observed on a set of nodes at

More information

Probability Models of Information Exchange on Networks Lecture 3

Probability Models of Information Exchange on Networks Lecture 3 Probability Models of Information Exchange on Networks Lecture 3 Elchanan Mossel UC Berkeley All Right Reserved The Bayesian View of the Jury Theorem Recall: we assume 0/1 with prior probability (0.5,0.5).

More information