Sampling and Estimation in Network Graphs

Size: px
Start display at page:

Download "Sampling and Estimation in Network Graphs"

Transcription

1 Sampling and Estimation in Network Graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester March 23, 2017 Network Science Analytics Sampling and Estimation in Network Graphs 1

2 Network sampling Network sampling and challenges Background on statistical sampling theory Network graph sampling designs Estimation of network totals and group size Estimation of degree distributions Network Science Analytics Sampling and Estimation in Network Graphs 2

3 Sampling network graphs Measurements often gathered only from a portion of a complex system Ex: social study of high-school class vs. large corporation, Internet Network graph sample from a larger underlying network Goal: use sampled network data to infer properties of the whole system Approach using principles of statistical sampling theory Sampling in network contexts introduces various potential challenges System under study G(V, E) Population graph Random Procedure Available measurements G (V, E ) Sampled graph G often a subgraph of G (i.e., V V, E E), but may not be Network Science Analytics Sampling and Estimation in Network Graphs 3

4 The fundamental problem Suppose a given graph characteristic or summary η(g) is of interest Ex: order Nv, size N e, degree d v, clustering coefficient cl(g),... Typically impossible to recover η(g) exactly from G Q: Can we still form a useful estimate ˆη = ˆη(G ) of η(g)? Plug-in estimator ˆη := η(g ) Boils down to computing the characteristic of interest in G Many familiar estimators in statistical practice are of this type Ex: sample means, standard deviations, covariances, quantiles... Oftentimes η(g ) is a poor representation of η(g) Network Science Analytics Sampling and Estimation in Network Graphs 4

5 Example: Estimating average degreee Let G(V, E) be a network of protein interactions in yeast Characteristic of interest is average degree η(g) = 1 N v i V Here N v = 5, 151, N e = 31, 201 η(g) = Consider two sampling designs to obtain G First sample n vertices V = {i 1,..., i n } without replacement Design 1: For each i V, observe incident edges (i, j) E Design 2: Observe edge (i, j) only when both i, j V Estimate η(g) by averaging the observed degree sequence {d i } i V η(g ) = 1 n d i di i V Network Science Analytics Sampling and Estimation in Network Graphs 5

6 Example: Estimating average degreee (cont.) Random sample of n = 1, 500 vertices, Designs 1 and 2 for edges Process repeated for 10,000 trials histogram of η(g ) Density Design 2 Design Estimate of of average Average Degree degree Under-estimate η(g) for Design 2, but Design 1 on target. Why? Design 1: sample vertex degree explicitly, i.e., d i = d i Design 2: (implicitly) sample vertex degree with bias, i.e., d i n N v d i Network Science Analytics Sampling and Estimation in Network Graphs 6

7 Improving estimation accuracy In order to do better we need to incorporate the effects of Random sampling; and/or Measurement error Sampling design, topology of G, nature of η( ) all critical Model-based inference Likelihood-based and Bayesian paradigms Design-based methods Statistical sampling theory Assume observations made without measurement error Only source of randomness sampling procedure Ex: Estimating average degree Under Design 2 the estimate is biased, with mean of only Adjusting η(g ) upward by a factor Nv = yields 12,115 n Will see how statistical sampling theory justifies this correction Network Science Analytics Sampling and Estimation in Network Graphs 7

8 Background Network sampling and challenges Background on statistical sampling theory Network graph sampling designs Estimation of network totals and group size Estimation of degree distributions Network Science Analytics Sampling and Estimation in Network Graphs 8

9 Statistical sampling theory Suppose we have a population U = {1,..., N u } of N u units Ex: People, animals, objects, vertices,... A value y i is associated with each unit i U Ex: Height, age, gender, infected, membership,... Typical interest in the population totals τ and averages µ τ := y i and µ := 1 y i = 1 τ N u N u i U i U Basic sampling theory paradigm oriented around these steps: S1: Randomly sample n units S = {i 1,..., i n } from U S2: Observe the value y ik for k = 1,..., n S3: Form an unbiased estimator ˆµ of µ, i.e., E [ˆµ] = µ S4: Evaluate or estimate the variance var [ˆµ] Network Science Analytics Sampling and Estimation in Network Graphs 9

10 Inclusion probabilities Def: For given sampling design, the inclusion probability π i of unit i is π i := P (unit i belongs in the sample S) Simple random sampling (SRS): n units sampled uniformly form U Without replacement: i 1 chosen from U, i 2 from U \ {i 1 }, and so on There are ( N u ) n such possible samples of size n There are ( N u 1) n 1 samples which include a given unit i The inclusion probability is π i = ( Nu 1 n 1 ( Nu n ) ) = n N u Network Science Analytics Sampling and Estimation in Network Graphs 10

11 Sample mean estimator Definition of sample mean estimator ˆµ = 1 n Using indicator RVs I {i S} for i U, where E [I {i S}] = π i [ ] [ ] 1 1 N u E [ˆµ] = E y i = E y i I {i S} n n = 1 n N u i=1 i S i S y i i=1 y i E [I {i S}] = 1 n N u i=1 y i π i SRS without replacement unbiased because π i = n N u Unequal probability sampling More common than SRS, especially with networks. (More soon) Sample mean can be a poor (i.e., biased) estimator for µ Network Science Analytics Sampling and Estimation in Network Graphs 11

12 Horvitz-Thompson estimation for totals Idea: weighted average using inclusion probabilities as weights Horvitz-Thompson (HT) estimator ˆµ π = 1 N u i S y i π i and ˆτ π = N u ˆµ π Remedies the bias problem E [ˆµ π ] = 1 N u N u i=1 y i π i E [I {i S}] = 1 N u Size of the population N u assumed known N u i=1 y i = µ Broad applicability, but π i may be difficult to compute Network Science Analytics Sampling and Estimation in Network Graphs 12

13 Horvitz-Thompson estimator variance Def: Joint inclusion probability π ij of units i and j is π ij := P (units i and j belong in the sample S) If inclusion of units i and j are independent events π ij = π i π j Ex: Simple random sampling without replacement yields π ij = Variance of the HT estimator: var [ˆτ π ] = i U j U n(n 1) N u (N u 1) y i y j ( πij π i π j 1 ), var [ˆµ π ] = var [ˆτ π] N 2 u Typically estimated in an unbiased fashion from the sample S Network Science Analytics Sampling and Estimation in Network Graphs 13

14 Probability proportional to size sampling Unequal probability sampling n units selected w.r.t. a distribution {p 1,..., p Nu } on U Uniform sampling: special case with p i = 1 N u Probability proportional to size (PPS) sampling for all i U Probabilities p i proportional to a characteristic c i Ex: households chosen by drawing names from a database If sampling with replacement, PPS inclusion probabilities are π i = 1 (1 p i ) n, where p i = c i k c k Joint inclusion probabilities for variance calculations π ij = π i + π j [1 (1 p i p j ) n ] Network Science Analytics Sampling and Estimation in Network Graphs 14

15 Estimation of group size So far implicitly assumed N u known Often not the case! Ex: endangered animal species, people at risk of rare disease Special population total often of interest is the group size N u = 1 i U Suggests the following HT estimator of N u ˆN u = i S π 1 i Infeasible, since knowledge of N u needed to compute π i Network Science Analytics Sampling and Estimation in Network Graphs 15

16 Capture-recapture estimator Capture-recapture estimators overcome HT limitations in this setting Two rounds of SRS without replacement Two samples S 1, S 2 Round 1: Mark all units in sample S 1 of size n 1 from U Ex: tagging a fish, noting the ID number... All units in S1 are returned to the population Round 2: Obtain a sample S 2 of size n 2 from U Capture-recapture estimator of N u ˆN u := n 2 m n 1, where m := S 1 S 2 Factor m/n 2 indicative of marked fraction of the overall population Can derive using model-based arguments as an ML estimator Network Science Analytics Sampling and Estimation in Network Graphs 16

17 Common network graph sampling designs Network sampling and challenges Background on statistical sampling theory Network graph sampling designs Estimation of network totals and group size Estimation of degree distributions Network Science Analytics Sampling and Estimation in Network Graphs 17

18 Graph sampling designs Q: What are common designs for sampling a network graph G? A: Will see a few examples, along with their inclusion probabilities π i Graph-based sampling designs Two inter-related classes of units, vertices i and edges (i, j) Often two stages Selection among one class of units (e.g., vertices) Observation of units from the other class (e.g., edges) Inclusion probabilities offer insight into the nature of the designs Central to HT estimators of network graph characteristics η(g) Network Science Analytics Sampling and Estimation in Network Graphs 18

19 Induced subgraph sampling S) Sample n vertices V = {i 1,..., i n } without replacement (SRS) O) Observe edges (i, j) E only when both i, j V (induced by V ) Ex: construction of contact networks in social network research Vertex and edge inclusion probabilities are uniformly equal to π i = n N v and π {i,j} = n(n 1) N v (N v 1) Network Science Analytics Sampling and Estimation in Network Graphs 19

20 Incident subgraph sampling Consider a complementary design to induced subgraph sampling S) Sample n edges E without replacement (SRS) O) Observe vertices i V incident to those selected edges in E Ex: construction of sampled telephone call graphs Network Science Analytics Sampling and Estimation in Network Graphs 20

21 Inclusion probabilities For incident subgraph sampling, edge inclusion probabilities are π {i,j} = n N e Vertex in V if any one or more of its incident edges are sampled π i = P (vertex i is sampled) = 1 P (no edge incident to i is sampled) { ( Ne di ) n 1 ( = Ne ), if n N e d i n 1, if n > N e d i Vertices included with unequal probs. that depend on their degrees Probability proportional to size (degree) sampling of vertices Requires knowledge of N e and degree sequence {d i } i V Network Science Analytics Sampling and Estimation in Network Graphs 21

22 Snowball sampling S) Sample n vertices V 0 = {i 1,..., i n } without replacement (SRS) O1) Observe edges E0 incident to each i V 0, forming the initial wave O2) Observe neighbors N (V 0 ) of i V 0, i.e., V 1 = N (V 0 ) (V 0 )c Iterate to a desired number of e.g., k waves, or until Vk empty G has V = V0 V 1... V k, and their incident edges Ex: spiders or crawlers to discover the WWW s structure Network Science Analytics Sampling and Estimation in Network Graphs 22

23 Star sampling Difficult to compute inclusion probabilities beyond a single wave Single-wave snowball sampling reduces to star sampling Unlabeled: V = V0 and E = E0 their incident edges Ex: Count all co-authors of n sampled authors Vertex inclusion probabilities are simply πi = n/n v Labeled: V = V0 (N (V 0 ) (V 0 )c ) and E = E0 Ex: Count and identify all co-authors of n sampled authors Vertex inclusion probabilities can be shown to look like π i = L N i ( 1) L +1 P (L), where P (L) = ( Nv L n L ( Nv ) n ) Denoted by Ni the neighborhood of vertex i (including i itself) Network Science Analytics Sampling and Estimation in Network Graphs 23

24 Link tracing Link-tracing designs Select an initial sample of vertices VS Trace edges (links) from Vs to another set of vertices VT Snowball sampling: special case where all incident edges are traced May be infeasible to follow all incident edges to a given vertex Ex: lack of recollection/deception in social contact networks Path sampling designs Source nodes V S = {s 1,..., s ns } V Target nodes V T = {t 1,..., t nt } V \ V S Traverse and measure the path between each pair (s i, t j ) Ex: Traceroute Internet studies, Milgram s Six Degrees experiment Network Science Analytics Sampling and Estimation in Network Graphs 24

25 Traceroute sampling Trace shortest paths from each source to all targets s1 s2 t1 t2 Vertex and edge inclusion probabilities roughly [Dall Asta et al 06]: π i 1 (1 ρ S ρ T )e ρ S ρ T c Be (i) and π {i,j} 1 e ρ S ρ T c Be ({i,j}) Source and target sampling fractions ρ S := n S /N v and ρ T := n T /N v Induces PPS sampling, size given by betweenness centralities Network Science Analytics Sampling and Estimation in Network Graphs 25

26 Estimation of totals in network graphs Network sampling and challenges Background on statistical sampling theory Network graph sampling designs Estimation of network totals and group size Estimation of degree distributions Network Science Analytics Sampling and Estimation in Network Graphs 26

27 Network summaries as totals Various graph summaries η(g) are expressible in terms of totals τ Average degree: Let U = V and y i = d i, then η(g) = d i V d i Graph size: Let U = E and y ij = 1, then η(g) = N e = (i,j) E 1 Betweenness centrality: Let U = V (2) (unordered vertex pairs) and y ij = I { k P (i,j) }. For unique shortest i j paths P(i,j), then η(g) = c Be (k) = (i,j) V (2) I { k P(i,j) } Clustering coefficient: Let U = V (3) (unordered vertex triples), then η(g) = cl(g) = 3 total number of triangles total number of connected triples Often such totals can be obtained from sampled G via HT estimation Network Science Analytics Sampling and Estimation in Network Graphs 27

28 Vertex totals Vertex totals are of the form τ = i V y i, averages are τ/n v Ex: average degree where yi = d i Ex: nodes with characteristic C, where yi = I {i C} Given a sample V V, the HT estimator for vertex totals is ˆτ π = y i π i i V Variance expressions carry over, let U = V and V for estimates Inclusion probabilities π i depend on network sampling design Sampling also influences whether y i is observable, e.g., y i = d i Network Science Analytics Sampling and Estimation in Network Graphs 28

29 Totals on vertex pairs Quantity y ij corresponding to vertex pairs (i, j) V (2) of interest Totals τ = (i,j) V (2) y ij become relevant Ex: graph size Ne and betweenness c Be (k) where y ij = I { k P (i,j) } Ex: shared gender in friendship network, average dissimilarity The HT estimator in this context is ˆτ π = y ij π ij (i,j) V (2) Edge totals a special case, when y ij 0 only for (i, j) E Variance expression increasingly complicated, namely var [ˆτ π ] = ( ) πijkl y ik y kl 1 π ij π kl (i,j) V (2) (k,l) V (2) Depends on inclusion probabilities π ijkl of vertex quadruples Network Science Analytics Sampling and Estimation in Network Graphs 29

30 Example: Estimating network size Consider estimating N e as an edge total, i.e., N e = 1 = (i,j) E (i,j) V (2) A ij Bernoulli sampling (BS): I {i V } Ber(p) i.i.d. for all i V Edges E obtained via induced subgraph sampling π ij = p 2 The HT estimator of N e is ˆN e = (i,j) V (2) A ij π ij = p 2 N e Scales up the empirically observed edge total N e by p 2 > 1 Variance can be shown to take the form [Frank 77] [ ] var ˆN e = (p 1 1) di 2 + (p 2 2p 1 + 1)N e i V Network Science Analytics Sampling and Estimation in Network Graphs 30

31 Example: Estimating network size (cont.) Protein network: N v = 5, 151, N e = 31, 201 BS of vertices with p = 0.1 and p = 0.3 Process repeated for 10,000 trials histogram of ˆN e Density 0e+00 6e Density 0e+00 3e N^ e, for p=0.10 se^ (N^ e), for p=0.10 Density Density N^ e, for p=0.30 se^ (N^ e), for p=0.30 Average of ˆN e was 31, 116 and 31, 203 Unbiasedness supported Mean and variability of ŝe shrinks with p (larger sample) Network Science Analytics Sampling and Estimation in Network Graphs 31

32 Example: Estimating clustering coefficient Average clustering coefficient cl(g) can be expressed as cl(g) = 3 τ (G) τ 3 (G) Involves the quotient of two totals on vertex triples τ = (i,j,k) V (3) y ijk ˆτ π = Total number of triangles τ (G), where (i,j,k) V (3) y ijk π ijk y ijk = A ij A jk A ki Total number of connected triples τ 3 (G), where y ijk = A ij A jk + A ij A ki + A jk A ki Network Science Analytics Sampling and Estimation in Network Graphs 32

33 Example: Estimating clustering coefficient (cont.) Protein network: τ (G) = 44, 858, τ 3 (G) 1M, and cl(g) = BS of vertices with p = 0.2 Induced subgraph sampling of edges Process repeated for 10,000 trials histogram of ĉl(g) Density Estimated Clustering Coefficient Unbiased HT estimators ˆτ = p 3 τ (G ) and ˆτ 3 = p 3 τ 3 (G ) Plug-in estimator ĉl(g) = 3ˆτ /ˆτ 3 results in ĉl(g) = cl(g ) Quite accurate with mean and ŝe of Network Science Analytics Sampling and Estimation in Network Graphs 33

34 Caveat emptor Horvitz-Thompson framework fairly straightforward in its essence Success in network sampling and estimation rests on interaction among a) Sampling design; b) Measurements taken; and c) Total to be estimated Three basic elements must be present in the problem 1) Network summary statistic η(g) expressible as total; 2) Values y either observed, or obtainable from measurements; and 3) Inclusion probabilities π computable for the sampling design Unfortunately, often not all three are present at the same time... Network Science Analytics Sampling and Estimation in Network Graphs 34

35 Example: Estimating average degreee Recall our first example on estimation of average degree 1 N v i V d i Design 1: Unlabeled star sampling, observes degrees di, i V Design 2: Induced subgraph sampling, does not observe degrees Average degree is a scaling of a vertex total (N v known) HT estimation applicable so long as y i = d i observed True for unlabeled star sampling, and since π i = n/n v we have ˆµ St = ˆτ St N v, where ˆτ St = i V St d i n/n v We do not observe d i under induced subgraph sampling Not amenable to HT estimation as vertex total for this design Network Science Analytics Sampling and Estimation in Network Graphs 35

36 Example: Estimating average degreee (cont.) Identity µ = 2Ne N v For induced subgraph sampling π ij = ˆN e,is = Tackle instead as estimation of network size N e n(n 1) N v (N v 1), so HT estimator is A ij n(n 1)/[N v (N v 1)] = N v (N v 1) n(n 1) N e,is (i,j) V (2) Desired unbiased estimator for the average degree is ˆµ IS = 2 ˆN e,is N v Estimators under both designs can be compared by writing them as ˆµ St = 2N e,st n and ˆµ IS = 2N e,is. N v 1 n n 1 Design 1: uses the identity µ = 2Ne Design 2: same but inflated by N v on G St Nv 1 n 1, compensates d i,is < d i Network Science Analytics Sampling and Estimation in Network Graphs 36

37 Estimation of network group size Assuming that N v is known may not be on safe grounds Human or animal groups too mobile or elusive to count accurately All Web pages or Internet routers are too massive and dispersed Often estimating N v may well be the prime objective If vertex SRS or BS feasible, could sample twice marking in between Facilitates usage of capture-recapture estimators off-the-shelf If sampling infeasible, or capture-recapture performs poorly Develop estimators of N v tailored to the graph sampling at hand Network Science Analytics Sampling and Estimation in Network Graphs 37

38 Estimating the size of a hidden population Hidden population: individuals do not wish to expose themselves Ex: humans of socially sensitive status, such as homeless Ex: involved in socially sensitive activities, e.g., drugs, prostitution Such groups are often small Estimating their size is challenging Snowball sampling used to estimate the size of hidden populations O. Frank and T. Snijders, Estimating the size of hidden populations using snowball sampling, J. Official Stats., vol. 10, pp , 1994 Network Science Analytics Sampling and Estimation in Network Graphs 38

39 Sampling a hidden population Directed graph G(V, E), V the members of the hidden population Graph describing willingness to identify other members Arc (i, j) when ask individual i, mentions j as a member Graph G obtained via one-wave snowball sampling, i.e., V = V0 V 1 Initial sample V0 obtained via BS from V with probability p 0 Consider the following random variables (RVs) of interest N = V 0 : size of the initial sample M1: number of arcs among individuals in V0 M2: number of arcs from individuals in V0 to individuals in V1 Snowball sampling yields measurements n, m 1, and m 2 of these RVs Network Science Analytics Sampling and Estimation in Network Graphs 39

40 Method of moments estimator Method of moments: equate moments to sample counterparts [ ] E [N] = E I {i V0 } = N v p 0 = n E [M 1 ] = E j E [M 2 ] = E j i I {i V0 }I {j V0 }A ij = N e p0= 2 m 1 i j I {i V0 }I {j / V0 }A ij = N e p 0 (1 p 0 )= m 2 i j Expectation w.r.t. randomness in selecting the sample V 0. Solution: ( ) m1 + m 2 ˆN v = n m 1 Size of initial sample inflated by estimate of the sampling rate Network Science Analytics Sampling and Estimation in Network Graphs 40

41 Estimation of degree distributions Network sampling and challenges Background on statistical sampling theory Network graph sampling designs Estimation of network totals and group size Estimation of degree distributions Network Science Analytics Sampling and Estimation in Network Graphs 41

42 Estimation of other network characteristics Classical sampling theory rests heavily on Horvitz-Thompson theory Scope limited to network totals Q: Other network summaries, e.g., degree distributions? Findings on the effect of sampling on observed degree distributions: Highly unrepresentative of actual degree distributions; and Unhelpful to characterizing heterogeneous distributions Ex: Internet traceroute sampling [Lakhina et al 03] Broad degree distribution in G, while concentrated in G Ex: Sampling protein-protein interaction networks [Han et al 05] Power-law exponent estimate from G underestimates α in G Network Science Analytics Sampling and Estimation in Network Graphs 42

43 Impact of sampling on degree distribution Let N(d) denote the number of vertices with degree d in G Let N (d) be the counterpart in a sampled graph G Introduce vectors n = [N(0),..., N(d max )] and likewise n Under a variety of sampling designs, it holds that E [n ] = Pn Matrix P depends fully on the sampling, not G itself Expectation w.r.t. randomness in selecting the sample G O. Frank, Estimation of the number of vertices of different degrees in a graph, J. Stat. Planning and Inference, vol. 4, pp , 1980 Network Science Analytics Sampling and Estimation in Network Graphs 43

44 An inverse problem Recall the identity E [n ] = Pn Face a linear inverse problem Unbiased estimator of the degree distribution n ˆn naive = P 1 n While natural, two problems with this simple solution Matrix P typically not invertible in practice; and Non-negativity of the solution is not guaranteed We actually have an ill-posed linear inverse problem Network Science Analytics Sampling and Estimation in Network Graphs 44

45 Performance of naive estimator Erdös-Renyi graph with N v = 100 and N e = 500 BS of vertices with p = 0.6 Induced subgraph sampling of edges Network Science Analytics Sampling and Estimation in Network Graphs 45

46 Penalized least-squares formulation Constrained, penalized, weighted least-squares [Zhang et al 14] min (Pn n ) C 1 (Pn n ) + λpen(n) n s. to N(d) 0, d = 0, 1,..., d max, d max N(d) = N v d=1 Matrix C denotes the covariance of n Functional pen(n) penalizes complexity in n, tuned by λ Constraints Non-negativity of degree counts Total degree counts equal the number of vertices Smoothness: pen(n) = Dn 2, D differentiating operator Network Science Analytics Sampling and Estimation in Network Graphs 46

47 Application to online social networks Communities from online social networks Orkut and LiveJournal BS of vertices with p = 0.3 Induced subgraph sampling of edges True, sampled, and estimated degree distribution Network Science Analytics Sampling and Estimation in Network Graphs 47

48 Glossary Enumeration and samping Population graph Sampled graph Plug-in estimator Sampling design Sample with(out) replacement Design-based methods Averages and totals Inclusion probability Simple random sampling Bernoulli sampling Unequal probability sampling Horvitz-Thompson estimator Probability proportional to size sampling Capture-recapture estimator Induced subgraph sampling Incident subgraph sampling Snowball and star sampling Traceroute sampling Hidden population Ill-posed inverse problem Penalized least squares Network Science Analytics Sampling and Estimation in Network Graphs 48

Estimating network degree distributions from sampled networks: An inverse problem

Estimating network degree distributions from sampled networks: An inverse problem Estimating network degree distributions from sampled networks: An inverse problem Eric D. Kolaczyk Dept of Mathematics and Statistics, Boston University kolaczyk@bu.edu Introduction: Networks and Degree

More information

Sampling and incomplete network data

Sampling and incomplete network data 1/58 Sampling and incomplete network data 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/58 Network sampling methods It is sometimes difficult to obtain a

More information

Indirect Sampling in Case of Asymmetrical Link Structures

Indirect Sampling in Case of Asymmetrical Link Structures Indirect Sampling in Case of Asymmetrical Link Structures Torsten Harms Abstract Estimation in case of indirect sampling as introduced by Huang (1984) and Ernst (1989) and developed by Lavalle (1995) Deville

More information

Sampling. Jian Pei School of Computing Science Simon Fraser University

Sampling. Jian Pei School of Computing Science Simon Fraser University Sampling Jian Pei School of Computing Science Simon Fraser University jpei@cs.sfu.ca INTRODUCTION J. Pei: Sampling 2 What Is Sampling? Select some part of a population to observe estimate something about

More information

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy. 5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability

More information

Networks: Lectures 9 & 10 Random graphs

Networks: Lectures 9 & 10 Random graphs Networks: Lectures 9 & 10 Random graphs Heather A Harrington Mathematical Institute University of Oxford HT 2017 What you re in for Week 1: Introduction and basic concepts Week 2: Small worlds Week 3:

More information

Continuous-time Markov Chains

Continuous-time Markov Chains Continuous-time Markov Chains Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ October 23, 2017

More information

Specification and estimation of exponential random graph models for social (and other) networks

Specification and estimation of exponential random graph models for social (and other) networks Specification and estimation of exponential random graph models for social (and other) networks Tom A.B. Snijders University of Oxford March 23, 2009 c Tom A.B. Snijders (University of Oxford) Models for

More information

Gaussian, Markov and stationary processes

Gaussian, Markov and stationary processes Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November

More information

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,

More information

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

6.207/14.15: Networks Lecture 12: Generalized Random Graphs 6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model

More information

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

More information

1 Complex Networks - A Brief Overview

1 Complex Networks - A Brief Overview Power-law Degree Distributions 1 Complex Networks - A Brief Overview Complex networks occur in many social, technological and scientific settings. Examples of complex networks include World Wide Web, Internet,

More information

MIXED MODELS THE GENERAL MIXED MODEL

MIXED MODELS THE GENERAL MIXED MODEL MIXED MODELS This chapter introduces best linear unbiased prediction (BLUP), a general method for predicting random effects, while Chapter 27 is concerned with the estimation of variances by restricted

More information

Tutorial on Machine Learning for Advanced Electronics

Tutorial on Machine Learning for Advanced Electronics Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling

More information

Cluster Sampling 2. Chapter Introduction

Cluster Sampling 2. Chapter Introduction Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Uncovering structure in biological networks: A model-based approach

Uncovering structure in biological networks: A model-based approach Uncovering structure in biological networks: A model-based approach J-J Daudin, F. Picard, S. Robin, M. Mariadassou UMR INA-PG / ENGREF / INRA, Paris Mathématique et Informatique Appliquées Statistics

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 Figures are taken from: M.E.J. Newman, Networks: An Introduction 2

More information

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling

Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Probabilistic Foundations of Statistical Network Analysis Chapter 3: Network sampling Harry Crane Based on Chapter 3 of Probabilistic Foundations of Statistical Network Analysis Book website: http://wwwharrycranecom/networkshtml

More information

1 Mechanistic and generative models of network structure

1 Mechanistic and generative models of network structure 1 Mechanistic and generative models of network structure There are many models of network structure, and these largely can be divided into two classes: mechanistic models and generative or probabilistic

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Complex networks: an introduction

Complex networks: an introduction Alain Barrat Complex networks: an introduction CPT, Marseille, France ISI, Turin, Italy http://www.cpt.univ-mrs.fr/~barrat http://cxnets.googlepages.com Plan of the lecture I. INTRODUCTION II. I. Networks:

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Blind Identification of Invertible Graph Filters with Multiple Sparse Inputs 1

Blind Identification of Invertible Graph Filters with Multiple Sparse Inputs 1 Blind Identification of Invertible Graph Filters with Multiple Sparse Inputs Chang Ye Dept. of ECE and Goergen Institute for Data Science University of Rochester cye7@ur.rochester.edu http://www.ece.rochester.edu/~cye7/

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations

Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations Catalogue no. 11-522-XIE Statistics Canada International Symposium Series - Proceedings Symposium 2004: Innovative Methods for Surveying Difficult-to-reach Populations 2004 Proceedings of Statistics Canada

More information

Lecture 3: Simulating social networks

Lecture 3: Simulating social networks Lecture 3: Simulating social networks Short Course on Social Interactions and Networks, CEMFI, May 26-28th, 2014 Bryan S. Graham 28 May 2014 Consider a network with adjacency matrix D = d and corresponding

More information

Graph Detection and Estimation Theory

Graph Detection and Estimation Theory Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and

More information

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH Awanis Ku Ishak, PhD SBM Sampling The process of selecting a number of individuals for a study in such a way that the individuals represent the larger

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Mixed-Models. version 30 October 2011

Mixed-Models. version 30 October 2011 Mixed-Models version 30 October 2011 Mixed models Mixed models estimate a vector! of fixed effects and one (or more) vectors u of random effects Both fixed and random effects models always include a vector

More information

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA)

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA) Szemerédi s regularity lemma revisited Lewis Memorial Lecture March 14, 2008 Terence Tao (UCLA) 1 Finding models of large dense graphs Suppose we are given a large dense graph G = (V, E), where V is a

More information

Adventures in random graphs: Models, structures and algorithms

Adventures in random graphs: Models, structures and algorithms BCAM January 2011 1 Adventures in random graphs: Models, structures and algorithms Armand M. Makowski ECE & ISR/HyNet University of Maryland at College Park armand@isr.umd.edu BCAM January 2011 2 Complex

More information

3.2 Configuration model

3.2 Configuration model 3.2 Configuration model 3.2.1 Definition. Basic properties Assume that the vector d = (d 1,..., d n ) is graphical, i.e., there exits a graph on n vertices such that vertex 1 has degree d 1, vertex 2 has

More information

Network models: random graphs

Network models: random graphs Network models: random graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis

More information

Efficient MCMC Samplers for Network Tomography

Efficient MCMC Samplers for Network Tomography Efficient MCMC Samplers for Network Tomography Martin Hazelton 1 Institute of Fundamental Sciences Massey University 7 December 2015 1 Email: m.hazelton@massey.ac.nz AUT Mathematical Sciences Symposium

More information

arxiv: v2 [stat.ap] 9 Jun 2018

arxiv: v2 [stat.ap] 9 Jun 2018 Estimating Shortest Path Length Distributions via Random Walk Sampling Minhui Zheng and Bruce D. Spencer arxiv:1806.01757v2 [stat.ap] 9 Jun 2018 Department of Statistics, Northwestern University June 12,

More information

Network Degree Distribution Inference Under Sampling

Network Degree Distribution Inference Under Sampling Network Degree Distribution Inference Under Sampling Aleksandrina Goeva 1 Eric Kolaczyk 1 Rich Lehoucq 2 1 Department of Mathematics and Statistics, Boston University 2 Sandia National Labs August 18,

More information

PageRank: Ranking of nodes in graphs

PageRank: Ranking of nodes in graphs PageRank: Ranking of nodes in graphs Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ October

More information

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks

A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks A Tunable Mechanism for Identifying Trusted Nodes in Large Scale Distributed Networks Joydeep Chandra 1, Ingo Scholtes 2, Niloy Ganguly 1, Frank Schweitzer 2 1 - Dept. of Computer Science and Engineering,

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Part 7: Hierarchical Modeling

Part 7: Hierarchical Modeling Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

CRISP: Capture-Recapture Interactive Simulation Package

CRISP: Capture-Recapture Interactive Simulation Package CRISP: Capture-Recapture Interactive Simulation Package George Volichenko Carnegie Mellon University Pittsburgh, PA gvoliche@andrew.cmu.edu December 17, 2012 Contents 1 Executive Summary 1 2 Introduction

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Shortest Paths & Link Weight Structure in Networks

Shortest Paths & Link Weight Structure in Networks Shortest Paths & Link Weight Structure in etworks Piet Van Mieghem CAIDA WIT (May 2006) P. Van Mieghem 1 Outline Introduction The Art of Modeling Conclusions P. Van Mieghem 2 Telecommunication: e2e A ETWORK

More information

You are allowed 3? sheets of notes and a calculator.

You are allowed 3? sheets of notes and a calculator. Exam 1 is Wed Sept You are allowed 3? sheets of notes and a calculator The exam covers survey sampling umbers refer to types of problems on exam A population is the entire set of (potential) measurements

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1

A = A U. U [n] P(A U ). n 1. 2 k(n k). k. k=1 Lecture I jacques@ucsd.edu Notation: Throughout, P denotes probability and E denotes expectation. Denote (X) (r) = X(X 1)... (X r + 1) and let G n,p denote the Erdős-Rényi model of random graphs. 10 Random

More information

Nonparametric Bayesian Matrix Factorization for Assortative Networks

Nonparametric Bayesian Matrix Factorization for Assortative Networks Nonparametric Bayesian Matrix Factorization for Assortative Networks Mingyuan Zhou IROM Department, McCombs School of Business Department of Statistics and Data Sciences The University of Texas at Austin

More information

Adaptive Web Sampling

Adaptive Web Sampling Biometrics 62, 1224 1234 December 26 DOI: 1.1111/j.1541-42.26.576.x Adaptive Web Sampling Steven K. Thompson Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Reducing contextual bandits to supervised learning

Reducing contextual bandits to supervised learning Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing

More information

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E OBJECTIVE COURSE Understand the concept of population and sampling in the research. Identify the type

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

JIGSAW PERCOLATION ON ERDÖS-RÉNYI RANDOM GRAPHS

JIGSAW PERCOLATION ON ERDÖS-RÉNYI RANDOM GRAPHS JIGSAW PERCOLATION ON ERDÖS-RÉNYI RANDOM GRAPHS ERIK SLIVKEN Abstract. We extend the jigsaw percolation model to analyze graphs where both underlying people and puzzle graphs are Erdös-Rényi random graphs.

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Intro sessions to SNAP C++ and SNAP.PY: SNAP.PY: Friday 9/27, 4:5 5:30pm in Gates B03 SNAP

More information

MN 400: Research Methods. CHAPTER 7 Sample Design

MN 400: Research Methods. CHAPTER 7 Sample Design MN 400: Research Methods CHAPTER 7 Sample Design 1 Some fundamental terminology Population the entire group of objects about which information is wanted Unit, object any individual member of the population

More information

45 6 = ( )/6! = 20 6 = selected subsets: 6 = and 10

45 6 = ( )/6! = 20 6 = selected subsets: 6 = and 10 Homework 1. Selected Solutions, 9/12/03. Ch. 2, # 34, p. 74. The most natural sample space S is the set of unordered -tuples, i.e., the set of -element subsets of the list of 45 workers (labelled 1,...,

More information

Overall Plan of Simulation and Modeling I. Chapters

Overall Plan of Simulation and Modeling I. Chapters Overall Plan of Simulation and Modeling I Chapters Introduction to Simulation Discrete Simulation Analytical Modeling Modeling Paradigms Input Modeling Random Number Generation Output Analysis Continuous

More information

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee

Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee Mining Newsgroups Using Networks Arising From Social Behavior by Rakesh Agrawal et al. Presented by Will Lee wwlee1@uiuc.edu September 28, 2004 Motivation IR on newsgroups is challenging due to lack of

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

Statistical analysis of biological networks.

Statistical analysis of biological networks. Statistical analysis of biological networks. Assessing the exceptionality of network motifs S. Schbath Jouy-en-Josas/Evry/Paris, France http://genome.jouy.inra.fr/ssb/ Colloquium interactions math/info,

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

arxiv: v4 [stat.me] 28 May 2015

arxiv: v4 [stat.me] 28 May 2015 The Annals of Applied Statistics 2015, Vol. 9, No. 1, 166 199 DOI: 10.1214/14-AOAS800 c Institute of Mathematical Statistics, 2015 arxiv:1305.4977v4 [stat.me] 28 May 2015 ESTIMATING NETWORK DEGREE DISTRIBUTIONS

More information

Non-uniform coverage estimators for distance sampling

Non-uniform coverage estimators for distance sampling Abstract Non-uniform coverage estimators for distance sampling CREEM Technical report 2007-01 Eric Rexstad Centre for Research into Ecological and Environmental Modelling Research Unit for Wildlife Population

More information

Basic Sampling Methods

Basic Sampling Methods Basic Sampling Methods Sargur Srihari srihari@cedar.buffalo.edu 1 1. Motivation Topics Intractability in ML How sampling can help 2. Ancestral Sampling Using BNs 3. Transforming a Uniform Distribution

More information

Adaptive Web Sampling

Adaptive Web Sampling Adaptive Web Sampling To appear in Biometrics, 26 Steven K. Thompson Department of Statistics and Actuarial Science Simon Fraser University, Burnaby, BC V5A 1S6 CANADA email: thompson@stat.sfu.ca Abstract

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Unequal Probability Designs

Unequal Probability Designs Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014 Section 7.11 and 7.12 Probability Sampling Designs: A quick review A probability

More information

Double-Sampling Designs for Dropouts

Double-Sampling Designs for Dropouts Double-Sampling Designs for Dropouts Dave Glidden Division of Biostatistics, UCSF 26 February 2010 http://www.epibiostat.ucsf.edu/dave/talks.html Outline Mbarra s Dropouts Future Directions Inverse Prob.

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss;

Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; BFF4, May 2, 2017 Empirical Bayes Quantile-Prediction aka E-B Prediction under Check-loss; Lawrence D. Brown Wharton School, Univ. of Pennsylvania Joint work with Gourab Mukherjee and Paat Rusmevichientong

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Small Area Confidence Bounds on Small Cell Proportions in Survey Populations

Small Area Confidence Bounds on Small Cell Proportions in Survey Populations Small Area Confidence Bounds on Small Cell Proportions in Survey Populations Aaron Gilary, Jerry Maples, U.S. Census Bureau U.S. Census Bureau Eric V. Slud, U.S. Census Bureau Univ. Maryland College Park

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore

Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore AN EMPIRICAL LIKELIHOOD BASED ESTIMATOR FOR RESPONDENT DRIVEN SAMPLED DATA Sanjay Chaudhuri Department of Statistics and Applied Probability, National University of Singapore Mark Handcock, Department

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Survey Sample Methods

Survey Sample Methods Survey Sample Methods p. 1/54 Survey Sample Methods Evaluators Toolbox Refreshment Abhik Roy & Kristin Hobson abhik.r.roy@wmich.edu & kristin.a.hobson@wmich.edu Western Michigan University AEA Evaluation

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Chapter 3. Erdős Rényi random graphs

Chapter 3. Erdős Rényi random graphs Chapter Erdős Rényi random graphs 2 .1 Definitions Fix n, considerthe set V = {1,2,...,n} =: [n], andput N := ( n 2 be the numberofedgesonthe full graph K n, the edges are {e 1,e 2,...,e N }. Fix also

More information

Optimal Blocking by Minimizing the Maximum Within-Block Distance

Optimal Blocking by Minimizing the Maximum Within-Block Distance Optimal Blocking by Minimizing the Maximum Within-Block Distance Michael J. Higgins Jasjeet Sekhon Princeton University University of California at Berkeley November 14, 2013 For the Kansas State University

More information

TGDR: An Introduction

TGDR: An Introduction TGDR: An Introduction Julian Wolfson Student Seminar March 28, 2007 1 Variable Selection 2 Penalization, Solution Paths and TGDR 3 Applying TGDR 4 Extensions 5 Final Thoughts Some motivating examples We

More information

Consistency Under Sampling of Exponential Random Graph Models

Consistency Under Sampling of Exponential Random Graph Models Consistency Under Sampling of Exponential Random Graph Models Cosma Shalizi and Alessandro Rinaldo Summary by: Elly Kaizar Remember ERGMs (Exponential Random Graph Models) Exponential family models Sufficient

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information