arxiv: v2 [stat.me] 28 Aug 2016

Size: px

Start display at page:

Download "arxiv: v2 [stat.me] 28 Aug 2016"

Damon Carr
5 years ago
Views:

1 arxiv: v [stat.me] 8 Aug 06 Central liit teores for network driven sapling Xiao Li Scool of Mateatical Sciences Peking University Karl Roe Departent of Statistics University of Wisconsin-Madison Abstract Respondent-Driven Sapling is a popular tecnique for sapling idden populations. Tis paper odels Respondent-Driven Sapling as a Markov process indexed by a tree. Our ain results sow tat te Volz-Heckatorn estiator is asyptotically noral below a critical tresold. Te key tecnical difficulties ste fro (i) te dependence between saples and (ii) te tree structure wic caracterizes te dependence. Te teores allow te growt rate of te tree to exceed one and suggest tat tis growt rate sould not be too large. To illustrate te usefulness of tese results beyond teir obvious use, an exaple sows tat in certain cases te saple average is preferable to inverse probability weigting. We provide a test statistic to distinguis between tese two cases. Introduction Classical sapling requires a sapling frae, a list of individuals in te target population wit a etod to contact eac individual (e.g. a pone nuber). For any populations, constructing a sapling frae is infeasible. Network driven sapling enables researcers to access populations of people, webpages, and proteins tat are oterwise difficult to reac. Tese tecniques go by any naes: web crawling, Respondent- Driven Sapling, breadt-first searc, snowball sapling, co-iunoprecipitation, and croatin iunoprecipitation. In eac application, te only way to reac te population of interest is by asking participants to refer friends. Respondent-Driven Sapling (RDS) serves as a otivating exaple for tis paper. Te Centers for Disease Control, te World Healt Organization, and te Joint United Nations Prograe on HIV/AIDS ave invested in RDS to reac arginalized and ard-to-reac populations [Heckatorn, 997, WHO, 03]. Eac individual i in te population as a corresponding feature y i (e.g. y i {0, } and y i = if i is HIV+). Using only te sapled individuals, we wis to ake inferences about te average value of y i across te entire population, denoted as µ (e.g. te proportion of te

2 population tat is HIV+). Extensive previous statistical researc as proposed various estiators of µ wic are approxiately unbiased based upon various types of odels for an RDS saple [Salganik and Heckatorn, 004, Volz and Heckatorn, 008, Gile, 0]. We note tat in te papers cited above (except [Gile, 0]), RDS is assued to saple wit replaceent. Previous researc as also explored te variance of tese estiators [Goel and Salganik, 009, Roe, 05]. Tis paper studies te asyptotic distribution of statistics related to tese estiators. Results on asyptotic distributions for RDS are useful for two obvious reasons. First, tey allow us to construct asyptotic confidence intervals for µ. Second, tey provide essential tools to test various statistical ypoteses. Te only central liit teore associated considered in te RDS literature studied te case wen te tree indexed process reduces to a Markov cain [Goel and Salganik, 009]; tis presues tat eac individual refers exactly one person. Previous researc suggests tat te nuber of referrals fro eac individual is fundaental in deterining te variance of coon estiators [Roe, 05]. Tis paper establises two central liit teores in settings wic allow for ultiple referrals. Te ain results apply to bot te saple average and te Volz-Heckatorn estiator, wic is an approxiation of te inverse probability weigted estiator (cf Reark ). Because te inverse probability weigted (IPW) estiator and its extensions are asyptotically unbiased, tese estiators are often preferred to te saple average. However, soeties survey weigts are not needed and tey only introduce additional variance to te estiator [Bollen et al., 06]. Tis issue is particularly salient wen sapling weigts are igly eterogeneous, as is often te case in RDS. Proposition 3 sows tat if te outcoes y i are uncorrelated wit te sapling weigts, ten te saple average is unbiased. Teore 3 extends tis result to RDS to sow tat te IPW estiator can ave a larger variance tan te saple average. Taken togeter, tese results iply tat te saple average can ave a lower ean squared error (MSE) tan te IPW estiator. Section 4 introduces an estiator of te bias of te saple average. Te ain results provide a pat to test te null ypotesis tat te bias is zero. Tis can be used to select between te saple average and te IPW estiator. Section 6. studies tis routine wit te AddHealt social network. Notation Following [Goel and Salganik, 009] and [Roe, 05], te results below odel te network sapling ecanis as a tree indexed Markov process on a grap. Tere are any assuptions in tis odel wic are incorrect in practice. However, like te i.i.d assuption, it allows for tractable calculations. In te siulations, we sow tat te teory derived fro tis odel provides a good approxiation for a ore realistic sapling odel. [Lu et al., 0] studies te sensitivities of te estiators to tis odel. Let G = (V, E) be a finite, undirected, and siple grap wit vertex set V = {,..., N} and edge set E. V contains te individuals in te population and E describes ow tey are related to one anoter. As discussed in te introduction, y : V R is a fixed real-valued function on te state space V ; tese are te node features tat are

3 easured on te sapled nodes. Te target of RDS is to estiate µ = N N y(i). If eac sapled node referred exactly one friend, ten te Markov sapling procedure would be a Markov cain. Several classical central liit teores exist for tis odel; see [Jones et al., 004] for a review. Te results erein allow for eac sapled node to refer ore tan one node. Tis is a Markov process indexed not by a cain, but rater by a tree. Denote te referral tree as T. Were te node set of G indexes te population, te node set of T indexes te saples. Tat is, we observe a subset of te individuals in G wit te saple {X τ } τ T V. An edge (σ, τ) in te referral tree denotes tat sapled individual X σ referred individual X τ into te saple. Mateatically, T is a rooted tree a connected grap wit n nodes, no cycles, and a vertex 0 wic indexes te seed node. To siplify notation, σ T is used synonyously wit σ belonging to te vertex set of T. For eac non-root node τ T, denote p(τ) T as te parent of τ (i.e. te node one step closer to te root). Tis paper presues tat {X τ } τ T is a tree-indexed rando walk on G, wic was a odel introduced by [Benjaini and Peres, 994]. Tis odel generalizes a Markov cain on G; eac transition X p(τ) X τ is an independent and identically distributed Markov transition wit transition atrix P. Following [Benjaini and Peres, 994], we will call tis process a (T, P )-walk on G. Unless stated oterwise, it will be presued trougout tat te root node of te rando walk X 0 is initialized fro te equilibriu distribution of P. It follows tat X σ as distribution π for all σ T. Unless stated oterwise, te results in tis paper allow for te transition atrix P to be constructed fro a weigted grap G. Let w ij be te weigt of te edge (i, j) E; if (i, j) E, define w ij = 0. If te grap is unweigted, ten let w ij = for all (i, j) E. Define te degree of node i as deg(i) = j w ij. If te grap is unweigted, ten deg(i) is te nuber of connections to node i. Trougout tis paper, te grap is undirected. So, w ij = w ji for all pairs i, j. Given tat {X p(τ) = i}, te probability of {X τ = j} is proportional to w ij ; P ( X τ = j X p(τ) = i ) = w ij deg(i). We use te ter siple rando walk for te Markov cain constructed on te unweigted grap (i.e. w i,j {0, } for all i, j). Te siple rando walk presues tat eac participant selects a friend uniforly and independently at rando fro teir list of friends. In order to estiate µ, we observe y(x τ ) for all τ T. Because G is undirected, P is reversible and as stationary distribution π wit π i deg(i) for all i G; tis fact is elpful for creating an asyptotically unbiased estiator for µ, particularly under te siple rando walk assuption [Volz and Heckatorn, 008]. Reark. In general, te quantity of interest µ = N N y(i) is not equal to E π (y). As suc, te saple average of y(x τ ) s is a biased estiator for µ. Wit inverse probability weigting, define a new function y (i) = y(i)(nπ i ) and te respective estiator ˆµ IP W = y (X σ ) = y(x σ ), n n Nπ Xσ σ T 3 σ T

4 were n = T is te saple size. Ten, E π (ˆµ IP W ) = E π (y ) = µ. As suc, te saple average of te y (X τ ) s is an unbiased estiator of µ. Unfortunately, te values π i are unknown. In practice, RDS participants are asked various questions to easure ow any friends tey ave in G. Under te siple rando walk assuption, π i is proportional to te nuber of friends of i; tis result also requires tat te edges in G are undirected, soeting tat will be presued trougout te paper. Terefore te Volz-Heckatorn estiator ˆµ V H = σ T y(x σ )/deg(x σ ) τ T /deg(x τ ) is in essence a Hájek estiator based upon deg(i) [Volz and Heckatorn, 008]. Under te siple rando walk assuption, tis estiator provides an asyptotically unbiased estiator of µ. For eac node τ T, let τ be te distance of te node fro te root; tis is also called te wave of τ. For every pair of node σ, τ T, define d(σ, τ) to be te distance between σ and τ on T (as a grap). For eac non-leaf node σ T, let η(σ) be te nuber of offspring of σ. A tree is said to be an -tree of eigt if η(σ) = for all σ T wit σ < and η(σ) = 0 for all σ =. Here, bot and are a natural nubers (i.e., N). T is said to be Galton-Watson if η(σ) are i.i.d rando variables in N. Wile te teores below only study -trees; te coputational experients in Section 6. suggest tat te conclusions of te analytical results are igly robust to replacing te -tree wit a Galton-Watson tree. [Levin et al., 009] serves as tis paper s key reference for Markov processes. Following te notation in tat text, define E π (y) = N π iy(i) and var π (y) = E π (y E π (y)) for te function y. Tere are two priary concerns about te odel and estiator used in te ain results below. First, te Markov odel allows for resapling. Second, te results below only apply to -trees, not ore general trees. Te siulations in Section 6. suggest tat te analytic results continue to old under a ore realistic setting tat addresses bot of tese concerns. 3 Main Results Te tresold < λ was previously identified in [Roe, 05] as being a critical tresold for te design effect of network driven sapling; beyond tis tresold, te variance of te standard estiator does not decay at te standard rate. In oter words, var( σ T : σ σ T: σ y(x σ )) as. As suc, using te traditional scaling, no central liit teore olds above te critical tresold. Because of tis, te teores focus on te case < λ. Wen > λ, te siulations in Section 6. suggest tat te central liit teore does not old for any scaling. 4

5 Teore is a central liit teore for an estiator constructed fro te treeindexed Markov cain. Te teore olds for any function y, any reversible transition atrix wit second largest eigenvalue satisfying λ, and any < λ. Teore. Suppose tat P is a reversible transition atrix wit respect to te equilibriu distribution π, and tat te eigenvalues of P are = λ > λ... λ N. Witout loss of generality, suppose tat E π (y) = 0. Define Y i = i If T is an -tree wit < λ, ten τ T: τ =i y(x τ ). Y i N(0, σ0) in distribution, were σ 0 = var π (( P I) y) var π (P ( P I) y). Te sequence of rando variables considered in Teore are not exactly saple averages, but a reweigted for of saple average. Saples in te sae wave are equally weigted, wile saples fro different waves are not. Te following teore provides teoretical guarantee on te distribution of saple average for a specific class of transition atrix and node feature. Teore. Let T be a tree. Witout loss of generality, suppose tat E π (y) = 0. Define ˆµ = σ T, σ y(x σ). Suppose tat (c) E(ˆµ k+ ) = 0 for all, k N; (c) for any function f on V satisfying E π f = 0, P f λ f ; (c3) λ < ; ten ˆµ N(0, σ0) in distribution for soe σ 0. Reark. Condition (c) is a tecnical condition on te syetry of ˆµ tat is necessary in te proof. Te following proposition provides a sufficient condition for (c). Proposition. Suppose tat y is syetric, i.e. for any i V tere exists j suc tat y(j) = y(i). If p(u, v) = P (y(x σ ) = v y(x p(σ) ) = u) is well-defined and p(u, v) = p( u, v) for all u, v y(v ), ten condition (c) is satisfied. Proof. Under te conditions of te proposition, te distribution of ˆµ is syetric wit respect to 0. Tus E(ˆµ k+ ) = 0 for all, k N. Conditions (c)-(c3) can be substituted by te following condition (c ): (c ) Tere exists c < suc tat for any function f on V satisfying E π f = 0, P f c f. 5

6 Condition (c ) is weaker tan (c) and (c3) cobined, but is stronger tan (c3) alone. To see tis, let f be te eigenfunction of te second eigenvalue, and it follows tat λ <. It can be easily seen tat one necessary condition for (c ) is tat j P ij π j < for all i V. In oter words, all te rows of P ust be close to π. As previously discussed, condition (c3) is actually a necessary condition for te central liit teore [Roe, 05], in te sense tat te variance of ˆµ tends to infinity if λ. For clarity in te exposition of te teore and te proof, we ave only proved te teore for te -tree. We believe tat siilar results are likely to old for ore general -trees. 3. Extension to te Volz-Heckatorn estiator Wen P is restricted to be te transition atrix of te siple rando walk on G, te following corollary sows tat Teore can be extended to te Volz-Heckatorn estiator [Volz and Heckatorn, 008]. Denote d = N i V deg(i) as te average node degree. Following Reark, te IPW estiator contains /(Nπ i ) wic is equal to d/deg(i). Te Volz-Heckatorn estiator first estiates d wit te aronic ean of te observed degrees. Because tis aronic ean converges to d in probability, te following corollary applies Slutsky s Teore to give a central liit teore for te Volz-Heckatorn estiator. Corollary. Let T be a -tree. Suppose in particular tat P is te transition atrix of te siple rando walk on G. Define a new node feature y (i) = y(i)/deg(i). Witout loss of generality, suppose tat E π y = 0 (tis is not equivalent to E π y = 0). Define ˆµ,V H = ˆµ ˆ d = σ T, σ y (X σ ) ˆ d, were ˆ d = + σ T, σ /deg(x τ ). If te new node feature y and te transition atrix P satisfy conditions (c)-(c3) in Teore, ten ˆµ,V H N(0, σ 0,V H) in distribution for soe σ 0,V H. 3. Illustrating te conditions wit a blockodel Consider G as coing fro a blockodel wit two blocks [Lorrain and Wite, 97]. In tis blockodel, eac node i is given a label z(i) {, } and every edge weigt w i,j = B z(i),z(j) for soe syetric atrix B. Suppose tat y i = y j if z(i) = 6

7 z(j). Tis odel was previously studied in [Goel and Salganik, 009] and it serves as an approxiation to te Stocastic Blockodel. Given te structural equivalence of nodes witin te sae block, it is sufficient to study te transition atrix between blocks, P R. If P is a syetric atrix wit entries p = p = p and p = p = p for soe value p, ten it can be easily verified tat condition (c) is satisfied. Moreover, if p <, ten conditions (c) and (c3) are also satisfied. Our teore asserts tat te Volz-Heckatorn estiator converges to te noral distribution in tis odel. More generally, suppose tat te nodes in te blockodel for G are equally balanced between K blocks wit node features {y, y,..., y K, y K } and tat te transition atrix p(u, v) = pi(u = v) + p K conditions are satisfied as long as K < p < K +. I(u v). We can verify tat all te 4 Coparing te variance of inverse probability weigting to te bias of te saple average An estiator wit a sall ean square error (i.e. E(ˆµ µ) ) as a sall bias and a sall variance. It is generally known tat inverse probability weigting provides an unbiased estiator of µ. However, survey weigts can also drastically inflate te variance of te estiator. Tis atter as been eavily studied by survey statisticians and substantial literature ave devoted to te etodologies and issues regarding te use of sapling weigts; see [Pfefferann, 996], [Bieer and Crist, 007], [Valliant et al., 03], and [Bollen et al., 06] for a review. To deterine weter one sould use sapling weigts in RDS, tis section gives a test statistic for te null ypotesis tat te saple average is unbiased. Te results in te previous section suggest a confidence region for tis test statistic. Denote n = T as te nuber of saples. Te next results copare ˆµ IP W to te saple average ˆµ = y(x τ ). n τ T Proposition and Teore 3 igligt te dangers of inverse probability weigting by sowing tat it can increase te variance. Proposition studies te siplified case were te saples are i.i.d fro te stationary distribution. Following te proposition, Teore 3 studies te ore relevant setting of te (T, P )-walk on G. To siplify te stateents of Proposition and Teore 3 and teir proofs, te node features y(i) are presued to be rando variables. Tis condition could be reoved wit furter tecnical conditions on te oents of y : V R and its relationsip to π. Proposition. Suppose tat X,..., X n are sapled independently fro te stationary distribution π and tat y(),..., y(n) are N uncorrelated and identically distributed rando variables wit finite second oent µ. Let C = ax i N Nπ i and var(π) = N N π i N, ten var(ˆµ IP W ) var(ˆµ) µ N( N nc 7 )var(π). ()

8 Tus, as long as N > C n, wic can be easily satisfied in practice, var(ˆµ IP W ) > var(ˆµ). Teore 3. Suppose tat {X τ : τ T} is a saple fro te (T, P )-walk on G, and tat y(),..., y(n) are N uncorrelated and identically distributed rando variables wit finite second oent µ. Assue tat tere exist constants C, C and C 3 (not te sae constants as in Proposition ) suc tat C N d i C N for all i and N var(π) > C 3. Ten var(ˆµ IP W ) var(ˆµ) µ ( N C nc var(π) C NC ), () and tere exists C independent of n suc tat var(ˆµ IP W ) > var(ˆµ) as long as N > Cn. Proposition sows tat as var(π) increases, te difference between te variance of ˆµ IP W and ˆµ becoes larger. Siilarly, Teore 3 also sows tat te difference between te variance of ˆµ IP W and ˆµ increases as var(π) increases, given tat te relative upper and lower bound of d (i.e. C and C ) reain fixed. Recall tat wen P is te siple rando walk, te probabilities π i are proportional to node degree. An extensive literature (e.g. [Strogatz, 00, Clauset et al., 009]) as found tat epirical networks ave igly eterogeneous node degrees. As suc, Equation sows tat te variance of ˆµ IP W can be draatically greater tan te variance of ˆµ. Moreover, bot Proposition and Teore 3 suggest tat te variance difference var(ˆµ IP W ) var(ˆµ) can be considerable if we saple only a sall proportion of te wole population. Tis proble is particularly salient wen te population is large. Te bias-variance tradeoff presents a dilea between inverse probability weigting and te saple average. Tis bias can be estiated. For every i V, define a new node feature, y (i) = y(i)( d ) = y(i)( ). (3) Nπ i deg(i) Proposition 3. Te ean of te new node feature satisfies E π (y ) = E π (y) µ, wic is te true bias of te saple average. Terefore, under te null ypotesis, N H 0 : E π (y ) = 0, te saple average is an unbiased estiator. If π and y are uncorrelated (i.e. i V π iy(i) = i V y(i)), ten H 0 is satisfied and te saple average is unbiased. Under te conditions of Proposition, π and y satisfy te condition of Proposition 3 in expectation (i.e. tey are uncorrelated in expectation). Bot of tese conditions iply tat te outcoe is unrelated to te sapling weigt (in soe way). Under suc conditions, bot estiators are unbiased. If it is also true tat var(π) is large, ten ˆµ as a saller variance. In tis scenario, ˆµ is preferable to ˆµ IP W in ters of 8

9 MSE. However, if te saple average is biased, ten we ust copare te bias and variance of te two estiators. Asyptotically, te variance of bot estiators will vanis, wile te bias stays constant. So, for sufficiently large saple size, one sould use ˆµ IP W. For saller saple sizes, te bias of te saple average could be sall (relative to te difference in te variances). In suc settings, tere will be a crossover point, a saple size at wic ˆµ IP W becoes preferable to ˆµ. To distinguis between tese two cases, we want to test te null ypotesis tat te saple average is unbiased, i.e. E π (y ) = 0. Or, ore generally, we want to provide a confidence region for te bias of te saple average. If d is unknown, as is generally te case, we can estiate d by te aronic ean [Salganik and Heckatorn, 004] n ˆ d = τ T /deg(x τ ). Substituting d for ˆ d in Equation 3 yields te new node feature based on te Volz- Heckatorn estiator [Volz and Heckatorn, 008] σ T y V H(i) = y(i)( σ T ˆ d deg(i) ). Siilarly, define ( ) bias = y V n H(X σ ) = ˆ d y(x σ ) = ˆµ ˆµ V H, (4) n deg(x τ ) ten bias = ˆµ ˆµ V H is an asyptotically unbiased estiator for te bias of ˆµ. It serves as a test statistic for te null ypotesis H 0 : E π (y ) = 0. Te teores above suggest tat bias converges to te noral distribution. Te rejection region is ten were ˆσ 0 is an estiate of te variance. W = { bias >.96 ˆσ 0 n }. 5 Estiating te variance For soe node feature ỹ (e.g. HIV status y or te y in Equation (3) tat otivate bias), let µ denote te saple average. Denote σ µ as V ar T,P ( µ), were te subscript T, P denotes tat te data is collected via a (T, P )-walk on G. Tis subsection studies a siple plug-in estiator for σ µ. Te following function is essential to expressing σ [Roe, 05]. Definition. Select two nodes I, J uniforly at rando fro te tree T. Define te rando variable D = d(i, J) to be te grap distance in T between I and J. Define G as te probability generating function for D, G(z) = E(z D ). 9

10 In practice, T is observed. So, te function G can be coputed. In any studies tere are ultiple seed nodes. In tese cases, we suggest coputing d(i, J) on a tree wic as an artificial root node tat connects to all of te seeds; tis root node could be iagined as an individual tat is responsible for finding te seed nodes. In tis tree, two different seed nodes would be distance apart. Denote te autocorrelation at lag of ỹ(x τ ) as R = Cov(ỹ(X p(τ)), ỹ(x τ )). var π (ỹ) Bot Cov(ỹ(X p(τ) ), ỹ(x τ )) and var π can be estiated wit plug-in quantities. Because te data as been sapled proportional to π, te plug-in quantity for var π sould not explicitly adjust for π. Siilarly, var π (ỹ) = n Cov(ỹ(X p(τ) ), ỹ(x τ )) = n (ỹ(x τ ) ˆµ V H ). τ T (ỹ(x p(τ) ) ˆµ V H )(ỹ(x τ ) ˆµ V H ), τ T\0 were {T \ 0} contains all nodes except te root node 0 (because p(0) does not exist). Using tese plug-in quantities, define ˆR. Ten, te estiator is ˆσ µ = G( ˆR) var π (ỹ). A popular bootstrap tecnique for estiating ˆσ µ resaples y(x τ ) as a Markov process (i.e. in addition to X τ being a Markov process, te bootstrap procedure also assues tat y(x τ ) is Markov) [Salganik, 006]. Tis odel is akin to te blockodel wit two blocks in Section 3.. Te following assuption is weaker tan tis assuption. Assuption : ỹ(i) = µ + σf(i), were µ, σ R and f : V R is an eigenfunction of P wit var π (f) =. Proposition 4. Under Assuption, σ µ = G(R)var π (ỹ). Wile Assuption is weaker tan te previous assuption in [Salganik, 006], te next proposition igligts te danger of tis assuption. It uses a different assuption wic is a rater weak assuption. Assuption : G is convex on [λ in, ], were λ in is te sallest eigenvalue of P. Because G is a probability generating function, it is always convex on [0, ]. As suc, we on need to be worried about negative values. Recall, tat te central liit teores above only old wen λ in < /.7 (te sallest possible value for λ in is ). Soe siulated trees given in te appendix suggest tat if G is not 0

11 convex, it often fails in te neigborood of. As suc, te assuption tat λ in < /.7 is likely to iply Assuption. In practice, one observes te referral tree T. Tus, one can copute te second derivative of G. Eigenvalues of P close to negative one arise in antitetic sapling, were adjacent saples are dissiilar. For exaple, if te population in G was eterosexuals and edges in G represent sexual contact, ten en would only refer woen and vice versa. In tis case, λ in would be exactly. Wile easily iagined, suc settings are not current practice for RDS. As suc, large an negative values are uncoon; λ in is likely close to zero. Te following proposition follows fro an application of Jensen s inequality. Proposition 5. Under Assuption, σ µ G(R)var π (ỹ). Because Assuption is not very restrictive, te inequality in Proposition 5 igligts te danger in breaking Assuption (and tus te Markov odel in [Salganik, 006]); breaking Assuption will lead to ˆσ µ underestiating te variance. 5. A bias adjusted estiator for µ Te test above also allows us to derive a ore robust estiator of µ. Define bias = n n y (X i ). Ten bias = ˆµ ˆµ IP W. Using te ypotesis test to coose between te saple average and inverse probability weigting is akin to ard tresolding te bias adjustent. Define bias t = { bias if reject H0 0 oterwise. Te final estiator of µ is ten ˆµ BA estiator is explored in Section 6.. = ˆµ bias t (BA for bias adjustent). Tis 6 Nuerical results 6. Siulation In tis section we illustrate te teoretical results on siulated data. Te siulations are perfored on networks siulated fro te Stocastic Blockodel [Holland et al., 983]. Te four colors in Figure correspond to four different networks tat are siulated fro four different Stocastic Blockodels. Eac of te four networks as N =5,000 nodes, equally balanced between group zero and group one. Te probability of a connection between two nodes in different blocks is r and te probability of connection between two nodes in te sae block is p. To control te eigenvalues of te

12 transition atrix, consider te transition atrix between classes given by P = E(D) E(A). Te second eigenvalue of P is [Roe et al., 0] λ (P) = p r p + r were expectations are under te Stocastic Blockodel. In our siulation, te second eigenvalue of te actual transition atrix is typically very close to λ (P). We take p + r = 0.0 in all four Stocastic Blockodels so tat te average degree is about 5. As suc, λ (P) is actually controlled by p r. For eac of te four networks we carry out four different sapling designs. Let T be eiter a tree or a Galton-Watson tree wit E(η(σ)) =. For te Galton- Watson tree, te distribution of η(σ) is unifor on {,, 3}. For eac T, we consider bot wit replaceent sapling (i.e. te (T, P )-walk on G) and witout replaceent sapling (i.e. referrals are sapled uniforly fro te friends tat ave not yet been sapled). Note tat te conditions of Teore ay be violated wen eiter te Galton-Watson tree or witout-replaceent sapling is used. We take te first 8 waves of T as our saple. As suc, te saple size is rougly N/0. For eac social network and sapling design, we repeat te sapling process 000 ties and copute ˆµ = n n y(x i) for eac saple. Te Quantile-Quantile (Q-Q) plot of ˆµ is sown in te left panel of Figure ; note tat te QQ plot centers and scales eac distribution to ave ean zero and standard deviation one. In addition, we repeat te above siulation for te Volz-Heckatorn estiator, and te QQ plot of ˆµ V H is sown in te rigt panel of Figure. It is clear fro Figure tat tere are two patterns of distribution: wen λ < / 0.7, i.e. λ =0.5 or 0.6, te Q-Q plots of ˆµ and ˆµ V H approxiately lie on te line y = x for all sapling design; wen λ > / 0.7, i.e. λ =0.8 or 0.9, te Q-Q plot of ˆµ and ˆµ V H departs fro te line y = x. Taken togeter, Figure suggests tat te distribution of ˆµ and ˆµ V H converges to Gaussian distribution if and only if < λ. Actually, te rigt panel of Figure iplies tat tere are two odes in te asyptotic distribution of ˆµ and ˆµ V H wen > λ. Te relationsip between te expectation of te offspring distribution and te second eigenvalue of te social network deterines te asyptotic distribution of RDS estiators, regardless of te node feature, te particular structure of te tree or te way we andle replaceent. 6. Analysis of Adolescent Healt data To illustrate Teore wit te test statistic in Section 3, tis section presents siulation results tat use te Co5 friendsip network fro te National Longitudinal Survey of Adolescent Healt. Tis siulation copares te MSE of ˆµ and ˆµ IP W for two different node features y. Wen y is correlated wit π, ten ˆµ IP W as a saller MSE. Wen y is weakly correlated wit π, ten ˆµ as a saller MSE. For settings in wic ˆµ IP W clearly outperfors ˆµ, te test statistic fro Section 3 rejects te null ypotesis tat te saple average is unbiased (i.e. H 0 : E π ˆµ = µ). In te Co5 network, N = 089 students fro two sister scools were asked to list up to 0 friends; tese friends can be inside or outside of te scool. Te students also supplied inforation including teir gender, grade and race. Te analysis below

13 Figure : Q-Q plots of te saple average (left panel) and te Volz-Heckatorn estiator (rigt panel) for different social network and sapling designs. For eac scenario we draw 000 network driven saples of size 500 fro a network containing 5,000 nodes. Here te tresold for λ is / For te two settings wit λ < /, te distributions appear noral. However, for te two settings wit λ > /, te distributions do not appear noral. Across all values of λ, tere is no apparent difference between te four different designs (i.e. replaceent sapling vs witout replaceent sapling and -tree vs Galton-Watson tree). studies two node covariates: gender (0/) and total noinations of friends (integer between 0 and 0). Before te siulation, te network was syetrized (i.e. consider te new adjacency atrix Ã = A + AT ), yielding a network wit average node degree d = Because te students were only allowed to list up to 0 friends, te standard deviation of te degrees is 4.7. Tis is drastically saller tan typical social networks. However, even in tis setting, te variance of π is sufficiently large to illustrate te advantages of te saple average. For bot gender and te nuber of noinations, Table displays (i) te correlation between π and y, (ii) te bias of te saple average, and (iii) te crossover point. Recall tat te crossover point is te saple size at wic te ˆµ IP W as a saller MSE tan ˆµ; tis calculation is based upon te siulations described below. Te table sows tat gender is weakly correlated wit π. As suc, te saple average as a sall bias and te crossover point is large. Contrast tis wit te nuber of total noinations, wic is igly correlated wit π. Tis akes te saple average clearly biased. Because of tis, it as a sall crossover point. Tese two exaples illustrate a range of possibilities in ters of cor(π, y). Before estiating te crossover points sown in Table, we first study te ypotesis test H 0 : E ˆµ = µ for bot gender and total noinations. To provide a bencark, tis siulation copares RDS to independent sapling. Let P be te transition atrix of te siple rando walk on te network. Te second largest eigenvalue of P is λ = Let T be a saple fro te Galton-Watson process wit Eη(σ) =. < λ. For a node covariate y (gender or noinations), let y be te 3

14 cor(π, y) Bias of saple average Crossover point (saple size) Gender > 500 Total noinations Table : Gender as a weak correlation wit te saple weigts. As suc, ˆµ as a sall bias. Te crossover point sows tat if te saples were drawn independently fro te distribution π, ten ˆµ as a saller MSE wen n < 500. Total noinations as a larger correlation and tus a larger bias. Wen te saple size exceeds twenty, ˆµ IP W outperfors te saple average. node feature defined in Eq.4 in Section 3. Recall tat E π (y ) is te bias of te saple average. We consider te following etods of generating saples Y,..., Y n and coputing or estiating te variance σ.. Y,..., Y n is an independent and identically distributed saple fro te noral distribution N(E π (y ), var π (y )) and ˆσ = var π (y ). Here te variance is known.. Y i = y (X i ) for all i, were X,..., X n is an independent and identically distributed saple fro te equilibriu distribution of P, and ˆσ = var π (y ). Here te variance is known. 3. Y i = y (X i ) for all i, were X,..., X n is a saple fro te (T, P )-walk on G, and ˆσ is te true variance tat is only coputable in a siulation, var( n n Y i). 4. Y i = y (X i ) for all i, were X,..., X n is a saple fro te (T, P )-walk on G, and ˆσ = var π (y )G( ˆR), te estiator discussed in Section 5. For a saple of size n, let te rejection region be W = { nˆσ n Y i >.96}, wit Y i and ˆσ to be defined above. Here, te null ypotesis is rejected at α = For eac scenario, Figure plots te power of our test pr(w ) as a function of saple size. Te power under scenario () can be calculated exactly and serves as a bencark (black line). Te power under scenario ()-(4) is calculated by taking 000 independent saples and counting te nuber of saples tat fall in W (red, blue, yellow lines respectively). Because scenario (4) underestiates te true variance, tis tecnique is conservative in rejecting H 0 and adopting ˆµ IP W. For gender, none of te scenarios quickly reject te null ypotesis. Copare tis to te nuber of total noinations. Here, H 0 is rejected even for sall saple sizes. Te final figure plots te ean square error of ˆµ, ˆµ IP W, and ˆµ BA ; tis last estiator is te bias adjusted estiator fro Section 4. Tis siulation uses scenario (4), te ost realistic of te previous scenarios. After drawing te saple, copute te following (for bot gender and total noinations) 4

15 Power Power Saple size n Saple size n (a) Gender (b) Total noinations Figure : Power of test as a function of saple size for two node features (a) gender and (b) total noinations of friends. Te black, red, blue and yellow lines are te power under scenarios -4 respectively.. Te inverse probability weigted estiator ˆµ IP W = n n y(x i ) Nπ Xi. Te saple average ˆµ = n n y(x i). = n n y(x i ) d deg(x i ). 3. Te bias adjusted estiator { ˆµIP ˆµ BA = W if {X i } i n W, ˆµ if {X i } i n / W, introduced in Section 3. Figure 3 sows tat for gender, te true bias of te saple average is sall. As suc, te MSE of ˆµ (solid) is always saller tan tat of ˆµ IP W (dotted) (for saple sizes n < 500). For total noinations, te bias is uc larger. So, wen n > 0, te MSE of ˆµ is larger tan te MSE of ˆµ IP W. Te MSE of te bias adjusted estiator ˆµ BA (longdas) lies between tat of ˆµ and ˆµ IP W. In particular, wen ˆµ IP W drastically outperfors ˆµ (i.e. on te rigt of panel b), te null ypotesis is typically rejected and ˆµ BA perfors siilarly to ˆµ IP W. 7 Discussion A recent review of te RDS literature counted over 460 studies wic used RDS [Wite et al., 05]. Many of tese studies seek to estiate te prevalence of HIV 5

16 Mean square error Mean square error Saple size n Saple size n (a) Gender (b) Total noinations Figure 3: Te ean square error of ˆµ (solid), ˆµ IP W (dotted), and ˆµ BA (longdas) for gender and total noinations. or oter infectious diseases; for tese studies, a point estiate of te prevalence is insufficient. Tese studies ave used confidence intervals constructed fro bootstrap procedures and fro estiates of te standard error [Handcock et al., 06]. Tese standard error intervals iplicitly rely on a central liit teore and tis paper provides a partial justification for suc tecniques, so long as /λ. Tis paper akes a first step at studying te distributional properties of two siple estiators in tis regie. Figure suggests tat if is larger tan /λ, ten te siple estiators (ˆµ and ˆµ V H ) are no longer norally distributed. Interestingly, under te siulation setting were te estiators are no longer norally distributed, te Q-Q plots are flatter tan te line x = y. Tis indicates tat a confidence interval constructed fro te standard errors would be conservative; a noinally 90% confidence interval would cover µ ore tan 90% of te tie. As suc, a properly constructed interval sould be narrower tan te interval constructed fro te standard error. If one pursues tis pat, ten care ust be taken in estiating te standard errors. For exaple, a bootstrap procedure proposed in [Salganik, 006] as becoe very popular. However, for reasons beyond te inequality in Proposition 5, tis bootstrap procedure drastically underestiates te actual standard errors [Goel and Salganik, 00, Roe, 05]. Tere are any reasons to suspect te construction of te sapling weigts in RDS studies. At te ost basic level, te justification for {selection probability for node i} deg(i) coes fro a Markov odel wic as several probleatic pieces (e.g. replaceent sapling, unifor selection of friends, referral process as reaced equilibriu, and all network relationsips are reciprocated). Wile tese assuptions are all troublesoe, tey are erely sufficient conditions. It is conceivable tat deg(i) is still an 6

17 adequate approxiation of te selection probabilities (up to scaling) even wen te assuptions do not old. Peraps te ost difficult proble is tat deg(x i ) is estiated via self-report. Taken togeter, tere are any reasons to doubt te sapling weigts. In a related context, te tird section of te paper discusses te bias-variance tradeoff between ˆµ and ˆµ IP W. Te results in Proposition and Teore 3 presue tat te sapling weigts are known exactly. However, given tat te presued odel as several deficiencies, te stationary distribution of te (presued) Markov process does not necessarily reveal te actual sapling probabilities. As suc, ˆµ IP W (and by extension ˆµ V H ) are constructed wit incorrect and noisy easureents of te sapling probabilities. Tis will likely ake te estiator biased and ore variable. Because of tis, in practice we sould be less inclined towards te weigted estiator (i.e. ˆµ V H ) tan te proposed estiator ˆµ BA suggests. Section 4 and te data analysis wit te AddHealt network suggest tat te saple average is peraps less biased tan was previously considered. Wile tere are certainly situations were bias corrected estiators sould be used, it also sees sensible to first estiate te bias; wen te bias is large, tis is a relatively easy task. Te teores in tis paper do not apply to general trees, only to -trees. If T is a Galton-Watson tree wit E(η(σ)) < λ, ten te siulations support te following conjecture: (y(x σ ) E π (y)) N(0, σ ), n σ T were σ could be coputed fro te results in [Roe, 05]. To prove tis result requires a ore careful study of te structure of {X σ } σ T. We leave tis proble to future investigation. Acknowledgeent Tis aterial is based upon work supported in part by te U. S. Ary Researc Office under grant nuber W9NF5043 and te National Science Foundation under grant nuber DMS We tank Zoe Russek for elpful coents. A Proof of Teore In te appendix we give a proof of te teores and propositions in te paper. First we give an outline of te proof of our ain teore. Consider te artingale E(Y F ) Y, were {F } is a filtration to be defined later. Using te Markov property and te estiation of var(y ), we sow tat te artingale difference sequence satisfies te condition of te artingale central liit teore. In tis section, P will be a reversible transition atrix wit eigenvalues = λ λ... λ N and corresponding eigenfunctions f,..., f N satisfying k f i(k)f j (k)π k = δ ij for any i, j. We refer 7

18 to [Levin et al., 009] for te existence of suc eigen-decoposition. Unless stated oterwise, expectations are calculated wit respect to te tree indexed rando walk on te grap. We begin wit soe lea. Lea. (Lea. in [Levin et al., 009]) Let P be a reversible Markov transition atrix on te nodes in G wit respect to te stationary distribution π. Te eigenvectors of P, denoted as f,..., f N, are real valued functions of te nodes i G and ortonoral wit respect to te inner product f a, f b π = i G f a (i)f b (i)π i. (5) If λ is an eigenvalue of P, ten λ. Te eigenfunction f corresponding to te eigenvalue is taken to be taken to be te constant vector. If X(0),..., X(t) represent t steps of a Markov cain wit transition atrix P, ten te probability of a transition fro i G to j G in t steps can be written as P (X(t) = j X(0) = i) = P t ij = π j + π j Lea. For any nodes σ, τ in T, cov(y(x σ ), y(x τ )) = were < y, f l > π = N y(i)f l(i)π i. N l= N l= λ d(σ,τ) l < y, f l > π, λ t lf l (i)f l (j). (6) Proof. Fro te reversibility of te Markov cain and Lea, we ave Terefore, P (X σ = j X τ = i) = P d(σ,τ) ij = π j + π j N l= λ d(σ,τ) l f l (i)f l (j). cov(y(x σ ), y(x τ )) = i,j y(i)y(j)π i P (X σ = j X τ = i) ( i π i y i ) = i,j = N l= y(i)y(j)π i π j N l= λ d(σ,τ) l < y, f l > π, and te lea is proved. Te next lea gives te expression of var(y ). λ d(σ,τ) l f l (i)f l (j) 8

19 Lea 3 (Variance of Y ). Suppose tat λ > 0. Ten as, O() if < λ var(y ) = O() if = λ. O((λ ) ) if > λ Proof. For k = 0,,...,, denote by s k te nuber of ordered pairs (σ, τ) suc tat σ = τ = and d(σ, τ) = k. Ten s 0 =, and for k. By Lea, var(y ) = Tus s k = k ( )( k ) = +k +k N (s k λ k l < y, f l > π) = k=0 l= var(y ) = N (< y, f l > π ( + l= N (< y, f l > π l= O() if < λ O() if = λ O((λ ) ) if > λ k=0 (λ ) k ). k=0. k ( )λ k l )) Corollary. For any function y on te state space, Y 0 in probability. Proof. It follows fro Lea 3 tat var( Y ) = O(λ ) 0. Te next lea is a convergence arguent wic we will use in te proof of Teore. Lea 4 (Slutsky s lea). If X X in distribution and Y 0 in probability, ten X + Y X in distribution. Te following teore fro [Durrett, 00] is essential to te proof of our ain teore. Teore 4 (Martingale central liit teore). Suppose tat {Z } is adapted to te filtration {F } and tat E(Z + F ) = 0 for all. Let S = Z i and V = E(Z i F i ). If () V / σ > 0 in probability and () E(Z i { Z i >ɛ } ) 0 for every ɛ > 0, ten S / N(0, σ ) in distribution. 9

20 Now we are ready to prove our ain teores. Proof of Teore. Define Y in te sae way as Teore. Witout loss of generality, suppose tat E π (y) = 0. Since < λ, P I is invertible. Let y = ( P I) y. Ten y is also a function on te state space. We will first argue on te new node feature y and ten convert back to y. Define Y = τ T: τ = y (X τ ). Let z k = σ: σ = {X σ=k} for 0 and k =,..., N. Define z = {z,..., z N }, and F = σ(x τ : τ ) for. It is obvious tat {Y } is adapted to te filtration {F }. Let Z = E(Y F ) Y. Ten {Z, F } is a artingale difference sequence. We will verify tat {Z, F } satisfies () and () in Teore 4. We ave Z = Z = z P y / Y = zt ( P )y zt y. For any σ T, denote by p(σ) te parent node of σ. Z can also be expressed as σ: σ = E( y (X σ ) F ) y (X σ ) = σ: σ = p(σ) P y y (X σ ) were p(σ) is te N vector wit p(σ),i = if X p(σ) = i and 0 oterwise. We ave E(W σ F ) = 0 and E(W σ F ) = V ar p i (y ) = σ: σ = for i = X p(σ), were p i is te it row of te transition atrix P and V ar pi (y ) = j p ij(y(j) j p ijy(j)). Fro te definition of tree indexed Markov process, if σ = τ =, ten W σ, W τ are independent given {X σ : σ = }. Using E(W σ F ) = 0, we ave E(Z F ) = σ: σ = E(W σ F ) = N z i V ar p i (y ). W σ, 0

21 Fro Corollary, var( z i ) = O(λ ) 0 for every i. Tus var(e(z F )) = O(λ ) 0( ) and var(e(z i F i )) converges. It follows fro te definition of V and te Caucy-Scwarz inequality tat li var(v /) = li var( E(Zi F i )) li var(e(zi F i )) = 0. Terefore in probability, were V / σ N N N N σ = E(E(Z F )) = π i V ar pi (y ) = π i ( p ij y (j) ( p ij y (j)) ) and condition () in Teore 4 is satisfied. Siilarly, we ave E(Z F 4 ) = E(Wσ F 4 ) + σ: σ = C 0 + C ( ) C, j= = var π (y ) var π (P y ), σ,τ: σ = τ = j= E(W σw τ F ) were C 0, C, C are constants. Tus E(Z 4 ) C for any, and E(Xi { Xi >ɛ } ) E(Xi Xi ɛ ) = ɛ Condition () is also satisfied. Fro Teore 4, we ave Z i = ( zt i ( P )y i E(Xi 4 ) C 0. zt i y i ) N(0, σ ) in distribution. If < λ, ten fro Lea 3, zt ( P )y 0 in probability. Fro Lea 4 and te definition of y, z T i ( P I)y i = Y i N(0, σ ) in distribution, were σ = var π (y ) var π (P y ) = var π (( P I) y) var π (P ( P I) y). Te proof is now coplete.

22 B Proof of Teore and Corollary We provide a proof of te central liit teore using te oent etod. It involves a careful study of all te oents of ˆµ. Te following proposition is essential to our proof. Proposition 6. (Moent continuity teore) Let X be a sequence of uniforly subgaussian real rando variables, and let X be anoter subgaussian rando variable. Ten te following stateents are equivalent: ()EX k EXk for all k ()X X in distribution Following tis proposition, we can break down our proof to two parts. We will first prove tat all te oents of ˆµ converge to te oents of soe noral distribution. Ten we will verify tat ˆµ is a uniforly subgaussian sequence. B. Proof of oents convergence Let X r be te root of te -tree. Define and γ k, (i) = E[ˆµ k X r = i], γ k, = E[ˆµ k ]. Let ρ = λ <. We will prove tat tere exist γ k, k suc tat γ k, (i) γ k = O(ρ ) for all k, i, and tat γ k = E(ξ k ) for ξ N(0, γ ). Our key observation is tat te left and rigt subtree can be seen as i.i.d copies of te wole tree given te left and rigt cild of te seed, wic akes it possible to build a relationsip between γ k, (i) and γ k, (i). Only condition (c3) is needed trougout te proof. We need te following Lea. Lea 5. Let {a } be a sequence satisfying a = c (ca + C + d ), were c = O(ρ ), d = O(ρ ), C is a constant and c < ρ <. Ten a C/( c) = O(ρ ). Proof. Witout loss of generality, suppose tat c 0 and C = 0. Since c = O(ρ ) and d = O(ρ ), tere exists M suc tat k= c k M and d Mρ for all. Terefore, and te lea is proved. a = c k d k k=0 i= k c i M ρ c ρ,

23 We use an induction on k. First, we will prove tat γ = 0. In fact, fro Lea, Terefore E(y(X σ ) X r = i) = E(ˆµ X r = i) = N j= y(j)p σ ij = O( λ σ ). k O( λ k ) = O(ρ ) k= for all i, and γ, (i) γ = O(ρ ) for γ = 0. Now we ove fro k to k. Witout loss of generality, suppose tat γ, (i) > for all, i (or we can ultiply y wit a large constant). It follows tat γ k, (i) (γ, (i)) k > for all k. We can decopose γ k, (i) into γ k, (i) = E[ˆµ k X r = i] E[( + E[( := I + I σ T,0< σ σ T,0< σ X σ ) k X r = i] If k is even, ten fro Holder s inequality we know tat X σ ) k X r = i] (7) I = E[ˆµ k X r = i] E[(ˆµ i(s) )k X r = i] = k = k = ( ) k ( i ) E[ˆµ k X r = i] ( ) k ( i ) E[ˆµ k X r = i] [( + M ) k ]E[ˆµ k X r = i]. (8) Likewise, If k is odd, we ave I =E[ˆµ k X r = i] E[(ˆµ i )k X r = i] [( + M ) k ]E[ˆµ k X r = i] (9) Since k is fixed, E[ˆµ k X r = i] is bounded fro our assuption on γ k, (i), and ( + M ) k = O( ) = O(ρ ). Hence, I [( + M ) k ]E[ˆµ k X r = i] = O(ρ ). 3

24 Let X lc and X rc be te left and rigt cild of te root and T l and T r te left and rigt subtree, we ave E[(ˆµ =E(( ( i )k X r = i] σ T r, σ = p iu p iv E(( ( u,v = p iu p iv E(( ( u,v = u,v X σ + σ T r, σ σ T r, σ σ T l, σ X σ + X σ + p iu p iv k (γ k, (u) + γ k, (v)) + S X σ )) k X r = i) σ T l, σ σ T l, σ X σ )) k X r = i, X lc = u, X rc = v) X σ )) k X lc = u, X rc = v) = k were p iu γ k, (u) + S, u S = k = k If k =, Equation 8 and 0 reduce to ( ) k p iu p iv γ, (u)γ k, (v). u,v (0) γ k, (i) = u p iu γ k, (u) + δ (i), were δ (i) = y(i) y(i) γ,(i) + ( u Write ν = {γ } and δ = {δ }. For n, ν = P ν + δ. p iu γ, (u)) = O(ρ ). Tus by setting δ = 0 we ave ν = P ν + k= P k δ k,and it is not ard to verify tat all te coponents of ν (i.e., every γ k, (i)) converge to γ = π t ν + = πt δ wit rate ρ. Now suppose tat k >. Tere are a fixed nuber of ters in S. Since γ l, (i) γ l = O(ρ ) for all i S and l < k, we ave Tus, I = k S k = k p iu γ k, (u) + u ( ) k γ γ k = O(ρ ). k = 4 k ( ) k γ γ k + O(ρ ). ()

25 Cobining Equation 8, 9 and we arrive at te final equation for γ k, (i): γ k, (i) = c k, I = c k, ( k u k p iu γ k, (u)+ = k ( ) k γ γ k +O(ρ )), () were c k, = if k is odd and c k, = + O(ρ ) [ ( + M ) k, ( + M ) k ] if k is even. Since k < ρ and c k, converges, we conclude fro Lea 5 tat γ k, (i) γ k = O(ρ ), were γ k = k ( k = k ) γ γ k /( k ). We ave proved tat γ k, (i) γ k = O(ρ ) for all k, i,. Let tend to infinity in Equation, we ave γ k = k k γ k + = k ( ) k γ γ k. (3) Now suppose tat ξ i is a sequence of i.i.d N(0, γ ) variables. Let γ k = li E((ξ + + ξ ) k ), ten { γ k }, k N also follows Equation 3. Since γ = γ = 0 and γ = γ we ave γ k = γ k for every k, and te arguent is proved. B. Proof of unifor subgaussianity To prove tat ˆµ are uniforly subgaussian for all, we need to sow tat tere exists soe θ suc tat γ l, (i) θ l γ l for all l. Let c be a large constant to be defined later and c + = ( + M(λ ) )( + ( λ ) )c, were λ = ax{ λ, /3} and M = y. Let s l, = γ l, (i). Since 0 < λ < < λ, c + > c and θ = li c exists. Tus it suffices to prove tat s l, c l γ l (4) and s l, c l ( λ ) γ l (5) for all l and. Again we use an induction on l. Since γ, (i) = O( λ ), we can coose c large enoug suc tat te inequalities in Equation 4 and 5 old for all (, l) wit = or l =. Suppose tat 4 and 5 are verified for all l k. We will prove tat tey are also true for l = k +. 5

26 Fro condition (c) and (c), we know tat u p iu γ k+, (u) λ γ k+, (i) = λ s k+,. Fro our assuption of induction we ave s k+, [( + M ) k+ ]c k γ k + c k+ ( λ ) ( ) k + k+ ( 0 + k ( ) ( k + k + ( + = ) )γ k+ γ + = [( + M ) k+ ]c k γ k + c k+ ( λ ) ( ) k + k ( k + k+ ( γ k+ + 0 = ( k + k + ) γ k+ γ + = [( + M ) k+ ]c k γ k + c k+ ( λ ) γ k+ ) γ k+ ) ([( + M ) k+ ]( λ ) + )c k+ ( λ ) γ k+ ( + M(λ ) ) k+ c k+ ( λ ) γ k+ c k+ ( λ ) γ k+, and Equation 5 is true for k +. Now we ove fro k + to k +. Recall tat =0 γ k+ ( ) k + γ k+ ) k + k+ ( ) k + γ k+, (i) = c k+, ( p k+ iu p iv γ, (u)γ k+, (v)) k+ ( + M ) k+ ( k+ Let and Tus, =0 ( k + k+ s k+, ( + M ) k+ ( k+ I = I = k+ =0 k =0 k+ =0 ( k + Ten s k+, ( + M ) k+ (I + I ). u,v ) s, s k+, ). ( k + ) s, s k+, ). ) s, s k+, ( ) k + s k+ +, s k+,. + (6) 6

27 We ave I k+ =0 = c k+ k+ k+ =0 = c k+ γ k+, ( ) k + c ( k + k+ γ c k+ γ k+ ) γ γ k+ were te last equality follows fro Equation 3. On te oter and I k =0 k+ ( ) k + + = c k+ ( λ ) ( ) k =0 c + (7) ( λ ) γ + c k+ ( λ ) γ k+ ( ) k + γ k+ + γ k+. + It can be directly verified tat for all, ( ) ( ) k + k γ + γ k+ (k + ) γ k+. + (8) Tus, I c k+ ( λ ) k =0 = c k+ ( λ ) ( ) (k + )γ k+ c k+ ( λ ) kγ k+. Cobining Equation 7 and 9, we ave ( ) k k+ (k + ) γ k+ (9) Terefore, I + I c k+ γ k+( + k ( λ ) ) c k+ γ k+( + ( λ ) ) k+. (0) s,k+ c k+, (I + I ) and te teore is proved. c k+ ( + M ) k+ ( + ( λ ) ) k+ γ k+ c k+ γ k+, () B.3 Proof of Corollary Proof. By Teore and Slutsky s lea, it suffices to prove tat ˆ d d in probability. Let D = ax i N deg(x i ). For any σ T, E π deg(x σ) = d. Tus E ˆ d = d, 7

1 Proving the Fundamental Theorem of Statistical Learning

1 Proving the Fundamental Theorem of Statistical Learning THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.