arxiv: v1 [cs.si] 25 May 2016

Size: px
Start display at page:

Download "arxiv: v1 [cs.si] 25 May 2016"

Transcription

1 Stop--Stare: Optimal Sampling Algorithms for Viral Marketing in Billion-scale Networks Hung T. Nguyen CS Department Virginia Commonwealth Univ. Richmond, VA, USA My T. Thai CISE Department University of Florida Gainesville, Florida, USA Thang N. Dinh CS Department Virginia Commonwealth Univ. Richmond, VA, USA arxiv: v1 [cs.si] 5 May 016 ABSTRACT Influence Maximization (IM), that seeks a small set of key users who spread the influence widely into the network, is a core prolem in multiple domains. It finds applications in viral marketing, epidemic control, assessing cascading failures within complex systems. Despite the huge amount of effort, IM in illion-scale networks such as Faceook, Twitter, World Wide We has not een satisfactorily solved. Even the state-of-the-art methods such as TIM+ IMM may take days on those networks. In this paper, we propose D-, two novel sampling frameworks for IM-ased viral marketing prolems. D- are up to 100 times faster than the SIG- MOD 15 est method, IMM, while providing the same (1 1/e ɛ) approximation guarantee. Underlying our frameworks is an innovative Stop--Stare strategy in which they stop at exponential check points to verify (stare) if there is adequate statistical evidence on the solution quality. Theoretically, we prove that D- are the first approximation algorithms that use (asymptotically) minimum numers of samples, meeting strict theoretical thresholds characterized for IM. The asolute superiority of D- are confirmed through extensive experiments on real network data for IM another topic-aware viral marketing prolem, named TVM. Keywords Influence Maximization; Stop--Stare; Sampling 1. INTRODUCTION Viral Marketing, in which r-awareness information is widely spread via the word-of-mouth effect, has emerged as one of the most effective marketing channels. It is ecoming even more attractive with the explosion of social networking services such as Faceook 1 with 1.5 illion monthly active users or Instagram with more than 3.5 illion daily like connections. To create a successful viral marketing campaign, one needs to seed the content with a set of individuals with high social networking influence. Finding such a set of users is known as the Influence Maximization prolem. Given a network a udget k, Influence Maximization (IM) asks for k influential users who can spread the influence widely into the network. Kempe et al. [1] were the first to formulate IM as a cominatorial optimization prolem on the two pioneering diffusion models, namely, Independent Cascade (IC) Linear Threshold (LT). They prove IM to e NP-hard provide a natural greedy algorithm that yields (1 1/e ɛ)-approximate solutions for any ɛ > 0. This celerated work has motivated a vast amount of work on IM in the past decade [, 3, 4, 5, 6, 7, 8, 9, 10, 11, 1]. However, most of the existing methods either too slow for illion-scale networks [1,, 4, 5, 6, 7] or ad-hoc heuristics without performance guarantees [13, 3, 14, 15]. The most scalale methods with performance guarantee for IM are TIM/TIM+[8] latter IMM[16]. They utilize a novel RIS sampling technique introduced y Borgs et al. in [17]. All these methods attempt to generate a (1 1/e ɛ) approximate solution with minimal numers of RIS samples. They use highly sophisticated estimating methods to make the numer of RIS samples close to some theoretical thresholds θ [8, 16]. However, they all share two shortcomings: 1) the numer of generated samples can e aritrarily larger than θ, ) the thresholds θ are not shown to e the minimum among their kinds. In this paper, we 1) unify the approaches in [17, 8, 16] to characterize the necessary numer of RIS samples to achieve (1 1/e ɛ)-approximation guarantee; ) design two novel sampling algorithms D- aiming towards achieving minimum numer of RIS samples. In the first part, we egin with defining RIS framework which consists of two necessary conditions to achieve the (1 1/e ɛ) factor classes of RIS thresholds on the sufficient numers of RIS samples, generalizing θ thresholds in [8, 16]. The minimum threshold in each class is then termed type- 1 minimum threshold, the minimum among all type-1 minimum thresholds is termed type- minimum threshold. In the second part, we develop the Stop--Stare Algorithm () its dynamic version D- that guarantee to achieve, within constant factors, the two minimum thresholds, respectively. Both D- follow the stop--stare strategy which can e efficiently applied to many optimization prolems over the samples guarantee some constant times the minimum numer of samples required. In short, the algorithms keep generating samples stop at exponential check points to verify (stare) if there is adequate statistical evidence on the solution quality for termination. This strategy will e shown to address oth of the shortcomings in [8, 16]: 1) guarantee to e close to the theoretical thresholds ) the thresholds are minimal y definitions. The dynamic algorithm, D-, improves over y automatically dynamically selecting the est parameters for the RIS framework. We note that

2 the Stop--Stare strategy comined with RIS framework enales D- to meet the minimum thresholds without explicitly computing/looking for these thresholds. That is in contrast to previous approaches [17, 8, 16] which all find some explicit unreachale thresholds then proe for them with unounded or huge gaps. Our experiments show that oth D- outperform the est existing methods up to several orders of magnitudes w.r.t running time while returning comparale seed set quality. More specifically, on Friendster network with roughly 65.6 million nodes 1.8 illion edges, D-, taking 3.5 seconds when k = 500, are up to 100 times faster than IMM. We also run CELF++ (the fastest greedy algorithm for IM with guarantees) on Twitter network with k = 1000 oserve that D- is 10 9 times faster. Our contriutions are summarized as follows. We generalize the RIS sampling methods in [17, 8, 16] into a general framework which characterizes the necessary conditions to guarantee the (1 1/e ɛ)- approximation factor. Based on the framework, we define classes of RIS thresholds two types of minimum thresholds, namely, type-1 type-. We propose the Stop--Stare Algorithm () its dynamic version, D-, which oth guarantee a (1 1/e ɛ)-approximate solution are the first algorithms to achieve, within constant factors, the type- 1 type- minimum thresholds, respectively. Our proposed methods are not limited to solve influence maximization prolem ut also can e generalized for an important class of hard optimization prolems over samples/sketches. Our framework approaches are generic can e applied in principle to sample-ased optimization prolems to design high-confidence approximation algorithm using (asymptotically) minimum numer of samples. We carry extensive experiments on various real networks with up to several illion edges to show the superiority in performance comparale solution quality. To test the applicaility of the proposed algorithms, we apply our methods on an IM-application, namely, Targeted Viral Marketing (TVM). The results show that our algorithms are up to 100 times faster than the current est method on IM prolem, for TVM, the speedup is up to 500 times. Note that this paper does not focus on distriuted/parallel computation, however our algorithms are amenale to a distriuted implementation which is one of our future works. Related works. Kempe et al. [1] formulated the influence maximization prolem as an optimization prolem. They show the prolem to e NP-complete devise an (1 1/e ɛ) greedy algorithm. Later, computing the exact influence is shown to e #P-hard [3]. Leskovec et al. [] study the influence propagation in a different perspective in which they aim to find a set of nodes in networks to detect the spread of virus as soon as possile. They improve the simple greedy method with the lazy-forward heuristic (CELF), which is originally proposed to optimize sumodular functions in [18], otaining an (up to) 700-fold speedup. Several heuristics are developed to find solutions in large networks. While those heuristics are often faster in practice, they fail to retain the (1 1/e ɛ)-approximation guarantee produce lower quality seed sets. Chen et al. [19] otain a speedup y using an influence estimation for the IC model. For the LT model, Chen et al. [3] propose to use local directed acyclic graphs (LDAG) to approximate the influence regions of nodes. In a complement direction, there are recent works on learning the parameters of influence propagation models [0, 1]. The influence maximization is also studied in other diffusion models including the majority threshold model [] or when oth positive negative influences are considered [3] when the propagation terminates after a predefined time [, 4]. Recently, IM across multiple OSNs have een studied in [11] [5] studies the IM prolem on continuous-time diffusion models. Recently, Borgs et al. [17] make a theoretical reakthrough present an O(kl (m+n) log n/ɛ 3 ) time algorithm for IM under IC model. Their algorithm (RIS) returns a (1 1/e ɛ)-approximate solution with proaility at least 1 n l. In practice, the proposed algorithm is, however, less than satisfactory due to the rather large hidden constants. In sequential works, Tang et al. [8, 16] reduce the running time to O((k + l)(m + n) log n/ɛ ) show that their algorithm is also very efficient in large networks with illions of edges. Nevertheless, Tang s algorithms have two weaknesses: 1) intractale estimation of maximum influence ) taking union ounds over all possile seed sets in order to guarantee a single returned set. Organization. The rest of the paper is organized as follows: In Section, we introduce two fundamental models, i.e., LT IC, the IM prolem definition. We, susequently, devise the unified RIS framework, RIS threshold two types of RIS minimum thresholds in Section 3. Section 4 5 will present the algorithm prove the approximation factor as well as the achievement of type-1 minimum threshold. In Section 6, we propose the dynamic algorithm, D- prove the approximation together with type- minimum threshold property. Finally, we show experimental results in Section 7 draw some conclusion in Section 8.. MODELS AND PROBLEM DEFINITION This section will formally define two most essential propagation models, e.i., Linear Threshold (LT) Independent Cascade (IC), that we consider in this work followed y the prolem statement of the Influence Maximization (IM). We astract a network using a weighted graph G = (V, E, w) with V = n nodes E = m directed edges. Each edge (u, v) E is associated with a weight w(u, v) [0, 1] which indicates the proaility that u influences v..1 Propagation Models In this paper, we study two fundamental diffusion models, namely, Linear Threshold (LT) Independent Cascade (IC). Assume that we have a set of seed nodes S, the propagation processes under these two models happen in rounds. At round 0, all nodes in S are activated the others are not activated. In the susequent rounds, the newly activated nodes will try to activate their neighors. Once a node v ecomes active, it will remain active till the end. The process stops when no more nodes get activated. The distinctions of the two models are descried as follows: Linear Threshold (LT) model. The edge weights in LT

3 model must satisfy the condition u V w(u, v) 1. At the eginning of the propagation process, each node v selects a rom threshold λ v uniformly at rom in range [0, 1]. In round t 1, an inactivated node v ecomes activated if activated neighors u w(u, v) λv. Let I(S) denote the expected numer of activated nodes given the seed set S, where the expectation is taken over all λ v values from their uniform distriution. We call I(S) the influence spread of S under the LT model. Independent Cascade (IC) model. In IC, when a node u gets activated, initially or y another node, it has a single chance to activate each inactive neighor v with the proaility proportional to the edge weight w(u, v). After that moment, the activated nodes remain its active state ut they have no contriution in later activations. Notation Tale 1: Tale of Symols Description n, m #nodes, #edges of graph G = (V, E, w). I(S) Influence Spread of seed set S V. The maximum I(S) for any size-k seed set S. Ŝ k The returned size-k seed set of /D-. Sk An optimal size-k seed set, i.e., I(Sk ) =. R j A rom RR set. R A set of rom RR sets. Cov R (S), #RR sets in R incident at some node in S. S V c c = (e ). Υ(ɛ, δ) Υ(ɛ, δ) = c ln 1 1 δ ɛ. Λ 1 Λ 1 = (1 + ɛ 1 )(1 + ɛ )Υ(ɛ 3, δ/3). Λ Λ = (1 + ɛ )Υ(ɛ, δ ).. Prolem Definition Given the propagation models defined previously, we formally state the Influence Maximization (IM) prolem as in the following definition, Definition 1 (Influence Maximization (IM)). Given a graph G = (V, E, w), k Z + a propagation model, the Influence Maximization prolem asks for a seed set Ŝk V of k nodes that maximizes its influence spread, I(Ŝk). 3. UNIFIED RIS FRAMEWORK This section will present the unified RIS framework which generalizes the methods of using RIS sampling for IM prolem. The unified framework characterizes the sufficient conditions to guarantee an (1 1/e ɛ)-approximation in the framework. Susequently, we will introduce the concept of RIS threshold in terms of the numer of necessary samples to guarantee the solution quality two types of minimum RIS thresholds, i.e., type-1 type Preliminaries RIS sampling The major ottle-neck in the traditional methods for IM [1,, 4, 6] is the inefficiency in estimating the influence spread. To address that, Borgs et al. [17] introduced a novel sampling approach for IM, called Reverse Influence Sampling (in short, RIS), which is the foundation for TIM/TIM+[8] IMM[16], the state-of-the-art methods. a 0.3 c Generate a collection of rom RR sets, d R = R 1 =, a R = d, c, a, R 3 = Figure 1: An example of generating rom RR sets under the LT model. Three rom RR sets R 1, R R 3 are generated. Node a has the highest influence is also the most frequent element across the RR sets. Given a graph G = (V, E, w), RIS captures the influence lscape of G through generating a set R of rom Reverse Reachale (RR) sets. The term RR set is also used in TIM/TIM+ [8, 16] referred to as hyperedge in [17]. Each RR set R j is a suset of V constructed as follows, Definition (Reverse Reachale (RR) set). Given G = (V, E, w), a rom RR set R j is generated from G y 1) selecting a rom node v V ) generating a sample graph g from G 3) returning R j as the set of nodes that can reach v in g. Node v in the aove definition is called the source of R j. Oserve that R j contains the nodes that can influence its source v. If we generate multiple rom RR sets, influential nodes will likely appear frequently in the RR sets. Thus a seed set S that covers most of the RR sets will likely maximize the influence spread I(S). Here a seed set S covers an RR set R j, if S R j. For convenience, we denote the coverage of set S as Cov R(S) = R j R min{ S Rj, 1}. An illustration of this intuition how to generate RR sets is given in Fig. 1. In the figure, three rom RR sets are generated following the LT model with sources, d c, respectively. The influence of node a is the highest among all the nodes in the original graph also is the most frequent node across the RR sets. This oservation is rigorously captured in the following lemma in [17]. Lemma 1. Given G = (V, E, w) a rom RR set R j generated from G. For each seed set S V, c, a I(S) = n Pr[S covers R j]. (1) Lemma 1 says that the influence of a node set S is proportional to the proaility that S intersects with a rom RR set. Thus, to find S that maximize I(S) can e approximated y finding S that intersects as many R j as possile. The critical issue is on the minimum numer of samples to provide ounded-error guarantees (ɛ, δ)-approximation The ounded-error guarantee we seek for is ased on the concept of (ɛ, δ)-approximation. Definition 3 ((ɛ, δ)-approximation). Let Z 1, Z,... e independently identically distriuted samples according to Z in the interval [0, 1] with mean µ Z A Monte Carlo estimator of µ Z, variance σz. ˆµ Z = 1 T Z i T () i=1 is said to e an (ɛ, δ)-approximation of µ Z if Pr[(1 ɛ)µ Z ˆµ Z (1 + ɛ)µ Z] 1 δ (3)

4 The numer of samples to guarantee an (ɛ, δ)-approximation for the Monte-Carlo method is well-known. For example, we shall use the elow two lemmas from [7]. Lemma ([7]). Let θ e the optimal numer samples that guarantee an (ɛ, δ)-approximation of µ Z define Υ(ɛ, δ/) = c ln(/δ)/ɛ with c = (e ) Cov(Z) = T i=1 Zi. Υ(ɛ,δ/) If T = µ Z, then ˆµ Z = Cov(Z) is an (ɛ, δ)- T approximate of µ Z T c 1 θ (4) for some constant c 1 0. Note that calculating T using the aove lemma requires the knowledge of the unknown value µ Z. To avoid coping with µ Z directly, [7] also provides a simple stopping condition which depends on the numer of Z j = 1 oserved. Lemma 3 ([7]). Let θ, µ Z e defined as in Lem. T e the numer of samples at which Cov(Z) 1 + (1 + ɛ)υ(ɛ, δ/), then ˆµ Z is an (ɛ, δ)-approximation of µ Z T (1 + ɛ)c 1 θ where c 1 is the constant in Lem.. Note that if only one side of the event in Eq. 3 is required, then Υ(ɛ, δ/) ecomes Υ(ɛ, δ) = c ln 1 1 (5) δ ɛ 3. RIS Framework Thresholds Based on Lem. 1, the IM prolem can e solved y the following two step algorithm. Generate a collection of RR sets, R, from G. Use the greedy algorithm for the Max-coverage prolem [8] to find a seed set Ŝk that covers the maximum numer of RR sets return Ŝk as the solution. Figure : Overview of algorithms ased on optimization over samples (ɛ is the error from approximating f(s) y ˆf T (S), OP T f OP T ˆfT are optimal solutions of f(.) ˆf T (.)). This two-step algorithm is actually an instance of a general class of methods illustrated in Fig.. The original prolem is an maximization prolem of f(.) over Ω S which is usually very hard to solve/approximate directly. Instead, we find an estimate ˆf T (.) of f(.). The estimation function ˆf T (.) is constructed y generating T = θ(ɛ, δ) samples where θ(ɛ, δ) is an explicit threshold. The threshold θ(ɛ, δ) decides the estimation quality of ˆf T (.) compared to f(.) usually is the most critical point in the methods. After having the function ˆf T (.), an α approximation algorithm A which is easier efficient to find the final solution S A of ˆf T (.) as well as f(.). The function f(.) that characterizes our maximization ojective covers a wide range of important prolems, e.g., targeted viral marketing [9], densest sugraph [30], which have very high complexity to approximate hence, need to rely on sampling algorithms. Thus, our proposed stop--stare methods can e modified to address many other optimization prolems under this category. We illustrate this aility y extending D- to solve the targeted viral marketing in our experiments section. Similar to determining θ, the core question in applying the aove algorithm for influence maximization prolem is that: How many RR sets are sufficient to provide a good approximate solution? In this case, the function f(s) is the influence function I(S) the samples are generated y RIS sampling. [8, 16] propose two such theoretical thresholds two proing techniques to realistically estimate those thresholds. However, their thresholds are not known to e any kind of minimum the proing method is ad hoc in [8] or far from the proposed threshold in [16]. Thus, they cannot provide any guarantee on the optimality of the numer of samples generated. Since the proing for an explicit threshold is seen to admit certain limitations, we propose a new approach. Instead of explicitly expressing a theoretical threshold trying to proe for it, we characterize the conditions that all the RIS-ased algorithms need to attain state the sufficient numer of samples to satisfy those conditions. Thus, we can define the minimum samples according to the necessary conditions further, propose D- to achieve the minimum. We will first define our RIS framework as enforcing the necessary conditions to guarantee the est known solution quality then propose two minimum thresholds ased on the precision parameters in our framework. Suppose that there is an optimal seed set, Sk, which has the maximum influence in the network. If there are multiple optimal sets with influence,, we choose the first one alphaetically to e Sk. Given 0 ɛ, δ 1, our unified RIS framework enforces two conditions: Pr[Î(Ŝk) (1 + ɛ a)i(ŝk)] 1 δ a (6) Pr[Î(S k) (1 ɛ ) ] 1 δ (7) where δ a + δ δ ɛ a + (1 1/e)ɛ ɛ. Based on the aove conditions, we define the RIS threshold as the following. Definition 4 (RIS Threshold). Given a graph G, 0 ɛ a, ɛ, δ a, δ 1, N(ɛ a, ɛ, δ a, δ ) is called an RIS Threshold in G w.r.t ɛ a, ɛ, δ a, δ if any numer R N(ɛ a, ɛ, δ a, δ ) of rom RR sets generated from G is sufficient to guarantee oth Eqs With the two aove conditions, we now prove that any N N(ɛ a, ɛ, δ a, δ ) RR sets are sufficient to guarantee (1 1/e ɛ)-approximation ratio. Theorem 1. Given a graph G, 0 ɛ a, ɛ, δ a, δ 1, let ɛ ɛ a + (1 1/e)ɛ δ δ a + δ, if the numer of RR sets R N(ɛ a, ɛ, δ a, δ ), then the two-step algorithm in our RIS framework returns Ŝk satisfying Pr[I(Ŝk) (1 1/e ɛ) ] 1 δ (8) which means Ŝk is an (1 1/e ɛ)-approximate solution. For revity, the proof is presented in the appendix. Existing RIS thresholds. For any ɛ, δ (0, 1), Tang et al. estalished in [8] an RIS threshold, N( ɛ, ɛ, δ ( ) n, k δ ( n k ) ) = (8 + ɛ)n ln /δ + ln ( n k ɛ )

5 In a later study [16], they reduced this numer to another RIS threshold, δ N(ɛ 1, ɛ ɛ 1, ( ) n, k δ ) ) = n ( n k ((1 1/e)α + β) ɛ, where α = ( ln ) 1 1 δ, β = (1 1/e) ( ln + ln ( n 1 δ k) ) ɛ α ɛ 1 =. (1 1/e)α+β Unfortunately, computing is intractale, thus, the proposed algorithms have to generate θ RR sets, where KP T + KP T + is the expected influence of a node set otained y sampling k nodes with replacement from G the ratio 1 is not upper-ounded. That is they may generate many + times more RR sets than KP T needed. 3.3 Two Types of Minimum Thresholds Based on the definition of RIS threshold, we now define two strong theoretical limits, i.e. type-1 minimum type- minimum thresholds. In Section 5, we will prove that our first proposed algorithm,, achieves, within a constant factor, a type-1 minimum threshold later, in Section 6, our dynamic algorithm, D-, is shown to otain, within a constant factor, the strongest type- minimum threshold. Definition 5 (type-1 minimum threshold). Given 0 ɛ, δ 1 0 ɛ a, ɛ, δ a, δ 1 satisfying δ a + δ δ ɛ a+(1 1/e)ɛ ɛ, Nmin(ɛ 1 a, ɛ, δ a, δ ) is called a type- 1 minimum threshold w.r.t ɛ a, ɛ, δ a, δ if Nmin(ɛ 1 a, ɛ, δ a, δ ) is the smallest numer of RR sets that satisfies oth Eq. 6 Eq. 7. If N(ɛ a, ɛ, δ a, δ ) is an RIS threshold, then any N such that N N(ɛ a, ɛ, δ a, δ ) is also an RIS threshold. We choose the smallest numer over all the RIS thresholds to e type-1 minimum as defined in Def. 5. All the previous methods [17, 8, 16] try to approximate Nmin(ɛ 1 a, ɛ, δ a, δ ) for some setting of ɛ a, ɛ, δ a, δ, however, they fail to provide any guarantee on how close their numers are to that threshold. In contrast, we show that achieves, within a constant factor, an type-1 minimum threshold in Sec. 5. Next, we give the definition of the strongest type- minimum threshold which is achieved y D- as shown in Sec. 6. Definition 6 (type- minimum threshold). Given 0 ɛ, δ 1, Nmin(ɛ, δ) is called the type- minimum threshold if Nmin(ɛ, δ) = min Nmin(ɛ 1 a, ɛ, δ a, δ ) (9) ɛ a,ɛ,δ a,δ where ɛ a + (1 1/e)ɛ = ɛ δ a + δ = δ. From Def. 6, it follows that type- minimum is the strongest possile threshold that we can achieve in the RIS-framework. 3.4 Achieving the Minimum Thresholds In the following sections, we propose two approximation algorithms, namely, Stop--Stare () the dynamic version D-, which respectively achieve the two theoretical minimum thresholds as well as the est known worstcase approximation ratio. In more details, oth D- employ the stop--stare strategy that doules the numer of RIS samples checks the quality of current solution y an independent influence estimation step. This strategy guarantees that we do not oversample, i.e., douling the necessary numer in the worst case. On a specific setting of the tuple {ɛ a, ɛ, δ a, δ }, guarantees a type-1 minimum threshold corresponding to that configuration. By specifically setting the parameters, algorithm is simpler than the dynamic D-. D- achieves the type- minimum threshold through dynamically finding the est set of parameter values at each exponential checking points. Thus, {ɛ a, ɛ, δ a, δ } are not specified in advance ut automatically detected y D-. Further, D- can reuse the RR sets generated in the independent influence estimation without changing the independence property of RR sets in susequent iterations. 4. STOP-AND-STARE ALGORITHM () In this section, we descrie our first Stop--Stare Algorithm () in details. keeps generating RR sets until douling the current numer stops to check the solution otained from the total RR sets generated. It uses Max- Coverage (Alg. ) to find the solution, Ŝk. Since the influence of Ŝk calculated from those RR sets is iased, independently generates another collection of RR sets in Estimate-Inf procedure (Alg. 3) to otain an uniased estimation of Ŝk influence. Then it stares at the two influences if they are close enough (satisfying s stopping conditions), it halts returns the found solution. Algorithm 1 Algorithm Input: Graph G, 0 ɛ, δ 1, a udget k. Output: An (1 1/e ɛ)-optimal solution, Ŝ k with at least (1 δ)-proaility. 1: Compute ɛ 1, ɛ, ɛ 3 according to Eq ɛ 3 : Λ 1 (1 + ɛ 1)(1 + ɛ )c ln 3 δ 3: R Generate Λ 1 rom RR sets y RIS 4: repeat 5: <Ŝk, Î(Ŝk)> Max-Coverage(R, k, n) 6: if Cov R(Ŝk) Λ 1 then 7: I c(ŝk) Estimate-Inf(G, Ŝk, ɛ, δ 3, R 1+ɛ ɛ 3 1 ɛ ɛ 8: if Î(Ŝk) (1 + ɛ 1)I c(ŝk) then 9: return Ŝk 10: end if 11: end if 1: R Generate R rom RR sets y RIS 13: until R (8 + ɛ)n ln δ +ln (n k) ɛ 14: return Ŝk 4.1 Algorithm This susection will detail the main procedure of where Max-Coverage (descried in Alg. ) Estimate-Inf (descried in Alg. 3) are incorporated. The precision parameters ɛ 1, ɛ, ɛ 3, δ 1, δ are specified in Eq. 18 discussed in Susection An illustration of those precision parameters in ( also later for D-) is provided in Fig. 3. This figure also demonstrate our ounding technique in our framework (Fig. ): The overall error is accumulated from the sampling errors to estimate of I(S k) I(Ŝk) the approximation error of Max-Coverage to find the set Ŝk. The algorithm is presented in Alg. 1. starts with initializing the values of three variales ɛ 1, ɛ ɛ 3 which are critical in deciding the stopping conditions will e specified in our analysis (Eq. 18) where we prove the type-1 )

6 Estimation using R Estimation using R True influence Overall error (ε): Confidence: I( S k ) (1 1 I( S k ) e ) I(S k ) I c ( S k ) ε 1 ε (w. p. δ ) ε 1 + ε + ε 1 ε + (1 1/e)ε 3 1 δ 1 δ I(S k ) OPT k = I S k ε 3 (w. p. δ 1 ) Algorithm Max-Coverage procedure Input: RR sets (R), k numer of nodes (n). Output: An (1 1/e)-optimal solution, Ŝk its estimated influence I c(ŝk). 1: Ŝ k = : for i=1 to k do 3: ˆv arg max {v V } (Cov R(Ŝk {v}) Cov R(Ŝk)) 4: Add ˆv to Ŝk 5: end for 6: return <Ŝk, Cov R(Ŝk) n/ R > Figure 3: Illustration of precision parameters ɛ 1, ɛ, ɛ 3, δ 1, δ (I(.) denotes the true influence function, Î(.), Ic(.) are the estimates of I(.) y the collection R of RR sets in the main algorithm y another independent collection R of the Estimate-Inf, Sk is an optimal solution). minimum threshold of. Having otained those values, the algorithm, then, computes Λ 1 (Line ) which determines the lower ound for the degree of the selected seed set, Ŝk. The central part of our algorithm iterates round y round: at round 0, generates Λ 1 rom RR sets (Line 3) simply ecause that is the lowest numer we can expect to meet the first condition at Line 6; at round i 1, it doules the numer of rom RR sets y introducing R more sets (Line 1). In each round, a cidate seed set Ŝk is selected y Max-Coverage together with its estimated influence. otains another estimated influence which is returned y the Estimate-Inf (Line 7) denoted y I c(ŝk). The two influences are used in the second condition (Line 8). Thus, contains two stopping conditions: (C1) The first condition Cov R(Ŝk) Λ 1 (Line 6) ensures that the coverage of the returned solution is at least Λ 1 which is important to guarantee the approximaility of Ŝ k Sk as shown in Lem.5 6. This condition remains true after the first time it is estalished. (C) The second condition Î(Ŝk) (1 + ɛ 1)I c(ŝk) (Line 8) compares estimates of I(Ŝk) otained y Max-Coverage y the Estimate-Inf procedure. This comparison is only active after the first stopping condition is met the numer of RR sets is large enough to trigger the Estimate-Inf. If these two estimates are close enough (within a multiplicative factor of 1 + ɛ 1), we confirm the approximaility of Ŝk (Lem. 5). As we will prove in Sec. 5, the two stopping conditions are sufficient to guarantee the (1 1/e ɛ)-approximation of Ŝk. Note that we use a threshold (8 + ɛ)n ln δ +ln (n k) which is ɛ taken from TIM+ [8] when considering OP T k = 1 to stop in cases of ad events. 4. Finding Max-Coverage We descrie the Max-Coverage procedure to find a (1 1/e)-coverage set. This algorithm plays the role of the α- approximation algorithm in Fig. where α = 1 1/e. Alg. illustrates the greedy Max-Coverage algorithm to select a size-k seed set. The whole procedure goes through k iterations in which, at each step, a node with maximum relative coverage, with respect to the previously chosen ones, is selected into the seed set Ŝk. As a well-known result [31], we have the following lemma. Lemma 4. The greedy Max-Coverage returns an (1 1/e)- approximate seed set Ŝk. This algorithm can e implemented in linear time in terms of the total size of all the RR sets as in [17, 8, 16], as a result, the complexity is upper-ounded y the generating RR sets. Thus, the complexity of the whole algorithm actually depends only on that for generating RR sets. 4.3 Influence Estimation Estimating influence of a given seed set S is a key component in our method is used in the places where we have a cidate seed set need to check whether the estimate of that set in the main algorithm is good enough. Our goal is to otain a good approximation (within a certain multiplicative-error) of the given seed set influence. Algorithm 3 Estimate-Inf procedure Input: Graph G, a seed set S, 0 ɛ, δ 1 maximum numer of samples, T max. Output: I c(s) such that I c(s) (1 + ɛ )I(S) with at least (1 δ )-proaility or exceeding T max. 1: Λ = 1 + c(1 + ɛ ) ln 1 1 δ ɛ : Cov = 0 3: for T from 1 to T max 4: Generate R j RIS(G) 5: Cov = Cov + min{ R j S, 1} 6: if Cov Λ then 7: return n Cov/T {n: numer of nodes} 8: end if 9: end for 10: return -1 {Exceeding T max RR sets} The estimating procedure is called Estimate-Inf detailed in Alg. 3. Given a node set S two parameters, ɛ, δ, Estimate-Inf returns an estimate I c(s) of I(S) such that I c(s) (1 + ɛ )I(S) with at least (1 ɛ )-proaility. The most crucial point in the procedure is determining Λ (Line 1) the related condition (Line 6) which compares the coverage of S with Λ. In our analysis, we show that the condition is sufficient to guarantee I c(s) (1 + ɛ )I(S) with at least (1 ɛ )-proaility. The main step in this procedure is the for loop in which rom RR sets are drawn one at a time until satisfying the condition. The loop iterates through the maximum of T max iterations where, as descried in Alg. 1, T max is a constant multiply of the total numer of RR sets generated in the main algorithm at that point. Choosing T max is a crucial point of this algo-

7 rithm since we run this at every iteration in. At the eginning when the solution Ŝ is not We define a rom variale X = min{ R j S, 1}, where R j is a rom RR set µ X = I(S)/n (Lem. 1). From Lem. 3 of the (ɛ, δ)-approximation criteria, we otain a direct corollary as stated elow, Corollary 1. The Estimate-Inf procedure of returns an estimate, I c(s), of I(S) such that Pr[I c(s) (1 + ɛ )I(S)] 1 δ = 1 δ/3 (10) 5. GUARANTEE AND PERFORMANCE ANALYSIS In this section, we will prove that returns a (1 1/e ɛ)-approximate solution with at least (1 δ)-proaility in Susec Susequently, is shown to require no more than a constant factor of a type-1 minimum threshold of RR sets with the same proaility in Susec Approximation Guarantee In this susection, we will prove that achieves the approximation factor of (1 1/e ɛ) with at least (1 δ)- proaility. The proof essentially contains two core components which are two conditions in our RIS framework (Susection 3.): 1) prove that at termination achieves a good approximation of the selected solution, Ŝ k, (Lem. 5) ) the hidden optimal solution, Sk, is also well-estimated (Lem. 6). Thus, comining these with Theo. 1 (Susection 3.) gives us the approximation factor (1 1/e ɛ) stated in Theo.. The first component states the quality of the estimated influence of the returned solution, Î(Ŝk), that has at termination is shown in Lem. 5. Lemma 5. returns a seed set, Ŝk, with Pr[Î(Ŝk) (1 + ɛ 1)I(Ŝk)] 1 δ/3 (11) where ɛ 1 = ɛ 1 + ɛ + ɛ 1ɛ. The proof is presented in the appendix. Based on Lem. 5, we prove the second component which also contains the influence estimation of the optimal solution, S k. Lemma 6. terminates with Pr[Î(S k) (1 ɛ 3) or Î(Ŝk) (1 + ɛ 1)I(Ŝk)] δ/3 Thus, Pr[Î(S k) (1 ɛ 3) ] 1 δ/3 (1) Lem. 5 6 are sufficient to prove the approximation quality of as stated y the following theorem. Theorem. Given 0 ɛ, δ 1 ɛ 1, ɛ, ɛ 3 satisfying ɛ 1 + ɛ + ɛ 1ɛ + (1 1/e)ɛ 3 ɛ, returns a seed set, Ŝk, such that Pr[I(Ŝk) (1 1/e ɛ) ] 1 δ (13) Proof. To prove the theorem, we will show that R is an RIS threshold then apply Theo. 1 to otain the (1 1/e ɛ)-approximation property of. Actually, we will later prove in Sec. 5 that R is not just RIS threshold ut, to within a constant factor, a type-1 minimum threshold. The first condition to ecome an RIS threshold is taken from Lem. 5, Pr[Î(Ŝk) (1 + ɛ 1)(1 + ɛ )I(Ŝk)] 1 δ/3 (14) The second condition is otained from Lem. 6, Pr[Î(S k) (1 ɛ 3) ] 1 δ/3 (15) From Eq. 14 Eq. 15, we conclude that R is an RIS threshold with ɛ a = (1+ɛ 1)(1+ɛ ) 1, ɛ = ɛ 3, δ a = δ/3 δ = δ/3. Notice that ɛ a + (1 1/e)ɛ = ɛ δ a + δ = δ. By Theo. 1, we have Pr[I(Ŝk) (1 1/e ɛ) ] 1 δ (16) which completes the proof of Theo.. 5. Achieving Type-1 Minimum Threshold We will analyze the numer of RR sets generated in oth the main Estimate-Inf procedures show that requires no more than a constant times a type-1 minimum RR sets. That makes the first method to achieve a type-1 minimum threshold Parameter Settings In Theo., we rely on the assumptions that ɛ 1, ɛ, ɛ 3 are given such that ɛ 1 + ɛ + ɛ 1ɛ + (1 1/e)ɛ 3 ɛ. (17) Determining their values plays an important role in the algorithm. Ideally, we want to generate just enough RR sets in the main algorithm to have a good estimate of Ŝk then check the influence y Estimate-Inf procedure. That is ecause we will discard all the RR sets in the Estimate-Inf after running, thus, if we start checking too early, we will waste a wealth of RR sets. On the other h, ecause of the douling scheme in the main algorithm, if starting checking way latter, we will generate a lot of unnecessary RR sets in the main procedure. In, we determine the values of ɛ 1, ɛ, ɛ 3 ased on experiments prove in next susection that generates only a constant times the type-1 threshold w.r.t these settings. In Sect. 6, we propose a dynamic algorithm, D-, to tune find the est values during the execution it requires, within a constant factor, the strongest type- minimum threshold RR sets. We carried experiments on various values of ɛ 1, ɛ ɛ 3 which satisfy Eq. 17 found the following values roustly giving low numer of RR sets, ɛ 1 = ɛ/6 ɛ = ɛ/ (18) ɛ 3 = ɛ/4(1 1/e) 5.. Numer of RR sets in the main algorithm Here, we will analyze the numer of RR sets generated in the main algorithm show that this numer is at most constant times N 1 min(ɛ a, ɛ, δ a, δ ) where ɛ a = ɛ 1 = ɛ 1 +ɛ + ɛ 1ɛ, ɛ = ɛ 3 δ a = δ/3, δ = δ/3. From Susec. 5.1, Pr[Î(Ŝk) (1 + ɛ a)i(ŝk)] 1 δ a (19) Pr[Î(S k) (1 ɛ ) ] 1 δ (0) are otained y enforcing two stopping conditions: Cov R(Ŝk) Λ 1 (C1)

8 Î(Ŝk) (1 + ɛ 1)I c(ŝk) (C) Now, to determine the numer of RR sets generated in, we will start from Nmin(ɛ 1 a, ɛ, δ a, δ ) RR sets where Eq. 19 Eq. 0 are satisfied determine how many more RR sets needed to meet (C1) (C). More specifically, we prove that in the cases that the inequalities in Eq. 19 Eq. 0 hold with Nmin(ɛ 1 a, ɛ, δ a, δ ) RR sets, needs at c most 1 N 1 1 1/e ɛ min(ɛ a, ɛ, δ a, δ ) RR sets, where c 1 is a constant defined in Lem.. Thus, will need no more than c 1 N 1 1 1/e ɛ min(ɛ a, ɛ, δ a, δ ) RR sets with the same proaility that two inequalities in Eq. 19 Eq. 0 hold which is at least 1 δ. The satisfaction of (C1) is stated as follows. Lemma 7. Suppose that with N 1 min(ɛ a, ɛ, δ a, δ ) RR sets, we have Î(Ŝk) (1 + ɛ a)i(ŝk) (1) Î(S k) (1 ɛ ) () c Then needs at most 1 N 1 1 1/e ɛ min(ɛ a, ɛ, δ a, δ ) RR sets to satisfy condition (C1). The proof is presented in the appendix. The following lemma states that condition (C) is also satisfied with the same numer of RR sets. Lemma 8. Given all the assumptions as in Lem. 7, c needs at most 1 N 1 1 1/e ɛ min(ɛ a, ɛ, δ a, δ ) RR sets to satisfy condition (C) with at least (1 δ/3)-proaility. From Lem. 7 Lem. 8 the fact that stops when two stopping conditions, i.e., (C1) (C), are satisfied, we have the following lemma. Lemma 9. The main algorithm of generates, within a constant factor, N 1 min(ɛ a, ɛ, δ a, δ ) RR sets with at least (1 δ)-proaility. Proof. From Lems. 7 8, we otain that the numer c of RR sets R 1 N 1 1 1/e ɛ min(ɛ a, ɛ, δ a, δ ) is the necessary condition to satisfy oth (C1) (C) with the proaility accumulated from Eq. 19, Eq. 0 Lem. 8 that is 1 δ δ/3 = 1 δ (follows from Lem. 6 that the inequalities in Eq are oth satisfied with proaility 1 δ ). Therefore, due to the douling scheme, will generate c at most 1 N 1 1 1/e ɛ min(ɛ a, ɛ, δ a, δ ) which remains to e a constant times Nmin(ɛ 1 a, ɛ, δ a, δ ) Numer of RR sets in Estimate-Inf procedure As presented in Alg. 3, the numer of RR sets generated in Estimate-Inf procedure is always smaller than 1+ɛ ɛ 3 1 ɛ times ɛ the numer of RR sets currently in the main algorithm. Therefore, the total numer of RR sets generated during the running of is smaller than 1+ɛ ɛ 3 1 ɛ times the sum of ɛ RR sets present at each iteration in the main algorithm. In turn, due to the douling ehavior, the sum of RR sets is smaller than twice that numer at the last iteration. Thus, ased on Lem. 9, we have the following lemma. Lemma 10. Estimate-Inf procedure of generates, to within a constant factor, N 1 min(ɛ a, ɛ, δ a, δ ) RR sets with at least (1 δ)-proaility. The constant in the lemma is 1+ɛ ɛ 3 1 ɛ times that in Lem. 9 ɛ that makes it 1 + ɛ ɛ 3 c 1 1 ɛ ɛ 1 1/e ɛ N min(ɛ 1 a, ɛ, δ a, δ ) (3) Comining Lems. 9 10, we conclude the overall numer of RR sets y the following theorem. Theorem 3. generates, to within a constant factor, N 1 min(ɛ a, ɛ, δ a, δ ) RR sets with at least (1 δ)-proaility. The total numer in this theorem is the sum of those in Lems. 9 10, ( ɛ ɛ 3 c 1 ) 1 ɛ 1 1/e ɛ N min(ɛ 1 a, ɛ, δ a, δ ) (4) ɛ 6. DYNAMIC ALGORITHM In this section, we propose the dynamic algorithm, named D-, that automatically selects ɛ 1, ɛ, ɛ 3, δ 1, δ during its execution. While maintaining the (1 1/e ɛ)-approximate solution as in, D- requires, to within a constant factor, the type- minimum threshold. This is the strongest result over all IM methods following the RIS framework. 6.1 D- Algorithm In Section 4, can e seen as generating two independent collections of RR sets: one is in the main algorithm to find the maximum seed set the another is for estimating the influence of the seed set found in the main procedure. Recall from Section 5 that we want to start checking the influence of Ŝk at the moment of having generated just enough RR sets so that the RR sets for checking are not wasted. However, detecting that moment is challenging since it depends not only on the networks ut also the particular execution of generating RR sets. Algorithm 4 D- Algorithm Input: Graph G, 0 ɛ, δ 1, k. Output: An (1 1/e ɛ)-optimal solution, Ŝk. 1: Λ c(1 + ɛ) ln 1 δ ɛ : R Generate Λ rom RR sets y RIS 3: <Ŝk, Î(Ŝk)> Max-Coverage(R, k) 4: repeat 5: R Generate R rom RR sets y RIS 6: I c(ŝk) Cov R (Ŝk) n/ R 7: ɛ 1 Î(Ŝk)/I c(ŝk) 1 8: if (ɛ 1 ɛ) then 9: ɛ ɛ ɛ 1, (1+ɛ 1 ) ɛ3 ɛ ɛ 1 (1 1/e) 3 c(1+ɛ 1 )(1+ɛ ) 10: δ 1 e Cov R (Ŝ k ) ɛ 11: δ e (Cov R 1)(Ŝ k ) ɛ c(1+ɛ ) 1: if δ 1 + δ δ then 13: return Ŝk 14: end if 15: end if 16: R R R 17: <Ŝk, Î(Ŝk)> Max-Coverage(R, k) 18: until R (8 + ɛ)n ln δ +ln (n k) ɛ 19: return Ŝk

9 The dynamic algorithm D-, descried in Alg. 4, addresses thoroughly these issue y dynamically computing the values of ɛ 1, ɛ, ɛ 3, δ 1, δ along its execution stops whenever the success proaility meets the requirement (Line 1). D- also reuses the checking RR sets for finding seed set without affecting the independence of RR sets in susequent iterations (Line 16). More specifically, D- uses the newly generated RR sets to estimates the influence of the seed set found in the previous iteration otain the current value of ɛ 1 (Line 7). From the value of ɛ 1, ɛ ɛ 3 can e computed accordingly (Line 8). The formula for computing ɛ ɛ 3 are ased on the condition that ɛ 1 + ɛ + ɛ 1ɛ + ɛ 3(1 1/e) ɛ (5) considering ɛ +ɛ 1ɛ ɛ 3(1 1/e) having similar roles. After that it relies on the two stopping conditions mentioned in to calculate δ 1 δ. D- stops when sum of δ 1 δ is less than or equal to θ which signifies that the success proaility meets the requirement. 6. D- Analysis We will sequentially show that D- achieves (1 1/e ɛ)-approximation factor in Susec requires only, to within a constant factor, the strongest type- minimum threshold of the RR sets in Susec Approximation Guarantee We show that D- preserve the (1 1/e ɛ)-approximation factor of y the following theorem. Theorem 4. Given a graph G, 0 ɛ 1 /e 0 δ 1 as the inputs, D- returns a (1 1/e ɛ)- approximate solution. Proof. we prove that the two stopping conditions in are still hold in D- thus D- has the same approximation factor as does. Directly from the Alg. 4, when D- terminates, the following conditions are satisfied, Cov R(Ŝk) = c(1 + ɛ 1)(1 + ɛ ) ln 1 δ 1 1 ɛ 3 Cov R (Ŝk) = 1 + c(1 + ɛ ) ln 1 δ 1 ɛ (6) (7) with ɛ 1 + ɛ + ɛ 1ɛ + (1 1/e)ɛ 3 = ɛ δ 1 + δ = δ. Eq. 6 is the first stopping conditions in. Eq. 7 is the checking condition in the Estimate-Inf procedure in, together with the setting of ɛ 1, we otain the second stopping condition in. Î(Ŝk) (1 + ɛ 1)(1 + ɛ )I(Ŝk) (8) Thus, the (1 1/e ɛ)-approximation factor is followed from which completes the proof. 6.. Achieving the Type- Minimum Threshold Here, we will prove a much stronger result than that of that D- requires only, to within a constant factor, the type- minimum of RR sets Nmin(ɛ, δ). Since ɛ a ɛ δ δ, let denote the constant M = ln δ ln 1 δ 4 (1 + ɛ a) ( 1 1 e ɛ )3 ln δ ln 1 δ 4 (1 + ɛ) ( 1 1 e ɛ )3 (9) Alternatively speaking, for a given graph G, we will show that D- terminates when the numer of RR sets in R is larger than or equal to Mc 1N min(ɛ, δ), where c 1 is a constant defined in Lem., with at least (1 δ)-proaility. Due to the douling scheme, D- will generate no more than twice that numer. More specifically, we will prove that the stopping condition (Line 11) is satisfied. Recall N min(ɛ, δ) RR sets implies where Pr[Î(Ŝk) (1 + ɛ a)i(ŝk)] 1 δ a (30) Pr[Î(S k) (1 ɛ ) ] 1 δ (31) N 1 min(ɛ a, ɛ, δ a, δ ) = Apply Theo. 1, we have min Nmin(ɛ 1 a, ɛ, δ a, δ ) ɛ a,ɛ,δ a,δ Pr[I(Ŝk) (1 1/e ɛ) ] 1 δ (3) First, we show that the value of ɛ 1 is upper ounded y the following lemma. Lemma 11. Given 0 ɛ 1 /e 0 δ 1, if R Mc 1N min(ɛ, δ), then, Pr[ɛ 1 ɛ a + ɛ / ] 1 δ (33) 1 ɛ / where c 1 is a constant defined in Lem.. The proof is presented in the appendix. Based on the result of Lem. 11, we next prove that D- will terminate when R Mc 1Nmin(ɛ, δ) with at least (1 δ)-proaility. Lemma 1. If R Mc 1N min(ɛ, δ) ɛ 1 ɛ a +ɛ / 1 ɛ /, then, δ 1 + δ δ (34) Therefore, D- needs at most a constant times the type- minimum with the proaility of Pr[ɛ 1 ɛ a +ɛ / 1 ɛ ] which is / at least 1 δ. Since we doule the RR sets every round, the actual numer of RR sets is at most twice the necessary one. Thus, we conclude that D- generates at most a constant times the type- minimum with proaility of at least 1 δ. Theorem 5. Given a graph G, 0 ɛ 1 /e 0 δ 1, let Nmin(ɛ, δ) e the type- minimum threshold of RR sets. D- generates no more than, to within a constant factor, Nmin(ɛ, δ) RR sets with the proaility of at least (1 δ). The Theo. 5 follows directly from Lems EXPERIMENTS Backing y the strong theoretical results, we will experimentally show that D- outperform the existing state-of-the-art IM methods y a large margin. Specifically, D- are several orders of magnitudes faster than IMM TIM+, the est existing IM methods with approximation guarantee, while having the same level of solution quality. D- also require several times less memory than the other algorithms. To demonstrate the applicaility of the proposed algorithms, we apply our methods on a critical application of IM, i.e., Targeted Viral Marketing (TVM) introduced in [9] show the significant improvements in terms of performance over the existing methods.

10 Expected Influence D- IMM TIM+ TIM CELF Expected Influence (a) NetHEPT D- IMM TIM+ TIM (a) NetHEPT Dataset Expected Influence Expected Influence D- IMM TIM+ TIM Tale : Datasets Statistics D- IMM TIM+ TIM () NetPHY (c) DBLP Figure 4: Expected Influence under LT model. D- IMM TIM+ TIM #Nodes #Edges Avg. degree NetHELP 3 15K 59K 4.1 NetPHY 3 37K 181K 13.4 Enron 3 37K 184K 5.0 Epinions 3 13K 841K 13.4 DBLP 3 655K M 6.1 Orkut 3 3M 117M 78 Twitter [3] 41.7M 1.5G 70.5 Friendster M 1.8G Experimental Settings All the experiments are run on a Linux machine with.ghz Xeon 8 core processor 100GB of RAM. We carry experiments under oth LT IC models on the following algorithms datasets. Algorithms compared. On IM experiments, we compare D- with the group of top algorithms that provide the same (1 1/e ɛ)-approximation guarantee. More specifically, CELF++ [33], one of the fastest greedy algorithms, IMM [16], TIM/TIM+ [8], the est current RIS-ased algorithms, are selected. For experimenting with TVM prolem, we apply our Stop--Stare algorithms on this context compare with the most efficient method for the prolem, termed KB-TIM, in [9]. Datasets. For experimental purposes, we choose a set of 8 datasets from various disciplines: NetHEPT, NetPHY, DBLP are citation networks, -Enron is communication network, Epinions, Orkut, Twitter Friendster are online social networks. The description summary of those datasets is in Tale. On Twitter network, we also have the actual tweet/retweet dataset we use these data to extract the target users whose tweets/retweets are relevant to a certain set of keywords. The experiments on TVM are run on the Twitter network with the extracted targeted groups of users. Parameter Settings. For computing the edge weights, we follow the conventional computation as in [8, 13, 4, 6], () NetPHY (c) DBLP Figure 5: Expected Influence under IC model. Expected Influence Expected Influence D- IMM TIM+ TIM Expected Influence Expected Influence D IMM TIM (d) Twitter D IMM TIM+ (d) Twitter the weight of the edge (u, v) is calculated as w(u, v) = 1 d in (v) where d in(v) denotes the in-degree of node v. In all the experiments, we keep ɛ = 0.1 δ = 1/n as a general setting or explicitly stated otherwise. For the other parameters defined for particular algorithms, we take the recommended values in the corresponding papers if availale. We also limit the running time of each algorithm in a run to e within 4 hours. 7. Experiments with IM prolem To show the superior performance of the proposed algorithms on IM task, we ran the first set of experiments on four real-world networks, i.e., NetHEPT, NetPHY, DBLP, Twitter. We also test on a wide spectrum of the value of k, typically, from 1 to 0000, except on NetHEPT network since it has only 1533 nodes. The solution quality, running time, memory usage are reported sequentially in the following. We also present the actual numer of RR sets generated y, D- IMM when testing on four other datasets, i.e., Enron, Epinions, Orkut Friendster Solution Quality We first compare the quality of the solution returned y all the algorithms on LT IC models. The results are presented in Fig. 4 Fig. 5, respectively. The CELF++ algorithm is only ale to run on NetHEPT due to time limit. From those figures, all the methods return comparale seed set quality with no significant difference. The results directly give us a etter viewpoint on the asic network property that a small fraction of nodes can influence a very large portion of the networks. Most of the previous researches only find up to 50 seed nodes provide a limited view of this phenomenon. Here, we see that after around 000 nodes have een selected, the influence gains of selecting more seeds ecome very slim. 7.. Running time We next examine the performance in terms of running time of the tested algorithms. The results are shown in Fig. 6 Fig. 7. Both D- significantly outper-

11 (a) NetHEPT 10 - () NetPHY 10 - (c) DBLP Figure 6: Running time under LT model (d) Twitter Memory used (MB) Memory used (MB) (a) NetHEPT (a) NetHEPT (a) NetHEPT Memory used (MB) Memory used (MB) 10 - () NetPHY 10 - (c) DBLP Figure 7: Running time under IC model () NetPHY Memory used (MB) (c) DBLP Figure 8: Memory usage under LT model () NetPHY Memory used (MB) (c) DBLP Figure 9: Memory usage under IC model Memory used (MB) Memory used (MB) (d) Twitter (d) Twitter (d) Twitter Data running time(s) numer of RR sets(thouss) k = 1 k = 500 k = 1000 k = 1 k = 500 k = 1000 D- IMM D- IMM D- IMM D- IMM D- IMM D- IMM Enron Epin Orkut Frien Tale 3: Across dataset view of performance of, D- IMM on various datasets under LT model. form the other competitors y a huge margin. Comparing to IMM, the est known algorithm, D- run up to several orders of magnitudes faster. TIM+ IMM show similar running time since they operate on the same philosophy of estimating optimal influence first then calculating the necessary samples to guarantee the approximation for all possile seed sets. However, each of the two steps displays its own weakness. In contrast, D- follows the Stop--Stare mechanism to thoroughly address those weaknesses thus exhiit remarkale improvements. Comparing etween D-, since D-possesses the type- minimum threshold compared to the weaker type- 1 threshold of with the same precision settings ɛ, δ, D- achieves consideraly etter performance commits up to an order of magnitudes speedup.

Revisiting of Revisiting the Stop-and-Stare Algorithms for Influence Maximization

Revisiting of Revisiting the Stop-and-Stare Algorithms for Influence Maximization Revisiting of Revisiting the Stop-and-Stare Algorithms for Influence Maximization Hung T. Nguyen 1,2, Thang N. Dinh 1, and My T. Thai 3 1 Virginia Commonwealth University, Richmond VA 23284, USA {hungnt,

More information

Performance Evaluation. Analyzing competitive influence maximization problems with partial information: An approximation algorithmic framework

Performance Evaluation. Analyzing competitive influence maximization problems with partial information: An approximation algorithmic framework Performance Evaluation ( ) Contents lists available at ScienceDirect Performance Evaluation journal homepage: www.elsevier.com/locate/peva Analyzing competitive influence maximization problems with partial

More information

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model Luis Manuel Santana Gallego 100 Appendix 3 Clock Skew Model Xiaohong Jiang and Susumu Horiguchi [JIA-01] 1. Introduction The evolution of VLSI chips toward larger die sizes and faster clock speeds makes

More information

Outward Influence and Cascade Size Estimation in Billion-scale Networks

Outward Influence and Cascade Size Estimation in Billion-scale Networks Outward Influence and Cascade Size Estimation in Billion-scale Networks H. T. Nguyen, T. P. Nguyen Virginia Commonwealth Univ. Richmond, VA 2322 {hungnt,trinpm}@vcu.edu T. N. Vu Univ. of Colorado, Boulder

More information

A Billion-scale Approximation Algorithm for Maximizing Benefit in Viral Marketing

A Billion-scale Approximation Algorithm for Maximizing Benefit in Viral Marketing 1 A Billion-scale Approximation Algorithm for Maximizing Benefit in Viral Marketing Hung T. Nguyen, My T. Thai, Member, IEEE, and Thang N. Dinh, Member, IEEE Abstract Online social networks have been one

More information

Fast inverse for big numbers: Picarte s iteration

Fast inverse for big numbers: Picarte s iteration Fast inverse for ig numers: Picarte s iteration Claudio Gutierrez and Mauricio Monsalve Computer Science Department, Universidad de Chile cgutierr,mnmonsal@dcc.uchile.cl Astract. This paper presents an

More information

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES #A50 INTEGERS 14 (014) ON RATS SEQUENCES IN GENERAL BASES Johann Thiel Dept. of Mathematics, New York City College of Technology, Brooklyn, New York jthiel@citytech.cuny.edu Received: 6/11/13, Revised:

More information

arxiv: v1 [cs.ds] 16 Apr 2017

arxiv: v1 [cs.ds] 16 Apr 2017 Outward Influence and Cascade Size Estimation in Billion-scale Networks H. T. Nguyen, T. P. Nguyen Virginia Commonwealth Univ. Richmond, VA 23220 {hungnt,trinpm}@vcu.edu T. N. Vu Univ. of Colorado, Boulder

More information

Influence Spreading Path and its Application to the Time Constrained Social Influence Maximization Problem and Beyond

Influence Spreading Path and its Application to the Time Constrained Social Influence Maximization Problem and Beyond 1 ing Path and its Application to the Time Constrained Social Influence Maximization Problem and Beyond Bo Liu, Gao Cong, Yifeng Zeng, Dong Xu,and Yeow Meng Chee Abstract Influence maximization is a fundamental

More information

Structuring Unreliable Radio Networks

Structuring Unreliable Radio Networks Structuring Unreliale Radio Networks Keren Censor-Hillel Seth Gilert Faian Kuhn Nancy Lynch Calvin Newport March 29, 2011 Astract In this paper we study the prolem of uilding a connected dominating set

More information

Network Clocks: Detecting the Temporal Scale of Information Diffusion

Network Clocks: Detecting the Temporal Scale of Information Diffusion Network Clocks: Detecting the Temporal Scale of Information Diffusion Daniel J. DiTursi Gregorios A. Katsios Petko Bogdanov Department of Computer Science State University of New York at Alany Alany, NY

More information

Generalized Sampling and Variance in Counterfactual Regret Minimization

Generalized Sampling and Variance in Counterfactual Regret Minimization Generalized Sampling and Variance in Counterfactual Regret Minimization Richard Gison and Marc Lanctot and Neil Burch and Duane Szafron and Michael Bowling Department of Computing Science, University of

More information

Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users?

Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users? Continuous Influence Maximization: What Discounts Should We Offer to Social Network Users? ABSTRACT Yu Yang Simon Fraser University Burnaby, Canada yya119@sfu.ca Jian Pei Simon Fraser University Burnaby,

More information

1 Hoeffding s Inequality

1 Hoeffding s Inequality Proailistic Method: Hoeffding s Inequality and Differential Privacy Lecturer: Huert Chan Date: 27 May 22 Hoeffding s Inequality. Approximate Counting y Random Sampling Suppose there is a ag containing

More information

Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model

Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model Influence Blocking Maximization in Social Networks under the Competitive Linear Threshold Model Xinran He Guojie Song Wei Chen Qingye Jiang Ministry of Education Key Laboratory of Machine Perception, Peking

More information

Robot Position from Wheel Odometry

Robot Position from Wheel Odometry Root Position from Wheel Odometry Christopher Marshall 26 Fe 2008 Astract This document develops equations of motion for root position as a function of the distance traveled y each wheel as a function

More information

FinQuiz Notes

FinQuiz Notes Reading 9 A time series is any series of data that varies over time e.g. the quarterly sales for a company during the past five years or daily returns of a security. When assumptions of the regression

More information

Optimal Routing in Chord

Optimal Routing in Chord Optimal Routing in Chord Prasanna Ganesan Gurmeet Singh Manku Astract We propose optimal routing algorithms for Chord [1], a popular topology for routing in peer-to-peer networks. Chord is an undirected

More information

Ahmed Nazeem and Spyros Reveliotis

Ahmed Nazeem and Spyros Reveliotis Designing compact and maximally permissive deadlock avoidance policies for complex resource allocation systems through classification theory: the non-linear case Ahmed Nazeem and Spyros Reveliotis Astract

More information

arxiv: v1 [cs.fl] 24 Nov 2017

arxiv: v1 [cs.fl] 24 Nov 2017 (Biased) Majority Rule Cellular Automata Bernd Gärtner and Ahad N. Zehmakan Department of Computer Science, ETH Zurich arxiv:1711.1090v1 [cs.fl] 4 Nov 017 Astract Consider a graph G = (V, E) and a random

More information

Lecture 1 & 2: Integer and Modular Arithmetic

Lecture 1 & 2: Integer and Modular Arithmetic CS681: Computational Numer Theory and Algera (Fall 009) Lecture 1 & : Integer and Modular Arithmetic July 30, 009 Lecturer: Manindra Agrawal Scrie: Purushottam Kar 1 Integer Arithmetic Efficient recipes

More information

Multi-Round Influence Maximization

Multi-Round Influence Maximization Multi-Round Influence Maximization Lichao Sun 1, Weiran Huang 2, Philip S. Yu 1,2, Wei Chen 3, 1 University of Illinois at Chicago, 2 Tsinghua University, 3 Microsoft Research lsun29@uic.edu, huang.inbox@outlook.com,

More information

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03 Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg, Éva Tardos SIGKDD 03 Influence and Social Networks Economics, sociology, political science, etc. all have studied

More information

TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION

TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION Jiangping Wang and Dapeng Wu Department of Electrical and Computer Engineering University of Florida, Gainesville, FL 3611 Correspondence author: Prof.

More information

Content Delivery in Erasure Broadcast. Channels with Cache and Feedback

Content Delivery in Erasure Broadcast. Channels with Cache and Feedback Content Delivery in Erasure Broadcast Channels with Cache and Feedack Asma Ghorel, Mari Koayashi, and Sheng Yang LSS, CentraleSupélec Gif-sur-Yvette, France arxiv:602.04630v [cs.t] 5 Fe 206 {asma.ghorel,

More information

Depth versus Breadth in Convolutional Polar Codes

Depth versus Breadth in Convolutional Polar Codes Depth versus Breadth in Convolutional Polar Codes Maxime Tremlay, Benjamin Bourassa and David Poulin,2 Département de physique & Institut quantique, Université de Sherrooke, Sherrooke, Quéec, Canada JK

More information

WITH the recent advancements of information technologies,

WITH the recent advancements of information technologies, i Distributed Rumor Blocking with Multiple Positive Cascades Guangmo (Amo) Tong, Student Member, IEEE, Weili Wu, Member, IEEE, and Ding-Zhu Du, arxiv:1711.07412 [cs.si] 1 Dec 2017 Abstract Misinformation

More information

IN this paper, we consider the estimation of the frequency

IN this paper, we consider the estimation of the frequency Iterative Frequency Estimation y Interpolation on Fourier Coefficients Elias Aoutanios, MIEEE, Bernard Mulgrew, MIEEE Astract The estimation of the frequency of a complex exponential is a prolem that is

More information

arxiv: v2 [math.oc] 29 Jul 2016

arxiv: v2 [math.oc] 29 Jul 2016 Stochastic Frank-Wolfe Methods for Nonconvex Optimization arxiv:607.0854v [math.oc] 9 Jul 06 Sashank J. Reddi sjakkamr@cs.cmu.edu Carnegie Mellon University Barnaás Póczós apoczos@cs.cmu.edu Carnegie Mellon

More information

Essential Maths 1. Macquarie University MAFC_Essential_Maths Page 1 of These notes were prepared by Anne Cooper and Catriona March.

Essential Maths 1. Macquarie University MAFC_Essential_Maths Page 1 of These notes were prepared by Anne Cooper and Catriona March. Essential Maths 1 The information in this document is the minimum assumed knowledge for students undertaking the Macquarie University Masters of Applied Finance, Graduate Diploma of Applied Finance, and

More information

Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Problem with Stochastic Demand and Duration Limits

Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Problem with Stochastic Demand and Duration Limits Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Prolem with Stochastic Demand and Duration Limits Justin C. Goodson Jeffrey W. Ohlmann Barrett W. Thomas Octoer 2, 2012 Astract We develop

More information

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line Chapter 27: Inferences for Regression And so, there is one more thing which might vary one more thing aout which we might want to make some inference: the slope of the least squares regression line. The

More information

WITH the popularity of online social networks, viral. Boosting Information Spread: An Algorithmic Approach. arxiv: v3 [cs.

WITH the popularity of online social networks, viral. Boosting Information Spread: An Algorithmic Approach. arxiv: v3 [cs. 1 Boosting Information Spread: An Algorithmic Approach Yishi Lin, Wei Chen Member, IEEE, John C.S. Lui Fellow, IEEE arxiv:1602.03111v3 [cs.si] 26 Jun 2017 Abstract The majority of influence maximization

More information

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers Module 9: Further Numers and Equations Lesson Aims The aim of this lesson is to enale you to: wor with rational and irrational numers wor with surds to rationalise the denominator when calculating interest,

More information

Travel Grouping of Evaporating Polydisperse Droplets in Oscillating Flow- Theoretical Analysis

Travel Grouping of Evaporating Polydisperse Droplets in Oscillating Flow- Theoretical Analysis Travel Grouping of Evaporating Polydisperse Droplets in Oscillating Flow- Theoretical Analysis DAVID KATOSHEVSKI Department of Biotechnology and Environmental Engineering Ben-Gurion niversity of the Negev

More information

Greedy Maximization Framework for Graph-based Influence Functions

Greedy Maximization Framework for Graph-based Influence Functions Greedy Maximization Framework for Graph-based Influence Functions Edith Cohen Google Research Tel Aviv University HotWeb '16 1 Large Graphs Model relations/interactions (edges) between entities (nodes)

More information

CS 4120 Lecture 3 Automating lexical analysis 29 August 2011 Lecturer: Andrew Myers. 1 DFAs

CS 4120 Lecture 3 Automating lexical analysis 29 August 2011 Lecturer: Andrew Myers. 1 DFAs CS 42 Lecture 3 Automating lexical analysis 29 August 2 Lecturer: Andrew Myers A lexer generator converts a lexical specification consisting of a list of regular expressions and corresponding actions into

More information

Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process

Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process Wei Chen Microsoft Research Asia Beijing, China weic@microsoft.com Wei Lu University of British Columbia Vancouver,

More information

Modifying Shor s algorithm to compute short discrete logarithms

Modifying Shor s algorithm to compute short discrete logarithms Modifying Shor s algorithm to compute short discrete logarithms Martin Ekerå Decemer 7, 06 Astract We revisit Shor s algorithm for computing discrete logarithms in F p on a quantum computer and modify

More information

Santa Claus Schedules Jobs on Unrelated Machines

Santa Claus Schedules Jobs on Unrelated Machines Santa Claus Schedules Jobs on Unrelated Machines Ola Svensson (osven@kth.se) Royal Institute of Technology - KTH Stockholm, Sweden March 22, 2011 arxiv:1011.1168v2 [cs.ds] 21 Mar 2011 Abstract One of the

More information

Determinants of generalized binary band matrices

Determinants of generalized binary band matrices Determinants of generalized inary and matrices Dmitry Efimov arxiv:17005655v1 [mathra] 18 Fe 017 Department of Mathematics, Komi Science Centre UrD RAS, Syktyvkar, Russia Astract Under inary matrices we

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/24/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

More information

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS

EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS APPLICATIONES MATHEMATICAE 9,3 (), pp. 85 95 Erhard Cramer (Oldenurg) Udo Kamps (Oldenurg) Tomasz Rychlik (Toruń) EVALUATIONS OF EXPECTED GENERALIZED ORDER STATISTICS IN VARIOUS SCALE UNITS Astract. We

More information

In Search of Influential Event Organizers in Online Social Networks

In Search of Influential Event Organizers in Online Social Networks In Search of Influential Event Organizers in Online Social Networks ABSTRACT Kaiyu Feng 1,2 Gao Cong 2 Sourav S. Bhowmick 2 Shuai Ma 3 1 LILY, Interdisciplinary Graduate School. Nanyang Technological University,

More information

Bringing Order to Special Cases of Klee s Measure Problem

Bringing Order to Special Cases of Klee s Measure Problem Bringing Order to Special Cases of Klee s Measure Prolem Karl Bringmann Max Planck Institute for Informatics Astract. Klee s Measure Prolem (KMP) asks for the volume of the union of n axis-aligned oxes

More information

Online Influence Maximization

Online Influence Maximization Online Influence Maximization Siyu Lei University of Hong Kong Pokfulam Road, Hong Kong sylei@cs.hku.hk Reynold Cheng University of Hong Kong Pokfulam Road, Hong Kong ckcheng@cs.hku.hk Silviu Maniu Noah

More information

arxiv: v4 [cs.si] 21 May 2017

arxiv: v4 [cs.si] 21 May 2017 Submitted to manuscript 2017 No Time to Observe: Adaptive Influence Maximization with Partial Feedback Jing Yuan Shaojie Tang The University of Texas at Dallas arxiv:1609.00427v4 [cs.si] 21 May 2017 Although

More information

On Universality of Blow-up Profile for L 2 critical nonlinear Schrödinger Equation

On Universality of Blow-up Profile for L 2 critical nonlinear Schrödinger Equation On Universality of Blow-up Profile for L critical nonlinear Schrödinger Equation Frank Merle,, Pierre Raphael Université de Cergy Pontoise Institut Universitaire de France Astract We consider finite time

More information

1Number ONLINE PAGE PROOFS. systems: real and complex. 1.1 Kick off with CAS

1Number ONLINE PAGE PROOFS. systems: real and complex. 1.1 Kick off with CAS 1Numer systems: real and complex 1.1 Kick off with CAS 1. Review of set notation 1.3 Properties of surds 1. The set of complex numers 1.5 Multiplication and division of complex numers 1.6 Representing

More information

Scheduling Two Agents on a Single Machine: A Parameterized Analysis of NP-hard Problems

Scheduling Two Agents on a Single Machine: A Parameterized Analysis of NP-hard Problems Scheduling Two Agents on a Single Machine: A Parameterized Analysis of NP-hard Prolems Danny Hermelin 1, Judith-Madeleine Kuitza 2, Dvir Shatay 1, Nimrod Talmon 3, and Gerhard Woeginger 4 arxiv:1709.04161v1

More information

Analytic models for flash-based SSD performance when subject to trimming

Analytic models for flash-based SSD performance when subject to trimming Analytic models for flash-ased SSD performance when suject to trimming R. Verschoren and B. Van Houdt Department of Mathematics and Computer Science University of Antwerp - iminds Middelheimlaan 1, B-2020

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai Divide-and-Conquer Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, 33.4. CSE 6331 Algorithms Steve Lai Divide and Conquer Given an instance x of a prolem, the method works as follows: divide-and-conquer

More information

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables Minimizing a convex separale exponential function suect to linear equality constraint and ounded variales Stefan M. Stefanov Department of Mathematics Neofit Rilski South-Western University 2700 Blagoevgrad

More information

Branching Bisimilarity with Explicit Divergence

Branching Bisimilarity with Explicit Divergence Branching Bisimilarity with Explicit Divergence Ro van Glaeek National ICT Australia, Sydney, Australia School of Computer Science and Engineering, University of New South Wales, Sydney, Australia Bas

More information

CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms

CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms Tim Roughgarden November 5, 2014 1 Preamble Previous lectures on smoothed analysis sought a better

More information

Weak bidders prefer first-price (sealed-bid) auctions. (This holds both ex-ante, and once the bidders have learned their types)

Weak bidders prefer first-price (sealed-bid) auctions. (This holds both ex-ante, and once the bidders have learned their types) Econ 805 Advanced Micro Theory I Dan Quint Fall 2007 Lecture 9 Oct 4 2007 Last week, we egan relaxing the assumptions of the symmetric independent private values model. We examined private-value auctions

More information

Lecture 6 January 15, 2014

Lecture 6 January 15, 2014 Advanced Graph Algorithms Jan-Apr 2014 Lecture 6 January 15, 2014 Lecturer: Saket Sourah Scrie: Prafullkumar P Tale 1 Overview In the last lecture we defined simple tree decomposition and stated that for

More information

Online Supplementary Appendix B

Online Supplementary Appendix B Online Supplementary Appendix B Uniqueness of the Solution of Lemma and the Properties of λ ( K) We prove the uniqueness y the following steps: () (A8) uniquely determines q as a function of λ () (A) uniquely

More information

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms Tim Roughgarden March 9, 2017 1 Preamble Our first lecture on smoothed analysis sought a better theoretical

More information

Computational Tasks and Models

Computational Tasks and Models 1 Computational Tasks and Models Overview: We assume that the reader is familiar with computing devices but may associate the notion of computation with specific incarnations of it. Our first goal is to

More information

QUALITY CONTROL OF WINDS FROM METEOSAT 8 AT METEO FRANCE : SOME RESULTS

QUALITY CONTROL OF WINDS FROM METEOSAT 8 AT METEO FRANCE : SOME RESULTS QUALITY CONTROL OF WINDS FROM METEOSAT 8 AT METEO FRANCE : SOME RESULTS Christophe Payan Météo France, Centre National de Recherches Météorologiques, Toulouse, France Astract The quality of a 30-days sample

More information

The Capacity Region of 2-Receiver Multiple-Input Broadcast Packet Erasure Channels with Channel Output Feedback

The Capacity Region of 2-Receiver Multiple-Input Broadcast Packet Erasure Channels with Channel Output Feedback IEEE TRANSACTIONS ON INFORMATION THEORY, ONLINE PREPRINT 2014 1 The Capacity Region of 2-Receiver Multiple-Input Broadcast Packet Erasure Channels with Channel Output Feedack Chih-Chun Wang, Memer, IEEE,

More information

Non-Linear Regression Samuel L. Baker

Non-Linear Regression Samuel L. Baker NON-LINEAR REGRESSION 1 Non-Linear Regression 2006-2008 Samuel L. Baker The linear least squares method that you have een using fits a straight line or a flat plane to a unch of data points. Sometimes

More information

Upper Bounds for Stern s Diatomic Sequence and Related Sequences

Upper Bounds for Stern s Diatomic Sequence and Related Sequences Upper Bounds for Stern s Diatomic Sequence and Related Sequences Colin Defant Department of Mathematics University of Florida, U.S.A. cdefant@ufl.edu Sumitted: Jun 18, 01; Accepted: Oct, 016; Pulished:

More information

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement

Estimating a Finite Population Mean under Random Non-Response in Two Stage Cluster Sampling with Replacement Open Journal of Statistics, 07, 7, 834-848 http://www.scirp.org/journal/ojs ISS Online: 6-798 ISS Print: 6-78X Estimating a Finite Population ean under Random on-response in Two Stage Cluster Sampling

More information

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof)

SVETLANA KATOK AND ILIE UGARCOVICI (Communicated by Jens Marklof) JOURNAL OF MODERN DYNAMICS VOLUME 4, NO. 4, 010, 637 691 doi: 10.3934/jmd.010.4.637 STRUCTURE OF ATTRACTORS FOR (a, )-CONTINUED FRACTION TRANSFORMATIONS SVETLANA KATOK AND ILIE UGARCOVICI (Communicated

More information

Pseudo-automata for generalized regular expressions

Pseudo-automata for generalized regular expressions Pseudo-automata for generalized regular expressions B. F. Melnikov A. A. Melnikova Astract In this paper we introduce a new formalism which is intended for representing a special extensions of finite automata.

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network

Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network Minimizing Seed Set Selection with Probabilistic Coverage Guarantee in a Social Network Peng Zhang Purdue University zhan1456@purdue.edu Yajun Wang Microsoft yajunw@microsoft.com Wei Chen Microsoft weic@microsoft.com

More information

ECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap

ECS 253 / MAE 253, Lecture 15 May 17, I. Probability generating function recap ECS 253 / MAE 253, Lecture 15 May 17, 2016 I. Probability generating function recap Part I. Ensemble approaches A. Master equations (Random graph evolution, cluster aggregation) B. Network configuration

More information

A Sufficient Condition for Optimality of Digital versus Analog Relaying in a Sensor Network

A Sufficient Condition for Optimality of Digital versus Analog Relaying in a Sensor Network A Sufficient Condition for Optimality of Digital versus Analog Relaying in a Sensor Network Chandrashekhar Thejaswi PS Douglas Cochran and Junshan Zhang Department of Electrical Engineering Arizona State

More information

Chaos and Dynamical Systems

Chaos and Dynamical Systems Chaos and Dynamical Systems y Megan Richards Astract: In this paper, we will discuss the notion of chaos. We will start y introducing certain mathematical concepts needed in the understanding of chaos,

More information

On Two Class-Constrained Versions of the Multiple Knapsack Problem

On Two Class-Constrained Versions of the Multiple Knapsack Problem On Two Class-Constrained Versions of the Multiple Knapsack Problem Hadas Shachnai Tami Tamir Department of Computer Science The Technion, Haifa 32000, Israel Abstract We study two variants of the classic

More information

Characterization of the Burst Aggregation Process in Optical Burst Switching.

Characterization of the Burst Aggregation Process in Optical Burst Switching. See discussions, stats, and author profiles for this pulication at: http://www.researchgate.net/pulication/221198381 Characterization of the Burst Aggregation Process in Optical Burst Switching. CONFERENCE

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

A732: Exercise #7 Maximum Likelihood

A732: Exercise #7 Maximum Likelihood A732: Exercise #7 Maximum Likelihood Due: 29 Novemer 2007 Analytic computation of some one-dimensional maximum likelihood estimators (a) Including the normalization, the exponential distriution function

More information

Notes on MapReduce Algorithms

Notes on MapReduce Algorithms Notes on MapReduce Algorithms Barna Saha 1 Finding Minimum Spanning Tree of a Dense Graph in MapReduce We are given a graph G = (V, E) on V = N vertices and E = m N 1+c edges for some constant c > 0. Our

More information

Simple Examples. Let s look at a few simple examples of OI analysis.

Simple Examples. Let s look at a few simple examples of OI analysis. Simple Examples Let s look at a few simple examples of OI analysis. Example 1: Consider a scalar prolem. We have one oservation y which is located at the analysis point. We also have a ackground estimate

More information

arxiv: v1 [stat.ml] 28 Oct 2017

arxiv: v1 [stat.ml] 28 Oct 2017 Jinglin Chen Jian Peng Qiang Liu UIUC UIUC Dartmouth arxiv:1710.10404v1 [stat.ml] 28 Oct 2017 Abstract We propose a new localized inference algorithm for answering marginalization queries in large graphical

More information

WITH the popularity of online social networks, viral. Boosting Information Spread: An Algorithmic Approach

WITH the popularity of online social networks, viral. Boosting Information Spread: An Algorithmic Approach 1 Boosting Information Spread: An Algorithmic Approach Yishi Lin, Wei Chen Member, IEEE, John C.S. Lui Fellow, IEEE Abstract The majority of influence maximization (IM) studies focus on targeting influential

More information

1 Caveats of Parallel Algorithms

1 Caveats of Parallel Algorithms CME 323: Distriuted Algorithms and Optimization, Spring 2015 http://stanford.edu/ reza/dao. Instructor: Reza Zadeh, Matroid and Stanford. Lecture 1, 9/26/2015. Scried y Suhas Suresha, Pin Pin, Andreas

More information

Boosting. b m (x). m=1

Boosting. b m (x). m=1 Statistical Machine Learning Notes 9 Instructor: Justin Domke Boosting 1 Additive Models The asic idea of oosting is to greedily uild additive models. Let m (x) e some predictor (a tree, a neural network,

More information

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering Michigan Technological University Houghton, Michigan

More information

2 discretized variales approach those of the original continuous variales. Such an assumption is valid when continuous variales are represented as oat

2 discretized variales approach those of the original continuous variales. Such an assumption is valid when continuous variales are represented as oat Chapter 1 CONSTRAINED GENETIC ALGORITHMS AND THEIR APPLICATIONS IN NONLINEAR CONSTRAINED OPTIMIZATION Benjamin W. Wah and Yi-Xin Chen Department of Electrical and Computer Engineering and the Coordinated

More information

Bayesian inference with reliability methods without knowing the maximum of the likelihood function

Bayesian inference with reliability methods without knowing the maximum of the likelihood function Bayesian inference with reliaility methods without knowing the maximum of the likelihood function Wolfgang Betz a,, James L. Beck, Iason Papaioannou a, Daniel Strau a a Engineering Risk Analysis Group,

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

Polynomial Degree and Finite Differences

Polynomial Degree and Finite Differences CONDENSED LESSON 7.1 Polynomial Degree and Finite Differences In this lesson, you Learn the terminology associated with polynomials Use the finite differences method to determine the degree of a polynomial

More information

Superiorized Inversion of the Radon Transform

Superiorized Inversion of the Radon Transform Superiorized Inversion of the Radon Transform Gabor T. Herman Graduate Center, City University of New York March 28, 2017 The Radon Transform in 2D For a function f of two real variables, a real number

More information

Beating the Random Ordering is Hard: Inapproximability of Maximum Acyclic Subgraph

Beating the Random Ordering is Hard: Inapproximability of Maximum Acyclic Subgraph Beating the Random Ordering is Hard: Inapproximaility of Maximum Acyclic Sugraph VENKATESAN GURUSWAMI University of Washington venkat@cs.washington.edu RAJSEKAR MANOKARAN Princeton University rajsekar@cs.princeton.edu

More information

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL

MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL MobiHoc 2014 MINIMUM-SIZED INFLUENTIAL NODE SET SELECTION FOR SOCIAL NETWORKS UNDER THE INDEPENDENT CASCADE MODEL Jing (Selena) He Department of Computer Science, Kennesaw State University Shouling Ji,

More information

School of Business. Blank Page

School of Business. Blank Page Equations 5 The aim of this unit is to equip the learners with the concept of equations. The principal foci of this unit are degree of an equation, inequalities, quadratic equations, simultaneous linear

More information

Motivation: Can the equations of physics be derived from information-theoretic principles?

Motivation: Can the equations of physics be derived from information-theoretic principles? 10. Physics from Fisher Information. Motivation: Can the equations of physics e derived from information-theoretic principles? I. Fisher Information. Task: To otain a measure of the accuracy of estimated

More information

Reversal of regular languages and state complexity

Reversal of regular languages and state complexity Reversal of regular languages and state complexity Juraj Šeej Institute of Computer Science, Faculty of Science, P. J. Šafárik University Jesenná 5, 04001 Košice, Slovakia juraj.seej@gmail.com Astract.

More information

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Ying Xiao, Student Memer, IEEE, Krishnaiyan Thulasiraman, Fellow, IEEE, and Guoliang Xue, Senior Memer, IEEE Astract

More information

Heat Kernel Based Community Detection

Heat Kernel Based Community Detection Heat Kernel Based Community Detection Joint with David F. Gleich, (Purdue), supported by" NSF CAREER 1149756-CCF Kyle Kloster! Purdue University! Local Community Detection Given seed(s) S in G, find a

More information

Weak Keys of the Full MISTY1 Block Cipher for Related-Key Cryptanalysis

Weak Keys of the Full MISTY1 Block Cipher for Related-Key Cryptanalysis Weak eys of the Full MISTY1 Block Cipher for Related-ey Cryptanalysis Jiqiang Lu 1, Wun-She Yap 1,2, and Yongzhuang Wei 3,4 1 Institute for Infocomm Research, Agency for Science, Technology and Research

More information

CMPUT651: Differential Privacy

CMPUT651: Differential Privacy CMPUT65: Differential Privacy Homework assignment # 2 Due date: Apr. 3rd, 208 Discussion and the exchange of ideas are essential to doing academic work. For assignments in this course, you are encouraged

More information

ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA. Cross River State, Nigeria

ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA. Cross River State, Nigeria ON THE COMPARISON OF BOUNDARY AND INTERIOR SUPPORT POINTS OF A RESPONSE SURFACE UNDER OPTIMALITY CRITERIA Thomas Adidaume Uge and Stephen Seastian Akpan, Department Of Mathematics/Statistics And Computer

More information

Ordinal Optimization and Multi Armed Bandit Techniques

Ordinal Optimization and Multi Armed Bandit Techniques Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on

More information