Robust Lifetime Measurement in Large- Scale P2P Systems with Non-Stationary Arrivals

Similar documents
Robust Lifetime Measurement in Large-Scale P2P Systems with Non-Stationary Arrivals

On Superposition of Heterogeneous Edge Processes in Dynamic Random Graphs

144 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 1, FEBRUARY A PDF f (x) is completely monotone if derivatives f of all orders exist

Min Congestion Control for High- Speed Heterogeneous Networks. JetMax: Scalable Max-Min

Modeling Residual-Geometric Flow Sampling

On Lifetime-Based Node Failure and Stochastic Resilience of Decentralized Peer-to-Peer Networks

Probabilistic Near-Duplicate. Detection Using Simhash

On Static and Dynamic Partitioning Behavior of Large-Scale Networks

On Lifetime-Based Node Failure and Stochastic Resilience of Decentralized Peer-to-Peer Networks

Modeling Heterogeneous User Churn and Local Resilience of Unstructured P2P Networks

On Delay-Independent Diagonal Stability of Max-Min Min Congestion Control

Unstructured P2P Link Lifetimes Redux

Estimation of DNS Source and Cache Dynamics under Interval-Censored Age Sampling

End-to-end Estimation of the Available Bandwidth Variation Range

Statistics: Learning models from data

Performance Analysis of Priority Queueing Schemes in Internet Routers

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "

RESILIENCE of distributed peer-to-peer (P2P) networks

6.435, System Identification

Quantitative Estimation of the Performance Delay with Propagation Effects in Disk Power Savings

Statistical Analysis and Distortion Modeling of MPEG-4 FGS

Clustering non-stationary data streams and its applications

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Understanding Disconnection and Stabilization of Chord

Variance reduction techniques

Unstructured P2P Link Lifetimes Redux

Modeling Randomized Data Streams in Caching, Data Processing, and Crawling Applications

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Doing It Now or Later

How to deal with uncertainties and dynamicity?

Process Scheduling for RTS. RTS Scheduling Approach. Cyclic Executive Approach

CS 700: Quantitative Methods & Experimental Design in Computer Science

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden

PIQI-RCP: Design and Analysis of Rate-Based Explicit Congestion Control

arxiv: v1 [math.ho] 25 Feb 2008

Intro to Queueing Theory

Online Scheduling Switch for Maintaining Data Freshness in Flexible Real-Time Systems

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal

Overall Plan of Simulation and Modeling I. Chapters

Experimental Investigation of Inertial Force Control for Substructure Shake Table Tests

Chapter 6: CPU Scheduling

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974

Discrete-event simulations

Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming

Optimal Utilization Bounds for the Fixed-priority Scheduling of Periodic Task Systems on Identical Multiprocessors. Sanjoy K.

Section 7.5: Nonstationary Poisson Processes

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

A Task-Based Model for the Lifespan of Peer-to-Peer Swarms

Simulation & Modeling Event-Oriented Simulations

Statistics 100A Homework 5 Solutions

Multi-Hop Probing Asymptotics in Available Bandwidth Estimation: Stochastic Analysis

Power Controlled FCFS Splitting Algorithm for Wireless Networks

Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data

Load Balancing in Distributed Service System: A Survey

Uniform random numbers generators

Continuous-Model Communication Complexity with Application in Distributed Resource Allocation in Wireless Ad hoc Networks

DURING the last decade, peer-to-peer (P2P) networks

An Operational Evaluation of the Ground Delay Program (GDP) Parameters Selection Model (GPSM)

Dynamic Bin Packing for On-Demand Cloud Resource Allocation

Stochastic Optimization for Undergraduate Computer Science Students

Wireless Internet Exercises

Non-Monotonic Transformations of Random Variables

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Dynamic resource sharing

Nonparametric Density Estimation

MAT SYS 5120 (Winter 2012) Assignment 5 (not to be submitted) There are 4 questions.

Performance Evaluation of Queuing Systems

variability in size, see e.g., [10]. 2 BTD is similar to slow-down used for scheduling problems, e.g., [11], [12].

Real-Time Estimation of GPS Satellite Clocks Based on Global NTRIP-Streams. André Hauschild

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1>

Module 5: CPU Scheduling

What s for today. Continue to discuss about nonstationary models Moving windows Convolution model Weighted stationary model

NICTA Short Course. Network Analysis. Vijay Sivaraman. Day 1 Queueing Systems and Markov Chains. Network Analysis, 2008s2 1-1

Load Regulating Algorithm for Static-Priority Task Scheduling on Multiprocessors

Time in Distributed Systems: Clocks and Ordering of Events

STABLE AND SCALABLE CONGESTION CONTROL FOR HIGH-SPEED HETEROGENEOUS NETWORKS

Semi-asynchronous. Fault Diagnosis of Discrete Event Systems ALEJANDRO WHITE DR. ALI KARIMODDINI OCTOBER

Figure A1: Treatment and control locations. West. Mid. 14 th Street. 18 th Street. 16 th Street. 17 th Street. 20 th Street

Schedulability analysis of global Deadline-Monotonic scheduling

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

On the Tradeoff Between Resilience and Degree Overload in Dynamic P2P Graphs

Testing Problems with Sub-Learning Sample Complexity

Indirect Sampling in Case of Asymmetrical Link Structures

Optimal Measurement-based Pricing for an M/M/1 Queue

Using the Empirical Probability Integral Transformation to Construct a Nonparametric CUSUM Algorithm

Impact of Cross Traffic Burstiness on the Packet-scale Paradigm An Extended Analysis

Queueing Theory II. Summary. ! M/M/1 Output process. ! Networks of Queue! Method of Stages. ! General Distributions

Fair Scheduling in Input-Queued Switches under Inadmissible Traffic

Local and Global Stability of Symmetric Heterogeneously-Delayed Control Systems

An Online Algorithm for Maximizing Submodular Functions

f (1 0.5)/n Z =

Discrete Random Variables

Simulation of Process Scheduling Algorithms

Capacity and Scheduling in Small-Cell HetNets

SOLUTIONS IEOR 3106: Second Midterm Exam, Chapters 5-6, November 8, 2012

Capturing Network Traffic Dynamics Small Scales. Rolf Riedi

Efficient Network-wide Available Bandwidth Estimation through Active Learning and Belief Propagation

Analysis of the Durability of Replicated Distributed Storage Systems

Task Reweighting under Global Scheduling on Multiprocessors

Simulation. Where real stuff starts

Transcription:

Robust Lifetime Measurement in Large- Scale P2P Systems with Non-Stationary Arrivals Xiaoming Wang Joint work with Zhongmei Yao, Yueping Zhang, and Dmitri Loguinov Internet Research Lab Computer Science and Engineering Department Texas A&M UniversityMay Sept. 10, 2009 1

Agenda Introduction Background and motivation Previous approaches CBM and RIDE Proposal of new method U-RIDE Experimental Evaluation Conclusion 2

Introduction Peer-to-peer networks are popular platforms for many Internet applications Characterizing these systems is important for theoretic modeling of resilience, throughput, etc. However, many existing P2P systems are fully distributed, large-scale, and highly dynamic Therefore, measuring these systems is challenging Limit of bandwidth and lack of infrastructural support The goal of this work is to address one of such challenging tasks - measuring the distribution of user lifetimes 3

Measuring Lifetime Distribution An instance of the lifetime is the duration of a user s appearance in the system Our target metric Let L be the lifetime of a random user Define F L (x) = P (L x) to be the CDF of the lifetime One straightforward solution is to collect lifetime instances by periodically probing users And then compute empirical distribution E(x) to estimate F L (x) Due to hardware constraint and security concern, we cannot probe all users with infinitely small intervals 4

Measuring Lifetime Distribution 2 In large-scale distributed systems, it is non-trivial to measure the exact lengths of lifetime instances We need a definition for accuracy Let be the probing interval and define discrete point x i = i Estimator E(x) is unbiased if it can correctly reproduce the distribution of lifetime L at all discrete points {x i } for any >0: Our target for accuracy:e(x i ) = F L (x i ) Probing traffic could be significant for large systems Our target for overhead: small amount of probing traffic 5

Agenda Introduction Background and motivation Previous approaches CBM and RIDE Proposal of new method U-RIDE Experimental Evaluation Conclusion 6

Previous Methods CBM Create-Based Method (CBM) uses an observation window of 2T Within the window, it takes a snapshot of the system every time units To avoid sampling bias, CBM divides the window into two halves and only includes lifetime samples that satisfy the following conditions Appear during the first half of the window Disappear within the window Have a lifetime no longer than T 7

Example of CBM Sampling P 3 P 1 P 2 P 5 P 4 start t 0 t 0 +2T Observation window 2T valid samples {3, 4 } appears before t 0 disappears after t 0 +2T longer than T 8

Previous Methods RIDE Wang et. al. [INFOCOM 07] proved that CBM can be arbitrarily biased in estimating the lifetime distribution ResIDual-based Estimator (RIDE) was proposed to address the issue in CBM Take one snapshot of the system at time t 0 and record alive users in S 0 Probe these alive users to obtain their residuals and compute distribution H(x) Wang et. al. also proved that under stationary systems, RIDE can produce an unbiased estimator Moreover, RIDE can be configured to incur much less traffic overhead than CBM 9

Example of RIDE Sampling interval P 4 P 5 P 3 t 0 P 1 P 2 R 3 R 4 R 5 t 0 +2 t 0 +4 H(x j ) P(j R(t 0 )<(j+1) ) exact invalid samples S 0 R(t 0 ) P 3 P 4 P 5 [4,5 ) [2,3 ) [5,6 ) 10

Agenda Introduction Background and motivation Previous approaches CBM and RIDE Proposal of new method U-RIDE Experimental Evaluation Conclusion 11

Motivation While RIDE can achieve both high accuracy and low overhead RIDE relies on one critical assumption stationarity of the user arrival process However, many systems exhibit diurnal arrival/departure patterns or any other nonstationary dynamics Therefore, we need to investigate whether RIDE can achieve the same benefit mentioned earlier To do so, we first propose a model for nonstationary arrivals 12

Non-Stationary User Churn Our proposed model models each user with an alternating process ON (online) and OFF (offline) states Moreover, the time is partitioned into bins of size τ For example, a day is a bin OFF states are split into two sub-states REST: the delay between the user s departure and midnight of the day when he/she joins the system again WAIT: the delay from midnight until the user s arrival into the system within a given day Non-Stationary-Periodic Churn Model (NS-PCM) 13

Example of User Process midnight A L WAIT ON REST WAIT For any time t, define to be the bin offset (remainder) of t For x in [0, τ], define F A (x) = P (A x) to be the CDF of the arrival time A τ 14

Gnutella and NS-PCM NS-PCM mimics the arrival process of Gnutella Actual data measured from Gnutella Simulations generated by NS-PCM 15

Analysis of RIDE in NS-PCM Theorem 1: Under NS-PCM, residual lifetime distribution H(x,t 0 ) is a periodic function where Recall in a stationary model, RIDE depends on Therefore, RIDE is biased in NS-PCM Differentiating H(x,t 0 ) gives no simple formula of F L (x) 16

RIDE under NS-PCM Power-law non-monotonic! Periodic Simulations indicate that RIDE deviates from F L (x) Its estimation does not even represent a valid CDF function 17

Proposed Sampling Algorithm - U-RIDE Instead of just one snapshot at time t 0, we crawl the system at multiple time points t m (m=1 M) Sampling schedule Offset schedule For the m-th snapshot, U-RIDE keeps track of captured alive users N R (t m ): the number of alive users in this snapshot N R (x, t m ): the number of them with residual x Then, compute the ratio for each x j 18

Example of U-RIDE Sampling t 1 t 2 t 3 valid samples invalid samples interval 19

Proposed Sampling Algorithm - U-RIDE We call a schedule uniform if offset schedule O M forms a uniform distribution in [0, τ] Theorem 2: Under a uniform schedule, the ratio r An unbiased estimator is given by where h * (x) is the numerical derivative of ratio function E H* (x) 20

U-RIDE under NS-PCM Simulations show an exact match between U-RIDE estimation and F L (x) Power-law Periodic 21

Overhead and Subsampling Residual sampling supports ²-subsampling uniformly select ² percent of valid samples We also prove that CBM does not support subsampling Wang et. al. [INFOCOM 07] have proved that with ²-subsampling, RIDE can reduce overhead by a factor of over 100 compared to CBM The question is whether U-RIDE can save the same amount of bandwidth 22

Overhead and Subsampling 2 Theorem 3: Overhead ratio of U-RIDE and RIDE q UR is where and p is a scheduling parameter This result shows that U-RIDE incurs more traffic overhead than RIDE in the original form However, by using a smaller ², U-RIDE s overhead can always be upper bounded within that of RIDE In fact, we can choose proper ² based on the size of the initial set S 0 so that ² S 0 is fixed at some predetermined value 23

Agenda Introduction Background and motivation Previous approaches CBM and RIDE Proposal of new method U-RIDE Experimental Evaluation Conclusion 24

Gnutella Measurement RIDE actual 25

Gnutella Measurement U-RIDE actual 26

Conclusion We studied the tradeoff between accuracy and scalability in P2P systems with non-stationary arrivals We proposed a novel non-stationary churn model NS- PCM We introduced a simple algorithm U-RIDE that can achieve both accuracy and scalability Future work includes Applying NS-PCM to understand how it affects existing results in P2P Extending U-RIDE for measuring the arrival process of P2P systems 27

The End Thank you! 28