Estimation of DNS Source and Cache Dynamics under Interval-Censored Age Sampling

Similar documents
Estimation of DNS Source and Cache Dynamics under Interval-Censored Age Sampling

Min Congestion Control for High- Speed Heterogeneous Networks. JetMax: Scalable Max-Min

On Superposition of Heterogeneous Edge Processes in Dynamic Random Graphs

Large-Scale Behavioral Targeting

Robust Lifetime Measurement in Large- Scale P2P Systems with Non-Stationary Arrivals

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Socket Programming. Daniel Zappala. CS 360 Internet Programming Brigham Young University

Modeling Residual-Geometric Flow Sampling

Introduction to ArcGIS Server Development

Measurements made for web data, media (IP Radio and TV, BBC Iplayer: Port 80 TCP) and VoIP (Skype: Port UDP) traffic.

Age-Optimal Constrained Cache Updating

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

Linear classifiers: Overfitting and regularization

Temporal Update Dynamics Under Blind Sampling Xiaoyong Li, Student Member, IEEE, Daren B. H. Cline, and Dmitri Loguinov, Senior Member, IEEE

Impact of traffic mix on caching performance

Impression Store: Compressive Sensing-based Storage for. Big Data Analytics

Application-Level Scheduling with Deadline Constraints

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

ARMA based Popularity Prediction for Caching in Content Delivery Networks

Communication constraints and latency in Networked Control Systems

Load Balancing in Distributed Service System: A Survey

Stochastic resource allocation Boyd, Akuiyibo

Probabilistic Near-Duplicate. Detection Using Simhash

Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming

Dynamic Power Allocation and Routing for Time Varying Wireless Networks

Efficient Nonlinear Optimizations of Queuing Systems

Queueing Theory I Summary! Little s Law! Queueing System Notation! Stationary Analysis of Elementary Queueing Systems " M/M/1 " M/M/m " M/M/1/K "

Gibbsian On-Line Distributed Content Caching Strategy for Cellular Networks

On Lifetime-Based Node Failure and Stochastic Resilience of Decentralized Peer-to-Peer Networks

Progress in Learning 3 vs. 2 Keepaway

Game Theoretic Approach to Power Control in Cellular CDMA

BS-assisted Task Offloading for D2D Networks with Presence of User Mobility

Cyber-Awareness and Games of Incomplete Information

EP2200 Course Project 2017 Project II - Mobile Computation Offloading

Sensor Tasking and Control

144 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 1, FEBRUARY A PDF f (x) is completely monotone if derivatives f of all orders exist

A Different Kind of Flow Analysis. David M Nicol University of Illinois at Urbana-Champaign

Sharing LRU Cache Resources among Content Providers: A Utility-Based Approach

Introduction to Queueing Theory

Lecture 2: Convergence of Random Variables

On Lifetime-Based Node Failure and Stochastic Resilience of Decentralized Peer-to-Peer Networks

The Multi-Path Utility Maximization Problem

Asymptotically Exact TTL-Approximations of the Cache Replacement Algorithms LRU(m) and h-lru

Energy-Efficient Broadcast Scheduling. Speed-Controlled Transmission Channels

Linear Discrimination Functions

Simulation. Where real stuff starts

Troubleshooting Replication and Geodata Service Issues

Agreement. Today. l Coordination and agreement in group communication. l Consensus

High Performance Centralized Content Delivery Infrastructure: Models and Asymptotics

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis

Fairness and Optimal Stochastic Control for Heterogeneous Networks

Simulation. Where real stuff starts

These are special traffic patterns that create more stress on a switch

Caching Policy Toward Maximal Success Probability and Area Spectral Efficiency of Cache-enabled HetNets

Mining Data Streams. The Stream Model. The Stream Model Sliding Windows Counting 1 s

Expectation maximization tutorial

ArcGIS Deployment Pattern. Azlina Mahad

On Allocating Cache Resources to Content Providers

Queuing Theory and Stochas St t ochas ic Service Syste y ms Li Xia

Exact Analysis of TTL Cache Networks

django in the real world

A STAFFING ALGORITHM FOR CALL CENTERS WITH SKILL-BASED ROUTING: SUPPLEMENTARY MATERIAL

Ad Placement Strategies

Time in Distributed Systems: Clocks and Ordering of Events

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

CS246 Final Exam. March 16, :30AM - 11:30AM

Logistic Regression Logistic

Analysis of Scalable TCP in the presence of Markovian Losses

Robust Network Codes for Unicast Connections: A Case Study

Computational Intelligence in Product-line Optimization

Troubleshooting Replication and Geodata Services. Liz Parrish & Ben Lin

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling

Prioritized Random MAC Optimization via Graph-based Analysis

IMBALANCED DATA. Phishing. Admin 9/30/13. Assignment 3: - how did it go? - do the experiments help? Assignment 4. Course feedback

Adaptive Contact Probing Mechanisms for Delay Tolerant Applications. Wang Wei, Vikram Srinivasan, Mehul Motani

Learning Algorithms for Minimizing Queue Length Regret

ANALYTICAL MODEL OF A VIRTUAL BACKBONE STABILITY IN MOBILE ENVIRONMENT

COMP 355 Advanced Algorithms

Theoretical study of cache systems

On Sample-Path Staleness in Lazy Data Replication

Discrete-event simulations

EE5585 Data Compression April 18, Lecture 23

Machine Learning Lecture Notes

Do we have a quorum?

Online Scheduling Switch for Maintaining Data Freshness in Flexible Real-Time Systems

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Statistical Ranking Problem

COMP 355 Advanced Algorithms Algorithm Design Review: Mathematical Background

Mining Data Streams. The Stream Model Sliding Windows Counting 1 s

Kernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin

COMP9334 Capacity Planning for Computer Systems and Networks

Elementary queueing system

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

Input-queued switches: Scheduling algorithms for a crossbar switch. EE 384X Packet Switch Architectures 1

Energy Cooperation and Traffic Management in Cellular Networks with Renewable Energy

NICTA Short Course. Network Analysis. Vijay Sivaraman. Day 1 Queueing Systems and Markov Chains. Network Analysis, 2008s2 1-1

Proactive Resource Allocation: Turning Predictable Behavior into Spectral Gain

Lecture and notes by: Alessio Guerrieri and Wei Jin Bloom filters and Hashing

SPLITTING AND MERGING OF PACKET TRAFFIC: MEASUREMENT AND MODELLING

AstroPortal: A Science Gateway for Large-scale Astronomy Data Analysis

Transcription:

Estimation of DNS Source and Cache Dynamics under Interval-Censored Age Sampling Di Xiao, Xiaoyong Li, Daren B.H. Cline, Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas A&M University April 18, 2018 1

Agenda Introduction Notation Models Passive Measurement (Chameleon) Active Measurement (Shark) Experiments 2

Introduction Caches are important parts of many distributed systems in today s Internet Search engines Wireless mobile networks P2P structures CDNs Two common eviction policies Capacity-based: unpopular elements get removed TTL-based: elements are evicted only upon expiration TTL-based policy is more suitable when object staleness is of primary concern 3

Introduction DNS has long used TTL-based policy However, staleness of records was never considered Instead, hit rate h was the sole metric of performance for many years With wide adoption of dynamic DNS and CDNs, many authoritative domains frequently change IP addresses Simply maximizing h, essentially setting TTL to infinity, is not a meaning pursuit The tradeoff between h and freshness f may be of interest in certain applications Unfortunately, this interplay has not been studied 4

Introduction Besides modeling f, it is important to develop methods for estimating this value Applications may aim to minimize staleness for a given hit rate, or maximize h for a given f Researchers and network admins might be interested in measuring performance of existing systems to which they have no direct access Approaches to sampling f Passive: observer is internal to the cache, but external to the source Active: observer is external to both 5

Agenda Introduction Notation Models Passive Measurement (Chameleon) Active Measurement (Shark) Experiments 6

Notation Source experiences random updates via process N U Clients query resolver via process N Q TTLs are iid and follow distribution F T (x) Interplay between N Q and F T (x) defines download process N D inter-query delays with CDF F Q (x) inter-update delays with CDF F U (x) inter-download delays 7

Agenda Introduction Notation Models Passive Measurement (Chameleon) Active Measurement (Shark) Experiments 8

Models Define update age A U (t) and update residual R U (t) For t, they have the residual distribution of F U (x) We call this function G U (x) Neither A U (t) nor R U (t) is directly measurable! Assuming T F T (x) is a random TTL, hit rate was already shown in previous work as Theorem: Assuming R U G U (x) is a random update residual, cache replies are fresh with probability 9

Models Theorem: Increasing E[T] produces h 1 and f 0. Making Q stochastically smaller yields h 1 and f E[min(R U,T)]/E[T] from above Tradeoff between h and f Simulation with Pareto U, T, Q with E[U] = 20 sec 10

Models It is common to assume queries arrive from a large number of users and N Q is Poisson, where Let G T (x) be the residual distribution of F T (x) and R T G T (x) is a random residual TTL Theorem: Under Poisson queries, freshness is given by f = 1 h(1 p), where p := P(R T <R U ) Note that f requires h and p, the latter of which is significantly more difficult to estimate as it needs the distribution of both R U and R T Since G T (x) is relatively easy to obtain, the main focus of the paper is on sampling G U (x) and F U (x) 11

Roadmap Passive measurement is the cache estimating f with access to all information except G U (x) E.g., an adaptive cache that trades spare bandwidth to maintain f at some desired threshold Active measurement is an observer estimating f only knowing residual TTLs R T (t) E.g., network administrators / researchers study replication efficiency and diagnose potential problems with bandwidth consumption or staleness Either way, it is useful to also recover F U (x) to characterize the source 12

Agenda Introduction Notation Models Passive Measurement (Chameleon) Active Measurement (Shark) Experiments 13

Passive Measurement The major challenge is to remotely estimate G U (x) assuming random inter-download delay Best previous method is M 6 from [Li 2015] Drawbacks of M 6 Slow convergence speed Quadratic computation time Non-concave estimator Our method Chameleon improves in all aspects 14

Passive Measurement It produces upper/lower bounds on A U (d k ) Based on detecting modification of records by comparing them at download points d k and d k 1 Generates one bound at each download point d k Example: Suppose d (k) is the last point before d k that detected a modification to the record Upper bound Lower bound 15

EM (Expectation Maximization) Estimator Problem maps to interval-censored observation Naïve method inspired by [Turnbull 1976], which is asymptotically accurate as n However, complexity is O(n 2 ) per iteration We propose an O(n) time/space algorithm Further notice that G U (x) is a concave function Which prior methods in this field cannot guarantee We thus propose a new concave EM estimator Requires little additional CPU cost Allows recovery of F U (x) from G U (x) 16

Agenda Introduction Notation Models Passive Measurement (Chameleon) Active Measurement (Shark) Experiments 17

Active Measurement Without access to the cache, the observer needs to send probes to local resolver and remotely estimate all metrics No previous work has studied this problem Probes are iterative queries that don t pollute the cache Sampling points A process N S at points {s k } with iid sample delays S k that follow distribution F S (x) Two scenarios for TTL Constant T : problem reduces to passive measurement Random T : our focus next 18

Active Measurement Main challenge is we cannot see download points d k, which makes passive techniques inapplicable We thus propose a new algorithm Shark For each sample point s k, we derive probabilistic bounds on d k, which allows interval-censoring of update age A U (d k ) We then modify our concave EM to operate with probabilistic intervals This usually requires a huge number of samples to be processed; however, our EM is fast enough to handle billions of points on commodity machines 19

Agenda Introduction Notation Models Passive Measurement (Chameleon) Active Measurement (Shark) Experiments 20

Experiment Setup Passive scenario An authoritative server A A local DNS server that resolves at A A background traffic generator from process N Q Active scenario We randomly selected a recursive server R from our port-53 UDP scan of the Internet, making sure it was stable and compliant with source-provided TTLs Added an observer that sends probes to R 21

Experiments Performance of our fast EM (Algorithm 1) Scales no worse than linearly in n Handles 1 billion observations in 30 sec Estimation of simple parameters Both Chameleon and Shark are asymptotically accurate 22

Experiments Passive recovery of G U (x) with 10K samples Pareto F U (x) with α = 3 and E[U] = 20 sec M 6 is poorly suited for such small sample size Noisy M 6 Chameleon 23

Experiments Passive recovery of F U (x) with 10K samples Non-concave EM produce a bunch of random points that cannot be a CDF function 24

Experiments Active recovery of G U (x) with 10K samples Pareto and uniform F U (x) with E[U] = 20 sec 25

Experiments Estimation of complex parameters Although Chameleon operates with more information, Shark comes pretty close to matching its accuracy 26

Thank you! Questions? 27