Design of Probing Experiments for Loss Tomography

Similar documents
Fisher Information-based Experiment Design for Network Tomography

certain class of distributions, any SFQ can be expressed as a set of thresholds on the sufficient statistic. For distributions

Node Failure Localization: Theorem Proof

ECE 275A Homework 6 Solutions

Approximating the Covariance Matrix with Low-rank Perturbations

Node Failure Localization via Network Tomography

5682 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 12, DECEMBER /$ IEEE

Brief Review on Estimation Theory

Node Failure Localization in Communication Networks via Network Tomography

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

A Polynomial-Time Algorithm for Pliable Index Coding

Node Failure Localization via Network Tomography

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Network Performance Tomography

Layer 0. α 1. Path P(0,2) Layer 1. α 2 α 3. Layer 2. α 4. α 5. Layer 3 = Receiver Set

A General Overview of Parametric Estimation and Inference Techniques.

Performance Evaluation. On optimal monitor placement for localizing node failures via network tomography

Long-Run Covariability

Efficient Nonlinear Optimizations of Queuing Systems

ISyE 691 Data mining and analytics

Continuous-Model Communication Complexity with Application in Distributed Resource Allocation in Wireless Ad hoc Networks

SPARSE signal representations have gained popularity in recent

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Submitted to the Brazilian Journal of Probability and Statistics

Lecture Notes 1: Vector spaces

Chapter 3. Point Estimation. 3.1 Introduction

Linear Regression and Its Applications

10. Smooth Varieties. 82 Andreas Gathmann

The properties of L p -GMM estimators

Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications

High-Rank Matrix Completion and Subspace Clustering with Missing Data

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

14 Singular Value Decomposition

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach

Auxiliary signal design for failure detection in uncertain systems

Active Measurement for Multiple Link Failures Diagnosis in IP Networks

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

ECE521 week 3: 23/26 January 2017

Lecture 3 September 1

Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data

Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Classical Estimation Topics

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

Robust Network Codes for Unicast Connections: A Case Study

Does Better Inference mean Better Learning?

Distributed Optimization over Random Networks

Lecture 7 Introduction to Statistical Decision Theory

Estimating Internal Link Loss Rates Using Active Network Tomography

Least Sparsity of p-norm based Optimization Problems with p > 1

Efficient Monte Carlo computation of Fisher information matrix using prior information

PERFORMANCE COMPARISON OF DATA-SHARING AND COMPRESSION STRATEGIES FOR CLOUD RADIO ACCESS NETWORKS. Pratik Patil, Binbin Dai, and Wei Yu

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Routing. Topics: 6.976/ESD.937 1

Linear Algebra March 16, 2019

Zero controllability in discrete-time structured systems

Inference in Networking Systems with Designed Measurements

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

CS281 Section 4: Factor Analysis and PCA

Convergence Rate for Consensus with Delays

Understanding the Capacity Region of the Greedy Maximal Scheduling Algorithm in Multi-hop Wireless Networks

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient

10. Linear Models and Maximum Likelihood Estimation

Rank Determination for Low-Rank Data Completion

On Two Class-Constrained Versions of the Multiple Knapsack Problem

SIGNAL STRENGTH LOCALIZATION BOUNDS IN AD HOC & SENSOR NETWORKS WHEN TRANSMIT POWERS ARE RANDOM. Neal Patwari and Alfred O.

MS&E 226: Small Data

ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM

An Interpretation of the Moore-Penrose. Generalized Inverse of a Singular Fisher Information Matrix

1 Lyapunov theory of stability

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

A Few Notes on Fisher Information (WIP)

CS 246 Review of Linear Algebra 01/17/19

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Introduction to Machine Learning CMU-10701

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Transductive Experiment Design

Topological properties

COMP 558 lecture 18 Nov. 15, 2010

SIGNAL STRENGTH LOCALIZATION BOUNDS IN AD HOC & SENSOR NETWORKS WHEN TRANSMIT POWERS ARE RANDOM. Neal Patwari and Alfred O.

Efficient MCMC Samplers for Network Tomography

EE 550: Notes on Markov chains, Travel Times, and Opportunistic Routing

Frames and Phase Retrieval

Notes on Tabular Methods

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Interactive Interference Alignment

The Skorokhod reflection problem for functions with discontinuities (contractive case)

NP Completeness and Approximation Algorithms

Dissertation Defense

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Gaussian Estimation under Attack Uncertainty

The weighted spectral distribution; A graph metric with applications. Dr. Damien Fay. SRG group, Computer Lab, University of Cambridge.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Transcription:

Design of Probing Experiments for Loss Tomography Ting He, Chang Liu, Ananthram Swami, Don Towsley, Theodoros Salonidis, Andrei Iu. Bejan, and Paul Yu IBM T. J. Watson Research Center, Yorktown Heights, NY, USA. {the, tsaloni}@us.ibm.com University of Massachusetts, Amherst, MA, USA. {cliu, towsley}@cs.umass.edu Army Research Laboratory, Adelphi, MD, USA. {ananthram.swami, paul.l.yu}.civ@mail.mil University of Cambridge, Cambridge, UK. aib29@cam.ac.uk Abstract Network tomography aims to infer the individual performance of networked elements (e.g., links) using aggregate measurements on end-to-end paths. Previous work on network tomography focuses primarily on developing estimators using the given measurements, while the design of measurements is often neglected. We fill this gap by proposing a framework to design probing experiments with focus on probe allocation, and applying it to packet loss tomography. Based on the Fisher Information Matrix (FIM), we design the distribution of probes across paths to maximize the best accuracy of unbiased estimators, asymptotically achievable by the maximum likelihood estimator. We consider two widely-adopted objective functions: determinant of the inverse FIM (D-optimality) and trace of the inverse FIM (A-optimality). We also extend the A-optimal criterion to incorporate heterogeneity in link weights. Under certain conditions on the FIM, satisfied by loss tomography, we derive explicit expressions for both objective functions. When the number of probing paths equals the number of links, these lead to closed-form solutions for the optimal design; when there are more paths, we develop a heuristic to select a subset of paths and optimally allocate probes within the subset. Observing the dependency of the optimal design on unknown parameters, we further propose an algorithm that iteratively updates the design based on parameter estimates, which converges to the design based on true parameters as the number of probes increases. Using packet-level simulations on real datasets, we verify that the proposed design effectively reduces estimation error compared with the common approach of uniformly distributing probes. Keywords: Network Tomography; Experiment Design; Fisher Information Matrix I. INTRODUCTION Network management in complex networks (e.g., MANETcellular hybrid networks, coalition networks) often suffers from inefficiencies imposed by protocol/policy barriers between different administration domains, where one notable example is the lack of common monitoring services that provide global state of all the networked components (e.g., links). This limitation motivates the need for external approaches that allow one domain to infer the internal state (e.g., link loss rates) of another domain by measuring its external performance Research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence and was accomplished under Agreement Number W911NF-6-3-1. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies of the U.S. Army Research Laboratory, the U.S. Government, the U.K. Ministry of Defence or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. An extended version of this work has been presented at SIGMETRICS 15 [9]. (e.g., end-to-end losses between a set of vantage points). The methodology of such inference is called network tomography [5]. Network tomography has been an active research area in the recent past. Compared with the approach of directly measuring the performance at individual network components, it provides an alternative approach that does not require privileged access to the components. The challenge is that since the measurements are generally functions of the states of multiple components, one has to invert these functions. Moreover, the states of interest are usually persistent performance indicators such as mean delays and packet loss rates, while the measurements are functions of the delay/loss instances, and thus the inversion has to be robust against randomness in the measurements. By modeling the performance of each link as a random variable with a (partially) unknown probability distribution, one can apply statistical techniques to estimate the parameter of this distribution from path measurements [4], [7], [11]. While most existing work focuses on designing estimators, the accuracy of estimation is fundamentally bounded by the amount of information contained in measurements. It is crucial that the probing experiments generate measurements that are most informative for estimating link parameters. It is not straightforward to quantify the information in measurements. On the one hand, measurements from different paths provide different amounts of information as they traverse different sets of links, each exhibiting a different level of uncertainty; on the other hand, measurements from a single (multi-hop) path alone do not provide sufficient information for uniquely determining link performances. To address this issue, we apply a notion from estimation theory called the Fisher Information Matrix (FIM) [14]. The FIM combines knowledge of paths and link parameters into a single measure of how much information a measurement provides for the parameter of interest. By the Cramér-Rao bound (CRB) [14], the inverse of the FIM establishes a lower bound on the error covariance matrix of any unbiased estimator. Based on the FIM, an intuitive formulation of experiment design is to allocate probes on paths to maximize the total information. Turning this intuition into a precise formulation requires an objective function that maps a matrix (FIM) to a scalar that can be uniquely optimized. The theory of optimal experiment design [2] has established a set of such objective

functions. In particular, maximizing the determinant of FIM (aka D-optimality) leads to a design that minimizes (a bound on) the volume of the error ellipsoid, and minimizing the trace of the inverse FIM (aka A-optimality) leads to a design that minimizes (a bound on) the average mean squared error (MSE). Both objective functions lead to convex optimization problems that can, in theory, be solved to obtain the optimal experiment design [3]. Solving these optimizations for practical networks, however, is highly nontrivial, as its solution space (i.e., all possible probe allocations) has a dimension that is at least the size of the network. In this work, we develop efficient solutions for tomographic experiment design using the above objective functions, and apply our results to two concrete tomography problems with multiplicative/additive link metrics. A. Related Work Based on the model of link metrics, existing work can be classified into algebraic tomography and statistical tomography. Algebraic tomography models each link metric as an unknown constant (e.g., fixed link delay) and each path measurement as a deterministic function of link metrics (e.g., fixed path delay). The goal of experiment design for algebraic tomography is to construct paths whose measurements can uniquely determine link metrics, e.g., n linearly independent paths for an n-link network [13]. Statistical tomography models each link metric as a random variable with a (partially) unknown probability distribution, and applies various estimation techniques to infer the distribution from path measurements. When multicast is supported, techniques have been proposed to estimate link loss rates and delays from multicast losses and delays [4], [7], [11]. Similar results have been obtained for multi-source measurements [1]. Variants based on subtree, unicast, and striped unicast have also been developed to improve the flexibility of probing [16], [6]. Most existing work in statistical tomography has focused on developing estimators, while the problem of experiment design is often ignored. Unlike algebraic tomography where it suffices to find paths that result in a unique solution of link metrics, statistical tomography also needs to deal with randomness in link metrics and possibly measurement noise. There has been a rich theory of experiment design for general statistical inference, which casts the problem as an optimization of a set of design objectives that capture various aspects of estimation accuracy [2]. The approach has recently been applied to design experiments for network monitoring, e.g., to design link sampling rates for flow size estimation [17], or design probing experiments for estimating link delays/loss rates from delays/losses of multicast or correlated unicast probes [8], [18]. In particular, [8], [18] measure the quality of an experiment design by the FIM and try to design probing experiments such that certain optimality criteria based on the FIM can be optimized (A-optimality in [8], D-optimality in [18]). Their solutions, however, either rely on numerical solvers [18] or a coarse approximation that ignores off-diagonal elements of the FIM [8]. We have a similar goal of designing the optimal allocation of probes based on certain functions of the FIM that capture the overall estimation accuracy, but we identify special structures of the objective functions that allow for exact, closed-form solutions. B. Summary of Contributions Given a fixed number of probes and a set of measurement paths, we want to allocate the probes to the paths so that the measurements can provide the most accurate estimate of the link parameters of interest. Our contributions are: 1) a general experiment design framework for network tomography that uses the FIM to allocate probes across paths; 2) explicit formulas for the design objectives for D- optimality and A-optimality, with closed-form solutions for loss tomography; 3) closed-form solutions for optimal probe allocation when measurement paths form a basis of the link space, and a two-step heuristic for general cases; 4) an iterative design algorithm that resolves dependency on unknown link parameters, which converges to the design based on the true parameters with high probability; 5) packet-level simulations based on parameters extracted from a real network, which show that the proposed design can effectively reduce the MSE compared with uniform probing. The rest of the paper is organized as follows. Section II formulates the problem of experiment design for network tomography. Section III introduces the FIM and its basic properties. Section IV defines objectives of experiment design, and Section V presents algorithms to optimize the objectives. Section VI evaluates the proposed design algorithms via simulations based on real data. Then Section VII concludes the paper. See the appendix of the full version [9] for proofs. II. A. Network Model PROBLEM FORMULATION Let G = (V, L) denote a network with nodes V and links L. Each link l L is associated with a performance metric (e.g., delay, loss) that varies stochastically according to a distribution with unknown parameter θ l. Let P be a given set of candidate probing paths in G. Each path p y P consists of one or more pairwise adjacent links in 1 G. We assume that the monitoring system can inject probes on all paths in P and observe their end-to-end performance. We also introduce a P matrix A := [A y,l ] defined by P, called the measurement matrix, where A y,l = 1 if link l is on path p y and A y,l = otherwise. Without loss of generality, we assume that each link is on at least one path in P. Additional assumptions on A (and hence P ) will be introduced later as needed. At run time, probes are injected on paths selected according to our experiment design. We consider a probabilistic design model, where each probe is sent over a path randomly selected from P, with probability φ y of selecting path p y. Here φ := (φ y ) P y=1, satisfying φ y and P y=1 φ y = 1, is a design parameter. B. Packet Loss Tomography Given a family of link metric distributions with unknown parameters θ := (θ l ) l L, the goal of (parametric) stochastic 1 We do not impose constraints on paths except that a path traverses each link at most once.

link metric tomography is to infer θ from observations of the corresponding performance metrics over probed paths. Let f y (x; θ) denote the conditional probability of observing path metric x, given that the probe is sent on path p y and the link parameters are θ. Then the problem of stochastic link metric tomography is to infer the parameter θ from the observations (x, y) := (x t, y t ) N t=1, where x t is the outcome of the t-th probe and y t the corresponding path index. Each observation satisfies the following distribution: f(x, y; θ, φ) = φ y f y (x; θ). (1) As a concrete example, consider packet loss tomography. Let the parameter of interest θ be the vector of link success rates (i.e., complements of loss rates), and each probe outcome x be an indicator that the probe successfully reaches its destination. Assume that losses of the same probe on different links and of different probes on the same link are both independent. Then the observation model becomes: f(x, y; θ, φ) = φ y ( θ l ) x (1 θ l ) 1 x. (2) l p y l p y C. Main Problem: Experiment Design Our goal is to develop a systematic framework to optimally allocate probes over measurement paths such that the overall error in estimating θ is minimized. Specifically, given an error measure C(ˆθ, θ) (e.g., L 2 -norm) and a total number of probes N, we want to design the probe distribution φ, such that in conjunction with an appropriate estimator ˆθ, the expected error E[C(ˆθ, θ)] after making N probes is minimized. The specific form of C(ˆθ, θ) will determine the design criterion, and will be specified later (Section IV). III. PRELIMINARIES In preparation for FIM-based experiment design, we first review FIM and its important properties. A. FIM and CRB Given the observation model (1), the (per-measurement) FIM wrt θ is an matrix, whose (i, j)-th entry is defined by [ ( )( ) ] θ,φ E logf(x, y; θ, φ) logf(x, y; θ, φ). θ i θ j We denote this matrix by I(θ; φ) to highlight its dependence on both the (unknown) parameter θ and the design parameter φ. All subsequent references to FIM mean this permeasurement FIM. The significance of the FIM is that it provides a fundamental bound on the error of unbiased estimators. Specifically, if ˆθ is an unbiased estimator ofθ usingn i.i.d. measurements, then the covariance matrix of ˆθ satisfies 2 cov(ˆθ) 1 N I 1 (θ; φ), known as the Cramér-Rao bound (CRB) [14]. In particular, the MSE in estimating θ l, given by cov(ˆθ) l,l, is lower bounded by I 1 l,l (θ; φ)/n. 2 For matrices A and B, A B means that A B is positive semi-definite. (3) B. Identifiability and Invertibility of FIM The CRB has an implicit assumption that the FIM is invertible. In our problem, we will show that this assumption follows from the identifiability of link parameters. We say that an unknown parameterθ is identifiable from observationsxif and only if the observation model satisfies that f(x; θ) f(x; θ ) at some x for any θ θ. In network tomography, the identifiability of link parameters θ has direct implication on the measurement matrix and the FIM. Specifically, suppose that a stochastic link metric tomography problem can be cast as a linear system Az(θ) = w, where A is the measurement matrix, z(θ) is a bijection of θ, and w is a vector of path performance parameters such that the probe outcomes depend on θ only through w. Suppose that w can be estimated consistently from probes 3. Then the following statements hold: θ is identifiable if and only if A has full column rank; if θ is identifiable, then I(θ; φ) is invertible. The first statement can be easily proved by an argument of contradiction, and the second statement is a direct implication of the equivalence between the invertibility of the FIM and the local identifiability 4 of θ [15]. Loss tomography admits such a linear system model, where z l = logθ l, and w y = log( l p y θ l ) according to (2). Discussion: The aspect of experiment design focusing on path construction has been extensively studied in the literature. In this work, we focus on a different aspect of designing probe allocation. Therefore, we will assume in the sequel that the link parameters of interest are identifiable using the given paths. By the above statements, this implies that the measurement matrix A has full column rank and the FIM I(θ; φ) is invertible. C. Link Parameter Estimation Fundamental to experiment design is how the collected measurements will be used to estimate the parameters of interest. Using the FIM in experiment design implies an implicit assumption that the adopted estimator is efficient (i.e., unbiased and achieves the CRB), and thus the CRB characterizes estimation error. In this regard, the Maximum Likelihood Estimator (MLE) has a superior property that it is the only candidate for efficient estimator. Moreover, although efficient estimators may not exist for finite sample sizes, the MLE is asymptotically efficient under regularity conditions. Therefore, using the MLE to estimate link parameters guarantees that our FIM-based experiment design will optimize the error decay rate as the number of probes becomes large. For loss tomography, we can derive the MLE using a unique property that MLE is invariant under one-to-one parameter transformations. That is, if ˆθ is an MLE of θ and η = g(θ) is a one-to-one transformation, then ˆη = g(ˆθ) is an MLE of η. Let α y (θ) := l p y θ l denote the success 3 That is, there exists an estimator ŵ that converges to w in probability as the number of probes goes to infinity. 4 That is, there exists an open neighborhood of θ such that no θ (θ θ) in this neighborhood leads to the same observation model.

probability of path p y, n 1,y the number of successfully received probes, and n,y the number of lost probes. It is known that the MLE of α y (θ) is simply the empirical path success probability ˆα y := n 1,y /(n 1,y + n,y ). Moreover, when A has full column rank, the link success rates θ and the path success rates α := (α y (θ)) P y=1 form a one-to-one mapping logθ = (A T A) 1 A T logα (assume α > ). Using the invariance property of MLE, we can obtain the MLE of θ from the MLE of α as follows. Without loss of generality, we assume that n 1,y +n,y > for y = 1,..., P. Proposition 1. If the measurement matrix A has full column rank and there is at least one successful probe per path (i.e., n 1,y > for y = 1,..., P ), then the MLE for loss tomography equals 5 : ˆθ = exp ( (A T A) 1 A T log ˆα ), (4) where ˆα is the vector of empirical path success rates. Remark: The MLE for loss tomography is only asymptotically unbiased (verified in Section VI-B) because of the non-linear operators (log, exp). IV. OBJECTIVE OF EXPERIMENT DESIGN The essence of FIM-based experiment design is to treat the CRB as an approximation of the estimation error matrix and select the design parameter φ to optimize a given objective function based on the CRB. Given an estimate of θ, the FIM (and hence the CRB) only depends on φ, which in theory allows us to optimize φ. Solving this optimization, however, is highly nontrivial as it is an optimization of a P -dimensional vector, making numerical solutions infeasible for larger P (i.e., larger networks). In this section, we will show that under certain conditions, satisfied by loss tomography, the objective function has a special structure that allows for closed-form solutions. A. D-Optimal Design In D-optimal experiment design, we seek to minimize the determinant of the inverse FIM, det(i 1 (θ;φ)), or equivalently maximize det(i(θ; φ)). The CRB implies that this design minimizes the volume of the error ellipsoid. We begin by establishing a special structure of det(i(θ; φ)) that holds for any network topology and any set of probing paths, under certain conditions on the observation model. We first show a general property of the FIM as follows. Lemma 2. The FIM for the observation model (1) is a convex combination of per-path FIMs: P I(θ;φ) = φ y I (y) (θ), (5) y=1 where I (y) (θ) is the FIM for path p y based on the observation model f y (x; θ). Note that I (y) (θ) is independent of φ and is only a function of θ. 5 For ease of presentation, we use g(z) to denote the vector obtained by applying a scalar function g( ) to each element of a vector z. Based on this decomposition, we can show that the determinant of the FIM has a particular structure as follows. Theorem 3. Let S n be the collection of all size-n subsets of {1,..., P }. If the per-path FIM satisfies I (y) i,k (θ)i(y) j,l (θ) = I(y) i,l (θ)i(y) j,k (θ) (6) for any y {1,..., P } and any i,j,k,l {1,...,}, then there exist functions B C (θ) (C S ) such that det(i(θ;φ)) = B C (θ) φ i. (7) C S i C Functions B C (θ) (C S ) do not depend on φ. Remark: This theorem describes a generic structure of det(i(θ; φ)) that applies to any tomography problem where condition (6) holds. In words, condition (6) states that any2 2 submatrix of the per-path FIM formed by removing 2 rows and 2 columns has a determinant of zero, i.e., any 2 2 minor of the per-path FIM (and the overall FIM) is zero 6 (note that the condition holds trivially if i = j or k = l). We will see in Section IV-D that this condition holds for loss tomography. The essence of this theorem is that under condition (6), the determinant of the FIM, when viewed as a function of φ, can be written as a weighted sum of order- terms, where each term is a product of (out of P ) distinct φ i s. We will show later that this property helps to simplify our FIM-based experiment design. Theorem 3 implies, in particular, that when P =, the determinant of the FIM equals: det(i(θ; φ)) = B(θ) k=1 φ k. (8) Analogous arguments can be used to show a formula for any minor of the FIM as follows. Corollary 4. Let M be an n-dimensional submatrix of I(θ;φ) after removing n rows and columns, and S n be defined as in Theorem 3. Then under condition (6), there exist functions B C (θ) (C S n ) such that the determinant of M (i.e., a minor of I(θ; φ)) equals: det(m(θ;φ)) = B C (θ) φ i. (9) C S n i C Functions B C (θ) (C S n ) do not depend on φ. B. A-Optimal Design In A-optimal experiment design, we seek to minimize the trace of the inverse FIM, Tr(I 1 (θ;φ)). The CRB states that this design minimizes the average mean squared error (MSE) for estimating θ. We observe a special structure oftr(i 1 (θ;φ)) as follows. When P =, the result in (8) together with Corollary 4 can be used to prove the following structure of Tr(I 1 (θ;φ)). 6 The k k minor of an m n matrix is the determinant of a submatrix obtained by removing m k rows and n k columns.

Theorem 5. Suppose P = and the FIM is invertible. If condition (6) holds, then the trace of the inverse FIM Tr(I 1 (θ;φ)) admits the following representation: Tr(I 1 (θ;φ)) = i=1 1 φ i A i (θ), (1) where A 1 (θ),...,a (θ) are only functions of θ. Remark: The proof actually gives a more general structure of Tr(I 1 (θ;φ)) for any P : C Tr(I 1 (θ;φ))= S 1 i C φ [ i k=1 B C,k(θ) ] C S B C (θ) i C φ, (11) i where B C,k(θ) and B C (θ) are only functions of θ. We only highlight the special case of P = because it allows for efficient optimization of φ; see Section V-A. C. Weighted A-Optimal Design In practice, applications may place different weights on the links. We extend the A-optimal design to account for this by introducing a weight vector ω := (ω k ) k=1, where ω k denotes the importance of link l k. Introducing weights changes the objective from minimizing Tr(I 1 (θ;φ)) to minimizing a weighted trace of I 1 (θ;φ), i.e., the weighted sum of the diagonal elements of I 1 (θ;φ): k=1 ω ki 1 k,k (θ;φ). By the CRB, this design minimizes the weighted average MSE for estimating {θ l } l L. We refer to this design as the weighted A-optimal design. Using analogous arguments, we can easily extend Theorem 5 to the following result. Corollary 6. Under the conditions in Theorem 5, the weighted trace of the inverse FIM admits the following representation: k=1 ω k I 1 k,k (θ;φ) = 1 à i (θ), (12) φ i i=1 where Ã1(θ),...,Ã(θ) are only functions of θ. Remark: Since the weighted A-optimal design contains the A-optimal design as a special case, we only consider the weighted A-optimal design in the sequel, simply referred to as A-optimal. D. Application to Loss Tomography We now apply our generic results to concrete tomography problems. To apply these results, we need to answer two questions: (i) Does condition (6) hold for the problem at hand? (ii) Can we evaluate the coefficient functions in the derived formulas (i.e., B C (θ), A i (θ), and Ãi(θ)) for a given value of θ? In this subsection, we give positive answers to both questions for loss tomography. Based on the observation model (2), we can obtain the perpath FIM I (y) (θ) for loss tomography, whose (i, j)-th entry equals I (y) i,j (θ) = α y (θ) θ i θ j (1 α y (θ)) {i,j p y}, (13) where { } is the indicator function. It is easy to verify that this FIM satisfies condition (6), and thus the formulas in Theorem 3, Theorem 5, and Corollary 6 apply. To derive explicit expressions for their coefficients, we take a detailed look at the FIM, which leads to a decomposition into a product of matrices with special structures. Substituting (13) into (5) gives the (i,j)-th entry of the FIM: P φ y I i,j (θ;φ) = y=1 α y (θ) θ i θ j (1 α y (θ)) {i,j p y}. (14) ( ) We introduce two auxiliary matrics 7 : D = diag (d y ) P y=1 for d y := φ y α y (θ)/(1 α y (θ)), and Θ = diag(θ). Then the above FIM can be written in matrix form as I(θ;φ) = Θ 1 A T DAΘ 1, (15) where A is the measurement matrix. Based on this decomposition, we can evaluate its determinant and trace of the inverse as functions of Θ, A, and D, leading to the following results. Lemma 7. Let A C denote a submatrix of the measurement matrix A formed by rows with indices in C (C S ). Then det(i(θ;φ)) for loss tomography can be expressed as (7) with coefficients for each C S. B C (θ) = det(a C) 2 l L θ2 l i C α i (θ) 1 α i (θ) (16) Moreover, if P = and I(θ; φ) is invertible, then Tr(I 1 (θ;φ)) can be expressed as (1) with coefficients A i (θ) = 1 α i(θ) α i (θ) θkb 2 2 k,i (17) k=1 for i = 1,...,, where b k,i is the (k,i)-th entry of 8 A 1. Similarly, the weighted sum of the diagonal elements of I 1 (θ;φ) can be expressed as (12) with coefficients à i (θ) = 1 α i(θ) α i (θ) where ω k is the weight of link l k. ω k θkb 2 2 k,i, (18) k=1 V. EXPERIMENT DESIGN ALGORITHMS The special structures of the design objectives established in Section IV enable us to compute the design parameter φ efficiently. In the sequel, we will first derive closed-form solutions for the case of P =, i.e., all probing paths are linearly independent, and then address the case of P >. A. Closed-form Solution for P = For the D-optimal design, Theorem 3 implies that when P =, the determinant of the FIM is proportional to the 7 Here diag(d) denotes a diagonal matrix with the main diagonal d. 8 Given identifiability of θ, A must be invertible in this case; see Section III-B.

Algorithm 1 Two-step Experiment Design for Given θ 1: P B P 2: for iteration i = 1,..., P do 3: for path p P B do 4: if P B \p has rank then 5: evaluate design objective when only using paths in P B \p 6: record path p that yields the optimal objective 7: P B P B \p 8: compute optimal φ y for p y P B ; set φ y to for p y P B product of φ i s as shown in (8). Since i=1 φ i = 1, by the inequality of arithmetic and geometric means, we see that (8) is maximized by setting φ i = 1/ for all i = 1,...,. Claim 8. Uniform probing (i.e., φ i = 1/ P ) is D-optimal when P =. For the A-optimal design, it is easy to show, using the Lagrange Multiplier method, that a closed-form solution for minimizing (1) wrt φ is the following: Ai (θ) φ i = j=1 Aj (θ), (19) for i = 1,2,...,. The solution for the weighted A-optimal design is analogous, except that A i (θ) is replaced by Ãi(θ). B. Heuristic Solution for P > When P >, the computation of the optimal design becomes more complicated. In general, we have verified that uniform probing is no longer D-optimal. To develop a scalable solution, we leverage the closed-form solution in the case of P = through a two-step heuristic: we first pick a basis of paths that gives the optimal objective value among all bases, and then optimize φ using solutions in Section V-A for paths in the basis, while setting φ y = for paths not in the basis. However, optimizing the basis is itself a combinatorial optimization that is hard to solve exactly. To select a basis, we propose a backward greedy algorithm, given in Algorithm 1. Starting with all P paths, it iteratively deselects one path at a time to optimize the design objective (determinant, trace, or weighted trace of I 1 (θ;φ)), and the iteration continues until the remaining paths form a basis (lines 2 7). To evaluate the design objective (line 5) before calculating φ, we assume uniform φ for the selected paths. C. Iterative Design Algorithm In general, the optimal design depends on the unknown parameterθ, which can only be estimated after collecting some measurements. This motivates an iterative design algorithm, presented in Algorithm 2. Specifically, we conduct probing in N/k iterations of k probes each. In each iteration, we send k probes on paths selected according to the current φ (line 3), update the estimate ˆθ based on the probing results (line 4), and then compute a new design parameter ˆφ using the updated estimate (line 5). During first few iterations, we may not have sufficient measurements to accurately estimate θ, which can mislead our design. To improve robustness, we Algorithm 2 Iterative Experiment Design 1: φ y 1/ P for y = 1,..., P 2: for iteration i = 1,...,N/k do 3: send k probes according to φ 4: update ˆθ based on probing results 5: compute a new design parameter ˆφ by Algorithm 1 using the updated ˆθ 6: update design parameter φ (1 ik/n)φ+(ik/n)ˆφ use a combination of the current φ and the new ˆφ (computed by line 5), and give increasing weight to ˆφ as we obtain more measurements (line 6). Intuitively, as we obtain more measurements, the estimated ˆθ will converge to θ, and thus the iteratively designed ˆφ will converge to the φ optimized for θ. We now formalize this intuition. We only consider the A-optimal design, as results are analogous for the other design objectives. Theorem 9. For loss tomography, as the number of probes per path increases, the estimated objective of the A-optimal design (i.e., trace of the inverse FIM based on ˆθ) converges to the true objective with high probability. Moreover, for a fixed basis P B, the A-optimal design on P B based on ˆθ converges to the true A-optimal design on P B based on θ with high probability. VI. PERFORMANCE EVALUATION We evaluate different experiment designs by packet-level simulations on real network topologies and link parameters. Our goal in the evaluation is two-fold: (i) evaluating the performance of (iterative) A-optimal design compared with uniformly allocating probes (uniform probing), and (ii) evaluating the impact of system parameters such as link weights, number of monitors, and number of paths. To guarantee identifiability of all the link parameters, we first place a minimum set of monitors by the Minimum Monitor Placement (MMP) algorithm in [12] and then place the remaining monitors, if any, randomly. Given the monitors, we first construct linearly independent paths by the Spanning Tree-based Path Construction (STPC) algorithm in [13] and then construct additional paths if needed by a random walk 9. We consider two types of link weights: homogeneous link weights, where all links have unit weight, and heterogeneous link weights, where a randomly selected subset of K links have a larger weight Ω (Ω > 1), and the rest of the links have unit weight. In the case of heterogeneous link weights, we set K = 1 and Ω = 5. We measure the performance of an experiment design by the (weighted) average MSE and bias over all estimated link parameters when applying the MLE (Section III-C) to measurements collected using this design. Furthermore, we evaluate the CRB and the design parameter φ to gain insights on the internal behaviors of various designs. All results are averaged over 5 instances of monitor locations, measurement paths, and link weights, and 1 Monte Carlo runs of probing simulations per instance. In each Monte Carlo run, we simulate 9 We remove cycles from the random-walk paths so that all paths are cyclefree, although this is not required for the design of probe allocation.

2 4 6 8 1 1 CRB: uniform MSE: uniform iterative.16.14.12 uniform iterative.9.8.7 uniform iterative.1.6 MSE 1 1 bias.8.6 φ.5.3.1 Fig. 1. 2 4 6 8 1 #probes x 1 #probes 4 (a) MSE and CRB (b) Bias Loss tomography, homogeneous link weights (2 monitors, 219 paths) x 1 4 path index (c) φ 2 4 6 8 1 12 14 16 18 2 22 1 CRB: uniform MSE: uniform iterative.1.8 uniform iterative 5.35 uniform iterative 1 1 MSE bias.6 φ.3 5.15.1.5 Fig. 2. 2 4 6 8 1 4 #probes #probes 2 6 8 1 x 1 4 (a) MSE and CRB (b) Bias Loss tomography, heterogeneous link weights (2 monitors, 219 paths) x 1 4 path index (c) φ 2 4 6 8 1 12 14 16 18 2 22 CRB: uniform 1 1 MSE: uniform iterative 1 1 uniform iterative.18.16.14 15 monitors 2 monitors 25 monitors 3 monitors Fig. 3. 1 MSE 1 1 1 3 absolute bias 1 3 1 4 2 4 6 8 1 18 2 22 24 #monitors 18 2 22 24 #monitors #iterations 14 16 26 28 3 32 14 16 26 28 3 32 (a) MSE and CRB (b) Absolute bias (c) Convergence of φ I Loss tomography, varying number of monitors (219 paths, 1 5 probes, homogeneous link weights).12 distance to φ A.1.8.6 Fig. 4. 1 1 MSE CRB: uniform MSE: uniform iterative absolute bias 1 1 uniform iterative 1 3 1 4 1 2 3 #extra paths 1 2 3 #extra paths #iterations 1 4 5 6 1 4 5 6 (a) MSE and CRB (b) Absolute bias (c) Convergence of φ I Loss tomography, varying number of paths (2 monitors, 1 5 probes, homogeneous link weights).14.12 distance to φ A.1.8.6 extra paths 1 extra paths 3 extra paths 5 extra paths 5 1 15 2 1 5 probes, which is divided into 1 iterations of 1 probes each for the iterative design. A. Dataset for Evaluation To evaluate our experiment design in a realistic scenario, we use the Roofnet dataset [1], which contains topologies and link measurements from a 38-node wireless mesh network. The dataset contains four subsets of data, corresponding to data rates 1, 2, 5.5, and 11 Mbps. We only present results based on the 1-Mbps data, as the results are similar to those for the other data rates. The raw dataset contains sent/received packet sequence numbers and timestamps between all pairs of nodes within communication range. We extract link success rates by computing the fraction of packets sent by a first node that are received by a second node. We then take the average of both directions of transmission as the parameter of a link 1. We also filter out links with success rates below.1 to only focus on useful links. After filtering, we obtain a topology with 38 nodes and 219 (undirected) links, with a roughly uniform distribution of link success rates. B. Evaluation of Loss Tomography We first evaluate the performance of different designs as the number of probes increases; see Fig. 1 for results under homogeneous link weights, and Fig. 2 for results under heterogeneous link weights. From Fig. 1 (a) and 2 (a), we see that the A-optimal design and its iterative version achieve lower MSE 1 Note that the realizations of link losses for each probe are generated according to our model, using parameters extracted from the dataset.

than uniform probing, and the improvement is greater under heterogeneous link weights. Examining the design parameter φ under each design (Fig. 1 (c) and 2 (c)) verifies that this improvement is achieved through nonuniformly allocating probes to better measure the paths that provide more information for estimating link success rates (paths are sorted in the order of increasing probing probabilities under the A-optimal design). The same figure also verifies that the iterative design is able to converge to the true A-optimal design (their curves essentially overlap); we will evaluate the rate of convergence later. Interestingly, for loss tomography, the MLE (Eq. (4)) is biased at finite sample sizes as shown in Fig. 1 (b) and 2 (b), and thus the CRB does not provide a true lower bound on the MSE as shown in Fig. 1 (a) and 2 (a). Nevertheless, the CRB captures trends of the MSE so that minimizing the CRB provides a design (i.e., A-optimal design) with low MSE. Next, we study the impact of system parameters on estimation performance. As we vary the number of monitors, Figs. 3 (a) (b) show that all probing methods benefit as we place more monitors. Intuitively, this is because with more monitors, paths become shorter, making the measurements less aggregated and more informative for inferring parameters of individual links. We also evaluate the impact on convergence rate of the iterative design by measuring the L 2 -distance of the iterative design (φ I ) to the A-optimal design (φ A ) across iterations; see Fig. 3 (c). The result verifies that the iterative design algorithm is able to quickly converge to the true A- optimal design. We then add extra paths constructed by a random walk (#extra paths = P ). Since when P >, the A- optimal design can no longer be computed in closed form, we only compute a constrained A-optimal design on a basis selected by Algorithm 1 (based on the true value of θ), simply referred to as A-optimal. Fig. 4 shows that the constrained A-optimal design still performs notably better than uniform probing which probes all the paths (Fig. 4 (a) (b)). In contrast to adding monitors, adding paths does not significantly impact the MSE; we do not observe a clear trend in bias. Meanwhile, we see from Fig. 4 (c) that having extra paths significantly slows down convergence of the iterative design 11. Detailed examination shows that this is due to near-tie between some bases in terms of the objective value (trace of the inverse FIM). Nevertheless, Fig. 4 (a) shows that the iterative design outperforms uniform probing in terms of MSE. VII. CONCLUSION We propose a general framework of optimal experiment design for inferring parameters of stochastic link metrics using path measurements, with a case study on loss tomography. Using the FIM to measure the amount of information contained in each measurement, we formulate the problem as an optimization of probe distribution across paths, with two widelyadopted objectives known as D-optimality and A-optimality. We are particularly interested in A-optimal design since it is directly linked to MSE and can be easily extended to incorporate different link weights. Under certain conditions on the FIM, satisfied by loss tomography, we derive explicit expressions for both objective functions, which enable closedform solutions of the optimal design when the probing paths are linearly independent. Using this solution as a building block, we develop a two-step heuristic and an iterative algorithm to address the issues of linearly dependent paths and dependency on unknown parameters. Our evaluations on real datasets verify the effectiveness of the proposed solution in reducing MSE, even if the FIM-based bound can be loose. REFERENCES [1] D. Aguayo, J. Bicket, S. Biswas, G. Judd, and R. Morris. Link-level measurements from an 82.11b mesh network. In SIGCOMM, 24. [2] A.C. Atkinson and A.N. Donev. Optimum Experimental Designs. Clarendon Press, 1992. [3] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 24. [4] R. Caceres, N.G. Duffield, J. Horowitz, D. Towsley, and T. Bu. Multicast-based inference of network-internal characteristics: Accuracy of packet loss estimation. IEEE Trans. Info. Thy, 45:2462 248, 1998. [5] M. Coates, A. O. Hero, R. Nowak, and B. Yu. Internet tomography. IEEE Signal Processing Magzine, 19:47 65, 22. [6] N. Duffield, F. Lo Presti, V. Paxson, and D. Towsley. Network loss tomography using striped unicast probes. IEEE/ACM Trans. Networking, 14(4):697 71, Aug 26. [7] N.G. Duffield and F. Lo Presti. Multicast inference of packet delay variance at interior network links. In IEEE INFOCOM, 2. [8] T. Gu, G. Jiang, V. Singh, and Y. Zhang. Optimal probing for unicast network delay tomography. In IEEE INFOCOM, 21. [9] T. He, C. Liu, A. Swami, D. Towsley, T. Salonidis, A. I. Bejan, and P. Yu. Fisher information-based experiment design for nework tomography. In SIGMETRICS, 215. [1] A. Krishnamurthy and A. Singh. Robust multi-source network tomography using selective probes. In IEEE INFOCOM, 212. [11] F. Lo Presti, N.G. Duffield, J. Horowitz, and D. Towsley. Multicastbased inference of network-internal delay distributions. IEEE/ACM Trans. Networking, 1(6):761 775, Dec. 22. [12] L. Ma, T. He, K.K. Leung, A. Swami, and D. Towsley. Identifiability of link metrics based on end-to-end path measurements. In ACM IMC, 213. [13] L. Ma, T. He, K.K. Leung, D. Towsley, and A. Swami. Efficient identification of additive link metrics via network tomography. In ICDCS, 213. [14] H.V. Poor. An Introduction to Signal Detection and Estimation. Springer, 1994. [15] T. J. Rothenberg. Identification in parametric models. Econometrica, 39:577 591, May 1971. [16] M. Shih and A. Hero. Unicast inference of network link delay distributions from edge measurements. In IEEE ICASSP, 21. [17] H. Singhal and G. Michailidis. Optimal sampling in state space models with applications to network monitoring. In ACM SIGMETRICS, 28. [18] B. Xi, G. Michailidis, and V. N. Nair. Estimating network loss rates using active tomography. Journal of the American Statistical Association, 11:143 1448, December 26. 11 Due to the higher complexity of Algorithm 1 in this case, we reduce the number of iterations to 2, each with 5 probes.