Sensitivity Analysis in Markov Networks

Similar documents
Probabilistic Graphical Models

Complexity of Regularization RBF Networks

Nonreversibility of Multiple Unicast Networks

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion

Danielle Maddix AA238 Final Project December 9, 2016

Methods of evaluating tests

Assessing the Performance of a BCI: A Task-Oriented Approach

7 Max-Flow Problems. Business Computing and Operations Research 608

Maximum Entropy and Exponential Families

Probabilistic Graphical Models

Chapter 8 Hypothesis Testing

Sensitivity analysis for linear optimization problem with fuzzy data in the objective function

On Component Order Edge Reliability and the Existence of Uniformly Most Reliable Unicycles

Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite.

Sufficient Conditions for a Flexible Manufacturing System to be Deadlocked

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach

University of Groningen

Taste for variety and optimum product diversity in an open economy

A Spatiotemporal Approach to Passive Sound Source Localization

Packing Plane Spanning Trees into a Point Set

Error Bounds for Context Reduction and Feature Omission

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM

Stress triaxiality to evaluate the effective distance in the volumetric approach in fracture mechanics

Directional Coupler. 4-port Network

Word of Mass: The Relationship between Mass Media and Word-of-Mouth

A Functional Representation of Fuzzy Preferences

The Effectiveness of the Linear Hull Effect

A Characterization of Wavelet Convergence in Sobolev Spaces

DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS

Development of Fuzzy Extreme Value Theory. Populations

Model-based mixture discriminant analysis an experimental study

Some Properties on Nano Topology Induced by Graphs

Failure Assessment Diagram Analysis of Creep Crack Initiation in 316H Stainless Steel

Evaluation of effect of blade internal modes on sensitivity of Advanced LIGO

Discrete Bessel functions and partial difference equations

The Second Postulate of Euclid and the Hyperbolic Geometry

LOGISTIC REGRESSION IN DEPRESSION CLASSIFICATION

Resolving RIPS Measurement Ambiguity in Maximum Likelihood Estimation

When do Numbers Really Matter?

An I-Vector Backend for Speaker Verification

The Impact of Information on the Performance of an M/M/1 Queueing System

Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems

When do Numbers Really Matter?

HILLE-KNESER TYPE CRITERIA FOR SECOND-ORDER DYNAMIC EQUATIONS ON TIME SCALES

A Queueing Model for Call Blending in Call Centers

Normative and descriptive approaches to multiattribute decision making

No Time to Observe: Adaptive Influence Maximization with Partial Feedback

The Hanging Chain. John McCuan. January 19, 2006

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix

QCLAS Sensor for Purity Monitoring in Medical Gas Supply Lines

Modeling of Threading Dislocation Density Reduction in Heteroepitaxial Layers

Lecture 3 - Lorentz Transformations

the following action R of T on T n+1 : for each θ T, R θ : T n+1 T n+1 is defined by stated, we assume that all the curves in this paper are defined

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1

BAYESIAN NETWORK FOR DECISION AID IN MAINTENANCE

Likelihood-confidence intervals for quantiles in Extreme Value Distributions

Sensor management for PRF selection in the track-before-detect context

Conformal Mapping among Orthogonal, Symmetric, and Skew-Symmetric Matrices

Variation Based Online Travel Time Prediction Using Clustered Neural Networks

Generalized Dimensional Analysis

Case I: 2 users In case of 2 users, the probability of error for user 1 was earlier derived to be 2 A1

10.5 Unsupervised Bayesian Learning

Critical Reflections on the Hafele and Keating Experiment

Tight bounds for selfish and greedy load balancing

10.2 The Occurrence of Critical Flow; Controls

On Industry Structure and Firm Conduct in Long Run Equilibrium

Fig Review of Granta-gravel

Simplified Buckling Analysis of Skeletal Structures

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E')

eappendix for: SAS macro for causal mediation analysis with survival data

Physical Laws, Absolutes, Relative Absolutes and Relativistic Time Phenomena

Convergence of reinforcement learning with general function approximators

Control Theory association of mathematics and engineering

On the Bit Error Probability of Noisy Channel Networks With Intermediate Node Encoding I. INTRODUCTION

Advanced Computational Fluid Dynamics AA215A Lecture 4

Lightpath routing for maximum reliability in optical mesh networks

REFINED UPPER BOUNDS FOR THE LINEAR DIOPHANTINE PROBLEM OF FROBENIUS. 1. Introduction

Einstein s Three Mistakes in Special Relativity Revealed. Copyright Joseph A. Rybczyk

Coding for Random Projections and Approximate Near Neighbor Search

Wave Propagation through Random Media

Counting Idempotent Relations

Subject: Introduction to Component Matching and Off-Design Operation % % ( (1) R T % (

SINCE Zadeh s compositional rule of fuzzy inference

Battery Sizing for Grid Connected PV Systems with Fixed Minimum Charging/Discharging Time

A new method of measuring similarity between two neutrosophic soft sets and its application in pattern recognition problems

MultiPhysics Analysis of Trapped Field in Multi-Layer YBCO Plates

23.1 Tuning controllers, in the large view Quoting from Section 16.7:

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017

Simplification of Network Dynamics in Large Systems

After the completion of this section the student should recall

Math 220A - Fall 2002 Homework 8 Solutions

Where as discussed previously we interpret solutions to this partial differential equation in the weak sense: b

Microeconomic Theory I Assignment #7 - Answer key

LECTURE NOTES FOR , FALL 2004

Coding for Noisy Write-Efficient Memories

An iterative least-square method suitable for solving large sparse matrices

General probability weighted moments for the three-parameter Weibull Distribution and their application in S-N curves modelling

Combined Electric and Magnetic Dipoles for Mesoband Radiation, Part 2

EDGE-DISJOINT CLIQUES IN GRAPHS WITH HIGH MINIMUM DEGREE

Calibration of Piping Assessment Models in the Netherlands

Transcription:

Sensitivity Analysis in Markov Networks Hei Chan and Adnan Darwihe Computer Siene Department University of California, Los Angeles Los Angeles, CA 90095 {hei,darwihe}@s.ula.edu Abstrat This paper explores the topi of sensitivity analysis in Markov networks, by takling questions similar to those arising in the ontext of Bayesian networks: the tuning of parameters to satisfy query onstraints, and the bounding of query hanges when perturbing network parameters. Even though the distribution indued by a Markov network orresponds to ratios of multi-linear funtions, whereas the distribution indued by a Bayesian network orresponds to multi-linear funtions, the results we obtain for Markov networks are as effetive omputationally as those obtained for Bayesian networks. This similarity is due to the fat that onditional probabilities have the same funtional form in both Bayesian and Markov networks, whih turns out to be the more influential fator. The major differene we found, however, is in how hanges in parameter values should be quantified, as suh parameters are interpreted differently in Bayesian networks and Markov networks. 1 Introdution In the domain of unertainty reasoning, graphial models are used to embed and represent probabilisti dependenies between random variables. There are two major types of graphial models: Bayesian networks [Pearl, 1988; Jensen, 2001] and Markov networks [Pearl, 1988]. In both models, there are two omponents, a qualitative part and a quantitative part. In the qualitative part, a graph is used to represent the interations between variables, suh that variables with diret interations are onneted by edges, and information of independene relationships between variables an be inferred from the network struture. In the quantitative part, a set of parameters are speified to quantify the strengths of influenes between related variables. The joint probability distribution an then be indued from the two omponents. Bayesian networks and Markov networks differ in both the qualitative and quantitative parts of the models. For the qualitative part, direted edges are used in a Bayesian network to represent influenes from one variable to another, and no yles are allowed in the graph, while undireted edges are used in a Markov network to represent symmetrial probabilisti dependenies between variables. For the quantitative part, the parameters in a Bayesian network are speified in onditional probability tables (CPTs) of the variables, where eah CPT onsists of the onditional probability distributions of a variable given instantiations of its parents, whih are the variables that diretly influene this variable. Beause these parameters are onditional probabilities, they are more intuitive, but have to obey ertain properties. For example, the probabilities in eah onditional distribution must be normalized and sum to 1. On the other hand, the parameters in a Markov network are given in potentials, defined over instantiations of liques, whih are maximal sets of variables suh that every pair of variables in a lique are onneted by an edge. These parameters do not need to be normalized and an be hanged more freely. The topi of sensitivity analysis is onerned with understanding and haraterizing the relationship between the loal parameters quantifying a network and the global queries omputed based on the network. Sensitivity analysis has been investigated quite omprehensively in Bayesian networks [Laskey, 1995; Castillo et al., 1997; Jensen, 1999; Kjærulff and van der Gaag, 2000; Chan and Darwihe, 2002; 2004; 2005]. Earlier work on the subjet has observed that the distribution indued by a Bayesian network orresponds to multi-linear funtions, providing effetive omputational methods for omputing P r(e)/ θ x u : the partial derivative of a joint probability with respet to a network parameter. These earlier results were reently pursued further, providing effetive methods for solving the following problems: What is the neessary parameter hange we need to apply suh that a given query onstraint is satisfied? For example, we may want some query value P r(z e) to be at least p by hanging network parameters. Effiient solutions were given to this problem for omputing minimum hanges for single parameters [Chan and Darwihe, 2002], and multiple parameters in a single CPT [Chan and Darwihe, 2004]. What is the bound on the hange in some query value if we apply an arbitrary parameter hange? Here, we are interested in finding a general bound for any query and any parameter of any network. It has been shown that the log-odds hange in any onditional query in a

Bayesian network is bounded by the log-odds hange in any single parameter [Chan and Darwihe, 2002]. What is the bound on the differene between some query value indued by two networks that have the same struture but different parameter values? The solution to this problem is based on a newly proposed distane measure [Chan and Darwihe, 2005], whih allows one to bound the differene between the query values under two distributions. Based on this distane measure, and given two Bayesian networks that differ by only a single CPT, the global distane measure between the distributions indued by the networks an be omputed from the loal distane measure between the CPTs, thereby allowing one to provide a bound on the differene between the query values. Moreover, if we are given multiple CPTs that do not interset with eah other, the global distane measure an still be omputed as the sum of the loal distane measures between the individual CPTs. In this paper, we address these key questions of sensitivity analysis but with respet to Markov networks. The main question of interest is the extent to whih these promising results hold in Markov networks as well. There is indeed a key differene between Bayesian networks and Markov networks whih appears to suggest a lak of parallels in this ase: whereas the joint distribution indued by a Bayesian network orresponds to multi-linear funtions, the joint distribution indued by a Markov network orresponds to ratios of multi-linear funtions. As it turns out, however, the onditional probability funtion has the same funtional form in both Bayesian and Markov networks, as ratios of multi-linear funtions. This similarity turns out to be the key fator here, allowing us to derive similarly effetive results for Markov networks. This is greatly benefiial beause we an answer the previous three questions in the ontext of Markov networks as well, with the same omputational omplexity. For example, we an go through eah parameter in a Markov network, ompute the minimum single parameter hanges neessary to enfore a query onstraint, and find the one that perturbs the network the least. Alternatively, we an hange all parameters in a single lique table, and find the hange that minimizes network perturbation. Afterwards, we an ompute a bound on any query hange using the distane measure inurred by the parameter hange. Our results, however, point to a main semantial differene between Bayesian networks and Markov networks, relating to how one should quantify and measure parameter hanges. That is, how should one quantify a parameter hange from.3 to.4? In a Bayesian network, parameters are interpreted as onditional probabilities, and the measure whih quantifies the hange is the relative odds hange in the parameter value. This means query values are muh more sensitive to hanges in extreme probability values, whether lose to 0 or 1. On the other hand, in a Markov network, parameters are interpreted as ompatibility ratios, and the measure whih quantifies the hange is the relative hange in the parameter value. This differene stems from how the parameters in the two models are interpreted and will be explained in more depth later. This paper is strutured as follows. Setion 2 is dediated A 1 B 1 B 2 A 2 Figure 1: A sample Markov network struture in terms of an undireted graph G. to the definition and parameterization of Markov networks. Setion 3 presents a proedure whih tunes parameters in a Markov network in order to satisfy a query onstraint. Setion 4 introdues a distane measure that an be used to bound query hange between distributions indued by two Markov networks. Setion 5 onludes the paper. 2 Markov networks A Markov network M = (G, Θ) onsists of two parts: the network struture in terms of an undireted graph G, and the parameterization Θ. Eah node in the undireted graph G represents a random variable, and two variables are onneted by an edge if there exists a diret interation between them. The absene of an edge between two variables means any potential interation between them is indiret and onditional upon other variables. As an example, onsider four individuals, A 1, A 2, B 1, B 2, and we wish to represent the possible transmission of a ontagious disease among them. A network struture whih onsists of these variables is shown in Figure 1. Eah pair of variables {A i, B j } is onneted by an edge, meaning these pairs of individuals are in diret ontat of eah other. However, A 1 and A 2 are not onneted with an edge, meaning they do not interat diretly with eah other, although diseases an still be transmitted between them indiretly through B 1 or B 2. To quantify the edges in a Markov network, we first define liques in the network struture. A lique C is a maximal set of variables where eah pair of variables in the set C is onneted by an edge. In the network struture in Figure 1, there are four liques: {A 1, B 1 }, {A 1, B 2 }, {A 2, B 1 }, {A 2, B 2 }. For eah lique C, we define a table Θ C that assigns a nonnegative number to eah instantiation of variables C, whih measures the relative degree of ompatibility assoiated with eah instantiation. The numbers given indiate the level of ompatibility between the values of the variables in the lique, where diret interations exist between every pair of variables. For example, in our example network, we define the same lique table Θ Ai,B j shown in Table 1 for eah lique {A i, B j }. For example, it an be viewed that both A i and B j not having the disease is four times more ompatible than both A i and B j having the disease, sine θāi, b j = 4θ ai,b j. 1 1 Parameters may be estimated on an ad ho basis [Geman and Geman, 1984] or by statistial models. It an be diffiult to assign meanings and numbers to Markov network parameters intuitively [Pearl, 1988, pages 107 108].

A i B j Θ Ai,B j a i b j 1 a i bj 2 ā i b j 3 ā i bj 4 Table 1: The lique table of {A i, B j } of the Markov network whose struture is shown in Figure 1. 2.1 Joint and onditional probabilities indued by Markov networks We now proeed to find the distribution indued by a Markov network. If X is the set of all variables in the Markov network M = (G, Θ), the joint potential ψ over X indued by M is defined as: ψ(x) def =. (1) x That is, ψ(x) is the produt of parameters in eah lique C, suh that instantiation is onsistent with x. The joint distribution P r M indued by Markov network M is then defined as its normalized joint potential: P r M (x) def = ψ(x) = Kψ(x), ψ(x) x where K = 1/ x ψ(x) is the normalizing onstant. From this equation, we an easily verify that the speifi parameter values in a lique are not important, but their ratios are. This is a major departure from Bayesian networks, where speifi parameter values in CPTs are important. Given the network struture in Figure 1 with the parameterization in Table 1, we an ompute the joint probability of any full instantiation of variables indued by our example network. For example, the joint probability of a 1, ā 2, b 1, b 2 is equal to Kθ a1,b 1 θ a1, b 2 θā2,b 1 θā2, b 2 = K 1 2 3 4, where K is the normalizing onstant. For an instantiation e of variables E X, we an express the joint probability P r M (e) as: P r M (e) = x e P r M (x) = x e Kψ(x) = Kφ(e), where the funtion φ(e) is defined as the sum of the potentials of x whih are onsistent with e: φ(e) def = ψ(x) =, (2) x e x e x and any onditional probability P r M (z e) as: P r M (z e) = P rm (z, e) P r M (e) = Kφ(z, e) Kφ(e) = φ(z, e) φ(e). (3) From Equation 2, we an see that both φ(z, e) and φ(e) are sums of produts of the network parameters. Therefore, the onditional probability P r M (z e) indued by a Markov network an be expressed as the ratio of two multi-linear funtions of the network parameters. As a omparison, the parameterization Θ of a direted ayli graph N for a Bayesian network onsists of a set of CPTs, one for eah variable X, whih are onditional probabilities in the form of θ x u = P r(x u), where U are the parents of X. The joint distribution P r B indued by Bayesian network B = (N, Θ) is: P r B (x) def = θ x u. x,u x The joint probability P r B (e) an be expressed as: P r B (e) = θ x u, x e x,u x and any onditional probability P r B (z e) as: P r B (z e) = P rb (z, e) x z,e P r B = (e) x e x,u x θ x u x,u x θ x u. (4) We an easily see the similarities between the expressions of P r M (z e) and P r B (z e) in Equations 3 and 4 lie in the fat that both an be expressed as ratios of two multilinear funtions of the network parameters. This is different for joint probabilties, where in a Bayesian network they are multi-linear funtions of the network parameters, while in a Markov network they are ratios of multi-linear funtions of the network parameters, sine the normalizing onstant an be expressed as K = 1/ x ψ(x) = 1/φ(true). This similarity in terms of onditional probabilities will be the basis of our results in the following setion. 3 Tuning of parameters in Markov networks We now answer the following question in the ontext of Markov networks: what is the neessary hange we an apply to ertain parameter(s) suh that a query onstraint is satisfied? For example, if P r is the probability distribution indued by a Markov network, we may want to satisfy the query onstraint P r(z e) k, for some k [0, 1]. 2 Notie that given all the parameters in the lique C, φ(e) an be expressed as a multi-linear funtion of these parameters, as shown in Equation 2. Sine φ(e) is linear in eah parameter, and no two parameters in the same lique are multiplied together, if we apply a hange of to eah parameter in the lique C, the hange in φ(e) is: φ(e) = φ(e). θ To ensure the query onstraint P r(z e) k, from Equation 3, this is equivalent to ensuring the inequality φ(z, e) k φ(e). If the urrent φ values of z, e and e are ϕ(z, e) and ϕ(e) respetively, this inequality beomes: ϕ(z, e) + φ(z, e) k(ϕ(e) + φ(e)), or: ( φ(z, e) k ϕ(e) + ) φ(e). ϕ(z, e) + Rearranging the terms, we get the following theorem and orollary. 2 We an easily expand our results in this setion into other forms of onstraints, suh as P r(z e) k, P r(z 1 e) P r(z 2 e) k, or P r(z 1 e)/p r(z 2 e) k.

Theorem 1 To ensure the probability distribution P r indued by a Markov network satisfies the query onstraint P r(z e) k, we must hange eah parameter in the lique C by suh that: α( ) (ϕ(z, e) k ϕ(e)), (5) where ϕ(z, e) and ϕ(e) are the urrent φ values of z, e and e respetively, and: α( ) = φ(z, e) k φ(e). (6) Corollary 1 If instead of hanging all parameters in the lique C, we are only allowed to hange a single parameter by, the solution of Theorem 1 beomes: α( ) (ϕ(z, e) k ϕ(e)), (7) whih returns a solution interval of possible. Therefore, to solve for in Inequality 7 for all parameters in the Markov network, we need to ompute the original values ϕ(z, e) and ϕ(e), whih should already be known when omputing the original probability of z e, and the partial derivatives φ(z, e)/ and φ(e)/, to find α( ) in Equation 6 for all parameters. To do this, we an use a proedure in time omplexity O(n exp(w)) where n is the number of variables and w is the treewidth of the Markov network, similar to the one proposed to ompute partial derivatives in a Bayesian network [Darwihe, 2003]. This an greatly help users debug their network when they are faed with query results that do not math their expetations. We now use our example network to illustrate this proedure. The probability distribution P r indued by the urrent parameterization gives us the onditional query value P r(ā 2 a 1 ) =.789. Assume that we would like to hange a single parameter in the lique {A 2, B 1 } to ensure the onstraint P r(ā 2 a 1 ).9. We need to ompute the α values for eah parameter in the lique using Equation 6, and then use Inequality 7 to solve for the minimal. The solutions are: θ a2,b 1 2.93; θ a2, b 1 1.467; θā2,b 1 12; θā2, b 1 8.8. However, notie that sine the parameter values have to be non-negative, the solution for θ a2,b 1 is impossible to ahieve. Therefore, no possible single parameter hange on θ a2,b 1 is possible to ensure our query onstraint. However, we an derease the parameter θ a2, b 1 from 2 to.533 to ensure our query onstraint. If we are interested in hanging all the parameters in the lique {A 2, B 1 } to satisfy our query onstraint, we need to find a solution that satisfies Inequality 5. As a onsequene, we are now faed with multiple solutions of hanging the parameters, and we want to ommit to one whih disturbs the network the least. We will disuss this in the next setion using a distane measure whih measures network perturbation. 4 Bounding belief hange between Markov networks using a distane measure We now answer the following question in the ontext of Markov networks: what is the bound on the differene between some query value indued by two networks that have the same struture but different parameter values? We will answer it by using a previously proposed distane measure. Let P r and P r be two probability distributions over the same set of variables X. We use a distane measure D(P r, P r ) defined as follows [Chan and Darwihe, 2005]: D(P r, P r ) def P r (x) P r (x) ln min x P r(x) x P r(x), where we will define 0/0 def = 1 and / def = 1. 3 The signifiane of this distane measure is that if the distane measure is D(P r, P r ) = d, the hange in the probability of any onditional query z e is bounded by: e d O (z e) O(z e) ed, where O (z e) and O(z e) are the odds of z e under distributions P r and P r respetively. 4 Given p = P r (z e), this result an also be expressed in terms of probabilities: p e d p(e d 1) + 1 P p e d r (z e) p(e d 1) + 1. (8) The advantage of this distane measure is that it an be omputed using loal information for Bayesian networks. Given distributions P r B and P r B indued by two Bayesian networks B and B that differ by only the CPT of a single variable X, if P r(u) > 0 for every u of parents U, the distane measure between P r B and P r B an be omputed as: D(P r B, P r B ) = D(Θ X U, Θ X U ) x,u θ x u θ x u ln min. (9) θ x u x,u θ x u This is useful beause we an ompute the bound on any query hange using Inequality 8 by omputing the loal distane between the CPTs Θ X U and Θ X U in B and B. For example, if X is binary, and we only hange a single parameter θ x u while applying a omplementary hange on θ x u, the bound on the hange in query z e is given by: O(θ x u ) O(θ x u ) OB (z e) O B (z e) O(θ x u ) O(θ x u ), (10) where O(θ x u ) and O(θ x u) are the odds of parameters θ x u and θ x u respetively. Intuitively this means the relative hange in the query odds is bounded by the relative hange in the parameter odds. 3 This measure is a distane measure beause it satisfies positiveness, symmetry and the triangle inequality. 4 The odds of z e under distribution P r is defined as: O(z e) def = P r(z e)/(1 P r(z e)).

We an get a similar result for Markov networks, where the distane measure between distributions indued by two Markov networks that differ by only a single lique table an be omputed by the distane measure between the tables. Theorem 2 Given distributions P r M and P r M indued by two Markov networks M and M whih differ by only the parameters in a single lique C, suh that the lique tables are Θ C and Θ C respetively, the distane measure between P r M and P r M is given by: D(P r M, P r M ) = D(Θ C, Θ C) if ψ(x)/ 0 for all x. 5 ln min, (11) Proof Given instantiation suh that x, the joint potential ψ of x is linear in the parameter, and the ratio of ψ (x) and ψ(x) indued by M and M respetively is: if ψ(x)/ 0. We have: ψ (x) ψ(x) = θ, P r M (x) P r M (x) = K ψ (x) Kψ(x) = K θ K. Note that sine the parameters have hanged, the normalizing onstants K and K for networks M and M respetively are different. Therefore, the distane measure between P r M and P r M is given by: D(P r M, P r M ) x P r M (x) P r M (x) K K θ = D(Θ C, Θ C), ln min x ln min ln min θ K θ K P r M (x) P r M (x) if ψ(x)/ 0 for all x. Therefore, the global distane measure between the distributions indued is equal to the loal distane measure between the individual lique tables. This is useful for omputing the bound on query hange after hanging the parameters in a lique table. For example, what is the bound on the hange in some query value if we apply an arbitrary hange on the single parameter? The distane measure in this ase is: D(P r M, P r M ) = ln θ, 5 This ondition is satisfied when the network parameters are all stritly positive. However, this is a suffiient ondition, not a neessary ondition. The neessary ondition is there exists some x suh that ψ(x)/ 0 for all. This means that hanging any parameter will have some impat on the joint potential ψ. δ 0.2 0.15 0.1 0.05-0.05-0.1-0.15-0.2 0.2 0.4 0.6 0.8 1 Figure 2: The amount of parameter hange δ that would guarantee the query P r(z e) =.75 to stay within the interval [.667,.818], as a funtion of the urrent Bayesian network parameter value p. and the hange in query z e is bounded by: OM (z e) O M (z e) θ. (12) This means for Markov networks, the relative hange in query odds is bounded by the relative hange in the parameter itself, not the relative hange in the parameter odds as for Bayesian networks. This is an important distintion between Markov networks and Bayesian networks. As an example, suppose we have a network and we want to ensure the robustness of the query P r(z e) after we apply a parameter hange. Assume that we define robustness as the relative hange in any query odds to be less than 1.5. For example, if urrently P r(z e) =.75, the new query value must stay in the interval [.667,.818] after the parameter hange. We may ask, how muh hange an we apply to a parameter if we want to ensure robustness? The answers for Bayesian networks and Markov networks are different due to our previous results, as we will show next. For a Bayesian network, the amount of permissible hange is determined by Inequality 10, and is plotted in Figure 2. We an see that the amount of permissible hange is small when the parameters have extreme values lose to 0 or 1, beause the relative odds hange is large when even a very small absolute hange is applied. On the other hand, for a Markov network, the amount of permissible hange is determined by Inequality 12, and is plotted in Figure 3. We an see that the amount of permissible hange is proportional to the parameter values, beause relative hange is used instead of relative odds hange. Therefore, for a Bayesian network, the sensitivity of the network with respet to a parameter is largest for extreme values lose to 0 or 1, and beomes smaller as the parameter approahes.5, while for a Markov network, the sensitivity of the network with respet to a parameter is proportional to its value, and inreases as it grows larger. The distane measure is useful in many aspets of sensitivity analysis. For example, given the possible single parameter hanges in our example in the previous setion, we an hoose the one whih perturbs the network the least aording to the distane measure. In this ase, the most preferred p

δ 0.4 0.3 0.2 0.1-0.1-0.2-0.3-0.4 p 0.2 0.4 0.6 0.8 1 Figure 3: The amount of parameter hange δ that would guarantee the query P r(z e) =.75 to stay within the interval [.667,.818], as a funtion of the urrent Markov network parameter value p. single parameter hange is to derease the parameter θ a2, b 1 from 2 to.533, inurring a distane measure of 1.322. Moreover, we an also use the distane measure to find the optimal solution for hanging all parameters in a lique table, whih is the solution that satisfies Inequality 5 and minimizes the distane measure. As the distane measure D(Θ C, Θ C ) is omputed using the maximum relative hange in the parameters, the relative hanges in all parameters must be the same for the optimal solution, beause to obtain another solution that satisfies the onstraint, we must inrease the relative hange in one parameter while dereasing the relative hange in another, thereby inurring a larger distane measure. For example, in our example from the previous setion, to ensure our query onstraint, we would like to derease the parameters θ a2,b 1 and θ a2, b 1 and inrease the parameters θā2,b 1 and θā2, b 1, suh that the relative hanges in all parameters are the same. However, sine only the ratios between the parameters are important, we an keep the first two parameters onstant and only inrease the last two parameters. The optimal solution of Inequality 5 is: θ a2,b 1 = 0; θ a2, b 1 = 0; θā2,b 1 = 6.257; θā2, b 1 = 8.676. This parameter hange inurs a distane measure of.350. It involves all parameters in the lique table and inurs a smaller distane measure than any of the single parameter hanges omputed earlier. Finally, if we hange the parameters in different lique tables whih do not share any variables, the distane measure an be omputed as the sum of the loal distane measures between the lique tables. This result is also similar to the ase of Bayesian networks [Chan and Darwihe, 2004]. 5 Conlusion In this paper, we expanded some of the main results in the topi of sensitivity analysis from Bayesian networks to Markov networks. We were able to find many parallels between the results in both models even given the differenes in how their parameters are interpreted. Our results allow for the effetive debugging of Markov networks by hanging parameters in a single potential for ensuring query onstraints. We also showed how to ompute a bound on any query hange between two Markov networks that have the same struture, but differ in the parameters of a single potential, and to hoose parameter hanges whih minimize network disturbane. Finally, we identified a key semantial differene between Bayesian networks and Markov networks, relating to how parameter hanges should be quantified. Referenes [Castillo et al., 1997] Enrique Castillo, José Manuel Gutiérrez, and Ali S. Hadi. Sensitivity analysis in disrete Bayesian networks. IEEE Transations on Systems, Man, and Cybernetis, Part A (Systems and Humans), 27:412 423, 1997. [Chan and Darwihe, 2002] Hei Chan and Adnan Darwihe. When do numbers really matter? Journal of Artifiial Intelligene Researh, 17:265 287, 2002. [Chan and Darwihe, 2004] Hei Chan and Adnan Darwihe. Sensitivity analysis in Bayesian networks: From single to multiple parameters. In Proeedings of the Twentieth Conferene on Unertainty in Artifiial Intelligene (UAI), pages 67 75, Arlington, Virginia, 2004. AUAI Press. [Chan and Darwihe, 2005] Hei Chan and Adnan Darwihe. A distane measure for bounding probabilisti belief hange. International Journal of Approximate Reasoning, 38:149 174, 2005. [Darwihe, 2003] Adnan Darwihe. A differential approah to inferene in Bayesian networks. Journal of the ACM, 50:280 305, 2003. [Geman and Geman, 1984] Stuart Geman and Donald Geman. Stohasti relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transations in Pattern Analysis and Mahine Intelligene, 6:721 741, 1984. [Jensen, 1999] Finn V. Jensen. Gradient desent training of Bayesian networks. In Proeedings of the Fifth European Conferene on Symboli and Quantitative Approahes to Reasoning with Unertainty (ECSQARU), pages 190 200, Berlin, Germany, 1999. Springer-Verlag. [Jensen, 2001] Finn V. Jensen. Bayesian Networks and Deision Graphs. Springer-Verlag, New York, 2001. [Kjærulff and van der Gaag, 2000] Uffe Kjærulff and Linda C. van der Gaag. Making sensitivity analysis omputationally effiient. In Proeedings of the Sixteenth Conferene on Unertainty in Artifiial Intelligene (UAI), pages 317 325, San Franiso, California, 2000. Morgan Kaufmann Publishers. [Laskey, 1995] Kathryn B. Laskey. Sensitivity analysis for probability assessments in Bayesian networks. IEEE Transations on Systems, Man, and Cybernetis, 25:901 909, 1995. [Pearl, 1988] Judea Pearl. Probabilisti Reasoning in Intelligent Systems: Networks of Plausible Inferene. Morgan Kaufmann Publishers, San Franiso, California, 1988.