Combining Linear Equation Models via Dempster s Rule

Combining Linear Equation Models via Dempster s Rule Liping Liu Abstract. This paper proposes a concept of imaginary extreme numbers, which are like traditional complex number a + bi but with i = 1 being replaced by e = 1/0, and defines usual operations such as addition, subtraction, and division on the numbers. It applies the concept to representing linear equations in knowledgebased systems. It proves that the combination of linear equations via Dempster s rule is equivalent to solving a system of simultaneous equations or finding a least-square estimate when they are overdetermined. 1 Introduction The concept of linear belief functions unifies the representation of a diverse range of linear models in expert systems [Liu et al., 2006]. These linear models include linear equations that characterize linear deterministic relationships of continuous or discrete variables and stochastic models such as linear regressions, linear time series, and Kalman filters in which some variables are deterministic while others stochastic. They also include normal distributions that describe probabilistic knowledge on a set of variables, a lack of knowledge such as ignorance and partial ignorance, and direct observations or observations with missing values. Despite the varieties, the concept of linear belief functions unifies them as manifestations of a single concept, represents them as matrices with the same semantics, and combine them by a single mechanism, the matrix addition rule, which is consistent with Dempster s rule of combination [Shafer, 1976]. What makes the unification possible is the sweeping operator. Nevertheless, when the operator is applied to knowledge representation, a division-by-zero enigma often arises. For example, when two linear models are combined, their matrix representations must be fully swept via the old matrix addition rule [Dempster, 2001] or partially swept via the new matrix addition rule [Liu, 2011b]. This poses no issue Liping Liu The University of Akron T. Denœux & M.-H. Masson (Eds.): Belief Functions: Theory & Appl., AISC 164, pp. 255 265. springerlink.com c Springer-Verlag Berlin Heidelberg 2012

256 L. Liu to linear models with a positive definite covariance matrix [Liu, 2011a]. However, for deterministic linear models such as linear equations, sweeping points are often zero, and a sweeping, if needs to be done, will have to divide regular numerical values by zero, a mathematical operation that is not defined. The division-by-zero issue has been a challenge that hinders the development of intelligent systems that implements linear belief functions. In this paper, I propose a notion of imaginary extreme numbers to deal with the division-by-zero problem. An imaginary extreme number is a complex number like 3 + 4e with extreme number e = 1 0. On these imaginary numbers, usual operations can be defined. The notion of imaginary extreme numbers makes it possible to represent linear equations as knowledge in intelligent systems. As we will illustrate, a linear equation is transformed into an equivalent one by a sweeping from a zero variance and a reverse sweeping from an extreme inverse variance. The notion also makes it possible to combine linear equations as independent pieces of knowledge via Dempster s rule of combination. We will show that the combination of linear equations corresponds to solving the equations or finding the least-square estimate when the equations are over-determining. 2 Matrix Sweepings Sweeping is a matrix transformation that starts from a sweeping point, a square submatrix, and iteratively spreads the change across the entire matrix: Definition 1. Assume real matrix A is made of submatrices as A =(A ij ) and assume A ij is a square submatrix. Then a forward (reverse) sweeping of A from A ij replaces submatrix A ij by its negative inverse (A ij ) 1, any other submatrix A ik in row i and any submatrix A kj in column j are respectively replaced by ( )(A ij ) 1 A ik and ( )A kj (A ij ) 1, and the remaining submatrix A kl not in the same row or column as A ij,i.e.,k i and j l, by A kl A kj (A ij ) 1 A il. Note that forward and reverse sweepings defined above operationally differ only in the sign for the elements in the same column or row as the sweeping point. Yet the difference is significant in that forward and reverse sweepings cancel each other s effects, and thus the modifiers forward and reverse are justified. Both forward and reverse sweeping operations may be also defined to sweep from a square submatrix as a sweeping point. If a sweeping point is positive definite such as a covariance matrix, then a sweeping from the submatrix is equivalent to a series of successive sweepings from each of the leading diagonal elements of the submatrix [Liu, 2011a]. When applied to a moment matrix that consists of a mean vector and a covariance matrix, sweeping operations can transform a normal distribution to its various forms,

Combining Linear Equation Models via Dempster s Rule 257 each with interesting semantics. Assume X has mean vector μ and covariance matrix Σ. Then in general the moment matrix is [ ] μ M(X)= Σ and its fully swept form M( [ ] μσ 1 X )= Σ 1 represents the density function of X. NoteM( X ) symbolizes that M(X) has been swept from the covariance matrix of X, ortobebrief,thatm(x) has been swept from both X. It is interesting to imagine that, if the variances of X are so huge that their inverse covariance matrix Σ 1 0, then M( X )=0. Thus, a zero fully swept matrix is the representation of ignorance; intuitively, we are ignorant about X if its variances are infinite. A partial sweeping has more interesting semantics. For example, for the normal distribution of X, Y,andZ with moment matrix: 342 M(X,Y, Z)= 420 252, 026 its sweeping from the variance terms for X and Y is a partially swept matrix 0.4375 0.625 0.75 M( X, Y,Z)= 0.3125 0.125 0.25 0.125 0.25 0.5. 0.25 0.5 5 This contains two distinct pieces of information about the variables [Liu, 2011a]. First, the submatrix corresponding to variables X and Y, M( X, 0.4375 0.625 Y )= 0.3125 0.125 0.125 0.25 represents the density function of X andy. Second, the remaining partial matrix 0.75 0.25 0.5 0.25 0.5 5 represents a regression model Y = 0.75 0.25X + 0.5Y + ε with ε N(0,5).Since this regression model alone casts no information on independent variables X and Y, the missing elements in the above partial matrix shall be zero. Furthermore,

258 L. Liu when the conditional variance of Z vanishes, the conditional distribution reduces to a regular linear equation model Z = 0.75 0.25x + 0.5y as represented by the matrix: 0 0 0.75 M( X, Y,Z)= 0 0 0.25 0 0 0.5. 0.25 0.5 0 Here M( X, Y,Z) represents a generic moment matrix of X, Y,andZ with X and Y being swept. Note that it has been long realized that a linear model such as a regression model or a linear equation is a special case of a multivariate normal distribution [Khatri, 1968].What is new, however, is that with sweeping operations, it can be uniformly represented as a moment matrix or its partially swept form. 3 Imaginary Numbers In this section I propose a new type of imaginary numbers, called extreme numbers, and use it to resolve the division-by-zero issue. Just as a usual imaginary number uses i for non-existent 1, we use e for 1 0, which also does not exist. Also, as a usual imaginary number consists of two parts, a real part and an imaginary part, an imaginary extreme number consists of the same two parts. For example, 3 2e is an extreme number with 3 as real part and 2 as imaginary part. When imaginary part vanishes, an extreme number reduces to a real one. When its imaginary part is nonzero, we call an extreme number true extreme number. When its real part is zero, we call the extreme number pure extreme. When both real and imaginary parts are zero, the extreme number is zero, i.e., a + be = 0 if and only if a = 0andb = 0. Thus, the system of extreme numbers includes real numbers as a subset. Extreme numbers may be added, subtracted, or scaled as usual imaginary numbers. For any extreme number a + be and a real number c, their multiplication, or scaling of a + be using scale c is defined as c(a + be)=(a + be)c = ac + bce.for any two extreme number a 1 + b 1 e and a 2 + b 2 e, their addition is defined as (a 1 + b 1 e)+(a 2 + b 2 e)=(a 1 + a 2 )+(b 1 + b 2 )e.clearly, the system of extreme numbers is closed under the operation of scaling, addition, and subtraction. Unlike usual imaginary numbers, the multiplication of two extreme numbers is not defined because it is not closed operationally. However, division can be defined here: for any two extreme number a 1 + b 1 e and a 2 + b 2 e, their division is defined as follows: a 1 + b 1 e a 2 + b 2 e = b 1 b 2 if b 2 0. If the denominator is a nonzero real number, then division reduces to scaling. If the denominator is zero and the numerator is one, i.e., b 1 = 0anda 1 = 1,

Combining Linear Equation Models via Dempster s Rule 259 the division is e = 1/0 via definition. Also, 0/0 is defined to be 0 to be consistent with scaling, i.e., 0(0 + 1e)=0 + 0e = 0. Because division generally cancels out imaginary parts, the operation of multiplication followed by division, called crossing, can be defined. For any three extreme numbers a 1 + b 1 e, a 2 + b 2 e,anda 3 + b 3 e, their crossing is defined as follows: (a 1 + b 1 e)(a 2 + b 2 e) = a 1b 2 + a 2 b 1 + b 1b 2 e a 3 + b 3 e b 3 b 3 if b 3 0. Crossing reduces to division if one of the multiplicants a 1 + b 1 e and a 2 + b 2 e is real, i.e., b 1 b 2 = 0. If at the same time the denominator is a nonzero real number, i.e., b 3 = 0anda 3 0, it is reduced to scaling. It is consistent with the definition of extreme numbers if the dividor a 3 + b 3 e = 0, and b 1 = 0, b 2 = 0. Extreme numbers may be extended to extreme matrices with the inverse of zero matrix being defined as 0 1 = Ie, wherei is an identity matrix. In general, A + Be with real part A and imaginary part B, where both A and B are of the same dimensions. Operations on extreme matrices can be adopted from those for extreme numbers with slight modifications on division and crossing. For any two extreme matrices A 1 + B 1 e and A 2 + B 2 e,ifb 2 is nonsingular, then (A 1 + B 1 e)(a 2 + B 2 e) 1 = B 1 (B 2 ) 1 (A 2 + B 2 e) 1 (A 1 + B 1 e)=(b 2 ) 1 B 1 For any three extreme matrices A 1 + B 1 e, A 2 + B 2 e,anda 3 + B 3 e,ifb 3 is nonsingular, then their crossing is defined as A 1 (B 3 ) 1 B 2 + B 1 (B 3 ) 1 A 2 + B 1 (B 3 ) 1 B 2 e. 4 Equation Combination Intuitively, a linear equation carries partial knowledge on the values of some variables through a linear relationship with other variables. If each of such equations is considered an independent piece of knowledge, its combination with other similar knowledge will render the values more certain. When there exist sufficient number of linear equations, their combination may jointly determine a specific value of the variables with complete certainty. Therefore, the combination of linear equations should correspond to solving a system of simultaneous equations. In this section, we will prove this statement. In genearl, a linear equation may be expressed explicitly as X n = b + a 1 X 1 + a 2 X 2 +... + a n 1 X n 1 (1) or implicitly as a 1 X 1 + a 2 X 2 +... + a n 1 X n 1 + a n X n = b. (2)

260 L. Liu The matrix representation for the explicit expression is straightforward: 0... 0 b M( X 1,..., 0... 0 a 1 X n 1,X n )=............ 0... 0 a n 1. a 1... a n 1 0 This partially swept matrix indicates that we have ignorance on the values of X 1, X 2,..., and X n 1 ; thus they correspond to a zero submatrix in the fully swept form. While given X 1, X 2,..., and X n 1,thevalueofX n is b for sure; thus its conditional mean and variance are respectively b and zero. Of course, in algebra, a variable on the right-hand-side can be moved to the left-hand-side through a linear transformation. For example, if a 1 0, Equation 1 can be equivalently turned into X 1 = b a 2 X 2... a n 1 X n 1 + 1 X n. a 1 a 1 a 1 a 1 This transformation can be also done through the sweepings of matrix representations first by a foward sweeping from X n and then a backward sweeping from X 1. An implicit expression like Equation 2 may be represented as two separate linear equations in explicit forms: a 1 X 1 + a 2 X 2 +... + a n 1 X n 1 + a n X n = U and U = b. Their matrices are respectively 0... 0 0 M 1 ( X 1,..., 0... 0 a 1 X n,u)=............ 0... 0 a n a 1... a n 0 and [ ] b M 2 (U)=. 0 To combine them via Dempster s rule, we sweep both matrirces from U respectively into M 1 ( X 1,..., X n, U ) as 0... 0 0 (a 1 ) 2 e... a 1 a n ea 1 e............ a n a 1 e... (a n ) 2 ea n e a 1 e... a n e e and M 2 ( [ ] be U )=, e

Combining Linear Equation Models via Dempster s Rule 261 and then add the results position-wise into M( X 1,..., X n, U ) as 0... 0 be (a 1 ) 2 e... a 1 a n e a 1 e............ a n a 1 e... (a n ) 2 ea n e. a 1 e... a n e 2e To remove the auxiliary variable U, we shall unsweept M( X 1,..., X n, U ) from U into M( X 1,..., X n,u) and then remove U by projecting the result to the variables X 1, X 2,..., and X n. We will obtain a fully swept matrix representation M( X 1,..., X n ) for the implicit linear equation 2 as 1 2 b( ) a 1... a n e a 1... ( ) a 1... a n e (3) 1 2 a n Assume coefficient a n 0, we can then unswept it from X n and obtain M( X 1,..., X n 1,X n ) as 0... 0 b/a n 0... 0 a 1 /a n............ 0... 0 a n 1 /a n, a 1 /a n... a n 1 /a n 0 which is the matrix representation for an explicit form for equation 2: X n = b a 1 X 1... a n 1 X n 1. a n a n a n Now let us study the representation and combination of multiple linear equations. For explicit expressions, without loss of generality, assume two linear equations are respectively Y = b 1 + XA 1 and Y = b 2 + XA 2,whereY is a single variable, X is n dimensional horizontal vector, b 1 and b 2 are constant values, and A 1 and A 2 are n dimensional vertical vectors. Their matrix representations are M 1 ( X,Y )= 0 b 1 0 A 1, (A 1 ) T 0 M 2 ( X,Y )= 0 b 2 0 A 2. (A 2 ) T 0 To combine them, we need to sweep both matrices from Y andthenaddthem position-wise into M( X, Y ) as

262 L. Liu b 1(A 1 ) T e b 2 (A 2 ) T e (b 1 + b 2 )e A 1 (A 1 ) T e A 2 (A 2 ) T e (A 1 + A 2 )e. (A 1 + A 2 ) T e 2e Now unsweeping M( X, Y ) from Y, we obtain M( X,Y ) as 1 2 (b 2 b 1 )(A 1 A 2 ) T e (b 1 + b 2 )/2 1 2 (A 1 A 2 )(A 1 A 2 ) T e (A 1 + A 2 )/2. (A 1 + A 2 ) T /2 0 Comparing to Equation 3, the above matrix represents the implicit linear equation: X(A 1 A 2 )=b 2 b 1 for X along with the conditional knowledge of Y given X. It is trivial to note that the combination is equivalent to solving linear equations Y = b 1 + XA 1 and Y = b 2 + XA 2 by substitution: b 1 + XA 1 = b 2 + XA 2. When linear equations are expressed implicitly, their combination is equivalent to forming a larger system of linear equations. Assume XA = U and XB = V are two systems of linear equations on a vector of variables X, U, andv, whereu and V are distinct vectors of auxiliary variables, and A and B are appropriate coefficient matrices. Their matrix representations are M( X,U)= 0 0 0 A, A T 0 M( X,V )= 0 0 0 B. B T 0 Since both matrices have been swept from common variables X, they can be directly summed according to the new generalized rule of combination [Liu, 2011b]: 0 0 0 M( X,U,V)= 0 AB A T 00, B T 00 which corresponds to X [ AB ] = [ UV ]. In words, the combination of XA= U and XB= V is identical to a system of linear equations joining both XA= U and XB= V.

Combining Linear Equation Models via Dempster s Rule 263 To understand what it really means by combining linear equations, let us perform sweepings on the matrix representation for a system of m equations, XA= U, where X is a vector of n variables, and U is a vector of m variables, and A is a n m coefficient matrix. First, assume n m and all linear equations are independent, i.e., none is linear combination of others, and thus there is a subvector of X that can be solved in terms of other variables. Without loss of generality, assume X =(X 1,X 2 ) with X 1 being any subvector of m variables that can be solved and A is split vertically into two submatrices A 1 and A 2 with A 1 being a nonsingular m m matrix. Then we have X 1 A 1 + X 2 A 2 = U, which is represented as 0 0 0 M( X 1, X 2,U)= 0 0 A 1 0 0 A 2. A T 1 AT 2 0 Apply a forward sweep to M( X 1, X 2,U) from U: 0 0 0 M( X 1, X 2, U )= ea 1 A T 1 ea 1A T 2 ea 1 ea 2 A T 1 ea 2A T 2 ea 2 ea T 1 ea T 2 ei and unsweep M( X 1, X 2, U ) from X 1. Noting that A 1 is nonsingular and (A 1 A T 1 ) 1 =(A T 1 ) 1 (A 1 ) 1, we can easily verify that M(X 1, X 2, U ) is 0 0 0 0 (A T 1 ) 1 A T 2 (AT 1 ) 1 A 2 (A 1 ) 1 0 0, (A 1 ) 1 0 0 which is the matrix representation of X 1 = X 2 A 2 (A 1 ) 1 +U(A 1 ) 1. Therefore, sweeping from U and unsweeping from X 1 is the same as solving for X 1 in terms of U. Second, assume the system XA = C contains m equations and n variables with n m, C being an n dimensionalvector, and A has rank n. Using auxiliary variable U, the system is equivalent to the combination of

264 L. Liu M( 0 0 X,U)= 0 A A T 0 with or via extreme numbers, [ ] C M(U)= 0 M( X, 0 Ce U )= AA T e Ae. A T e 2Ie Unsweeping M( X, U ) from the inverse covariance matrix of U, we obtain M( 1 2 X,U)= CAT e C/2 1 2 AAT ea/2. A T /2 0 Since A has rank n, AA T is positive definite. Thus, we can unsweep M( X,U) from the inverse covariance matrix of X and obtain M(X,U) as CAT (AA T ) 1 1 2 C[I + AT (AA T ) 1 A] 0 0 0 0 implying that, after combination, variable X takes on value X = CA T (AA T ) 1 with certainty. Note that this solution is the least-square estimate of X from regression model XA = C with A being the observation matrix for independent variables and C being the observations for a dependent variable. In addition, the auxiliary variable U takes on the value U = 1 2 C[I + AT (AA T ) 1 A] (4) with certainty. This seems to be in conflict with initial component model U = C. However, one shall realize that, when m > n, there exist only n independent linear equations. Thus, only n variables of U can take independent observations, and the remaining n m variables take on the values as derived from those observations. Otherwise, U = C will have conflicting observations on some or all variables. Equation 4 represents values that are closest to the observations if there is any conflict. In fact, in the special case when m = n,wehave (AA T ) 1 =(A T ) 1 A 1.

Combining Linear Equation Models via Dempster s Rule 265 Thus M(X,U)= CA 1 C 0 0, 0 0 implying that X = CA 1 and U = C with certainty. This is simply the solution to XA= C. 5 Conclusion In knowledge-based systems, extreme numbers arise whenever a deterministic linear model like a linear equation exists in the knowledge base. A linear model is represented as a marginal or conditional normal distribution. For a linear equation, its conditional variance is zero, and its matrix sweeping from such a zero variance turns the matrix into an extreme one. This paper studied the application of extreme numbers to representing and transforming linear equations and combining them as belief functions via Dempster s rule. When a number of linear equations are under-determined, their combination corresponds to solving the equations for some variables in terms of others. When they are just determined, their combination corresponds to solving the equations for all the variables. When they are overdetermined, their combination corresponds to finding the least-square estimate of all the variables.the meaning of the combination in such a case should be studied by future research. References [Dempster, 2001] Dempster, A.P.: Normal belief functions and the kalman filter. In: Saleh, A.K.M.E. (ed.) Data Analysis from Statistical Foundations, pp. 65 84. Nova Science Publishers, Hauppauge (2001) [Khatri, 1968] Khatri, C.G.: Some results for the singular normal multivariate regression nodels. Sankhya A 30, 267 280 (1968) [Liu, 2011a] Liu, L.: Dempster s rule for combining linear models. Technical report, Department of Management, The University of Akron, Akron, Ohio (2011a) [Liu, 2011b] Liu, L.: A new rule for combining linear belief functions. Technical report, Department of Management, The University of Akron, Akron, Ohio (2011b) [Liu et al., 2006] Liu, L., Shenoy, C., Shenoy, P.P.: Knowledge representation and integration for portfolio evaluation using linear belief functions. IEEE Transactions on Systems, Man, and Cybernetics, Series A 36(4), 774 785 (2006) [Shafer, 1976] Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)