Subset selection for matrices

Similar documents
We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Yimin Wei a,b,,1, Xiezhang Li c,2, Fanbin Bu d, Fuzhen Zhang e. Abstract

Institute for Computational Mathematics Hong Kong Baptist University

MATH36001 Generalized Inverses and the SVD 2015

Introduction to Numerical Linear Algebra II

EE731 Lecture Notes: Matrix Computations for Signal Processing

Intrinsic products and factorizations of matrices

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

The Singular Value Decomposition

EE731 Lecture Notes: Matrix Computations for Signal Processing

A Backward Stable Hyperbolic QR Factorization Method for Solving Indefinite Least Squares Problem

Vector and Matrix Norms. Vector and Matrix Norms

arxiv: v1 [math.na] 1 Sep 2018

Glossary of Linear Algebra Terms. Prepared by Vince Zaccone For Campus Learning Assistance Services at UCSB

The skew-symmetric orthogonal solutions of the matrix equation AX = B

Linear Algebra and its Applications

Chapter 3. Matrices. 3.1 Matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra)

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Math Linear Algebra Final Exam Review Sheet

Compressed Sensing and Robust Recovery of Low Rank Matrices

Equivalence constants for certain matrix norms II

Optimization problems on the rank and inertia of the Hermitian matrix expression A BX (BX) with applications

Linear algebra for computational statistics

7. Dimension and Structure.

ETNA Kent State University

Chapter 1. Matrix Algebra

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Spectrally arbitrary star sign patterns

Lecture 5 Singular value decomposition

Section 3.3. Matrix Rank and the Inverse of a Full Rank Matrix

Some inequalities for sum and product of positive semide nite matrices

Linear Regression and Its Applications

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Fast low rank approximations of matrices and tensors

Linear Algebra in Actuarial Science: Slides to the lecture

SVD, PCA & Preprocessing

Applied Matrix Algebra Lecture Notes Section 2.2. Gerald Höhn Department of Mathematics, Kansas State University

Absolute value equations

MAT 610: Numerical Linear Algebra. James V. Lambers

QR FACTORIZATIONS USING A RESTRICTED SET OF ROTATIONS

The minimum rank of matrices and the equivalence class graph

Scientific Computing

Moore Penrose inverses and commuting elements of C -algebras

Lecture notes: Applied linear algebra Part 1. Version 2

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

MAT 2037 LINEAR ALGEBRA I web:

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Review of Linear Algebra

Matrix Inequalities by Means of Block Matrices 1

Theorem A.1. If A is any nonzero m x n matrix, then A is equivalent to a partitioned matrix of the form. k k n-k. m-k k m-k n-k

Applied Linear Algebra

Math 407: Linear Optimization

Notes on Eigenvalues, Singular Values and QR

Lecture 9: Numerical Linear Algebra Primer (February 11st)

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Chapter 4. Determinants

A Note on Eigenvalues of Perturbed Hermitian Matrices

LinGloss. A glossary of linear algebra

Lecture 1: Review of linear algebra

Review of Matrices and Block Structures

arxiv: v1 [math.pr] 22 May 2008

MOORE-PENROSE INVERSE IN AN INDEFINITE INNER PRODUCT SPACE

Sums of diagonalizable matrices

The Maximum Dimension of a Subspace of Nilpotent Matrices of Index 2

Solution of Linear Equations

Another algorithm for nonnegative matrices

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Interlacing Inequalities for Totally Nonnegative Matrices

Principal Component Analysis

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY

Linear Algebra. Shan-Hung Wu. Department of Computer Science, National Tsing Hua University, Taiwan. Large-Scale ML, Fall 2016

7. Symmetric Matrices and Quadratic Forms

Lecture Summaries for Linear Algebra M51A

1 Determinants. 1.1 Determinant

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.

Customizable triangular factorizations of matrices

Linear Algebra Massoud Malek

The maximal determinant and subdeterminants of ±1 matrices

Conceptual Questions for Review

5.6. PSEUDOINVERSES 101. A H w.

Spectral inequalities and equalities involving products of matrices

Linear Algebra and Eigenproblems

Properties of Matrices and Operations on Matrices

Orthogonalization and least squares methods

Rank and inertia optimizations of two Hermitian quadratic matrix functions subject to restrictions with applications

Chapter 3 Transformations

Tikhonov Regularization of Large Symmetric Problems

On matrix equations X ± A X 2 A = I

Simultaneous Diagonalization of Positive Semi-definite Matrices

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

Modified Gauss Seidel type methods and Jacobi type methods for Z-matrices

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 6

Transcription:

Linear Algebra its Applications 422 (2007) 349 359 www.elsevier.com/locate/laa Subset selection for matrices F.R. de Hoog a, R.M.M. Mattheij b, a CSIRO Mathematical Information Sciences, P.O. ox 664, Canberra, ACT 2601, Australia b Department of Mathematics Computer Science, Technische Universiteit Eindhoven, P.O. ox 513, 5600 M Eindhoven, The Netherls Received 22 March 2005; accepted 8 August 2006 Available online 14 February 2007 Submitted by M. Neumann Abstract We consider the question of how to delete m k rows from a matrix X R m n so that the resulting matrix A R k n is as non-singular as possible. ounds for the singular values of A are derived which decrease only algebraically with m n. In addition a number of applications, subset selection is necessary, are examined. 2006 Published by Elsevier Inc. Keywords: Selection of submatrices; Optimally conditioned submatrices 1. Introduction In some applications it is necessary to select k rows from an m n matrix such that the resulting matrix is as non-singular as possible. That is, for X R m n find a permutation matrix P R m m so that PX =, A R k n, (1) A is the matrix in question. The above problem was originally brought to the authors attention during an investigation of the conditioning of multipoint boundary value problems [3] but it also arises in a number of other applications, some of which we address in the present note. Corresponding author. E-mail addresses: Frank.deHoog@csiro.au (F.R. de Hoog), r.m.m.mattheij@tue.nl (R.M.M. Mattheij). 0024-3795/$ - see front matter ( 2006 Published by Elsevier Inc. doi:10.1016/j.laa.2006.08.034

350 F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 Of course the phrase non-singular as possible is vague, in part, our aim is to present some natural criteria of optimality for various applications. These criteria will be discussed later but for the moment it is useful to focus on the case n = k = rank(x) m P is to be chosen to maximize the smallest nonzero singular value of A (an intuitively attractive notion of nonsingular as possible ). While the statement of the problem is straightforward it is quite difficult to find a sharp lower bound on this singular value. Row selection is often implemented using the usinger Golub algorithm [1] (see also [4], Section 12.2), which is based on a QR decomposition of X T with column interchange to maximize the size of the pivots. While this algorithm usually works well in practice, examples are known [5, p. 31] the pivot size does not accurately reflect the size of the singular values. Consequently, the analysis of such algorithms leads to poor lower bounds for the smallest singular value of A. A more promising approach is to relate the problem of row selection when k = n to that of finding an interpolatory projection from a anach space (R m in our case) to a n dimensional subspace (the range of X in our case). A review of such projections is given in [2]. Taking the approach of [6] (see also [2]) to our problem, we choose P to maximize det(a), the magnitude of the determinant of A. Then [ ] I PX = A, H h ij 1, i= 1,...,m n, j = 1,...,n, since interchanging the jth (n + i)th rows of PX cannot result in an increase in the magnitude of the determinant. It now follows from the usual variational formulation for singular values that σ n (X) σ n (A) σ n (X), 1 + n(m n) σ n (X) σ n (A) are the smallest singular value of X A respectively. Although the bound derived above is unlikely to be sharp, it is quite useful as the lower bound decreases only algebraically with m n. It indicates that row selection to maximize det(a) will give reasonable results even though the optimum in this case is to maximize the smallest singular value. This appears to be true more generally in that a permutation P, chosen optimally for a specific application, will usually be a reasonable choice for other applications. In the sequel we shall exp on this, by generalizing the notion that det(a) is maximal, derive useful bounds for other optimality criteria. An interesting paper such bounds play a role is [7], they appear for example in the context of integral equations. Although maximizing det(a) does provide a useful approach to determining bounds, it does not provide insight to how the permutation P might be determined. We therefore also examine an alternative approach which is based on deleting rows, one at a time. Specifically, we delete each row to minimize the Frobenius norm of the pseudo inverse of the resulting matrix at each step. It turns out that the bounds obtained are comparable to those obtained by maximizing det(a). The paper is organised as follows. In Section 2 we introduce some notation present the basic results. Then, in Section 3 we consider a number of applications indicate how the results of Section 2 can be applied to give useful bounds. 2. The main results First we introduce some notation. For x R p we use the usual Euclidean norm x = x T x.

F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 351 Given a matrix X R m n we denote the non-zero singular values by σ 1 (X) σ 2 (X) σ r (X), the spectral norm by X =σ 1 (X) the Frobenius norm by X F = trace(x T X). r = rank(x), From the singular value decomposition of X, it follows that X = UDV T, D = diag(σ 1 (X), σ 2 (X),...,σ r (X)) U R m r, V R n r satisfy U T U = V T V = I. Using the singular value decomposition, we define the Moore Penrose generalised inverse X + R n m by X + := VD 1 U T. On post multiplying (1)byA + z, it follows that σ 2 r (X) VT A + z 2 XA + z 2 = AA + z 2 + A + z 2. On noting A = AVV T, it is straightforward to verify that A + = V(AV) + hence, σr 2 (X) A+ z 2 XA + z 2 = AA + z 2 + A + z 2. (2) Thus, σ 2 r (X) A+ 2 XA + 2 1 + A + 2 (3) σ 2 r (X) A+ 2 F XA+ 2 F = rank(a) + A+ 2 F. (4) Eqs. (3) (4) demonstrate that A + will not be too large if A + is not large. Further, on applying the usual variational formulation for singular values, we find from (2) that σr 2 (X)σ l 2 (A+ ) 1 + σl 2 (A+ ) 1 + 1 l A+ 2 F, l rank(a). (5) Additional bounds can be established when k rank(x). It is then clearly possible to find a permutation matrix such that rank(a) = rank(x), (1) can then be rewritten as [ ] I PX = A, H = A +, rank(x) k, (6) H

352 F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 from which it follows that Az 2 Xz 2 (1 + H 2 ) Az 2, H = A +, rank(x) = rank(a) k. Then, again applying the usual variational formulation for singular values, we obtain σl 2 2 (A) σl )σl 2 H = A+, rank(x) = rank(a) k. (7) Again, this indicates that it is desirable to choose the permutation so that H is not large. We now show that a permutation exists so that (6) holds with a matrix H that is not large. Theorem 1. Let r = rank(x) k. Then, there is a permutation matrix P so that (6) holds with H 2 (m k)r F k r + 1. Proof. Clearly, it is possible to find a permutation matrix so that (1) holds with rank(a) = r. Then HA = A + A =, which establishes (6). To obtain the bound on the Frobenius norm of H, we can without loss of generality take r = n because, if r<n, we can effectively delete columns of X by post multiplication by an orthogonal matrix. We therefore take r = n, choose P to maximize det(a T A) define Q = A(A T A) 2 1. Then PX = = [ ] Q (A T A) 2 1, Y Y = (A T A) 2 1. Let q T i y T j denote the ith jth rows of Q Y respectively. Then det(ãa T ijãa ij ) = det((a T A) 1 2 (I qi q T i + y j y T j )(AT A) 1 2 ), ÃA ij is the matrix obtained from A by replacing the ith row of A with the jth row of. Since interchanging the ith row of A the jth row of cannot lead to an increase in the determinant, it follows that det(a T A) det((a T A) 2 1 (I qi q T i + y j y T j )(AT A) 2 1 ) = det(a T A)(1 + y j 2 q i 2 y j 2 q i 2 + (q T i y j )2 ). Consequently, y j 2 q i 2 y j 2 q i 2 + (q T i y j )2 0, since k q i 2 = n i=1 k (q T i y j )2 = Qy j 2 = y j 2, i=1

F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 353 we obtain k y j 2 n(1 + y j 2 ) + y j 2 0 or, equivalently, y j 2 n k n + 1. It is straightforward to show that H = A + = (A T A) 1 A T = YQ T the result now follows from H 2 n(m k) F = Y 2 F k n + 1. It follows immediately from Theorem 1, (4), (7) (5) that: Corollary 1. Let r = rank(x) k. Then there is a permutation matrix P so that (6) holds with ( ) A + m r + 1 F r X +, k r + 1 k r + 1 k r + 1 + r(m k) σ i(x) σ i (A) σ i (X), i = 1,...,r (k r + 1)(r i + 1) (k r + 1)(r i + 1) + r(m k) σ r(x) σ i (A), i = 1,...,r. Theorem 1 Corollary 1 give a generalization of the results obtained in the introduction. Again the lower bound decreases only algebraically with m r. Whilst Corollary 1 demonstrates that there is a permutation matrix such that the smallest singular value of A does not become too small, it does not provide guidance on how such a permutation might be calculated. We now examine an alternative approach which is based on deleting rows, one at a time. Specifically, we delete each row to minimize the Frobenius norm of the pseudo inverse of the resulting matrix at each step. An advantage of this approach is that it provides a constructive way of calculating the permutation P. Theorem 2. For X R m n with r = rank(x), there is a permutation matrix P R m m such that (1) holds for k rank(a) = r A + 2 F m r + 1 k r + 1 X+ 2 F. Proof. Without loss of generality we take r = n, since otherwise we can effectively delete columns of X by post multiplication by an orthogonal matrix. We proceed by induction, first consider the case when just one row is deleted (that is, k = m 1). Let us write

354 F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 Then X = x T 1 x T 2. x T m, x j R n, j = 1,...,m. X T X x k x T k = (XT X) 2 1 (I qk q T k )(XT X) 2 1, q k = (X T X) 2 1 xk. Thus, (1 q k 2 )(X T X x k x T k ) 1 = (1 q k 2 )(X T X) 1 + (X T X) 1 x k x T k (XT X) 1. Now, let { } p = arg min trace((x T X x k x T k ) 1 ). k Then, (1 q k 2 )trace((x T X x p x T p ) 1 ) (1 q k 2 )trace((x T X x k x T k ) 1 ) On noting that ( m m ) q k 2 = trace q k q T k ( m ) = trace (X T X) 2 1 xk x T k (XT X) 2 1 ( ( m ) ) = trace (X T X) 2 1 x k x T k (X T X) 2 1 ( ) = trace (X T X) 2 1 X T X(X T X) 2 1 = n = (1 q k 2 ) X + 2 F + (XT X) 1 x k 2. ( m m ) (X T X) 1 x k 2 = trace (X T X) 1 x k x T k (XT X) 1 ( ( m ) ) = trace (X T X) 1 x k x T k (X T X) 1 = trace ((X T X) 1 X T X(X T X) 1) = X + 2 F,

we obtain F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 355 (m n)trace((x T X x p x T p ) 1 ) m (1 q k 2 )trace((x T X x k x T k ) 1 ) = (m n + 1) X + 2 F. Thus, the result follows for k = m 1. Proceeding, by eliminating a row at the time, the result follows by induction. Note however that the permutation matrix in Theorems 1 2 will not generally be the same. Thus it is not necessarily the case that the results in Theorem 1 2 will hold for the same permutation matrix. We now show that Theorem 1 can be established using Theorem 2, which has the advantage of providing a constructive way to determine P. Corollary 2. Let r = rank(x) k. Then, there is a permutation matrix P so that (2) holds with H 2 F = A+ 2 F r(m k) k r + 1. Proof. Consider the singular value decomposition X = UDV T, which is described in more detail in Section 1, apply Theorem 2 to U. Then, there is a permutation matrix P so that [ PU = PXVD 1 AVD 1 ] = VD 1, with (AVD 1 ) + 2 F m r + 1 k r + 1 PU 2 F r(m r + 1) k r + 1. The result now follows from (4) the identity XA + = UDV T A + = U(AVD 1 ) +. We now extend the results to the case when k, the number of rows in A is smaller than r, the rank of A. Theorem 3. Let k<rank(x). Then, there is a permutation matrix P so that (1) holds with k A + 2 F (m k + 1) σl 2 (X). l=1 Proof. Consider the singular value decomposition X = UDV T with the following partitioning: U = [ U 1 U 2 ], U1 R m k,

356 F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 [ ] D1 0 D =, D 0 D 1 R k k, 2 V = [ ] V 1 V 2, V1 R n k. We can write X = X 1 + X 2, X 1 = U 1 D 1 V T 1 X 2 = U 2 D 2 VT 2. Note that X 1 is the truncated singular value expansion of X for which rank(x 1 ) = k. Therefore, from Theorem 2, there is a permutation matrix P such that [ ] A1 PX 1 =, A 1 R k n, rank(a 1 ) = k, A + 1 2 F (m k + 1) X+ 1 2 F. Now, with PX =: 1, PX 2 =: [ A2 2 ], A, A 2 R k n, we have AA T = A 1 A T 1 + A 2 AT 2 hence A+ 2 F A+ 1 2 F. The result now follows on applying the above inequality for A + 1 2 F. 3. Some applications As our first example, consider the least squares problems min Xx f, x X R m n, rank(x) = r suppose we wish to delete some points from the design matrix. Thus, we require a permutation matrix P for which PX =, A R k n, k n then solve the modified problem min Aa f ˆ, a ˆ f consists of the first k elements of Pf. Let us assume that the underlying linear model is correct, although the data f is contaminated by noise. That is, f = s + ε, s range(x) ε i,i = 1,...,mare independent realizations of a rom variable with mean zero variance β 2. Then it is sensible to minimize E( A + ˆ f A + ŝs 2 ) = β 2 A + 2, E( ) is the expectation operator ŝs consists of the first k elements of Ps. From Corollary 1 we immediately obtain a bound for this quantity.

F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 357 Lemma 1. For X R m n, rank(x) = n, there is a permutation P such that [ ] A PX =, A R k n, k n ( ) X + 2 F A+ 2 F n(m k) 1 + X + 2 F k + 1 n. Thus, while the natural criterion is to minimize A + 2 F in this application, the bound obtained by maximizing det(a T A) grows only algebraically with n m. A somewhat sharper bound is given by Theorem 2, which is based on deleting rows by minimizing A + 2 F at each step. For our second application, consider the least squares problem min X T x f, x (8) X R m n,n>m= rank(x). If the problem is very badly conditioned, some modification to the usual approach is necessary to provide some stabilization. As is discussed in [[4], Section 12.2], it is sometimes advantageous to impose on (8) the constraint that x has at most k<m non-zero components. That is, replace (8) by min A T a f, a PX =, A R k n P R m m is a permutation matrix. To analyse the constrained problem it is useful to introduce the truncated singular value decomposition. As previously, let X = UDV T = U 1 D 1 V T 1 + U 2 D 2 VT 2 = X 1 + X 2 PX 1 = [ A1 1 ], PX 2 = [ A2 2 ]. As discussed in [4], a useful measure of how well the constrained problem performs is the quantity V 1 V T 1 A+ A a bound for this is now derived. Lemma 2 V 1 V T 1 A+ A σ k+1(x) σ k (A). Proof. As in [4, pp. 417 418] V 1 V T 1 A+ A = V T 2 AT (AA T ) 1/2

358 F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 AV 2 A + σ k+1(x) σ k (A). Lemma 3. There exists a permutation P such that σ k (A) σ k (X) 1 + k(m k) V 1 V T 1 A+ A σ k+1(x) 1 + k(m k). σ k (X) Proof. Clearly, rank(x 1 ) = k it follows from Corollary 1 that there is a permutation matrix P such that [ ] A1 1 PX 1 =, σ k (A 1 ) 1 + k(m k) σ k(x 1 ). 1 Since σ k (A) σ k (A 1 ) σ k (X) = σ k (X 1 ) this establishes the required bound for σ k (A). The second part of the lemma now follows immediately from Lemma 2. As our final example, consider the underdetermined system X T x = f, X R m n, m>n= rank(x) for which we wish to determine a solution with at most k n non zero entries. A solution to this problem is [ x = P T (A + ] ) T f, 0 PX =, A R k n P is a permutation matrix. It is reasonable in this application to choose P to minimize max f f T A + f T X + = σ 1 (XA + ), since (X + ) T f is the minimum norm solution to X T x = f. The quantity σ 1 (XA + ) also plays a key role in the conditioning of the problem. If ÂA R k n A ÂA is sufficiently small, a stard perturbation argument shows that, to leading order { } a âa κ(x) σ 2 A ÂA 1 x L (XA+ ) + σ 1 (XA + ) f ˆf, A f

F.R. de Hoog, R.M.M. Mattheij / Linear Algebra its Applications 422 (2007) 349 359 359 a = (A + ) T f, âa = (A + ) T f, ˆ x L = (X + ) T f κ(x) = σ 1 (X)/σ k (X). Thus, in this problem it seems natural to take σ 1 (XA + ) as a measure of the conditioning of the submatrix A. Lemma 4. There exists a permutation matrix P for which σ 1 (XA + n(m k) ) 1 + k n + 1. Proof. This follows immediately from Theorem 1. References [1] P.A. usinger, G.H. Golub, Linear least squares solution by Householder transformations, Numer. Math. 7 (1965) 269 276. [2] E.W. Cheney, K.H. Price, Minimal projections, in: Approximation Theory (Proceedings of Symposium, Lancaster 1969), New York, 1970, pp. 261 289. [3] F.R. de Hoog, R.M.M. Mattheij, On the conditioning of multipoint integral boundary value problems, SIAM J. Math. Anal. 20 (1989) 200 214. [4] G.H. Golub, C.F. van Loan, Matrix Computations, John Hopkins University Press, altimore, 1983. [5] C.L. Lawson, R.J. Hanson, Solving Least Squares Problems, Prentice Hall, Englewood Cliffs, 1974. [6] A.F. Ruston, Auerbach s theorem, Proc. Camb. Phil. Soc. 58 (1964) 476 480. [7] E.E. Tyrtyshnikov, Incomplete cross approximation in the Mosaic Skeleton method, Computing 64 (2000) 367 380.