, Octoer/Novemer 2009 hese notes give the detailed workings of Fisher's reduced rank Kalman filter (RRKF), which was developed for use in a variational environment (Fisher, 1998). hese notes are my interpretation of the method and an outline of how I would implement it. Fisher's approach uses flow-dependent information from a Hessian singular vector (HSV) calculation and I propose an alternative that uses information from an availale ensemle, made up of ensemle memers (EMs). 1. What is a reduced rank Kalman filter? In VAR, the B-matrix used is a static representation of forecast error covariances and as such VAR would e expected to perform poorly in an environment where the actual forecast error covariance statistics change significantly from case to case (e.g. Fisher, 2007; Bannister, 2008). he RRKF is an attempt to lend-in to the Var. prolem flow-dependent aspects in a mathematically formal manner. wo sources of flow-dependent information are considered here: firstly HSV information (as in Fisher's original formulation (Fisher, 1998)) and secondly from EM information. A RRKF in this context may e regarded as a modification to the existing B-matrix in VAR that allows the dynamical evolution of a suspace of the state vector (where the suspace is defined as that spanned y either the HSVs or y the EMs). Each approach is outlined elow. 2. Source A of flow-dependent information to lend with the B-matrix (Hessian singular vectors) One way of identifying the suspace that will e treated with explicit flow dependence is to use a HSV calculation. Let the size of the suspace e K, which can e chosen aritrarily, ut restricted in practice y cost. K represents the numer of singular vectors. he size of the full model space is N and in these notes it is recognised that K N. Fisher defines the suspace y the K most rapidly growing HSVs. he reason why they are chosen to e singular vectors of the Hessian will ecome evident. In order to specify the prolem that must e solved to compute the HSVs, we introduce two norms as follows. Let the covariance matrix e the error covariance of the analysis of the previous cycle. In order for Fisher's method to work, it must e possile to act with the matrix P a 1 (or an approximation of it). Let the matrix W e the norm used to measure the size of a perturation. It must e possile to act with the matrix W 1. P a Let the time of the previous data assimilation cycle e t and the time of the present analysis e 0. States that have no time lael are valid at time 0 y default. Let the tangent linear model, M 0 t act on perturations at time t and give a perturation at time 0 δx = M 0 t δx( t). (1)
If δx( t) were known, then the size of δx according to the W-norm would e J 1 J 1 = δx W 1 δx = δx ( t) M 0 tw 1 M 0 t δx( t). (2) he HSVs are defined as those δx( t) that maximise J 1 suject to the constraint that δx( t) is distriuted according to P a, ie δx ( t) P a 1 δx( t) const = 0, (3) for an aritrary constant, ' const'. he constrained optimisation prolem may therefore e posed as x( t) [J 1 λ (δx ( t) P a 1 δx( t) const)] = 0, (4) where λ is the Lagrange multiplier. Differentiating and setting the solutions to δx k ( t) (with associated Lagrange multiplier ) gives λk M 0 tw 1 M 0 t δx k ( t) = λkp a 1 δx k ( t), (5) which is a generalised eigenvalue prolem. he δx k ( t) are the HSVs. he set of vectors P a 1/2 δx k ( t) are eigenvectors of (W 1/2 M 0 t P a1/2 ) (W 1/2 M 0 t P a1/2 ) and are thus the right singular vectors of the matrix W 1/2 M 0 t P a1/2. Let s k = M 0 t δx k ( t). hose s k with the largest define the suspace whose ackground errors are treated explicitly y the RRKF. λk A general perturation at time 0, δx has a part δx s that lies in this suspace, which can e found as a linear comination of the s k. Identification of this suspace can e simplified y first constructing an orthogonalized and normalized set of vectors, sk (e.g. y the Gramm-Schmidt procedure, see App. A). hen δx s = S a, (6) where S is the N K matrix of sk vectors and a is the K-element vector of (as yet unknown) coefficients. Orthogonalization should e done with respect to an inner product that nondimensionalizes the components (the W 1 inner product achieves this, so this matrix shall e used) such that S W 1 S = I, (7) (the B 1 norm could, in principle e used instead). he enefit of first orthogonalizing the vectors is to allow a to e found easily from δx he part of δx that is not within the chosen suspace is the residual a = S W 1 δx. (8) δx s = δx δx s, (9) which is orthogonal to δx s under the W 1 norm (see App. B). he way that the HSVs may e used in the RRKF is covered in Sec. 6. δx s 3. Source B of flow-dependent information to lend with the B-matrix (ensemle memers) Another way of identifying the suspace that will e treated with explicit flow dependence is to use information extracted from an ensemle. Let the numer of EMs e L, which can e chosen aritrarily, ut restricted in practice y cost. From the L ensemle memers, a K-dimensional suspace can e determined ( K L) that will e used to descrie the covariances explicitly. 2
S he size of the full model space is N and in these notes it is assumed that K, L N. In a similar way to the procedure for the HSVs in Sec. 2, the EMs may e orthogonalized y the Gramm-Schmidt procedure (see App. A) and placed in columns of the matrix. he relationship etween a vector in the L-dimensional space spanned y the EMs, a, and the model space, δx s, is given y (6), ut where now a is a L-element vector and S is the N L matrix of orthogonalized EMs. Orthogonalization is performed with respect to the W 1 inner product descried y (7) (Eqs. (8) and (9) then follow). he procedure to use the EMs in the RRKF is a little more involved than the HSV procedure. he way that the EMs may e used in the RRKF is covered in Sec. 7. 4. he ackground cost function in the new suspace he next task is to outline the way that the flow-dependent information, whether from HSVs or from EMs, can e comined with static error covariance information in the VAR formulation. he usual ackground cost function in VAR is J J = 1 2 (δx δx ) B 1 (δx δx ), (10) where δx = x x g, δx = x x g, x is the ackground state and x g is a reference (or guess) state. he B-matrix in (10) is static. Equation (10) may e written in terms of the components δx s and δx s y sustituting (9) into (10). his gives three parts: (i) a part that involves only the special suspace that has een identified from the K-dimensional suspace introduced in Secs. 2 and 3, (ii) the part that couples this suspace with the rest of the state, and (iii) the part that involves only the rest of the state J = 1 2 (δx s δx s) B 1 (δx s δx s) + (δx s δxs) B 1 (δx s δx s) + 1 2 (δx s δxs) B 1 (δx s δxs). (11) his cost function is identical to (10). he RRKF is constructed y imposing a flow dependent error covariance matrix for the first two terms (B P f ) ut keeping the static B-matrix in the last term J 1 2 (δx s δx s) P f 1 (δx s δx s) + α (δx s δxs) P f 1 (δx s δx s) + 1 2 (δx s δxs) B 1 (δx s δxs). (12) he factor α, added y Fisher (1998), is to help ensure that J is convex. he flow dependent information provided y the K special vectors will e used to determine the P f. It will e introduced into the prolem y a modification to the standard control variale transform used in VAR. 5. Control variale transforms stage It is usual in VAR to make a change of variale from model variales ( δx) to control variales (often named χ). he control variale transform used in standard VAR is here denoted L and in the RRKF there is an additional transform denoted X as follows 3
δx = LXχ, (13) where, y design, X is an orthogonal matrix XX = I, (14) (see elow) and L is the usual control variale transform used in VAR LX (LX) = LL = B. (15) Sustituting (13) into (12) gives J = 1 2 (χ s χ s) X L P f 1 LX (χs χ s) + α (χ s χ s) X L P f 1 LX (χs χ s) + 1 2 (χ s χ s) X L B 1 LX (χ s χ s), (16) where χ s = X L 1 δx s, χ s = X L 1 δx s, χ s = X L 1 δx s and χ s = X L 1 δx s. he matrix X is not present in standard VAR, ut is introduced in (13) to isolate the special suspace identified in Secs. 2 and 3 from the remainder of the state space. As it stands, (16) looks complicated to treat. wo sustitutions are made as follows X L B 1 LX = I, (17) X L P f 1 LX = P f χ 1, (18) where (17) follows from (14) and (15), and (18) is a definition. With these sustitutions, (16) is J = 1 2 (χ s χ s) P f χ 1 (χ s χ s) + α (χ s χ s) P f χ 1 (χ s χ s) + 1 2 (χ s χ s) (χ s χ s). (19) he part of P f 1 χ that is important is derived in Secs. 6 and 7 from the K-dimensional suspaces identified from the HSV or EM calculations respectively. he key to simplifying (19) is in the design of X. Let X have the following properties. X acting on any vector in the suspace L 1 δx s (where δx s is a vector that exists entirely in the special K-dimensional suspace) gives a vector that is non-zero only in the first K elements ( ) α1 X L 1 αk δx s =, (20) 0 0 and acting on a vector in the remaining space, L 1 δx, gives a vector that is non-zero only in the remaining N K elements. X It is possile to define a suitale X that satisfies these conditions y using a sequence of Householder transformations (see App. C). he important oservation is that in (19), only the 1 first K columns of need to e known. P f χ s 6. Determination of P f 1 χ for the Hessian singular vectors P f 1 χ he procedure to calculate the first K columns of using the HSVs is now descried. 4
ˆ ˆ ˆ Following Fisher (1998) let Z = P f 1 S, (21) where S is the N K matrix whose columns are the s k and Z is the N K result after acting with the inverse of the flow-dependent error covariance matrix. Equation (21) is now developed using the definition (18) along the way X (N N)L (N N)Z (N K) Z = P f 1 LXX L 1 S, X L Z = X L P f 1 LXX L 1 S, = Pχ 1 f ˆ I X L 1 S, (22) = P f χ 1 (N K)I(K N)X (N N)L 1 where I is the following non-square quasi-identity matrix (N N)S (N K), (23) I(K N) = (I (K K) 0 (K N K) ). (24) his matrix is included to remove the superfluous zero-elements for rows i > K of X L 1 S (y the design of X). In (23) and (24), laels have een added to the matrices to indicate their dimensions. Equation (22) leads to X L Z (I ˆ X L 1 S) 1 = Pχ 1 f, (25) where the operator inverted is a calculale K K matrix, which we assume is non-singular. Note that (25) is for only part of the inverse covariance matrix and so is not symmetric. he matrix yet known is X L Z which is now found from the HSVs (Sec. 2). By the definition of Z (21), X L Z is X L Z = X L P f 1 S, (26) whose right hand side can e found from (5). Let columns of a new matrix, S t, e those K vectors at time t that evolve into the columns of S (the columns of S t are the states δx k ( t) in (5)) S = M 0 t S t. (27) his is useful in the derivation to follow. Write (5) in complete matrix form M 0 tw 1 M 0 t S t = P a 1 S t Λ, (28) where Λ is the diagonal matrix of λk. Also important is the propagation of the error covariances (ignoring the model error contriution) P f = M 0 t P a M 0 t. (29) hese equations can e manipulated to give the matrix in (26) required to complete (25). Starting from (28) P a M 0 tw 1 SΛ 1 = S t, M 0 t P a M 0 tw 1 SΛ 1 = M 0 t S t, P f W 1 SΛ 1 = S, W 1 SΛ 1 = P f 1 S, X L W 1 SΛ 1 = X L P f 1 S, 5
S S = X L Z y (26), Pχ 1 f = X L W 1 SΛ 1 (I ˆ X L 1 S) 1 y (25). (30) he right hand side of (30) is known and thus all relevant elements of the ackground cost function (19) are now calculale given the HSV results. 7. Completing the calculation with information from the ensemle he EMs are now used to determine the first K columns of P f χ. Let the columns of S contain the L raw EMs ( L K). he forecast error covariance matrix in state space is then P f 1 = L 1 SS, (31) which is too large to compute explicitly. he transform etween the ensemle from the orthogonalized ensemle suspace and the state space is S = S S su, (32) where S is the N L orthogonalized matrix of EMs, as used in (6), and S su is the L L matrix of EMs in the orthogonalized ensemle representation. he orthogonality of is specified in (7). he inverse of (32) is 1 S su = S W 1 S. (33) he ensemle forecast error covariance matrix in EM suspace is P f 1 L = L 1 S sus su, (34) which is easily calculale, as are its eigenvectors and eigenvalues. For reference, the relationship etween P f in (31) and in (34) is, using (32) P f L P f = S PLS f. (35) Interest here is in the K eigenvectors of PL f with the largest eigenvalues, where 0 < K L. hese are used to define the special K-dimensional suspace. First let U L e the L L matrix of all L eigenvectors, where U LU L = I and U L U L = I, (36) and let Λ L e the diagonal L L matrix of eigenvalues. Equation (34) may e decomposed as P f L = U L Λ L U L. (37) Similarly, let U K e the L K matrix of the K eigenvectors with the largest eigenvalues, where U KU K = I. (38) he following also holds as long as it acts only on vectors entirely in the column space of U K U K U K = I. (39) he K largest eigenvalues are assemled into the diagonal matrix Λ K. he L L covariance matrix, ut of rank K is then (y analogy to (37)) P f K = U K Λ K U K. (40) hese K eigenvectors may e projected ack to model space using (6) U = S U K, (41) where (as explained in Sec. 3) is the N L matrix of orthogonalized ensemle memers. he inverse of (41) makes use of (7) 6
ˆ U K = S W 1 U. (42) he projection of these K eigenvectors to model space span the special suspace and so the L columns of U can e used to define the Householder matrix X. Recall the two ullet points at the end of Sec. 5, where δx s is a vector that spans this special space. Appendix C shows how X can e constructed from U. he aim of the following procedure is to use information from the EMs to define the first K columns of P f 1 χ as defined in (18) and used in (19). In a similar fashion to the HSV approach in Sec. 6, let Z = P f 1 U, (43) = P f 1 LXX L 1 U, X L Z = X L P f 1 LXX L 1 U, = Pχ 1 f ˆ I X L 1 U, (44) where I is the K N non-square quasi-identity matrix defined y (24). his has to e included 1 ecause is defined only with the first K columns. Equation (44) leads to P f χ Pχ 1 f = X L Z (I ˆ X L 1 U) 1, (45) where the operator inverted is a calculale K K matrix, which we assume is non-singular. Note that (45) is for only part of the inverse covariance matrix and so is not symmetric. he matrix yet known is X L Z which is now found from the EMs (Sec. 3). he P f that is of interest exists only in the K-dimensional suspace. he P f that exists in the L- dimensional suspace is (35). his can e modified to exist in the K-dimensional suspace replacing in (35) with found from (40). P f L P f K P f = S P f LS S P f KS = S U K Λ K U KS. (46) Equation (46) is developed as follows (steps marked with a * need special note - see elow) I = S U K Λ K U KS P f 1, S W 1 = U K Λ K U KS P f 1, U KS W 1 = Λ K U KS P f 1, Λ 1 K U KS W 1 = U KS P f 1, U K Λ 1 K U KS W 1 = S P f 1. (47 ) he following, derived from (7) and (39) holds as long as it acts only on vectors entirely in the column space of S When used with (47) this gives S S W 1 = I. (48 ) U K Λ 1 K U KS W 1 = S W 1 WP f 1, S U K Λ 1 K U KS W 1 = WP f 1, W 1 S U K Λ 1 K U KS W 1 = P f 1. (49) 7
he steps marked with a (*) require special attention as these statements are appropriate only under special circumstances. his may require some checking. Equation (49) is used in the 1 definition of Z (43), which is itself used for the definition of in (45). his gives which is a N K 1 P f χ = X L W 1 S U K Λ 1 K U KS W 1 U (I ˆ X L 1 U) 1, (50) matrix as required. P f χ 8. Differentiating the cost function J he ackground part of the cost function, (19), is now defined, where is found depending upon whether the HSV or the EM source of information is used. For the HSV calculation, P f 1 χ is found from (30) and for the EM calculation, P f 1 χ is found from (50). For VAR, the derivatives of J and J o (the oservation part of the cost function) are required with respect to each component of the control vector χ. Recall that the control vector comprises two parts: one that descries the special suspace, χ s (non-zero only in the first K components), and another that descries the remainder, χ s (non-zero only in the last N K components) P f 1 χ χ = χs + χ s, (51) and the gradient vector, J / χ has a similar structure (i.e. J / χ s is non-zero only in the first K components and J / χ s is non-zero only in the last N K components). J is the total cost function, which is the sum of the ackground part, J in (19) and an oservation part, J o. Let the three terms defining in (19) e written separately J J = J I + J II + J III + J o, (52) where J I = 1 2 (χ s χ s) P f χ 1 (χ s χ s), (53) J II = α (χ s χ s) P f χ 1 (χ s χ s), (54) and J III = 1 2 (χ s χ s) (χ s χ s). (55) It is assumed that J o / δx has already een found, e.g. y the conventional adjoint method (e.g. Rodgers, 200x; Bannister 200x). here are eight contriutions to the gradient vector, viz. J I / χ s, J I / χ s, J II / χ s, J II / χ s, J III / χ s, J III / χ s, J o / χ s, and J o / χ s, where each contriutes to the total gradient vector in the following way J = JI + JI χ χs χ s Note that derivatives with respect to + JII χs χs + JII + JIII + JIII + J o χ s χs χ s χs give a vector that is non-zero only in the first + J o. (56) χ s K components, and so imply differentiating with respect to the first K components of χ only. Similarly derivatives with respect to χ s give a vector that is non-zero only in the last N K components, and so imply differentiating with respect to the last N K components of χ only. It will e useful to expand-out the matrix notation in (53)-(55) as follows (in the following, ignore the χ s and χ s terms for simplicity (they can e put ack in later with no loss of generality). Whether we are interested in χs or χ s will e controlled y the range of the index (1 to K or K + 1 to N respectively) 8 J I = 1 2 K (χs) i K (P f 1 χ )ij (χs) j i = 1 = 1 2 K χi K (Pχ 1 f )ij χj, (57) i = 1
J II = α N i = K + 1 J III = 1 2 N i = K + 1 (χ s) i K (P f χ Each contriution in (56) is now addressed in turn. 1 )ij (χs) j = α N i = K + 1 χi K (P f χ 1 )ij χj, (58) χ 2 i. (59) Contriution (i): J I / χ s Differentiating (57) with respect to χ (components 1 k K) gives J I (χs) k = 1 2 K δik K (P f 1 χ )ij χj + 1 i = 1 2 K χi K (Pχ 1 f )ij δkj, i = 1 = 1 2 K (P f 1 χ )kj χj + 1 2 K χi (P f 1 χ )ik i = 1 = K Contriution (ii): J I / χ s Differentiating (57) with respect to χ (components K + 1 k N) gives J I (P f χ 1 )kj χj. (60) (χ s) k = 0. (61) Contriution (iii): J II / χ s Differentiating (58) with respect to χ (components 1 k K) gives J II = α (χs) k N = α N χi K (Pχ 1 f )ij δjk, i = K + 1 i = K + 1 χi (P f χ 1 )ik. (62) Contriution (iv): J II / χ s Differentiating (58) with respect to χ (components K + 1 k N) gives J II = α (χ s) k N = α K i = K + 1 δik K (P f χ 1 )ij χj, Contriution (v): J III / χ s Differentiating (59) with respect to χ (components 1 k K) gives J III (P f χ 1 )kj χj. (63) (χs) k = 0. (64) Contriution (vi): J III / χ s Differentiating (59) with respect to χ (components N + 1 k N) gives J III = (χs) k N i = K + 1 Contriutions (vii) and (viii): J o / χ s and J o / χ s χi δ ik, = χk. (65) 9
For this term it is not necessary to distinguish etween parts of the state vector. Using the definition of the control variale transform in (13), it can e expanded as herefore x m χq δx m = N n = 1 = N n = 1 L mn N L mn N p = 1 X np δpq p = 1 Using the chain rule, and then feeding-in (67) gives J o χq = N m = 1 χq = N m = 1 = N x m X qn n = 1 J o δx m, X np χp. (66) = N n = 1 N J o L mn X nq, n = 1 δx m N m = 1 L nm J o Equation (68) is equivalent to the matrix operation represented y J o χ = X L J o L mn X nq. (67) δx m. (68) δx, (69) which is the standard result of the adjoint of the control variale transform. References Bannister R.N., 2001, Elementary 4d-Var, DARC echnical Report No. 2, 16 Pp. Bannister R.N., 2008, A review of forecast error covariance statistics in atmospheric variational data assimilation. I: Characteristics and measurements of forecast error covariances, Q.J.R. Meteorol. Soc. 134, 1951-1970. Fisher M., 1998, Development of a simplified Kalman filter, ECMWF ech. Memo. 260 [availale from ECMWF, Shinfield Park, Reading, Berkshire, RG2 9AX, UK]. Fisher M., 2007, he sensitivity of analysis errors to the specification of ackground error covariances, Pp. 27-36 of ECWMF Workshop Proceedings on Flow-dependent aspects of Data Assimilation, 11-13 June 2007, ECMWF, Reading, UK. Press W.H., eukolsky S.A., Vetterling W.., Flannery B.P. 1986, Numerical Recipes in Fortran, Camridge University Press. Rodgers C.D., Inverse methods for atmospheric sounding: theory and practice (Sec. 8.2.3). Series on atmospheric, oceanic and planetary physics, Vol. 2, World Scientific. 10
S Appendix A: the Gram-Schmidt procedure he creation of an orthogonalized set of vectors, si, from a set that is made up of non-orthogonal vectors, s i, may e performed y the Gram-Schmidt procedure. he space spanned y each set of vectors is the same, ut the orthogonalized set is more convenient to work with. Let orthogonalization e performed with respect to the W 1 inner product. Let the first vector of the orthogonalized set, s1, e the first vector of the non-orthogonal set, s 1, ut normalized to have unit length (under the W 1 norm). s1 = 1 s 1, (A.1) N 1 s1w 1 s1 = 1, (A.2) N 1 = s 1W 1 s 1. (A.3) For i 1, the i+1th orthogonalized vector is defined as the i+1th non-orthogonal vector minus a linear comination of all previous vectors defined (and normalized) 1 ( si + 1 = N s i + 1 i + 1 i j) αj,i + 1s. (A.4) he coefficients,, are chosen for orthogonality as follows αj,i + 1 sj W 1 si + 1 = δj,i + 1. (A.5) Performing an inner product of (A.4) with sk (where 1 k i) gives skw 1 1 ( si + 1 = N s k W 1 s i + 1 i + 1 i j) αj,i + 1skW 1 s = 0. (A.6) his is set equal to zero y (A.5) and y the fact that k i + 1. Further use of (A.5) leads to his determines the coefficients. Let unnormalized vector) where skw 1 s i + 1 αk,i + 1 = 0, αj,i + 1 ti + 1 = sj W 1 s i + 1. e the part of (A.4) inside the rackets (i.e. the si + 1 = ti + 1 1 ti + 1, N i + 1 = s i + 1 i αj,i + 1s j, which can e calculated now that the coefficients are known (A.7). similar way to (A.2) and (A.3) N i + 1 then follows in a (A.7) (A.8) (A.9) si + 1W 1 si + 1 = 1, (A.10) N i + 1 = t i + 1 W 1 ti + 1. (A.11) Appendix B: Proof of the orthogonality of the residual in (9) In these notes a vector in model space, δx is divided into a part that is spanned y columns of, δx s, and a residual, δx s, given y (9). Here it is proved that any vector that exists only in the 11
spanned space is ' e shown that W 1 First eliminate δx s using (9) -orthogonal' to one that exists only in the residual. Mathematically, it is to δxs W 1 δx s = 0. (B.1) δxs W 1 δx s = (δx δx s ) W 1 δx s. (B.2) δx s can e written in terms of δx y comining (6) and (8) Sustituting (B.3) into (B.2) gives δxs W 1 δx s δx s = S S W 1 δx. (B.3) = (δx S S W 1 δx) W 1 S S W 1 δx = δx W 1 S S W 1 δx δx W 1 S S W 1 S S W 1 δx. (B.4) Next use the orthogonality of the suspace (7) which proves that each term in (B.4) is equal and opposite, thus giving zero and proving (B.1) δxs W 1 δx s = δx W 1 S S W 1 δx δx W 1 S S W 1 δx = 0. (B.5) Appendix C: Design of the sequence of Householder transforms It is now shown how X can e formulated to achieve property (20). Fisher (1998) states that this can e achieved with a sequence of Householder transformations. A single Householder transformation, P, (e.g. Press et al. 1986) may e written as follows where P = I 2 uu u u, u = x x e 1. (C.1) (C.2) he vector x is aritrary and e 1 is a vector which is zero valued except for the first element, which has unit value (the properties of the Householder transformation hold for a general single element eing chosen instead, although here we always choose the first element). P is useful ecause it has the following useful properties. he Householder transformation is orthogonal PP = (I 2 uu u u ) (I 2 uu u u ), = I 4 uu u u + 4uu uu u u u u, = I 4 uu u u + 4uu u u = I. (C.3) When acting on the state x, which is used to define P in (C.1) and (C.2), the result is a vector with all ut the first element zero Px = (I 2 uu u u ) x, = (I 2 u (x x e 1) u u ) x, 12
= ( I 2 u (x x e 1) 2x x 2 x x 1) x, = x 2 u (x x x x 1 ) 2x x 2 x x 1, = x u, = ± x e 1. (C.4) When acting on a state x, which is orthogonal to the state x, which is used to define P in (C.1) and (C.2), the result is a vector with zero in the first element Note u x = x x = x x1, x e 1x, (C.5) then Px = (I 2 uu u u ) x, = x = x 2 x ± 2 x x 1 (x x e 1 ), 2x x 2 x x 1 x x 1 x ± x x1x x1 x 2 e 1. (C.6) x x x x 1 Equation (C.6) does not have weight in element 1. o show this, do a scalar product with e 1 e 1Px = x 2 x1 x x 1 x1 ± x x1x 1 x1 x 2 = 0, (C.7) x x x x 1 In these equations, x1 and x 1 are the first components of x and x respectively. he first property gives P = P = P 1. hese properties can e comined to give X in the following way. Defining R (0) = L 1 S for the HSV calculation or R (0) = L 1 U for the EM calculation, let X R (0) e a vector of two parts X R (0) A = ( 0 ), (C.8) where A is a K K matrix consistent with the required property of (20). In fact, y the way that X is to e formed, A will turn out to e upper triangular. Let Each X R (0) = P K P k P 2 P 1 R (0). (C.9) P l transformation is Householder-like according to the following, e.g. for P 1 P 1 R (0) = (I 2 uu u u ) R (0), u = r (0) 1 r (0) 1 e (N) 1, (C.10) where r (0) 1 is the first column of R (0) and e (N) 1 is the N-element vector with all ut the first element zero (which is unity). his generates ( a new matrix R (1) ) = P 1 R (0) which has the form r (1) 11 r (1) 12 r (1) 13 r (1) 1K 0 r (1) 22 r (1) 23 r (1) 2K 0 r (1) R (1) 32 r (1) 33 r (1) 3K =, (C.11) 0 r (1) k2 r (1) k3 r (1) kk 0 r (1) N2 r (1) N3 r (1) NK 13
having only the first element non-zero of the first column (since P 1 is designed in terms of the first column of R (0) ). he aim now is to act with a N 1 N 1 element Householder operator on R (1) excluding the first row and first column ( 1 0 ) P 2 R (1) = 0 I 2 uu R(1) u u, u = r (1) 2 r (1) (N 1) 2 e 1, (C.12) where the partitioned-off (lower right) part of P 2 is a N 1 N 1 matrix, r (1) 2 is the N 1- element second column of R (1) (N 1) (excluding the first component), and e 1 is the N 1-element vector with all ut the first element zero (which is unity). his generates a new matrix R (2) = P 2 R (1) which has the form ( ) r (1) 11 r (1) 12 r (1) 13 r (1) 1K 0 r (2) 22 r (2) 23 r (2) 2K 0 0 r (2) R (2) 33 r (2) 3K =. (C.13) 0 0 r (2) k3 r (2) kk 0 0 r (2) N3 r (2) NK he kth operator, P k, has the form ( I 0 ) P k R (k 1) = 0 I 2 uu R(k u u 1) (k 1), u = r (k 1) (N k + 1) k r k e 1, (C.14) where the partitioned-off part of P k is a N k + 1 N k + 1 matrix, r k is the N k + 1-element kth column of R (k 1) (N k + 1) (excluding the first k 1 components) and e 1 is the N k + 1-element vector with all ut the first element zero (which is unity). After all K operators have acted, the result is R (k) = ( X R (0) ) r (1) 11 r (1) 12 r (1) 13 r (1) 1K 0 r (2) 22 r (2) 23 r (2) 2K 0 0 r (3) R (k) 33 r (3) 3K =. (C.15) 0 0 0 r (k) KK 0 0 0 0 (k 1) where the top section is the matrix A in (C.8), and the ottom section comprises zeros. It should also e shown that XX the property that P k Pk = I = I. From (C.9) this is easy to show given that each pair has P K P k P 2 P 1 P 1 P 2 P k P K = I. (C.16) It remains to e shown that the string of Householder operators P K P k P 2 P 1 acting on a (0) state, r (which is orthogonal to all columns of R (0) ) gives a state that is zero in the first K elements. All P k are formed in the same way as shown aove (ie with respect to the R (k) matrices). (1) (0) (0) First let r = P 1 r. Since r is orthogonal to 1 (the latter is the vector used to define P 1 in (1) (C.10)), and y property (C.7), the vector r has zero in the first element. Next, let 14 r (0)
(2) (1) r = P 2 r. By a similar argument, if the vector formed from the last N 1 components of r is orthogonal to the vector formed from the last N 1 components of r (1) 2 (the latter is the vector (2) used to define P 2 in (C.12)), and y property (C.7), the vector r will have zero in the first two (0) elements. Because the first element of P 1 r is always zero, the remaining N 1-component inner product in question is equal to the full N-component inner product as the first element contriutes zero. he orthogonality test is therefore satisfied if the following N-component inner product is zero (P 1 r (0) ) P 1 r (0) (0) 2 = r P 1 P 1 r (0) (0) 2 = r r (0) 2 = 0. (C.17) (0) his is satisfied ecause the vector r is orthogonal to r (0) 2 y definition. hese arguments continue for all K operators. he final result is a vector of zeros in the first K components. (1) 15