VECTORS AND MATRICES:

Size: px

Start display at page:

Download "VECTORS AND MATRICES:"

Caroline Harrington
6 years ago
Views:

1 VECTORS AND MATRICES: Matrx: a rectangular array of elements Dmenson: rxc means r rows by b columns Example: A = [a ], =1,2,3; j=1,2 j In general: A = [a ], =1, r; j=1,...c j Multvarate observatons=vector: x = x1 x2... x p s a multvarate observaton. x 1,,x n s a sample of multvarate observatons. A sample can be represented by a matrx: x11,..., x1 21,..., 2 X= x x... xn1,..., x p p np Vector addton, transpose and multplcaton: T T Norm of a vector: v = (v T v) 1/2 Unt vector: v =1 Dagonal matrx: a square matrx wth all off-dagonal elements equal to 0 Identty matrx: a dagonal matrx wth all dagonal elements equal to 1 Note that A*I=I*A=A Scalar matrx: a dagonal matrx wth all dagonal elements equal to a scalar Examples:

2 3 0 0 A I I Canoncal bass {e 1,e 2,,e P } = {,,..., } Orthogonal bass, coordnate system, Gram-Schmdt: Gven a lnearly ndependent vector system convert t to an orthogonal system. 1 0 Unary vector: or 1 = Matrx multplcaton: Unary matrx: J 1 1' nn n11n 1' 1 n 1nn For addton and subtracton number of rows of the matrces must be the same and number of columns must be the same For multplcaton number of columns of the frst matrx (left multpler) must be the same as number of rows of second matrx (rght multpler) Note that A + B = B + A but A*B B*A n general

3 Determnant of a square matrx: 3 1 det det det 2det 0det Inverse Matrx: for a square matrx A s A A A -1-1 = A -1 A = I -1 such that: A s unque and has the same rank as A To have an nverse a matrx needs to be full rank (or nonsngular) To check where a matrx s of full rank we check whether ts determnant s 0 Example: 2 by 2 matrces a b 1 d b c d D c a 2 4 A D (2)(1) (4)(3) A A A D ad bc Covarance matrx: Symmetrc postve defnte Egenvalues and Egenvectors of a covarance or correlaton matrx Multvarate Normal dstrbuton has constant densty on ellpsods: s a pxp dmensonal Covarance Matrx, Ellpsods: t 1 z z c Ellpsods are characterzed also by the sem-axes drecton and length. Spectral decomposton of a covarance matrx:

4 egenvector. p vv where,=1,,p s called the -th egenvalue and v, =1,,p s called the -th 1 T Synonyms: egenvalue == sem-axs length == varance == characterstc root == latent root, egenvector==sem-axs drecton == prncpal component==characterstc vector==latent vector. In order to fnd the egenvalues and egenvectors of we use the equaton PROPERTIES v v ( Iv ) 0 det( I) 0 t () V V, where V s the matrx of egenvectors and the dagonal matrx of egenvaues of. () The egnevectors are used to descrbe the orthogonal rotaton from maxmum varance. Ths type of rotaton s often referred to as prncpal-axes rotaton. () The trace of the matrx s the sum of ts egenvlaues {λ }. (v) The determnant of the matrx s the product of the λ (v) Two egenvectors v and v j assocated wth two dstnct egenvalues λ and λ j of a symmetrc matrx are mutually orthogonal, v t v j = 0. (v) Square Root Matrx. There are several matrces that are the square root of Σ. E.g. 1/2 1/2 t V V, Cholesk factorzaton: Σ = RT R (v) Gven a set of varables X 1, X 2,...,X p, wth nonsngular covarance matrx Σ, we can always derve a set of uncorrelated varables Y 1, Y 2,...,Y p by a set of lnear transformatons correspondng to the prncpal-axes rotaton. The covarance matrx of ths new set of varables s the dagonal matrx Λ = V t ΣV (v) Gven a set of varables X 1, X 2,...,X p, wth nonsngular covarance matrx Σ x, a new set of varables Y 1, Y 2,..., Y p s defned by the transformaton Y' = X'V, where V s an orthogonal matrx. If the covarance matrx of the Y's s Σ y, then the followng relaton holds: T 1 T 1 y y y x x x In other words, the quadratc form s nvarant under rgd rotaton. Sngular value decomposton: X = UDV T where U s (n by p) orthogonal, D s dagonal, V s (p by p) orthogonal X s centered.

5 Lnear resgresson problem ( and soluton): Y n1 np p1 n1 p1 p1 (Model) Fnd b such that mn Y X Y Xb 2 2 X ' X b X ' Y (Normal Equatons) pp X b X X X Y 1 ( ' ) ' (Least Squares Estmate) Q s orthonormal f Q Q = I, Rotatons are orthonormal transformatons. cos sn R sn cos Reflexons are orthonormal transformatons: I 2 uu Sometmes people use orthogonal as orthonormal but techncally orthogonal means Q Q = dag{d} Suppose we apply an orthogonal transformaton Q to X so that X* = QX s upper trangular. Ths s QY = QX + Q or Y* =X*+, where X* s upper trangular. X* has all zeroes bellow the dagonal and bellow the pth row. * * x,..., x 11 1 p * * 0, x,..., x 22 2 p * *... X Y 1 1 Q1 X* Y* Q * * * 0,...,0, x X Y Q2 pp 2 2 0,...,0 0,...,0 Homework 2: 1. Show that b ( X ' X ) X ' Y = X Y or X b Y 1 * 1 * * * Snce X * 1 s upper trangular ths equaton can be easly solved by back substtuton. 2. Show that the transformaton HX = H p H 2 H 1 X s upper trangular by showng that each ndvdual component makes zeros bellow the dagonal of X.

6 QR decomposton: Notce also that X 1 * = Q 1 X so X= Q 1 T X 1 * Ths s called the QR decomposton of X. R Code: > A <- matrx(runf(12), 4) > A [,1] [,2] [,3] [1,] [2,] [3,] [4,] > (Q <- qr.q(qr(a))) [,1] [,2] [,3] [1,] [2,] [3,] [4,] Householder Transformatons: To construct Q we use Householder transformatons >(R <- qr.r(qr(a))) [,1] [,2] [,3] [1,] [2,] [3,] > Q%*%R [,1] [,2] [,3] [1,] [2,] [3,] [4,] > y = 1:4 > lsft(a,y,nt=f)$coef X1 X2 X > c(solve( R, t(q)%*%y)) [1] Suppose we have a nx1 vector x, we defne an orthonormal transformaton H that transforms x nto a vector x* that has zeroes at postons (+1),,n H = (I 2 u u / u 2 ) where 0... x n s 2 2 u and s x j x1 j... x n Homework Show that the transformaton HX = H p H 2 H 1 X s upper trangular by showng that each ndvdual component makes zeros bellow the dagonal of X.

7 Gram-Schmdt: Gven a lnearly ndependent vector system convert t to an orthogonal system. Let U = { u 1, u 2,,u p } be a vector system that s ndependent but not orthogonal. Step 1 Let v 1 =u 1. Step 2 Let v 2 =u 2 Proj W1 u 2 (W 1 s the subspace spanned by v 1 ) = u 2 < u 2,v 1 >/ v 1 2 v 1 Step 3 v 3 =u 3 Proj W2 u 3 (W 2 s subspace spanned by v 1 v 2 )= u 3 <u 3,v 1 >/ v 1 2 v 1 <u 3,v 2 >/ v 2 2 v 2 and so on Ths process can be also adapted to produce the QR decomposton. The problem wth Gram-Schmdt s that t becomes unstable when U s nearly sngular. Gvens Rotatons Notce the smple lnear transformaton knocks out the second component of the vector [a,b] 2 2 r a b cos sn c s a r R = R cos a / r sn cos s c b 0 sn b/ r Gvens rotatons are matrces of the form As n the smple case G(,j,) could be use to knock out the jth component of the th column of the X matrx (X ). Notce that n order to make X upper trangular we would need as many Gvens rotatons as elements bellow the dagonal are made zero. Cholesk SVD

8 Generalzaton of Least Squares GLM General Lnear models - X s generated from one or more factors by usng dummy varables. - Each factor s represented by k-1 dummes. - Column of 1 s maybe dropped - Some columns of X must be dropped n order to avod sngularty - X s lkely to be very sparse - X maybe hgh dmensonal The computatons wth GLM need to deal wth these ssues. SAS and R both do a good job wth these problems. Weghted Least Squares (WLS) Y X F V V w V w n1 np p1 n1 2 (0, ) dag{1/ } 1/ * 1/2 1/2 1/2 * * * * 2 Y V Y V X V X F I (0, ) The computatons change very lttle snce we just need to replace X and Y by ther X * and Y * Generalzed Lnear Interactve models GLIM E(Y) = ; Lnk functon: f() s lnear n X f( = X Varance functon: Var(Y) =v() Famly Lnk Mean Varance Bnomal Logt log Posson Log log Gaussan Identty Weghts are nverse to the varance: w =1/ v( ) Iteratve Reweghted Least Squares (IRLS) Algorthm ˆ X ˆ, w 1/ v( ˆ ); (1) (1) (1) (1) ˆ Use these weghts n WLS to calculate ˆ ˆ ˆ ( k ) ( k ) ( k ) ( k ) terate: X, w 1/ v( ); untl t conversges. Use these weghts n WLS to calculate ˆ (2) ( k 1)

9 Ths algorthm s very general and can be used for many problems If V ˆ s the varance functon estmate for the last teraton then ˆ ( ' ˆ X V X ) s an estmate of the covarance of ˆ. Robust regresson M- estmators y x r S( ) ( ) ( ) s s Huber's : ( z) Takng dervatvbes wth r/s to 2 z / 2 f z c 2 c z c / 2 f z c s S( ) r r 0 xj'( ) xj( ) 2 s s j z f z c Huber's : ( z) sgn( z) c f z c The dea s to wrte the system of equatons as follows: r ( ) r xj ( ) x s jr xjr w s r =0 In matrx terms ths gves X Wr=0 or X WY=X W The soluton s ˆ =(X WX) -1 X WY But of course the matrx W does depend on so we apply here the same teratve reweghted least squares algorthm of the prevous secton. At convergence we obtan our estmate ˆ of Nonlnear Methods. Optmzaton methods. Maxmum Lkelhood estmaton. L x () = P( X ) l x () = ln(p( X )) ˆ maxmzes L ( ), or l ( ). The nformaton matrx s defned as x I( ) [ E l ( )] and asymptotcally converges to the nverse covarance matrx of ˆ X asy var( ˆ ) I( ) where s the true value of the parameter x

10 Optmzaton methods Newton-Raphson: Solve f(x) = 0 starng an ntal value close to the soluton. x +1 = x f(x ) / f (x ) Newton-Raphson usually converges very fast Multvarate verson f(x) = 0 0 = f(x k ) + f (x k ) (s - x k ) + x k+1 ~ x k - [ f (x k ) ] -1 f(x k ) px1 px1 pxp px1 Do a search along the drecton of d and fnd a mnmum. Steepest descent: d S = -F (x) Newton Steps: d N = - [F (x)] -1 F (x) Levenberg-Marquardt : d LM = - [F (x)+ I] -1 F (x) for > 0 Smplex Method (Nelder Mead) Use functons optmze and optm n R. Example: f <- functon(x) felse(x > -1, felse(x < 4, exp(-1/abs(x - 1)), 10), 10) fp <- functon(x) { prnt(x); f(x) } Example: plot(f, -2,5, ylm = 0:1, col = 2) optmze(fp, c(-4, 20))# doesn't see the mnmum optmze(fp, c(-7, 20))# ok fr <- functon(x) { ## Rosenbrock Banana functon x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- functon(x) { ## Gradent of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) }

11 optm(c(-1.2,1), fr) optm(c(-1.2,1), fr, grr, method = "BFGS") optm(c(-1.2,1), fr, NULL, method = "BFGS", hessan = TRUE) optm(c(-1.2,1), fr, grr, method = "CG") optm(c(-1.2,1), fr, grr, method = "CG", control=lst(type=2)) optm(c(-1.2,1), fr, grr, method = "L-BFGS-B") Non Lnear Least squares: Remember that when the error term s addtve and d normal then NLS s the same as maxmum lkelhood. These are the steps of the Gauss-Newton algorthm. t 2 y g( x, ) y g ( ) g ( ) ( ) g( ) F(0, ) y g ( ) ( ) g( ) Y X ( ) t t Gauss-Newton Method: ( ) ( X X ) X Y [ ( g)( g) ] g( y g ) G F 0 t 1 t t 1 1 Iterate ths equalty untl convergence. To do that at each teraton replace wth the new. 0 Example: Emax model Emax D g( D) EY D E, 0 ED D 50 where E 0 s the response Y at baselne (absence of dose), E max s the asymptotc maxmum dose effect and ED 50 s the dose whch produces 50% of the maxmal effect. A generalzaton of equaton above s the 4-parameter EMAX model for E g( D) E Y D E ED max 0 where s the 4 th parameter whch s sometmes called the Hll parameter (Holford and Shener). The Hll parameter affects the shape of the curve and s n some cases very dffcult to estmate. In Fgure 1 we show curves for a range of Hll parameter values. 50 D D, In fttng models above the error dstrbuton s assumed to be d normal wth zero mean and equal varance. Ths assumpton s not n any way a restrcton of the methods that we dscuss here but a convenent assumpton that smplfes the ssues that we address. Our model becomes: Y = g (D) + where =(E 0,ED 50,EMAX, and ~d N(0, ).

For the curves have concave downward shapes whereas for the curves have a sgmodal shape.

12 Fgure: Curve Shapes for Dfferent Values of Hll Parameter. Fgure above llustrates the rch varety of shapes generated by changng the Hll parameter. For the curves have concave downward shapes whereas for the curves have a sgmodal shape. For very small the curve represents a flat response for any dose greater than zero, whereas for very large the curve approaches a step functon at ED Maxmum Lkelhood Estmaton for the EMAX model.

13 Under the Emax model the MLE estmator of s the least squares estmator ˆ. One way to calculate ˆ s by non-lnear least squares mnmzaton (NLS). Gven the data {(D, y ), =1,,N} the NLS estmator mnmzes (n ) the followng quantty: SSE()= N 1 ( y g ( D )) The algorthm to mnmze ths quantty s a specal case of Newton-Raphson specalzed for mnmzng sums of squares. It s mplemented n many standard software packages such as R, S-plus and others. The NLS mplementaton of the MLE s s convenent because t allows the computaton of confdence ntervals for the model parameters and p-values for testng the sgnfcance of the parameters. 2 Example. Dose response estmaton n a clncal tral. In order to llustrate the ssues that we address n ths paper we use data from an unpublshed phase II clncal tral. The study had 5 doses (0, 5, 25, 50 and 100 mg) and correspondng group szes (n=78, 81, 81, 81, 77). Unlke most dose-response trals ths study has about 80 patents per arm and a hgh sgnal to nose rato. The Fgure below plots the response aganst the doses wth 90% confdence ntervals around the mean. The graph shows that t s a potentally good canddate for the EMAX model. We frst ft 3-Paramater EMAX model to the data. The maxmum lkelhood estmators (MLEs) converge quckly and have estmates of Ê max = 15.13, Ê 0 = 2.47, and ED 50 = Fgure: Clncal tral example wth 5 doses. Red stars represent the response at a gven dose and 90% CI. Sold lne s the 3-parameter ft. Dashed lne refers to the 4-parameter ft (usng the estmates after 100 teratons.)

14 Change n EDD Parameter EMAX 4 Parameter EMAX Bayesan Approach Power Law Dose (mg) One concern about the ft n the fgure s that the 3 parameter EMAX ft lacks the curvature necessary to ft well wth the observed relatonshp and the resduals show an nverted U pattern. To address the above lack of ft we next ft a 4-parameter EMAX model, but n ths case we encounter non-convergence of the MLEs. Estmates of ED 50, and E max do not converge. The Hll parameter,, trends toward a value less than 1. If we trace the values of the MLEs after a few hundred teratons of the NLS algorthm we observe Ê max = 2787, Ê 0 = 2.0, ˆ=0.52 and ED 50 = wth large standard errors. The sequence of ED 50 s and E max s are monotoncally ncreasng at a faster speed at each teraton untl the computaton becomes unstable (algorthm fals). At the same tme the mean squared error (MSE) decreases very slowly towards an asymptote. Ths suggests that the lkelhood, although bounded, converges to the maxmum as both ED 50 and E max go to. R CODE: m = nls(ch8~e0+(emax*dose)/(dose+ed50),start= c(ed50=25,e0=5,emax=20), control=nls.control(maxter=100),trace=t,na.acton=na.omt,data=edd) f1 = ftted(m) sg = summary(m)$sgma

15 Stochastc Approxmaton. Solve an equaton f(x) = 0 but nstead of observng f(x) we observe g(x) = f(x) + error The algorthm s due to Robns Monrow(1955) x +1 = x +1 - c y / Example: Logstc regresson E ( ˆ ) = ˆ or E ( ˆ ) - ˆ = ˆ 1 ˆ ˆ ˆ 1 ( E ˆ ( ) - 0) Homework: 1. Gven the followng data x= x = c( -10,(-3):5,10) ; y = c(0,0,1,0,0,1,0,1,1,1,1) calculate the stochastc approxmaton estmator of beta. Expected value functon ef =functon(b,z=x,ns=100) { n = length(x); px = 1/(1+ exp(-b*z) ) y = array( (runf(n*ns) <=px)*1, c(n,ns) ) for( n 1:ns) n[] = glm(y[,]~z-1,famly=bnomal)$coef medan(n) } Random number generaton Samplng from known CDF's. Smulatons.

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...