Stt 648: Assignment Solutions (115 points) (3.5) (5 pts.) Writing the ridge expression s y i 0 x ij j x j j + i1 y i ( 0 + i1 y i 0 c i1 j1 x j j ) j1 j1 x j j j1 (x ij x j ) j j1 (x ij x j )j c j1 + λ j1 + λ + λ j1 j1 j j c j for c j j nd c 0 0 + nd shifting the x i s to hve zero men only modifies the intercept nd not the slopes. A similr rgument holds for the lsso by gin letting j c j nd 0 c 0 + x j j. (3.1) (5 pts.) With X ug X y nd y ug λ I 0, y ug y ug y 0 y y y, 0 X ug X ug X X λi λi X X + λi, j1 x j j. j1 nd y ug X ug y 0 X λi y X. So, ˆ (X ug X ug ) 1 X ug y ug (X X + λi) 1 X y ˆ ridge. (3.16) (15 pts.) First, note tht ˆ (X X) 1 X y X y becuse of the orthonormlity of X. Best subset: Since the inputs re orthogonl, dropping terms will not chnge the estimtes of the other terms. Letting y M, X M, nd M represent reduced version of y, X, nd of size M, we wish to minimize (y M X M M ) (y M X M M ) y My M y MX M M + MX MX M M y My M M M since X M is orthonorml. This is minimized by choosing the M lrgest j s in mgnitude, giving estimtes ˆ j Irnk ˆ j M. Ridge: By Eqn. (3.44), ˆ ridge (X X + λi) 1 X y (I + λi) 1 X y since X is orthogonl ((1 + λ)i) 1 X y 1 1 + λ X y 1 ˆ 1 + λ 1
Lsso: ˆ lsso { rg min (y X) (y X) + λ } p j1 j where λ λ nd (y X) (y X) y y y X + X X Thus, we wish to minimize f() y y p j1 Setting df() d j (3.30) (8 pts.) Let X ug we wish to find min min y y y X + y y ˆ + y y ˆ j j + j1 ˆ j + j + λ sgn( j ) 0, nd since j ˆ j sgn( j1 j ˆ j j + p j1 j + λ p j1 j. λ sgn( ˆ ˆ j )( X y nd y ug αλ I 0 j ) ˆ j λ) + ˆ j nd j hve the sme signs,. Doing the lsso with X ug nd y ug, (yug X ug ) (y ug X ug ) + λ j where λ λ(1 α) j1 ug y y ug y ug X ug + X ug X ug + λ j. Now, in similr mnner to 3.1, y ug y ug y y, j1 So, we hve min min X ug X ug X X + αλi, y ug X ug y X. y y y X + ( X X + αλi ) + λ(1 α) j j1 (y X) (y X) + αλ + λ(1 α) j min y X + λ α + (1 α) 1. j1 (4.6) (10 pts.) () Seperbility implies tht sep such tht x i y i sepx i > 0 i y i sep x i > 0 i y i sepz i > 0 i. Let ɛ min i { yi sepz i } > 0 nd set sep sep/ɛ. Then, y i sepz i y i sepz i min i { yi sepz i } 1 i.
(b) new sep old + y i z i sep old sep + yi z i + y i old z i y i sepz i old sep + 1 yi sepz i since y i old z i 0 by misclssifiction old sep 1 by (). (5.) (16 pts.) () Suppose m 1. Then, for ll i, if x / τ i, τ i+m τ i, τ i+1, B i,m (x) B i,1 (x) 0. So the clim holds for m 1. Suppose the clim holds for m j for ll i {1,..., k + M m}. Then B i,j (x) 0 for x / τ i, τ i+j. Suppose x / τ i, τ i+j+1. Then x / τ i, τ i+j nd hence B i,j (x) 0. Also, x / τ i+1, τ i+j+1 nd hence B i+1,j (x) 0. Therefore, B i,j+1 x τ i τ i+j+1 x B i+j (x) 0 τ i+j+1 τ i+1 when x / τ i, τ i+j+1 nd the clim holds for m j + 1. This completes the proof by induction. (b) Suppose m 1. Then, for ll i, if x (τ i, τ i+m ) (τ i, τ i+1 ), B i,1 1 > 0. So the clim holds for m 1. Suppose the clim holds for m j, for ll i {1,..., k + M m}. Then B i,j (x) > 0 for x (τ i, τ i+j ). Suppose now tht x (τ i, τ i+j+1 ). If x (τ i, τ i+1 ) (τ i, τ i+j ), B i,j (x) > 0 nd x / τ i+1, τ i+j+1, so B i+1,j (x) 0 by (). If x τ i+1, τ i+j ), B i,j (x) > 0 nd B i,j+1 (x) > 0. If x τ i+j, τ i+j+1 ) (τ i+1, τ i+j+1 ), B i,j+1 (x) > 0 nd x / τ i, τ i+1, so B i,j (x) 0 by (). Thus, B i,j+1 x τ i τ i+j+1 x B i+j (x) > 0 τ i+j+1 τ i+1 when x (τ i, τ i+j+1 ) nd the clim holds for m j + 1. This completes the proof by induction. (c) For m 1, let x ξ 0, ξ k+1. Then x τ m, τ k+m+1 τ 1, τ k+ nd k+1 k+1 B i,1 (x) Iτ i x < τ i+1 1. i1 i1 Suppose the clim holds for m j. Then k+1 i1 B i,j(x) 1 x ξ 0, ξ k+1 τ j, τ j+k+1. Now let x ξ 0, ξ k+1 τ j+1, τ j+k+. i1 B i,j+1 (x) i1 ( x τi x τ 1 B 1,j (x) + τ j+1 τ 1 x τ 1 τ j+1 τ 1 B 1,j (x) + i B i,j (x) by (). i i τ ) i+j+1 x B i+1,j (x) τ i+j+1 τ i+1 k+j+ x τ i i1 ( x τi + τ i+j x τ i+j x B i,j (x) ) τ k+j+ x τ k+j+ τ k+j+ B k+j+,j (x) 3
Notice tht when m j becme m j + 1, the subscripts on the τ i s inside ξ 0, ξ k+1 incresed by 1 nd so did the B i,m s. So, i B i,j (x) 1. (d) For m 1, B i,1 (x) Iτ i x < τ i+1 is piecewise polynomil of degree 0 with breks t τ i nd τ i+1. Suppose the results holds for m j. Since B i,j+1 x τ i τ i+j+1 x B i+j (x) τ i+j+1 τ i+1 nd B i,j (x) nd B i+1,j (x) re piecewise polynomils of degree j 1, we hve tht B i,j+1 (x) is piecewise polynomil of degree j with breks only t the knots. (e) not grded (5.7) (10 pts.) () Since g is nturl cubic spline interpolnt for {x i, z i } N 1, it is liner outside x i, x N nd g () g (b) 0. Also, g (x) is constnt on the intervls x i, x i+1 ), i 1,,..., N 1. Since g nd g re both functions on, b tht interpolte the N pirs, g(x i ) g(x i ) z i nd h(x i ) 0 for i 1,..., N. Thus, (b) g (x)h (x)dx g (x)h (x) b N 1 g (x)h (x)dx g (x)h (x)dx g (x + j ){h(x j+1) h(x j )} 0 j1 g (t) dt (h (t) + g (t)) dt h (t) + h (t) + g (t) dt + g (t) dt by () g (t)h (t)dt g (t) dt with equlity holding only if h is 0 in, b. (c) Since ny interpolnt f evluted t x i yields the sme vlues s g(x i ), the sum of squres N i1 (y i f(x i )) will be the sme for ny choice of f. Thus, since λ is positive, we wish to minimize f (t) dt. In (b) it ws shown tht this is ccomplished by cubic spline with knots t ech of the x i. (5.11) (5 pts.) Since h 1 (x) 1 nd h (x) x, we cn write H 1 N 1 x N 1 T N (N ) for ( n N (N ) mtrix T. Since h 1 (x) 0, h (x) 0, nd Ω h j (t)h k ), (t)dt the first two 4
rows nd first two columns of Ω re 0. So, we cn write Ω (N ) (N ) mtrix M. Now, if we prtition H 1 into H 1 H u 1 u x u T v 1 v x v T W1 Wx WT 0 0 (N ) 0 (N ) M (N ) (N ) u 1 N v 1 N, we hve W (N ) N I 3 3. for n Thus, W1 0 nd Wx 0. Let nd b be ny constnts. Then u u (1 + bx) H 1 (1 + bx) v W (1 + bx) v (1 + bx) 0 Thus, K(1 + bx) H 1 ΩH 1 (1 + bx) H 1 0 0 0 M re bsis vectors for the null spce of K.. u (1 + bx) v (1 + bx) 0 0, nd 1 nd x (5.1) (5 pts.) As in Eqn. (5.10) we write f(x) N j1 h j(x)θ j, where the h j (x) re n N- dimensionl set of bsis functions for representing this fmily of nturl splines. The criterion reduces to RSS(θ, λ) (y Hθ) W(y Hθ) + λθ Ωθ, where W dig(w 1,..., w N ), nd is minimized by ˆθ (H WH + λω) 1 H Wy. The fitted smoothing spline is given by ˆf(x) N j1 h j(x)ˆθ j. When the trining dt hve ties in X we remove ll ties except one, replcing the y with the verge of ll y s for the ties. The technique bove cn then be used by letting the weight equl the number of ties. (5.15) (16 pts.) () K(, x i ), f HK (γ j φ j (x i ))φ j ( ), c j φ j ( ) j1 j1 γ j φ j (x i )c j γ j j1 H K φ j (x i )c j f(x i ) j1 (b) K(, x i ), K(, x j ) HK (γ k φ k (x i ))φ k ( ), (γ k φ k (x j ))φ k ( ) k1 k1 k1 γ k φ k (x i )γ k φ k (x j ) γ k 5 H K γ k φ k (x i )φ k (x j ) K(x i, x j ) k1
(c) First, g(x) α i K(x, x i ) i1 α i γ j φ j (x)φ j (x i ) i1 j1 N γ j j1 i1 (α i φ j (x i ))φ j (x) c j φ j (x) j1 for c j γ j N i1 α iφ j (x i ). Then, (d) First, J(g) g H K j1 N γ j j1 i1 k1 i1 k1 γ j ( N i1 α iφ j (x i )) γ j α i α k φ j (x i )φ j (x k ) α i α k K(x i, x k ) ( N γ j α i φ j (x i ) j1 α i α k i1 i1 k1 j1 ) γ j φ j (x i )φ j (x k ) J( g) g + ρ H K g, g HK + g, ρ HK + ρ, ρ HK J(g) + α i K(x, x i ), ρ HK + J(ρ) i1 J(g) + J(ρ) J(g) with equlity holding iffρ(x) 0. Now, by (), g(x i ) K(, x), g HK K(, x), g HK + K(, x), ρ HK K(, x), g HK g(x i ) for i 1,..., N. Thus, N i1 L(y i, g(x i )) N i1 L(y i, g(x i )), nd L(y i, g(x i )) + λj( g) i1 L(y i, g(x i )) + λj(g), i1 with equlity holding iff ρ(x) 0. 6
Chpter 4 computing ssignment code (0 pts.) ##Prt 1, ccording to the outline vowelred.tble("http://www-stt.stnford.edu/~tibs/elemsttlern/dtsets/vowel.trin", sep",",row.nmes1,hedertrue) Sigmtrix(rep(0,100),nrow10) mukmtrix(nrow11,ncol10) for(i in 1:11){ vowelkvowelvowel,1i,:11 muki,pply(vowelk,,men) SigSig+(1/11)*cov(vowelk)} Siginvsolve(Sig) eeigen(siginv) Ve$vectors Sig.hlfV%*%dig(sqrt(e$vlues))%*%t(V) mubrpply(muk,,men) mukstrsig.hlf%*%(t(muk)-mubr) Xvowel,:11 XstrSig.hlf%*%(t(X)-mubr) Wcov(t(mukstr)) eweigen(w) vew$vectors okt(v) Oxt(ok%*%Xstr) Omut(ok%*%mukstr) colorc("blck","ornge","green","brown","cyn","deeppink","yellow","gry", "red","drkviolet","blue") plot(-ox,1,ox,,colrep(color,48),xlb"coordinte 1 for Trining Dt", ylb"coordinte for Trining Dt",min"Liner Discriminnt Anlysis") points(-omu,1,omu,,colcolor,pch19,cex1.8) ##Prts nd 3 librry(mass) LDAld(y~x.1+x.+x.3+x.4+x.5+x.6+x.7+x.8+x.9+x.10,vowel) Oxlpredict(LDA,vowel)$x Omulpredict(LDA, newdts.dt.frme(lda$mens))$x pr(mfrowc(,)) plot(oxl,1,oxl,3,colrep(color,48),xlb"coordinte 1",ylb"Coordinte 3") points(omul,1,omul,3,colcolor,pch19,cex1.8) plot(oxl,,oxl,3,colrep(color,48),xlb"coordinte ",ylb"coordinte 3") points(omul,,omul,3,colcolor,pch19,cex1.8) plot(oxl,1,oxl,7,colrep(color,48),xlb"coordinte 1",ylb"Coordinte 7") points(omul,1,omul,7,colcolor,pch19,cex1.8) plot(oxl,9,oxl,10,colrep(color,48),xlb"coordinte 9",ylb"Coordinte 10") points(omul,9,omul,10,colcolor,pch19,cex1.8) 7
##Prt 4 D1000 Colsmtrix(nrowD,ncolD) Cmtrix(c(-Omu,1,Omu,),nrow11) t1seq(-5,5,,d) tseq(-7,4,,d) for(i in 1:D){ for(j in 1:D){ F((C,1-t1i)^+(C,-tj)^) Colsi,jwhich.min(F)}} contour(t1,t,cols,drwlbelsfalse,xlb"coordinte 1",ylb"Coordinte ", min"clssified to nerest men") points(-ox,1,ox,,colrep(color,48)) points(-omu,1,omu,,colcolor,pch19,cex1.8) ##Prt 5, Logistic Regression librry(vgam) logregdtcbind(rep(1:11,48),-ox,1,ox,) colnmes(logregdt)c("y","c.1","c.") logregdts.dt.frme(logregdt) lrvglm(y~c.1+c.,fmilymultinomil(),dtlogregdt) D500 t1seq(-5,5,,d) tseq(-7,4,,d) Colmtrix(nrowD,ncolD) for(i in 1:D){ Tmtmtrix(c(rep(t1i,D),t),ncol,nrowD) colnmes(tmt)c("c.1","c.") Tmts.dt.frme(Tmt) Plrpredict(lr,newdtTmt,type"response") for(m in 1:D){ Coli,mwhich.mx(Plrm,)}} contour(t1,t,col,drwlbelsfalse,xlb"coordinte 1",ylb"Coordinte ", min"clssified by Logistic Regression") points(-ox,1,ox,,colrep(color,48)) points(-omu,1,omu,,colcolor,pch19,cex1.8) 8