Solution 8 Advanced Numerical Analysis

Size: px

Start display at page:

Download "Solution 8 Advanced Numerical Analysis"

Jeffery Douglas
6 years ago
Views:

1 Solution 8 Advanced Numerical Analysis Nonlinear CG function ex9problem2 close all; 3 clear all Prof. D. Kressner M. Steinlechner 5 % Poisson matrix % n = ; I = eye(n^2; 9 A = gallery( poisson,n; % Choose a starting value x = [; zeros(n^2-,]; 3 %x = ones(n^2,; 5 % Prolate matrix 7 % % n = ; 9 % I = eye(n; 2 % A = gallery( prolate,n; % Choose a starting value 23 %x = [; zeros(n-,]; 25 % x = ones(n,; % Calculate exact solution: 27 Xexact = min(eig(a; 29 % Define function handles here f x *(A*x/(x *x; 3 df 2*(I - x*x /(x *x*(a*x/(x *x; 33 % Line search parameters alpha = ; 35 beta =.5; c = e-4; 37 tol = e-8; 39 maxiter = ; %Run Fletcher-Reeves 4 [X,fX,dfX] = ncg(f,df,x,c,alpha,beta,tol,maxiter, fr ; 43 %Run Polak-Ribiere 45 [X2,fX2,dfX2] = ncg(f,df,x,c,alpha,beta,tol,maxiter, pr+ ; %Run steepest descent 47 [X3,fX3,dfX3] = steepdesc(f,df,x,c,alpha,beta,tol,maxiter; 49 %Plot function value and gradient subplot(,2, 5 semilogy([::numel(fx-],diag(dfx *dfx.^(/2, ro ; hold on 53 title( Gradient norm subplot(,2,2; 55 semilogy([::numel(fx-],abs(fx-xexact, ro ; hold on 57 title( Error on cost function subplot(,2, 59 semilogy([::numel(fx2-],diag(dfx2 *dfx2.^(/2, ob ; subplot(,2,2; 6 semilogy([::numel(fx2-],abs(fx2-xexact, ob ;

2 subplot(,2, 63 semilogy([::numel(fx3-],diag(dfx3 *dfx3.^(/2, ok ; leg( FR, PR+, SD 65 subplot(,2,2; semilogy([::numel(fx3-],abs(fx3-xexact, ok ; 67 leg( FR, PR+, SD %Required functions 75 function [X,fX,dfX] = ncg(f,df,x,c,alpha,beta,tol,maxiter,opt 77 X(:, = x; fx(:, = f(x; 79 dfx(:, = df(x; k = ; 8 xk = x; gk = df(xk; 83 pk = -gk; 85 while norm(gk > tol && k<=maxiter 87 % start backtracking alpha = alpha; 89 while f(xk + alpha*pk > f(xk + c*alpha*gk *pk 9 alpha = alpha*beta; 93 %Perform step 95 xk = xk + alpha*pk; %new gradient 97 gknew = df(xk; 99 %new search directions if strcmp(opt, fr %Fletcher-Reeves 3 else betak = norm(gknew^2 / norm(gk^2; %Polak-Ribiere + 5 betak = max(,gknew *(gknew - gk / norm(gk^2; 7 pk = -gknew + betak*pk; 9 %continue with gk = gknew; X(:,k+ = xk; fx(:,k+ = f(xk; 3 dfx(:,k+ = gk; 5 7 k = k+; 9 function [X,fX,dfX] = steepdesc(f,df,x,c,alpha,beta,tol,maxiter 2 X(:, = x; fx(:, = f(x; 23 dfx(:, = df(x; k = ; 25 xk = x; gk = dfx(:,; while norm(gk > tol && k<=maxiter %search directions 2

3 3 pk = -gk; 33 % start backtracking alpha = alpha; 35 while f(xk + alpha*pk > f(xk + c*alpha*gk *pk 37 alpha = alpha*beta; 39 %Perform step xk = xk + alpha*pk; 4 gk = df(xk; X(:,k+ = xk; 43 fx(:,k+ = f(xk; dfx(:,k+ = gk; 45 k = k+; 47 2 Convex functions Prove the following simple statements for µ-strongly convex functions. a (Third relation in Lemma 4.22 Let f be twice differentiable and let H(x denote the Hessian of f. Then H(x µi if and only if f is µ-strongly convex. 叩サ翻匂ア ' 引ス, う ' プ叫い劃っート勺い曰 ( 少 = 明四平ル〆ュ " 当っ沙, ム ' = 凸ト翻団団 ((kð 繼な = ー少ー仞ララロ = タソ当らりたノ区ぅ物 7 区りツくー冂引 tm 当叫 4 弓は翆つ, 帰 - ⅶ 函イソう鬥い, b Show that for a differentiable µ-strongly convex function, the distance x x 2 from the point x to the minimizer x can be bounded solely by the norm of the gradient, f(x 2 : x x 2 2 µ f(x 2. 3

4 5 3 Binary logistic regression Logistic regression is an important tool in statistics and has various applications in machine learning and data mining for the classification of data. The binary logistic model with parameter ˆx R p yields the probability of the class b {, } given a certain sample a R n : P(b a = + exp( ba T ˆx Unfortunately, the parameter ˆx is usually unknown and we have to estimate it from data samples. Let a i R n be sampling points and b i be the associated binary class labels. Then, an approximation of the true parameter ˆx is given by the maximum log-likelyhood estimator x = argmin f(x, with f(x = x R p n log ( h(b i a T i x where h(t = /( + exp(t is the sigmoid function. Binary classification can hence be cast into an unconstrained optimization problem for the model parameters x (Note that we have introduced a minus sign to go from a maximization problem to a minimization problem. a Show that for a given data set {(a, b, (a 2, b 2,..., (a n, b n }, the objective function f is convex. To show that the function f(x = i= n log ( h(b i a T i x i= is convex, we will prove that the Hessian of f is positive semidefinite. The Hessian is given on the exercise sheet, H(x = A T D x A, with the data matrix A = [ a a 2... a n ] T and the diagonal matrix D x = diag (h(b a T x ( h(b a T x,..., h(b n a Tn x ( h(b n a Tn x 4

5 As h(t = +exp( t is easy to show that With this definition of g, we have, we have that g(t := h(t( h(t = exp( t D x = diag < g(t <, t R. 4 ( g(b a T x,..., g(b n a T n x (+exp( t 2, and it As g is strictly positive, we have that D x is a positive definite diagonal matrix. Thus, the square root of it exists, with { D 2 = diag g ( n b i a T i x}. To show that the Hessian H(x is positive semidefinite, we need to show that for any vector y R n with y it holds that y T H(xy y T A T D x Ay y T A T D 2 D 2 Ay i= y T A T D 2 D 2 Ay (D 2 Ay T (D 2 Ay D 2 Ay 2. which is clearly true as the norm is always nonnegative. b Is f strongly convex? For f to be strongly convex, it is necessary (but not sufficient! that the strict inequality y T H(xy > D 2 Ay 2 >. has to hold for all vectors y R n with y. In general, the data matrix A is not invertible, as its rows (the sample vectors are not necessarily linearly indepent, that is, it can happen that rank(a < min{n, p}. Hence, it may have a non-empty kernel, that is, we can find a z, and thus also z ker(a Az =. D 2 Az 2 =. Hence, f is in general not strictly convex and thus of course also not strongly convex. c Show that the Hessian of f is bounded for all x R p : H(x 2 < C. To show that H(x is bounded, we use the decomposition of H(x introduced in Exercise above and the fact that all induced matrix norms are submultiplicative, that is, H(x = AD x A A D x A = D x A 2. The diagonal matrix D x has entries g (b i a T (i x on the diagonal. As < g(t < 4 for all t R, we have D x = λ max (D x = max g (b i a T (i x i 4. Hence, the Hessian is bounded by H(x 4 A 2 indepent of x, as A are the training data samples. d What is the smallest Lipschitz constant L > you can find such that the gradient f is Lipschitz continuous, f(x f(y 2 L x y, x, y R p? A differentiable function is Lipschitz continuous if and only if its derivative is bounded. In this case, the gradient f is Lipschitz if the second derivative is bounded. This was shown in c, with L = 4 A 2 a possible Lipschitz constant. 5

Unconstrained optimization

Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout