Gradient descents and inner products

Size: px

Start display at page:

Download "Gradient descents and inner products"

Cleopatra Golden
5 years ago
Views:

1 Gradient descents and inner products

2 Gradient descent and inner products Gradient (definition) Usual inner product (L 2 ) Natural inner products Other inner products Examples

3 Gradient (definition) Energy E depending on a ariable f (ector or function) gradient E(f) =? Directional deriatie: consider a ariation of f. E(f + ε) E(f) δe(f)() := lim ε 0 ε f Hopefully, δe is linear and continuous wrt. Riesz representation theorem = gradient: ariation E(f) such that, δe(f)() = E(f) Usual inner product : L 2 f gradient

4 Gradient Descent Scheme Build minimizing path: f t=0 = f 0 f t = P f E(f)

5 Gradient Descent Scheme Build minimizing path: f t=0 = f 0 f t = P f E(f) Change the inner product P = change the minimizing path P f E(f) = arg min δe(f)() + 1 } 2 2 P P as a prior on the minimizing flow

6 Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent)

7 Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent) Should all parameters α i hae the same weights? Consider a different parameterization β = P( α ): β 1 = 10 α 1 and β i = α i i 1 F ( β ) = E(P 1 ( β )) = E(0.1β 1, β 2,..., β n )

8 Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent) Should all parameters α i hae the same weights? Consider a different parameterization β = P( α ): β 1 = 10 α 1 and β i = α i i 1 F ( β ) = E(P 1 ( β )) = E(0.1β 1, β 2,..., β n ) i 1, βi F ( β ) = αi E( α ) β1 F ( β ) = 0.1 α1 E( α ) α=p 1 ( β )

9 E( α ) = E(α 1, α 2, α 3,..., α n ) β 1 = 10 α 1 β1 F ( β ) = 0.1 α1 E( α ) α=p 1 ( β ) gradient ascent wrt β: δβ 1 = β1 F ( β ) = δβ 1 = 0.1 α1 E( α ) gradient ascent wrt α: δα 1 = α1 E( α ) = δβ 1 = 10 δα 1 = 10 α1 E( α ) Difference between two approaches: factor 100 for the first parameter! Conclusion: L 2 inner product is bad (not intrinsic)

10 Example 2: (not specific to) kernels and intrinsic gradients f = i α i k(x i, ) E(f) = i f(x i ) y i 2 + f 2 H L 2 gradient wrt α : not the best idea Choose a natural, intrinsic inner product: δ α a δ α b P = δf a δf b H = Df(δ α a ) Df(δ α b ) H = δ α a t Df Df δ α b H In case of kernels this gies: δ α a δ α b P = δ α a K δ α b Corresponding gradient: P α E = (t Df Df ) 1 ( P adm ( H f E) )

11 Example 3: parameterized shapes The same as for kernels Choose any representation for shapes (splines, polygon, leel-set...) Intrinsic gradient = as close as possible to the real eolution, most independent as possible of the representation/parameterization

12 Example 4: shapes, contours Cure f Deformation field, in the tangent space of f f + Usual tangent space: L 2 (f): u L 2 (f) = u(x) (x) df(x) f

13 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) f

14 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) f H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Interesting property of the H 1 gradient: H1 E = arg inf u u L2 E 2 + xu 2 L 2 L 2

15 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set S of prefered transformations (e.g. rigid motion) Projection on S: Q Projection orthogonal to S: R (Q + R = Id) f u S = Q(u) Q() L 2 + α R(u) R() L 2

16 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures f f t = f d H (f, f 2 ) usual rigidified

17 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures Change an inner product for another one: linear symmetric positie definite transformation of the gradient f

18 Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures Change an inner product for another one: linear symmetric positie definite transformation of the gradient Gaussian smoothing of the L 2 gradient: symmetric positie definite f

19 Extension to non-linear criteria P f E(f) = arg min δe(f)() P }

Extension to non-linear criteria P f E(f) = arg min δe(f)() + 1 2 2 P P

20 Extension to non-linear criteria P f E(f) = arg min δe(f)() P P f E(f) = arg min δe(f)() + R P ()} Example: semi-local rigidification }

21 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P

22 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents:

23 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f)}

24 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)()}

25 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P

26 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic

27 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()()

28 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()() quadratic : order-2 Newton

29 Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()() quadratic : order-2 Newton arg min E(f) + δe(f)() + R P ()} 1st order + any suitable criterion

30 (End of the parenthesis)

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties