Gradient descents and inner products

Gradient descent and inner products Gradient (definition) Usual inner product (L 2 ) Natural inner products Other inner products Examples

Gradient (definition) Energy E depending on a ariable f (ector or function) gradient E(f) =? Directional deriatie: consider a ariation of f. E(f + ε) E(f) δe(f)() := lim ε 0 ε f Hopefully, δe is linear and continuous wrt. Riesz representation theorem = gradient: ariation E(f) such that, δe(f)() = E(f) Usual inner product : L 2 f gradient

Gradient Descent Scheme Build minimizing path: f t=0 = f 0 f t = P f E(f)

Gradient Descent Scheme Build minimizing path: f t=0 = f 0 f t = P f E(f) Change the inner product P = change the minimizing path P f E(f) = arg min δe(f)() + 1 } 2 2 P P as a prior on the minimizing flow

Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent)

Example 1: ectors, lists of parameters E( α ) = E(α 1, α 2, α 3,..., α n ) Usual L 2 gradient: E( α ) = i i E(α 1, α 2,..., α n ) e i (supposes α i independent) Should all parameters α i hae the same weights? Consider a different parameterization β = P( α ): β 1 = 10 α 1 and β i = α i i 1 F ( β ) = E(P 1 ( β )) = E(0.1β 1, β 2,..., β n )

E( α ) = E(α 1, α 2, α 3,..., α n ) β 1 = 10 α 1 β1 F ( β ) = 0.1 α1 E( α ) α=p 1 ( β ) gradient ascent wrt β: δβ 1 = β1 F ( β ) = δβ 1 = 0.1 α1 E( α ) gradient ascent wrt α: δα 1 = α1 E( α ) = δβ 1 = 10 δα 1 = 10 α1 E( α ) Difference between two approaches: factor 100 for the first parameter! Conclusion: L 2 inner product is bad (not intrinsic)

Example 2: (not specific to) kernels and intrinsic gradients f = i α i k(x i, ) E(f) = i f(x i ) y i 2 + f 2 H L 2 gradient wrt α : not the best idea Choose a natural, intrinsic inner product: δ α a δ α b P = δf a δf b H = Df(δ α a ) Df(δ α b ) H = δ α a t Df Df δ α b H In case of kernels this gies: δ α a δ α b P = δ α a K δ α b Corresponding gradient: P α E = (t Df Df ) 1 ( P adm ( H f E) )

Example 3: parameterized shapes The same as for kernels Choose any representation for shapes (splines, polygon, leel-set...) Intrinsic gradient = as close as possible to the real eolution, most independent as possible of the representation/parameterization

Example 4: shapes, contours Cure f Deformation field, in the tangent space of f f + Usual tangent space: L 2 (f): u L 2 (f) = u(x) (x) df(x) f

Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) f

Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) f H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Interesting property of the H 1 gradient: H1 E = arg inf u u L2 E 2 + xu 2 L 2 L 2

Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set S of prefered transformations (e.g. rigid motion) Projection on S: Q Projection orthogonal to S: R (Q + R = Id) f u S = Q(u) Q() L 2 + α R(u) R() L 2

Examples of inner products for shapes (or functions) L 2 inner product: u L 2 (f) = u(x) (x) df(x) H 1 inner product: u H 1 = u L 2 (f) + xu x L 2 (f) Set of prefered transformations (e.g. rigid motion) Example: minimizing the Hausdorff distance between two cures Change an inner product for another one: linear symmetric positie definite transformation of the gradient f

Extension to non-linear criteria P f E(f) = arg min δe(f)() + 1 2 2 P }

Extension to non-linear criteria P f E(f) = arg min δe(f)() + 1 2 2 P P f E(f) = arg min δe(f)() + R P ()} Example: semi-local rigidification }

Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P

Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents:

Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f)}

Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)()}

Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P

Remark on Taylor series and Newton method P f E(f) = arg min δe(f)() + 1 } 2 2 P Different approximations of E(f + ) = different gradient descents: arg min E(f) + δe(f)() + 1 } 2 2 P = P f E(f) 1st order + metric choice 2 = quadratic arg min E(f) + δe(f)() + 1 } 2 δ2 E(f)()()

(End of the parenthesis)