PART II: Basic Theory of Half-quadratic Minimization

Size: px

Start display at page:

Download "PART II: Basic Theory of Half-quadratic Minimization"

Audrey Ward
6 years ago
Views:

1 PART II: Basic Theory of Half-quadratic Minimization Ran He, Wei-Shi Zheng and Liang Wang

2 Outline History of half-quadratic (HQ) minimization Half-quadratic minimization HQ loss functions HQ in robust M-estimation HQ in information theoretic learning

History of HQ minimization Prof. Donald Geman Department of Applied Mathematics and Statistics Johns Hopkins University, Cachan, France [D. Geman and G. Reynolds.

3 History of HQ minimization Prof. Donald Geman Department of Applied Mathematics and Statistics Johns Hopkins University, Cachan, France [D. Geman and G. Reynolds. IEEE TPAMI, 199] [D. Geman and C. Yang IEEE TIP, 1995] : Theory : Signal and image processing 010 : Machine learning and sparse representation 01

History of HQ minimization Linear image restoration Y =

function X: restored image E: measurement error Original

Third-order restoration First-order Second-order 199 001

4 History of HQ minimization Linear image restoration Y = AX + E Y: blurred and noisy observation A: point spread function X: restored image E: measurement error Original image Blurred image Original image Blurred image Third-order restoration First-order Second-order [D. Geman and G. Reynolds. IEEE TPAMI, 199]

5 History of HQ minimization Linear image restoration φ( x) = min{ px + ϕ( p)} p Y AX + λ φ X i i 1 X i X + min ( ) If a function satisfies the conditions of Theorem 1, it can be realized as the infimum of a family of quadratic functions [D. Geman and G. Reynolds. IEEE TPAMI, 199]

6 History of HQ minimization Nonlinear image restoration φ( x) = min{( x p) + ϕ( p)} p min Y KX + λ φ ( DX) Original image first order X Y = AX + E i Combined order First: DX ij = Xij X ij Three-Second: DX= X X + X,, 1 ij ij, 1 ij, ij, [D. Geman and C. Yang IEEE TIP, 1995]

7 History of HQ minimization Nonlinear image restoration + λ i min Y KX φ ( DX) X Y = AX + E φ( x) = min{( x p) + ϕ( p)} p Hubble image restored image [D. Geman and C. Yang IEEE TIP, 1995]

8 History of HQ minimization Deterministic Edge-Preserving Regularization Edge-Preserving potential functions [P. Charbonnier et al. IEEE ICIP, 1994] [P. Charbonnier et al. IEEE TIP, 1997]

9 History of HQ minimization Half-quadratic criteria Interacting auxiliary variables A family of convex Gibbsian energy functions φ λ H () v v / v λ φ p λ v λ / v > λ () v v p φ 1( v) c + v [J. Idier, IEEE TIP, 001]

10 History of HQ minimization Half-quadratic criteria and EM algorithms Iteratively Reweighted Least Squares (IRLS) and Residual Steepest descent (RSD) algorithms of robust statistics arise as special cases of half-quadratic schemes [F. Champagnat and J. Idier, IEEE SPL, 004]

11 History of HQ minimization Two forms of HQ minimization Convergence of HQ algorithms A family of HQ functions Original image Noisy image First-order Second-order [M. Nikolova and M. NG, SIAM Journal on Scientific Computing, 005] [M. Allain et al., IEEE TIP, 006]

History of HQ minimization HQ Analysis for Mean Shift HQ and gradient linearization iteration 199 001

12 History of HQ minimization HQ Analysis for Mean Shift HQ and gradient linearization iteration [X. T. Yuan and S. Z. Li, ICCV, 007] [M. Nikolova and R. H. Chan, IEEE TIP, 007]

13 History of HQ minimization HQ Analysis for linear inverse problems with compound regularizers Densely corrupted image recovery Original image Noisy image Restored image [Bioucas-Dias and Figueiredo, ICIP, 008]

mean-shift clustering 199 001 004 006 007 009 011 01

14 History of HQ minimization HQ analysis for information theoretic learning Agglomerative mean-shift clustering [Xiaotong Yuan and B.G. Hu, ICML, 009] [Xiaotong Yuan et al., SDM, 009]

15 History of HQ minimization HQ de-noising for high dynamic range image synthesis [Wei Yao et al., ICASSP, 010]

History of HQ minimization Robust semi-supervised learning Laplacian regularization i i + λ ij ij i j min ( f y ) w ( f f ) f i min φ( f y ) + w φ( f f ) f i i i i j ij i j 199 001 004 006 007 009

16 History of HQ minimization Robust semi-supervised learning Laplacian regularization i i + λ ij ij i j min ( f y ) w ( f f ) f i min φ( f y ) + w φ( f f ) f i i i i j ij i j [Sun and Taylor. Sparse semi-supervised learning using conjugate functions. 010] [Yang et al., Robust semi-supervised learning for biometrics, 010] [Nie et al., Unsupervised and Semi-Supervised Learning via L1-Norm Graph, 011]

17 History of HQ minimization Robust sparse representation Low-rank matrix recovery Feature selection and extraction min φ(( Ax y) ) + λ x x j j [Ran He et al., IEEE TPAMI, 011] [Ran He et al., IEEE CVPR, 011] [Hui Yan et al., PR, 011]

18 History of HQ minimization L 1 -norm minimization [Ran He et al., IEEE CVPR, 01]

19 History of HQ minimization Robust nonnegative matrix factorization F min X FG, stf.. 0, G 0 FG, F = i i X FG x Fg i φ x i i Fgi stf G min ( ),.. 0, 0 FG, [D. Kong et al. Robust nonnegative matrix factorization using L1-norm, CIKM, 011] [L. Du et al. Robust Nonnegative Matrix Factorization via HQ, ICMD, 01]

History of HQ minimization Robust subspace segmentation 199 001 004 006 007

20 History of HQ minimization Robust subspace segmentation [Y. Zhang et al. Robust Subspace Clustering via HQ Minimization, ICCV, 013] [C. Liu et al. Correntropy Induced L Graph for Robust Subspace, ICCV, 013] [Y. Zhang et al. Robust Low-Rank Representation via Correntropy, ACPR, 013]

21 Outline History of half-quadratic (HQ) minimization Half-quadratic minimization HQ loss functions HQ in robust M-estimation HQ in information theoretic learning

22 HQ minimization T T k k k [g 1,,g k ] indicates a graph of {x k } T T k k k+ k g x = x, g = [0,..., 0,1, 0,...0] g x = x x 1, g = [0,...,0,1, 1,...0] min Ax y + φ( g x) x k T k Convex Non-convex k = k k + ϕm k e φ( x ) min( x e ) ( e ) k The additive form k = k k + ϕm k p φ( x ) min{ p x ( p )} k The multiplicative form When auxiliary variables e and p are given, the original problem becomes a quadratic problem. [Geman and Reynolds. TPAMI, 199 ][Geman and Yang, TIP, 1995]

23 HQ minimization Conjugate function (Legendre transform) is called the conjugate of the function f(.). [Boyd and Vandenberghe 004]

24 HQ minimization Conjugate function A HQ loss function φ() x = min{ Q(, x p) + ϕ()} p p The additive form The multiplicative form

25 HQ minimization A HQ minimization problem Convex loss function Alternate minimization Minimizer function

26 HQ minimization The multiplicative form [Nikolova and NG 005] A function φ() v satisfies (a) v φ( v) is convex on R, ( b) v φ( v) is concave on R, () c φ() v = φ( v), v R, () d φ is C on R, '' + 1 () e φ (0 ) > 0, ( f) lim φ( v)/ v = 0. t [Geman and Reynolds. TPAMI, 199][Nikolova and NG. 005] +

27 HQ minimization The multiplicative form 1 φ() v = min pv + ϕ() p p Auxiliary variable p is determined by a HQ minimizer function δ() v. is the dual (conjugate) potential function of the function φ() v. ϕ( p) [Geman and Reynolds. TPAMI, 199][Nikolova and NG. 005]

28 HQ minimization Example min Ax y + λ φ( g x) x i T i g i =[0,..., 0,1, 0,..., 0] i

29 HQ minimization Example min Ax y + λ φ( x i ) x i xp, λ i i i ϕ i min Ax y + ( p x + ( p )) p t i = t 1 i δ( x ) t λ i i i x = arg min Ax y + ( p x ) x

30 HQ minimization The additive form A function φ() v satisfies (a) v φ( v) is convex, ( b) c > 0 is such that v { cv / φ( v)} is convex, () c φ() v = φ( v), v R, ( d) φ is continuous on R, () e lim φ()/ v v < c/. v [Geman and Yang, TIP, 1995][Nikolova and NG. 005]

31 HQ minimization The additive form 1 φ() v = min ( v c p/ c) + ϕ() p p Auxiliary variable p is determined by a HQ minimizer function δ() v. is the dual potential (conjugate function) of the function φ() v. Parameter c is a constant. ϕ( p) [Geman and Yang, TIP, 1995][Nikolova and NG. 005]

32 HQ minimization Example min Ax y + λ φ( x i ) x i xp, λ i i i ϕ i min Ax y + (( x p ) + ( p )) p t i = t 1 i δ( x ) t λ i i i x = arg min Ax y + ( x p ) x

33 HQ minimization The additive form and implicit regularizer φ ( v i ) = i min v p + ϕ( p ) p Minimization function w.r.t. φ(.) And i i ϕ(.) Accelerated proximal gradient Special case p i = σ( v ) ϕ(.) = λ. i 1 min v p +λ p p [R. He et al. Recovery of Corrupted Low-rank Matrix by Implicit Regularizers. TPAMI, 013]

34 Outline History of half-quadratic (HQ) minimization Half-quadratic minimization HQ loss functions HQ in robust M-estimation HQ in information theoretic learning

35 HQ loss functions Question 1 Are there functions that satisfy the conditions of the multiplicative and additive forms? Question Can L 1 -norm be optimized by HQ minimization? Can L p -norm be optimized by HQ minimization?

36 HQ loss functions Soft-thresholding function Sparse representation and low-rank matrix recovery min( v p) + λ p p p * 0 v λ = δ() v = v λsign( v) v > λ

37 HQ loss functions Soft-thresholding function Proposition: For a fixed v, the minimal value of fh( p) = min p( v p) + λ p depends on the Huber loss function (i.e. min f ( p) = φ λ ( v) ). p H H φ λ H () v v / v λ λ v λ / v > λ

38 HQ loss functions Soft-thresholding function When v λ, substituting the soft-thresholding function into f p, we directly obtain H ( ) 1 min f ( p) = x p H

39 HQ loss functions Soft-thresholding function When v > λ, substituting the soft-thresholding function into f p, we have H ( ) min f ( p) p H = min p ( v p) + λ p = ( v v + λsing( v)) + λ v λsing( v) = λ + λ v λ = λ v 1 λ Combing the two cases, we get the Proposition.

40 HQ loss functions Soft-thresholding function Absolute function is the dual potential function of Huber loss function Soft-thresholding function is the minimization function of Huber loss function L 1 -norm i φ f ( p) = min ( v p) + λ p λ H H ( x i) = min ( x i e i) + λ e 1 e Robust statistics: M-estimation p i Compressed sensing

41 HQ loss functions Huber loss function φ λ H () v v / v λ λ v λ / v > λ δ M H () v 1 v = λ v v > λ λ δ A H () v 0 v = v λsign( v) v > λ λ The multiplicative form The additive form Soft-thresholding function

42 HQ loss functions

43 HQ loss functions Lp-norm In compressed sensing, Lp-norm is often solved by iteratively reweighted method. [Nikolova and NG. 005]

44 Robust M-estimation Huber M-estimator and L 1 -norm λ H M-estimation i φ ( x ) = min x e + λ e i i i e 1 min φ( xi θ) θ i In robust statistics, M-estimators are defined as the minima of summation of functions of a data set. The statistical procedure of evaluating an M-estimator is called M-estimation.

45 Outline History of half-quadratic (HQ) minimization Half-quadratic minimization HQ loss functions HQ in robust M-estimation HQ in information theoretic learning

46 HQ in robust M-estimation M-estimation min φ( ej θ) θ j [Z. Zhang, IVC, 1997][S.Z. Li, Markov Random Field, 010][Zoubir et al. 013]

47 HQ in robust M-estimation Outliers In robust statistics, outliers are those data points that are significantly different from the original data points Outliers

48 HQ in robust M-estimation M-estimation Iterative reweighted min φ( ej θ) θ j j min w j( e θ) j θ wj = δφ (.) [Z. Zhang, IVC, 1997][S.Z. Li, Markov Random Field, 010][Zoubir et al. 013]

49 Some M-estimators L1 L1-L Lp Fair Huber Cauchy Geman-McClure Welsch Tukey x ( 1 + x / 1) p x / p x / c log(1 + x / c) x / x c c( x c/) x c log(1 + ( x/ c) ) x /((1 + x )) 1 exp( x ( / c) ) 3 /(6(1 (1 ( / ) ) ) c x c x c c /6 x c

50 HQ in robust M-estimation A few commonly used M-estimators [Z. Zhang, IVC, 1997]

51 HQ in robust M-estimation L1 (i.e. absolute value) estimators are not stable because the absolute function x is not strictly convex in x. Indeed, the second derivative at x=0 is unbounded, and an indeterminant solution may result. L1 estimators reduce the influence of large errors, but they still have an influence because the influence function has no cut off point. When using L1-error terms however, these methods often oscillate around the true optimum and only slowly converge towards this optimum due to the non-differentiability of the L 1 - norm at 0. [Z. Zhang, IVC, 1997][R. Angst et al., ICCV, 011]

52 Weighting functions L1 L1-L Lp Fair Huber Cauchy Geman-McClure Welsch Tukey 1/ x 1/ 1 + x / x p 1/(1 + x / c) 1 x c c/ x x c 1/(1 + ( x/ c) ) 1/((1 + x ) ) exp( ( x/ c) ) (1 ( / ) ) x c x c 0 x c

53 HQ in robust M-estimation Weighting functions [Z. Zhang, IVC, 1997]

54 Outline History of half-quadratic (HQ) minimization Half-quadratic minimization HQ loss functions HQ in robust M-estimation HQ in information theoretic learning

55 Information theory learning (ITL) History of ITL Information Theoretic Learning (ITL) was initiated in the late 1990 at CNEL (Prof. Jose Principe). It uses descriptors from information theory (entropy and divergences) estimated directly from the data to substitute the conventional statistical descriptors of variance and covariance. ITL can be used in the adaptation of linear or nonlinear filters and also in unsupervised and supervised machine learning applications.

56 Information theory learning (ITL) Motivation of ITL Seek to extend the ubiquitous mean-square error criterion (MSE) to cost functions that include more information about the training Set. It has a probabilistic meaning of maximizing the error probability density at the origin.

57 Information theory learning (ITL) M-estimation min φ( ej θ) Maximum correntropy criterion Renyi quadratic entropy θ j Welsch M-estimator j 1 e max exp( θ) θ m j σ ij 1 e max exp( θ) θ m i j σ

58 Information theory learning (ITL) Renyi entropy and information potential (IP) Renyi entropy Parzen window Renyi quadratic entropy Information potential

59 Information theory learning (ITL) The multiplicative form Proposition There exists a convex function ϕ : R R, such that x gx (, σ) exp( x / σ ) = sup( p ϕ()) p p R σ And for a fixed x, the supremum is reached at p = g(, x σ) [Yuan and Li, ICCV, 007][Yuan and Hu, ICML, 009]

60 Summary HQ minimization min Ax y + λ φ( x i ) x i A HQ loss function can be realized as the infimum of a family of quadratic functions

61 Summary HQ minimization min Ax y + λ φ( x i ) x The multiplicative form The additive form i λ p t i i i i + ϕ pi ( px ( )) Alternate minimization t 1 i δ( x ) min Ax y + Q ( x, p) x = M λ p t i i i i + ϕ i (( x p ) ( p )) Alternate minimization t 1 i δ( x ) min Ax y + Q ( x, p) x = A

62 HQ loss functions Huber and Welsch M-estimators Convex Huber M-estimator It changes from an L metric to L1 Its dual potential function is absolute function. Nonconvex Welsch M-estimator It changes from an L metric to L1 and finally to L0 depending upon the distance between samples Correntropy induced metric

63 Potential applications Image restoration HQ minimization for structure prior Substituting L-norm (mean square error) with an M-estimator Coefficients -->sparsity Errors -->robustness min Ax y + φ( g x) x k T k

64 Discussion relationship to other methods A general concept for function minimization 1 φ() v = min pv + ϕ() p p 1 φ() v = min ( v p) + ϕ() p p HQ is to minimize a function i 1 i = i i + ϕ i pi i φ( v ) min ( pv ( p )) 1 φ ( vi ) = min v p + p 1 pi i Any methods, constructing a quadratic item to simplify optimization, can be categorized into HQ Many sparsity estimation problems based on 1 min p i 1 v p + p

65 Discussion what is the most important Soft-thresholding function (proximal operator) Sparsity estimation Nuclear norm minimization

66 Some source codes Open Pattern Recognition Project is intended to be an open source platform for sharing algorithms of image processing, computer vision, natural language processing, pattern recognition, machine learning and the related fields. OpenPR is currently supported by the National Laboratory of Pattern Recognition, CASIA.

67 Thank You

68 Some references D. Geman and G. Reynolds. Constrained restoration and recovery of discontinuities. IEEE TPAMI, 199, 14, D. Geman and C. Yang. Nonlinear image recovery with half-quadratic regularization. IEEE TIP, 1995, 4(7), P. Charbonnier, L. Blanc-Feraud, G. Aubert, and M. Barlaud. Deterministic Edge-Preserving Regularization in Computed Imaging. IEEE TIP, 1997,6(), P. Hellier, C. Barillot, E. Memin, and P. Perez. An energy-based framework for dense 3D registration of volumetric brain images. in IEEE CVPR, 000. J. Idier. Convex half-quadratic criteria and interacting auxiliary variables for image restoration. IEEE TIP, vol. 10, no. 7, pp , 001. M. Rivera and J. L. Marroquin. Efficient half-quadratic regularization with granularity control. Image Vision Computing, 003, 1(4), F. Champagnat and J. Idier. A connection between half-quadratic criteria and EM algorithms. IEEE Signal Processing Letters. 004, 11(9), M. Nikolova and M. K. NG, Analysis of half-quadratic minimization methods for signal and image recovery, SIAM Journal on Scientific Computing, 005, 7(3), , 005. M. Allain, J. Idier, and Y. Goussard. On global and local convergence of half-quadratic algorithms. IEEE TIP, 006, 15(5),

69 Some references G.Y. An, Q. Ruan and J.Y. Wu. Most Expressive Feature Extracted by Half-Quadratic Theory and Multiresolution Analysis in Face Recognition. ICSP, 006. Xiaotong Yuan, Stan Z. Li: Half Quadratic Analysis for Mean Shift: with Extension to A Sequential Data Mode-Seeking Method. ICCV 007: 1-8. M. Nikolova and R. H. Chan. The equivalence of half-quadratic minimization and the gradient linearization iteration. IEEE TIP, 007, 16(6), J. P. Tarel, S. S. Ieng, and P. Charbonnier. A constrained-optimization based half-quadratic algorithm for robustly fitting sets of linearly parametrized curves. Advances in Data Analysis and Classification, 008, (3), pp Xiaotong Yuan, Bao-Gang Hu: Robust feature extraction via information theoretic learning. ICML, 009. Z. Li, S. Rahardja, S. Yao, J. Zheng and W. Yao. High Dynamic Range Compression by Half Quadratic Regularization. in ICIP, 009. W. Yao, Z. Li, S. Rahardja, S. Yao, and J. Zheng. Half-quadratic regularization based denoising for high dynamic range image synthesis. In ICASSP, 010, pp J. P. Tarel and P. Charbonnier. A Lagrangian Half-Quadratic Approach to Robust Estimation and its Applications to Road Scene Analysis. PRL, 010, 31(14),19-00.

70 Some references 010- Ran He, Bao-Gang Hu, Wei-Shi Zheng, YanQing Guo. Two-stage Sparse Representation for Robust Recognition on Large-scale Database, AAAI, 010, pp Ran He, Bao-Gang Hu, Wei-Shi Zheng, XiangWei Kong. Robust Principal Component Analysis Based on Maximum Correntropy Criterion. IEEE TIP, 011, 0(6): Ran He, Wei-Shi Zheng, Bao-Gang Hu. Maximum Correntropy Criterion for Robust Face Recognition. IEEE TPAMI, 011, 33(8): Ran He, Zhenan Sun, Tieniu Tan, Wei-Shi Zheng. Recovery of Corrupted Low-Rank Matrices via Half-Quadratic based Nonconvex Minimization. IEEE CVPR, 011. Ran He, Wei-Shi Zheng, Bao-Gang Hu, XiangWei Kong. A Regularized Correntropy Framework for Robust Pattern Recognition. MIT NECO, 011, 3(8): Hui Yan, Xiaotong Yuan, Shuicheng Yan, Jingyu Yang: Correntropy based feature selection using binary projection. Pattern Recognition, 011, 44(1): R. H. Chan and H.X. Liang. A Fast and Efficient Half-Quadratic Algorithm for TV-L1 Image restoration. The Chinese University of Hong Kong, 011, XiaoTong Yuan, Bao-Gang Hu, Ran He. Agglomerative Mean-Shift Clustering. IEEE Transactions on Knowledge and Data Engineering (TKDE), 01, 4(): Ran He, Tieniu Tan, Liang Wang and Wei-Shi Zheng. L1 Regularized Correntropy for Robust Feature Selection. IEEE CVPR, 01.

Robust Component Analysis via HQ Minimization

Robust Component Analysis via HQ Minimization Ran He, Wei-shi Zheng and Liang Wang 0-08-6 Outline Overview Half-quadratic minimization principal component analysis Robust principal component analysis Robust