A total variation approach

Similar documents
Lecture Notes on Linear Regression

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Classification as a Regression Problem

Linear Approximation with Regularization and Moving Least Squares

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Generalized Linear Methods

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Approximate Inference: Mean Field Methods

Kernel Methods and SVMs Extension

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Algorithms for factoring

Ensemble Methods: Boosting

Machine learning: Density estimation

Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Bayesian Decision Theory

8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Boostrapaggregating (Bagging)

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Support Vector Machines

Probability Density Function Estimation by different Methods

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Errors for Linear Systems

Gaussian Mixture Models

( ) 2 ( ) ( ) Problem Set 4 Suggested Solutions. Problem 1

Singular Value Decomposition: Theory and Applications

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

THERMODYNAMICS. Temperature

1 Convex Optimization

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

On an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1

Which Separator? Spring 1

Support Vector Machines

Topology optimization of plate structures subject to initial excitations for minimum dynamic performance index

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

6. Hamilton s Equations

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

Statistical pattern recognition

Hidden Markov Model Cheat Sheet

Composite Hypotheses testing

Mixture of Gaussians Expectation Maximization (EM) Part 2

Limited Dependent Variables

4DVAR, according to the name, is a four-dimensional variational method.

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

The big picture. Outline

Probabilistic Classification: Bayes Classifiers. Lecture 6:

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Bayesian classification CISC 5800 Professor Daniel Leeds

Expectation Maximization Mixture Models HMMs

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Statistics Chapter 4

Estimation: Part 2. Chapter GREG estimation

Negative Binomial Regression

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Feb 14: Spatial analysis of data fields

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

10-701/ Machine Learning, Fall 2005 Homework 3

Statistical Circuit Optimization Considering Device and Interconnect Process Variations

The Basic Idea of EM

Feature Selection & Dynamic Tracking F&P Textbook New: Ch 11, Old: Ch 17 Guido Gerig CS 6320, Spring 2013

Exam. Econometrics - Exam 1

Identification of Linear Partial Difference Equations with Constant Coefficients

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

Switching Median Filter Based on Iterative Clustering Noise Detection

Tracking with Kalman Filter

Model Reference Adaptive Temperature Control of the Electromagnetic Oven Process in Manufacturing Process

1 GSW Iterative Techniques for y = Ax

Fuzzy approach to solve multi-objective capacitated transportation problem

3.1 ML and Empirical Distribution

IV. Performance Optimization

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Lecture 3: Shannon s Theorem

STATIC OPTIMIZATION: BASICS

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Chapter 9: Statistical Inference and the Relationship between Two Variables

Differentiating Gaussian Processes

Pattern Classification (II) 杜俊

Homework 10 Stat 547. Problem ) Z D!

The exam is closed book, closed notes except your one-page cheat sheet.

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

Semi-Supervised Learning

CSE 546 Midterm Exam, Fall 2014(with Solution)

Communication with AWGN Interference

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Pulse Coded Modulation

Independent Component Analysis

Support Vector Machines CS434

ECE559VV Project Report

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

Lecture 10 Support Vector Machines II

Support Vector Machines CS434

Transcription:

Denosng n dgtal radograhy: A total varaton aroach I. Froso M. Lucchese. A. Borghese htt://as-lab.ds.unm.t / 46 I. Froso, M. Lucchese,. A. Borghese

Images are corruted by nose ) When measurement of some hyscal arameter s erformed, nose corruton cannot be avoded. ) Each xel ofa dgtal mage measures a number of hotons. Therefore, from ) and ) Images are corruted by nose! htt://as-lab.ds.unm.t / 46 I. Froso, M. Lucchese,. A. Borghese

Gaussan nose (not so useful for dgtal radograhs, but a good model for learnng ) Measurement nose s often modeled dld as Gaussan nose Let x be the measured hyscal arameter, let μ be the nose free arameter and let σ be the varance of the measured arameter (nose ower); the robablty densty functon for x s gven by: ( x μ, σ ) = ex σ π x μ σ htt://as-lab.ds.unm.t 3/ 46 I. Froso, M. Lucchese,. A. Borghese

Gaussan nose and lkelhood Images are comosedosed by a set of xels, x (x s a vector!) How can we quantfy the robablty to measure the mage x, gven the robablty densty functon for each xel? Let us assume that the varance s equal for each xel; Let x and μ be the measured and noseless values for the -th xel; Lkelhood functon, L(x μ): L x = = μ x μ = ex σ π σ ( x μ) = ( ) L(x μ) descrbes the robablty to measure the mage x, gven the nose free value for each xel, μ. htt://as-lab.ds.unm.t 4/ 46 I. Froso, M. Lucchese,. A. Borghese

What about denosng??? What s denosng then? Denosng = estmate μ from x. How can we estmate μ? Maxmze (μ x) => ths usually leads to an hard, nverse roblem. It s easer to maxmze (x μ), that s => maxmze the lkelhood functon (a smle, drect roblem). But Is maxmzaton of (μ x) dfferent from that of (x μ)? htt://as-lab.ds.unm.t 5/ 46 I. Froso, M. Lucchese,. A. Borghese

Bayes and lkelhood Bayes theorem: em ( μ x ) ( x ) = ( x μ ) ( μ ) Lkelhood ( μ x) = ( x μ) ( μ) ( x ) A ror hyothess h s on the estmated arameters μ. For the moment, let us suose (μ) = cost. Probablty densty functon for the data x Just a normalzaton factor!!! In ths case, maxmzng (μ x) or (x μ) s the same! htt://as-lab.ds.unm.t 6/ 46 I. Froso, M. Lucchese,. A. Borghese

So, let us maxmze the lkelhood L f Instead of maxmzng L(x μ), t s easer to mnze z log[l(x μ)]. When the nose s Gaussan, we get: x μ ex = = σ π σ ( x μ ) = ( x ) = μ μ = σ π σ = ( x μ) = ln L( x μ) [ ] = ln + ( x ) Maxmze L=> Least squares roblem! Least squares! Constant! htt://as-lab.ds.unm.t 7/ 46 I. Froso, M. Lucchese,. A. Borghese

However, what about nose n dgtal radograhy? ose n dgtal radograhy s Posson (hoton countng nose)! Let n, be the nosy (measured) number of hotons assocated to xel, and the unnosy number of hotons. o Then: ( ) n, = n, e! n, htt://as-lab.ds.unm.t 8/ 46 I. Froso, M. Lucchese,. A. Borghese

Gaussan nose: examle 000 Gaussan nose Constant varance 0000 8000 6000 4000 000 0 0 00 00 300 400 500 600 700 htt://as-lab.ds.unm.t 9/ 46 I. Froso, M. Lucchese,. A. Borghese

Posson nose: examle 000 Posson nose Lower varance for low sgnal 0000 8000 6000 4000 000 0 0 00 00 300 400 500 600 700 htt://as-lab.ds.unm.t 0 / 46 I. Froso, M. Lucchese,. A. Borghese

Lkelhood for Posson nose Let us wrte the negatve log lkelhood for the Posson case: L f = ( n ) = ( n, ) = = n, e! = n, ( ) = ln[ L( x μ) ] = ( ) n [ n, ln ( ) ] = [ n, ln ] + + ln( n,! ) = = = L( n ) s also known as Kullback-Lebler bl dvergence (aart from a constant term, whch does not affect the mnmzaton rocess), KL( n ). = htt://as-lab.ds.unm.t / 46 I. Froso, M. Lucchese,. A. Borghese

Maxmze L! L s maxmzed <=> f s mnmzed; Otmzaton (Gaussan nose) can be erformed osng: ( ) x f ( ) f ( ) j μ j x μ x μ j= = 0 = 0, = 0, μ μ μ ( x μ ) = 0, x =, μ The nosy mage gves the hghest lkelhood!!! Ths soluton s not so nterestng The lkelhood aroach suffers from a severe overfttng roblem. htt://as-lab.ds.unm.t / 46 I. Froso, M. Lucchese,. A. Borghese

Maxmze L! L s maxmzed <=> f s mnmzed; Otmzaton (Posson nose) can be erformed osng: f ( ) f ( ) = [ ln( )] n, n n = = 0 0, = 0, n, = 0, = n,, The nosy mage gves the hghest lkelhood!!! Ths soluton s not so nterestng The lkelhood aroach suffers from a severe overfttng roblem. htt://as-lab.ds.unm.t 3 / 46 I. Froso, M. Lucchese,. A. Borghese

Back to Bayes Bayes theorem: em Lkelhood ( ) = n ( ) ( ) n ( ) n A ror hyothess on the estmated arameters μ. Probablty densty functon for the data x Just a normalzaton factor!!! If we ntroduce a-ror knowledge about the soluton μ, we get a Maxmum A Posteror (MAP) soluton ( n ) s maxmzed! htt://as-lab.ds.unm.t 4 / 46 I. Froso, M. Lucchese,. A. Borghese

What do we have to mnmze now? We wantto maxmze ( n ) ~( n ) (), that s: ln = [ ( )] = ln ( ) ( ) ln = ln n [ ] = ln [ ( ) ( )] n n, = = [ ( n, ) ( )] = ln ( n, ) ln ( ) [ L( ) ] ln ( ) n = egatve log lkelhood = = Regularzaton term (a ror nformaton) = htt://as-lab.ds.unm.t 5 / 46 I. Froso, M. Lucchese,. A. Borghese

A ror term Let us call x and y the two comonents of the gradent of the mage. These are easly comuted, for nstance as: x =(,j) (-,j); y = (,j) (,j-); The gradent gaden (a vector!) wll be ndcated as ; ndcates the norm of the gradent. htt://as-lab.ds.unm.t 6 / 46 I. Froso, M. Lucchese,. A. Borghese

A ror term mage gradents (no nose) x = (,j) (-,j) y = (,j) (,j-) y htt://as-lab.ds.unm.t 7 / 46 I. Froso, M. Lucchese,. A. Borghese

A ror term mage gradents (nose) x = (,j) (-,j) y = (,j) (,j-) y htt://as-lab.ds.unm.t 8 / 46 I. Froso, M. Lucchese,. A. Borghese

A ror term norm of mage gradent o nose ose In the real mage, most of the areas are characterzed by an (almost) null gradent norm; We can for nstance suose that s a random varable wth Gaussan dstrbuton, zero mean and varance equal to β. [ote that, n the nosy mage, the norm of the gradent assume hgher values low means low nose!] htt://as-lab.ds.unm.t 9 / 46 I. Froso, M. Lucchese,. A. Borghese

MAP and regularzaton theory Posson nose, normal dstrbutonb for the norm of the gradent: f ( ) = ln[ L( ) ] ln ( ) n n = [ ( )] = n, ln ln ex = = = β π β = [ ( )] + ( ) n, ln ln β π + ββ = = Const!!! egatve log lkelhood Regularzaton term (a ror nformaton) = htt://as-lab.ds.unm.t 0 / 46 I. Froso, M. Lucchese,. A. Borghese

MAP and regularzaton theory We look for the mnmum of f The lkelhood lh s maxmzed m (data afttng term) At the same tme, the squared norm of the gradent s mnmzed (regularzaton term) The regularzaton arameter (/β ) balances between a erfect data fttng and very regular mage f [ n, ] + ( ) = ln ( ) n = β = htt://as-lab.ds.unm.t / 46 I. Froso, M. Lucchese,. A. Borghese

MAP and regularzaton theory For (/β ) = 0 we get the maxmum lkelhood soluton; Increasng (/β ) we get a more regular (less nosy) soluton; For (/β ) ->, a comletely smooth mage s acheved. (/β ) = 0 ose reducton. (/β ) = 0.005 0 ose and edge reducton. 8 6 4 0 - -4-6 htt://as-lab.ds.unm.t / 46 (/β ) = 0. I. Froso, M. Lucchese,. A. Borghese

Fx the deas A statstcal based denosng flter s acheved mnmzng: f=-ln[l( n )]-λ ln[()] The data fttng term s derved from the nose statstcal dstrbuton (lkelhood of the data); generally, the choce for ths term s unquestonable. The regularzaton term s derved from a-ror knowledge regardng some roertes of the soluton; ths term s generally user defned. Deendng on the regularzaton arameter λ, the frst or the second term assume more or less mortance. For λ->0, the maxmum lkelhood soluton s obtaned. htt://as-lab.ds.unm.t 3 / 46 I. Froso, M. Lucchese,. A. Borghese

Gbbs ror U to now, we assumed a normal dstrbutonbuton for the norm of the gradent, Tkhonov regularzaton (quadratc enalzaton). A more general framework s obtaned consderng: () =ex[-r()] (Gbb s ror) R() -> Energy functon ~ regularzaton term (note that -ln ex[-r()] = R()!) Tkhonov assumes R() = -½ (ll ll/β) htt://as-lab.ds.unm.t 4 / 46 I. Froso, M. Lucchese,. A. Borghese

Edge reservng denosng? Tkhonov term enalzes the mage edges (hgh gradent) more than the nose 6 4 gradents. It s well known that 0 Tkhonov regularzaton does not 8 reserve edges. 6 An edge reservng 4 algorthm s obtaned consderng R()=-ll ll [Total varaton, TV]. R( Grad ) Tkhonov Total varaton 0 0 0.5.5.5 3 3.5 4 Grad htt://as-lab.ds.unm.t 5 / 46 I. Froso, M. Lucchese,. A. Borghese

Tkhonov vs. TV (revew) Fltered mage Dfference 50 50 Tkhonov => 00 50 00 50 00 00 50 00 50 00 50 50 300 Orgnal mage 300 350 350 400 400 450 450 500 500 50 00 50 00 50 300 350 400 450 500 50 00 50 00 50 300 350 400 450 500 50 300 350 400 450 500 50 00 50 00 50 300 350 400 450 500 50 00 50 00 50 00 50 00 50 50 TV => 300 350 400 300 350 400 450 450 500 50 00 50 00 50 300 350 400 450 500 500 50 00 50 00 50 300 350 400 450 500 htt://as-lab.ds.unm.t 6 / 46 I. Froso, M. Lucchese,. A. Borghese

TV n dgtal radograhy: startng ont and roblems n, nosy mage affected by Posson nose (negatve log lkelhood => KL);, nose free mage (unknown); R() = (Total varaton); Mnmze Mnm f( n ) = KL( n,) + λ Σ =... How to comute? => A comromse between comutatonal effcency and accuracy has to be acheved. How to mnmze f( n )? => An teratve otmzaton technque s requred. htt://as-lab.ds.unm.t 7 / 46 I. Froso, M. Lucchese,. A. Borghese

How to comute? x = (u,v) (u-,v) y = (u,v) (u,v-) = x + y L norm = [ x + y ] ½ L norm = x + y + xy + yx = [ x + y + xy + yx ] ½ = x + y + xy + yx + = [ x + y + xy + yx + ] ½ Comu utatona al cost The comutatonal cost ncreases wth the number of neghbours consdered for comutng the gradent; The comutatonal cost s hgher for L norm wth resect to L norm; What about accuracy? => See exermental results! htt://as-lab.ds.unm.t 8 / 46 I. Froso, M. Lucchese,. A. Borghese

How to mnmze f( n )? f( n ) s strongly non lnear; solvng df( n )/d=0 drectly s not ossble => teratve otmzaton z methods. ) Steeest descent + lne search (SD+LS) ) Exectaton Maxmzaton (damed wth lne search - EM) 3) Scaled gradent (SG) htt://as-lab.ds.unm.t 9 / 46 I. Froso, M. Lucchese,. A. Borghese

Steeest descent + lne search (SD+LS) P k+ = k -α df( n )/d => => P k+ = k -α df( n )/d Thedamng arameter α s estmated at each teraton to assure convergence (f k+ <f k ); +: easy mlementaton; -: slow convergence, the method has been damed (lne search) to mrove convergence (α>). htt://as-lab.ds.unm.t 30 / 46 I. Froso, M. Lucchese,. A. Borghese

EM + lne search (EM) Consder the xel, then: df( n )/d =0 => => dkl( n )/d + dr/d =0 => β dr/d + - n, =0=> => = n, /(β dr/d + ) [Fxed on teraton] Damed formula: = (-α) +α n,i /(β dr/d +) The damng arameter α s estmated at each teraton to assure convergence (f k+ <f k ); +: easy mlementaton, fast convergence; -: the method has been damed to assure convergence (α<, what haens when β dr/d R + -> 0???). htt://as-lab.ds.unm.t 3 / 46 I. Froso, M. Lucchese,. A. Borghese

Scaled gradent (SG) Consder the gradent method formula; Each comonent of the gradent s scaled to mrove convergence (S s a dagonal matrx contanng the scalng arameters): P k+ = k -α S df( n )/d The matrx S s comuted from an oortune gradent decomoston and KKT condtons; +: easy mlementaton, fastest convergence; t can also be demonstrated that, for ostve ntal values, the estmated soluton remans ostve at each teraton! -:???. htt://as-lab.ds.unm.t 3 / 46 I. Froso, M. Lucchese,. A. Borghese

Problems wth dr/d Indeendentlyendently fom from the otmzaton method, the term dr/d has to be comuted at each teraton far any ; Wehave: dr/d = d[σ =.. (ll ll )]/d XOR dr/d = d[σ =.. (ll ll )]/d htt://as-lab.ds.unm.t 33 / 46 I. Froso, M. Lucchese,. A. Borghese

Problems wth dr/d L R/d Let us comute t for ll.ll (R/d = d[σ =.. (ll ll )]/d ).. ( ) ( ) [ ] ( ) ( ) [ ]...,,,,,, = + + = + = = d v u v u v u v u d d d d dr y x ( ) ( ) [ ] ( ) ( ) [ ] ( ) ( ) [ ] ( ) ( ) [ ] ( )...,...,,,,,,,,,, + + = + + + = v u v u v u v u v u v u v u v u v u d d d y x To avod dvson by zero: ( ) ( ) ( ) [ ] ( ) ( ) [ ]...,,,,...,,,,, + + + + + + = δ v u v u v u v u v u d dr y x y x htt://as-lab.ds.unm.t I. Froso, M. Lucchese,. A. Borghese 34 / 46

Problems wth dr/d Let us comute t for ll.ll (R/d = d[σ =.. (ll ll )]/d ) dr d = = d ( + ) d ( u, v) ( u, v) ( u, v) ( u, v ) + ( u, v ) ( u, v ) ( u, v ) ( u, v ) x, y, = = +... = d = d [ sgn( x, ) + sgn( y, )] = +... Here dvsons by zero are automatcally avoded only sgn s requred -> comutatonally effcent! htt://as-lab.ds.unm.t 35 / 46 I. Froso, M. Lucchese,. A. Borghese

Questons How many neghbor hb xels do wehave to consder to acheve a satsfyng accuracy at low comutatonal cost? Best norm, ll.ll vs ll.ll? Best otmzaton method (SD+LS, EM, SG)? htt://as-lab.ds.unm.t 36 / 46 I. Froso, M. Lucchese,. A. Borghese

TV n dgtal radograhy Research n rogress htt://as-lab.ds.unm.t 37 / 46 I. Froso, M. Lucchese,. A. Borghese

Results (answers) 75smulated radograhs wth hdfferent frequency content, corruted by Posson nose (max 5,000 hotons). For any fltered mage, measure: MAE = /Σ =.. l,nosefree -,fltered l RMSE = [/Σ =.. (,nosefree -,fltered ) ] ½ KL = Σ =.. [,nosefree ln(,nosefree /,fltered )+,nosefree-,fltered ] htt://as-lab.ds.unm.t 38 / 46 I. Froso, M. Lucchese,. A. Borghese

neghbors (000000) vs. 4 neghbors (0000) htt://as-lab.ds.unm.t 39 / 46 I. Froso, M. Lucchese,. A. Borghese

ll.ll vs. ll.ll htt://as-lab.ds.unm.t 40 / 46 I. Froso, M. Lucchese,. A. Borghese

EM vs. SD+LS htt://as-lab.ds.unm.t 4 / 46 I. Froso, M. Lucchese,. A. Borghese

EM vs. SG htt://as-lab.ds.unm.t 4 / 46 I. Froso, M. Lucchese,. A. Borghese

Convergence and teratons htt://as-lab.ds.unm.t 43 / 46 I. Froso, M. Lucchese,. A. Borghese

Flter effect Orgnal Fltered htt://as-lab.ds.unm.t 44 / 46 I. Froso, M. Lucchese,. A. Borghese

Flter effect: before flterng htt://as-lab.ds.unm.t 45 / 46 I. Froso, M. Lucchese,. A. Borghese

Flter effect: after flterng htt://as-lab.ds.unm.t 46 / 46 I. Froso, M. Lucchese,. A. Borghese

Concluson Effectve edge reservng flter; neghbors, ll.ll and EM acheve the best comromse between accuracy and comutatonal cost; SD acheves results better then EM when the regularzaton arameter s not correctly selected. Adatve regularzaton arameter; GPU (CUDA) mlementaton; Exandng the lkelhood model Mxture of Posson, Gaussan and Imulsve nose; Include the sensor ont sread functon. htt://as-lab.ds.unm.t 47 / 47 I. Froso, M. Lucchese,. A. Borghese