Statistical Estimation: Least Squares, Maximum Likelihood and Maximum A Posteriori Estimators

Size: px

Start display at page:

Download "Statistical Estimation: Least Squares, Maximum Likelihood and Maximum A Posteriori Estimators"

May Rice
6 years ago
Views:

1 CS5540: Computatonal Technques for Analyzng Clncal Data Lecture 7: Statstcal Estmaton: Least Squares, Maxmum Lkelhood and Maxmum A Posteror Estmators Ashsh Raj, PhD Image Data Evaluaton and Analytcs Laboratory (IDEAL) Department of Radology Well Cornell Medcal College New York

2 Outlne Part I: Recap of Wavelet Transforms Part II : Least Squares Estmaton Part III: Maxmum Lkelhood Estmaton Part IV: Maxmum A Posteror Estmaton : Next week Note: you wll not be tested on specfc examples shown here, only on general prncples IDEA Lab, Radology, Cornell 2

3 Bass functons n WT Haar Mexcan Hat Daubeches (orthonormal) IDEA Lab, Radology, Cornell 3

WT n mages Images are pecewse smooth or pecewse constant Statonarty s even rarer than n 1D sgnals FT even less useful (nnd WT more attractve) 2D wavelet transforms are smple

4 WT n mages Images are pecewse smooth or pecewse constant Statonarty s even rarer than n 1D sgnals FT even less useful (nnd WT more attractve) 2D wavelet transforms are smple extensons of 1D WT, generally performng 1D WT along rows, then columns etc Sometmes we use 2D wavelets drectly, e.g. orthonormal Daubeches 2D wavelet IDEA Lab, Radology, Cornell 4

WT on mages 2D generalzaton of scaletme decomposton scale Scale 0 Scale 1 H tme Scale 2 Successve applcaton of dot product wth wavelet of ncreasng wdth. Forms a natural pyramd structure.

5 WT on mages 2D generalzaton of scaletme decomposton scale Scale 0 Scale 1 H tme Scale 2 Successve applcaton of dot product wth wavelet of ncreasng wdth. Forms a natural pyramd structure. At each scale: H = dot product of mage rows wth wavelet V = dot product of mage columns wth wavelet H-V = dot product of mage rows then columns wth wavelet IDEA Lab, Radology, Cornell V H-V 5

6 Wavelet Applcatons Many, many applcatons! Audo, mage and vdeo compresson New JPEG standard ncludes wavelet compresson FBI s fngerprnts database saved as waveletcompressed Sgnal denosng, nterpolaton, mage zoomng, texture analyss, tme-scale feature extracton In our context, WT wll be used prmarly as a feature extracton tool Remember, WT s just a change of bass, n order to extract useful nformaton whch mght otherwse not be easly seen IDEA Lab, Radology, Cornell 6

7 WT n MATLAB MATLAB has an extensve wavelet toolbox Type help wavelet n MATLAB command wndow Look at ther wavelet demo Play wth Haar, Mexcan hat and Daubeches wavelets IDEA Lab, Radology, Cornell 7

8 Project Ideas Idea 1: use WT to extract features from ECG data use these features for classfcaton Idea 2: use 2D WT to extract spato-temporal features from 3D+tme MRI data to detect tumors / classfy bengn vs malgnant tumors Idea 3: use 2D WT to denose a gven mage IDEA Lab, Radology, Cornell 8

9 Idea 3: Voxel labelng from contrast-enhanced MRI Can segment accordng to tme profle of 3D+tme contrast enhanced MR data of lver / mammography Typcal plot of tme-resolved MR sgnal of varous tssue classes Temporal models used to extract features Instead of such a smple temporal model, wavelet decomposton could provde spato-temporal features that you can use for clusterng IDEA Lab, Radology, Cornell

10 Lver tumour quantfcaton from DCE-MRI IDEA Lab, Radology, Cornell

11 Further Readng on Wavelets IDEA Lab, Radology, Cornell 11

12 Part II : Least Squares Estmaton and Examples

13 A smple Least Squares problem Lne fttng Goal: To fnd the best-ft lne representng a bunch of ponts Here: y are observatons at locaton x, Intercept and slope of lne are the unknown model parameters to be estmated Whch model parameters best ft the observed ponts? E( y nt Best ( y, m) nt =, m) ( y h( x where, m) Ths can be wrtten n matrx notaton, as θ LS = arg mn y Hθ 2 )) 2 = arg mn E( y, nt h( x) = y nt + mx What are H and θ? IDEA Lab, Radology, Cornell 13

14 Least Squares Estmator Gven lnear process y = H θ + n Least Squares estmator: θ LS = argmn y Hθ 2 Natural estmator want soluton to match observaton Does not use any nformaton about n There s a smple soluton (a.k.a. pseudo-nverse): θ LS = (H T H) -1 H T y In MATLAB, type pnv(y) IDEA Lab, Radology, Cornell 14

15 Example - estmatng T 2 decay constant n repeated spn echo MR data IDEA Lab, Radology, Cornell 15

16 Example estmatng T 2 n repeated spn echo data s(t) = s 0 e -t/t 2 Need only 2 data ponts to estmate T 2 : T 2est = [T E2 T E1 ] / ln[s(t E1 )/s(t E2 ) ] However, not good due to nose, tmng ssues In practce we have many data samples from varous echoes IDEA Lab, Radology, Cornell 16

17 Example estmatng T 2 y ln(s(t 1 )) ln(s(t 2 )) M ln(s(t n )) = 1 -t 1 1 -t 2 M 1 -t n a r H θ Least Squares estmate: θ LS = (H T H) -1 H T y T 2 = 1/r LS IDEA Lab, Radology, Cornell 17

18 Estmaton example - Denosng Suppose we have a nosy MR mage y, and wsh to obtan the noseless mage x, where y = x + n Can we use LSE to fnd x? Try: H = I, θ = x n the lnear model LS estmator smply gves x = y! we need a more powerful model Suppose the mage x can be approxmated by a polynomal,.e. a mxture of 1 st p powers of r: x = Σ =0 p a r IDEA Lab, Radology, Cornell 18

19 y Example denosng y 1 y 2 M y n = 1 r 1 1 L r 1 p 1 r 2 1 L r 2 p M 1 r n 1 L r n p a 0 a 1 M a p H + n 1 n 2 M n n Least Squares estmate: θ LS = (H T H) -1 H T y θ x = Σ =0 p a r IDEA Lab, Radology, Cornell 19

20 Part III : Maxmum Lkelhood Estmaton and Examples

21 Estmaton Theory Consder a lnear process y = H θ + n y = observed data θ = set of model parameters n = addtve nose Then Estmaton s the problem of fndng the statstcally optmal θ, gven y, H and knowledge of nose propertes Medcne s full of estmaton problems IDEA Lab, Radology, Cornell 21

22 Dfferent approaches to estmaton Mnmum varance unbased estmators Least Squares Maxmum-lkelhood Maxmum entropy Maxmum a posteror uses pror nformaton about θ has no statstcal bass uses knowledge of nose PDF IDEA Lab, Radology, Cornell 22

23 Probablty vs. Statstcs Probablty: Mathematcal models of uncertanty predct outcomes Ths s the heart of probablty Models, and ther consequences What s the probablty of a model generatng some partcular data as an outcome? Statstcs: Gven an outcome, analyze dfferent models Dd ths model generate the data? From among dfferent models (or parameters), whch one generated the data? IDEA Lab, Radology, Cornell 23

24 Maxmul Lkelhood Estmator for Lne Fttng Problem What f we know somethng about the nose?.e. Pr(n) If nose not unform across samples, LS mght be ncorrect nose σ = 10 E( y nt Best ( y y h( x, m) = ( σ nt, m) ) ) 2 = arg mn E( y, nt where, m) h( x) = y nt + mx Ths can be wrtten n matrx notaton, as θ ML = arg mn W(y Hθ) 2 nose σ = 0.1 ML estmate What s W? IDEA Lab, Radology, Cornell 24

25 Defnton of lkelhood Lkelhood s a probablty model of the uncertanty n output gven a known nput The lkelhood of a hypothess s the probablty that t would have resulted n the data you saw Thnk of the data as fxed, and try to chose among the possble PDF s Often, a parameterzed famly of PDF s ML parameter estmaton IDEA Lab, Radology, Cornell 25

26 Gaussan Nose Models In lnear model we dscussed, lkelhood comes from nose statstcs Smple dea: want to ncorporate knowledge of nose statstcs If unform whte Gaussan nose: 2 n 1 n = 1 Pr( n) exp = exp 2 2 Z 2σ Z 2σ If non-unform whte Gaussan nose: Pr( n) = 1 exp Z n 2σ IDEA Lab, Radology, Cornell 26

27 Maxmum Lkelhood Estmator - Theory n = y-hθ, Pr(n) = exp(- n 2 /2σ 2 ) Therefore Pr(y for known θ) = Pr(n) Smple dea: want to maxmze Pr(y θ) - called the lkelhood functon Example 1: show that for unform ndependent Gaussan nose θ ML = arg mn y-hθ 2 Example 2: For non-unform Gaussan nose θ ML = arg mn W(y-Hθ) 2 IDEA Lab, Radology, Cornell 27

28 MLE Bottomlne: Use nose propertes to wrte Pr(y θ) Whchever θ maxmze above, s the MLE IDEA Lab, Radology, Cornell 28

29 Example Estmatng man frequency of ECG sgnal Model: y(t ) = a sn(f t ) + n What s the MLE of a, f? Pr(y θ ) = exp(-σ (y(t ) - a sn(f t ) ) 2 / 2 σ 2 ) IDEA Lab, Radology, Cornell 29

30 Maxmum Lkelhood Detecton ML s qute a general, powerful dea Same deas can be used for classfcaton and detecton of features hdden n data Example 1: Decdng whether a voxel s artery or ven There are 3 hypotheses at each voxel: Voxel s artery, or voxel s ven, or voxel s parenchyma IDEA Lab, Radology, Cornell 30

31 Example: MRA segmentaton artery/ven may have smlar ntensty at gven tme pont but dfferent tme profles wsh to segment accordng to tme profle, not sngle ntensty IDEA Lab, Radology, Cornell 31

32 Expected Result IDEA Lab, Radology, Cornell 32

33 Example: MRA segmentaton Frst: need a tme model of all segments h( t θa ) h ( v t θ ) Lets use ML prncples to see whch voxel belongs to whch model Artery: Ven: Parench: y = h( t θa ) + n y = h( t θv ) + n y = h( t θ ) + n p IDEA Lab, Radology, Cornell

34 Maxmum Lkelhood Classfcaton Artery: Ven: Paren: y = h( t θ ) + n y = h( t θ ) + n y = h( t θ ) + n a v p Pr( y θ ) IDEA Lab, Radology, Cornell Pr( y Pr( y a θ ) v θ ) p ( y exp ( y exp ( y exp = 2 = 2 h( t θa )) 2σ h( t θv )) 2σ h( t θ )) = 2 So at each voxel, the best model s one that maxmzes: 2 ( y h( t Pr( y = θ )) θ ) Pr( y θ ) = exp 2 2σ Or equvalently, mnmzes: ( y h( t θ )) 2 2σ p 2 2 2

35 Lver tumour quantfcaton from Dynamc Contrast Enhanced MRI Data: Tumor model Rabbt DCE-MR data Paramegnetc contrast agent, pathology gold standard Extract temporal features from DCE-MRI Use these features for accurate detecton and quantfcaton of tumour IDEA Lab, Radology, Cornell

36 Lver Tumour Temporal models Temporal models used to extract features Typcal plot of tme-resolved MR sgnal of varous tssue classes h( t θ ) IDEA Lab, Radology, Cornell 36

37 Lver tumour quantfcaton from DCE-MRI IDEA Lab, Radology, Cornell

38 ML Tutorals Sldes by Andrew Moore (CMU): avalable on course webpage Paper by Jae Myung, Tutoral on Maxmum Lkelhood : avalable on course webpage Max Entropy Tutoral IDEA Lab, Radology, Cornell 38

39 Next lecture (Frday) Non-parametrc densty estmaton Hstograms + varous fttng methods Nearest neghbor Parzen estmaton Next lecture (Wednesday) Maxmum A Posteror Estmators Several examples IDEA Lab, Radology, Cornell 39

40 CS5540: Computatonal Technques for Analyzng Clncal Data Lecture 7: Statstcal Estmaton: Least Squares, Maxmum Lkelhood and Maxmum A Posteror Estmators Ashsh Raj, PhD Image Data Evaluaton and Analytcs Laboratory (IDEAL) Department of Radology Well Cornell Medcal College New York

41 Falure modes of ML Lkelhood sn t the only crteron for selectng a model or parameter Though t s obvously an mportant one Bzarre models may have hgh lkelhood Consder a speedometer readng 55 MPH Lkelhood of true speed = 55 : 10% Lkelhood of speedometer stuck : 100% ML lkes fary tales In practce, exclude such hypotheses There must be a prncpled soluton IDEA Lab, Radology, Cornell 41

42 Maxmum a Posteror Estmate Ths s an example of usng an mage pror Prors are generally expressed n the form of a PDF Pr(x) Once the lkelhood L(x) and pror are known, we have complete statstcal knowledge LS/ML are suboptmal n presence of pror MAP (aka Bayesan) estmates are optmal lkelhood Bayes Theorem: posteror Pr(x y) = Pr(y x). Pr(x) pror Pr(y) IDEA Lab, Radology, Cornell 42

43 Maxmum a Posteror (Bayesan) Estmate Consder the class of lnear systems y = Hx + n Bayesan methods maxmze the posteror probablty: Pr(x y) Pr(y x). Pr(x) Pr(y x) (lkelhood functon) = exp(- y-hx 2 ) Pr(x) (pror PDF) = exp(-g(x)) Non-Bayesan: maxmze only lkelhood x est = arg mn y-hx 2 Bayesan: x est = arg mn y-hx 2 + G(x), where G(x) s obtaned from the pror dstrbuton of x If G(x) = Gx 2 Tkhonov Regularzaton IDEA Lab, Radology, Cornell 43

44 Other example of Estmaton n MR Image denosng: H = I Image deblurrng: H = convoluton mtx n mg-space Super-resoluton: H = dagonal mtx n k-space Metabolte quantfcaton n MRSI IDEA Lab, Radology, Cornell 44

45 What Is the Rght Imagng Model? y = H x + n, n s Gaussan (1) MAP Sense y = H x + n, n, x are Gaussan (2) MAP Sense = Bayesan (MAP) estmate of (2) IDEA Lab, Radology, Cornell 45

46 Intro to Bayesan Estmaton Bayesan methods maxmze the posteror probablty: Pr(x y) Pr(y x). Pr(x) Pr(y x) (lkelhood functon) = exp(- y-hx 2 ) Pr(x) (pror PDF) = exp(-g(x)) Gaussan pror: Pr(x) = exp{- ½ x T R x -1 x} MAP estmate: x est = arg mn y-hx 2 + G(x) MAP estmate for Gaussan everythng s known as Wener estmate IDEA Lab, Radology, Cornell 46

47 Regularzaton = Bayesan Estmaton! For any regularzaton scheme, ts almost always possble to formulate the correspondng MAP problem MAP = superset of regularzaton Pror model MAP Regularzaton scheme So why deal wth regularzaton?? IDEA Lab, Radology, Cornell 47

48 Lets talk about Pror Models Temporal prors: smooth tme-trajectory Sparse prors: L0, L1, L2 (=Tkhonov) Spatal Prors: most powerful for mages I recommend robust spatal prors usng Markov Felds Want prors to be general, not too specfc Ie, weak rather than strong prors IDEA Lab, Radology, Cornell 48

49 How to do regularzaton Frst model physcal property of mage, then create a pror whch captures t, then formulate MAP estmator, Then fnd a good algorthm to solve t! How NOT to do regularzaton DON T use regularzaton scheme wthout bearng on physcal property of mage! Example: L1 or L0 pror n k-space! Specfcally: deblurrng n k-space (handy b/c convoluton becomes multply) BUT: hard to mpose smoothness prors n k-space no meanng IDEA Lab, Radology, Cornell 49

50 MAP for Lne fttng problem If model estmated by ML and Pror nfo do not agree MAP s a compromse between the two LS estmate MAP estmate Most probable pror model IDEA Lab, Radology, Cornell 50

51 Mult-varate FLASH Acqure 6-10 accelerated FLASH data sets at dfferent flp angles or TR s Generate T 1 maps by fttng to: S ( ) ( * ) 1 exp TR T = exp TE T snα 2 1 cos exp α Not enough nfo n a sngle voxel Nose causes ncorrect estmates Error n flp angle vares spatally! 1 ( TR T ) 1 IDEA Lab, Radology, Cornell 51

52 Spatally Coherent T 1, ρ estmaton Frst, stack parameters from all voxels n one bg vector x Stack all observed flp angle mages n y Then we can wrte y = M (x) + n Recall M s the (nonlnear) functonal obtaned from S ( 1) ( TR T ) ( * 1 exp TR T = exp TE T ) 2 snα 1 cos α exp Solve for x by non-lnear least square fttng, PLUS spatal pror: x est = arg mn x y - M (x) 2 + µ 2 Dx 2 E(x) Makes M(x) close to y Makes x smooth Mnmze va MATLAB s lsqnonln functon How? Construct δ = [y - M (x); µ Dx]. Then E(x) = δ 2 1 IDEA Lab, Radology, Cornell 52

53 Mult-Flp Results combned ρ, T 1 n pseudocolour IDEA Lab, Radology, Cornell 53

54 Mult-Flp Results combned ρ, T 1 n pseudocolour IDEA Lab, Radology, Cornell 54

tme frame N f frame 1 frame 2 mean µ x (,j) envelope a(,j) Pror:

55 Spatal Prors For Images - Example Frames are tghtly dstrbuted around mean After subtractng mean, mages are close to Gaussan varance mean tme frame N f frame 1 frame 2 mean µ x (,j) envelope a(,j) Pror: -mean s µ x -local std.dev. vares as a(,j) IDEA Lab, Radology, Cornell 55

56 Spatal Prors for MR mages statonary Stochastc MR mage model: process x(,j) = µ x (,j) + a(,j). (h ** p)(,j) (1) ** denotes 2D convoluton µ x (,j) s mean mage for class p(,j) s a unt varance..d. stochastc process a(,j) s an envelope functon h(,j) smulates correlaton propertes of mage x r(τ 1, τ 2 ) = (h ** h)(τ 1, τ 2 ) x = ACp + µ (2) where A = dag(a), and C s the Toepltz matrx generated by h Can model many mportant statonary and non-statonary cases IDEA Lab, Radology, Cornell 56

57 MAP estmate for Imagng Model (3) The Wener estmate x MAP - µ x = HR x (HR x H H + R n ) -1 (y- µ y ) (3) R x, R n = covarance matrces of x and n Statonarty R x has Toepltz structure fast processng x MAP - µ x = HACC H A H ( HACC H A H H H + σ n 2 I ) -1 (y- µ y ) (4) Drect nverson prohbtve; so use CG teratve method (4) better than (3) snce A and C are O(N log N) operatons, enablng much faster processng IDEA Lab, Radology, Cornell 57

58 How to obtan estmates of A, C? Need a tranng set of full-resoluton mages x k, k = 1,,K Parallel magng doesnt provde un-alased full-res mages Approaches: 1. Use prevous full-res scans - tme consumng, need frequent updatng 2. Use SENSE-reconstructed mages for tranng set - very bad nose amplfcaton ssues for hgh speedu ps 3. Drectly estmate parameters from avalable parallel data - Alasng may cause naccuraces IDEA Lab, Radology, Cornell 58

59 MAP for Dynamc Parallel Imagng N f mages avalable for parameter estmaton! All mages well-regstered Tghtly dstrbuted around pxel mean Parameters can be estmated from alased data tme frame N f frame 2 frame 1 k y frame 1 frame 2 frame N f N f /R full-res mages k x IDEA Lab, Radology, Cornell 59

60 MAP-SENSE Prelmnary Results Scans acceleraty 5x The angogram was computed by: avg(post-contrast) avg(pre-contrast) Unaccelerated 5x faster: SENSE 5x faster: MAP-SENSE IDEA Lab, Radology, Cornell 60

61 References Smon Kay. Statstcal Sgnal Processng. Part I: Estmaton Theory. Prentce Hall 2002 Smon Kay. Statstcal Sgnal Processng. Part II: Detecton Theory. Prentce Hall 2002 Haacke et al. Fundamentals of MRI. Zh-Pe Lang and Paul Lauterbur. Prncples of MRI A Sgnal Processng Perspectve. Info on part IV: Ashsh Raj. Improvements n MRI Usng Informaton Redundancy. PhD thess, Cornell Unversty, May Webste: IDEA Lab, Radology, Cornell 61

62 Maxmum Lkelhood Estmator But f nose s jontly Gaussan wth cov. matrx C Recall C, E(nn T ). Then Pr(n) = e -½ nt C -1 n L(y θ) = ½ (y-hθ) T C -1 (y-hθ) θ ML = argmn ½ (y-hθ) T C -1 (y-hθ) Ths also has a closed form soluton θ ML = (H T C -1 H) -1 H T C -1 y If n s not Gaussan at all, ML estmators become complcated and non-lnear Fortunately, n MR nose s usually Gaussan IDEA Lab, Radology, Cornell 62

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton