Big Data Analytics. Data Fitting and Sampling. Acknowledgement: Notes by Profs. R. Szeliski, S. Seitz, S. Lazebnik, K. Chaturvedi, and S.

Bg Data Aaltcs Data Fttg ad Samplg Ackowledgemet: Notes b Profs. R. Szelsk, S. Setz, S. Lazebk, K. Chaturved, ad S. Shah

Fttg: Cocepts ad recpes

A bag of techques If we kow whch pots belog to the le, how do we fd the optmal le parameters? Least squares Probablstc fttg What f there are outlers? Robust fttg, RANSAC What f there are ma les? Icremetal fttg, K-les, EM What f we re ot eve sure t s a le? Model selecto Our ma case stud remas le fttg, but most of the cocepts are ver geerc ad wdel applcable

Least squares le fttg Data: (, ),, (, ) Le equato: m + b Fd (m, b) to mmze E ( m b) (, ) m+b

Least squares le fttg Data: (, ),, (, ) Le equato: m + b Fd (m, b) to mmze 0 Y X XB X db de T T [ ] ) ( ) ( ) ( ) ( ) ( XB XB Y XB Y Y XB Y XB Y XB Y b m b m E T T T T +!!! Normal equatos: least squares soluto to XBY b m E ) ( (, ) m+b Y X XB X T T

Problem wth vertcal least squares Not rotato-varat Fals completel for vertcal les

Total least squares Dstace betwee pot (, ) ad le a+bd (a +b ): a + b d Fd (a, b, d) to mmze the sum of squared perpedcular dstaces E a + b E ( a + b d) ( ( d, ) a+bd Ut ormal: N(a, b)

Total least squares Dstace betwee pot (, ) ad le a+bd (a +b ): a + b d Fd (a, b, d) to mmze the sum of squared perpedcular dstaces E a + b E ( a + b d) ( ( d, ) E ( + ) 0 a b d d a d + b a E ( a( ) b( )) +!! b de T ( U U) N 0 dn a+bd Ut ormal: N(a, b) a + b ( UN) Soluto to (U T U)N 0, subject to N : egevector of U T U assocated wth the smallest egevalue (least squares soluto to homogeeous lear sstem UN 0) T ( UN)

Total least squares U!! T U U ) ( ) )( ( ) )( ( ) ( secod momet matr

Total least squares U!! T U U ) ( ) )( ( ) )( ( ) ( ), ( N (a, b) secod momet matr ), (

Least squares as lkelhood mamzato Geeratve model: le pots are corrupted b Gaussa ose the drecto perpedcular to the le u v + ε a b a+bd (u, v) ε (, ) pot o the le ose: zero-mea Gaussa wth std. dev. σ ormal drecto

Least squares as lkelhood mamzato Geeratve model: le pots are corrupted b Gaussa ose the drecto perpedcular to the le + b a v u ε (, ) (u, v) ε + d b a d b a P d b a P ) ( ep ),, ( ),,,, ( σ Lkelhood of pots gve le parameters (a, b, d): + d b a d b a L ) ( ),,,, ( σ Log-lkelhood: a+bd

Probablstc fttg: Geeral cocepts Lkelhood:,, θ ) P( ) L( θ ) P( θ

Probablstc fttg: Geeral cocepts Lkelhood: Log-lkelhood: P P L ) ( ),, ( ) ( θ θ θ P P L ) ( log ),, ( log ) ( log θ θ θ

Probablstc fttg: Geeral cocepts Lkelhood: Log-lkelhood:,, θ ) P( ) L( θ ) P( θ log L( θ ) log P( Mamum lkelhood estmato:,, θ ) log P( θ ) ˆ θ arg ma θ L ( θ)

Probablstc fttg: Geeral cocepts Lkelhood: Log-lkelhood:,, θ ) P( ) L( θ ) P( θ log L( θ ) log P( Mamum lkelhood estmato: ˆ θ,, θ ) log P( θ ) arg ma θ L ( θ) Mamum a posteror (MAP) estmato: ˆ θ arg maθ P( θ,, ) arg maθ L( θ) P( θ) pror

Least squares for geeral curves We would lke to mmze the sum of squared geometrc dstaces betwee the data pots ad the curve d((, ), C) (, ) curve C

Calculatg geometrc dstace closest pot curve C(u,v) 0 (u 0, v 0 ) (, ) The curve taget must be orthogoal to the vector coectg (, ) wth the closest pot o the curve, (u 0, v 0 ): C v (u 0, v 0 )[ u ] 0 C u (u 0, v 0 )[ v 0 ] 0 C ( 0 u0, v ) 0 C v curve taget: C ( u0, v0), ( u0, v0 u Must solve sstem of equatos for (u 0, v 0 ) )

Least squares for cocs Equato of a geeral coc: C(a, ) a a + b + c + d + e + f 0, a [a, b, c, d, e, f], [,,,,, ] Mmzg the geometrc dstace s o-lear eve for a coc Algebrac dstace: C(a, ) Algebrac dstace mmzato b lear least squares: 0 f e d c b a! "!

Least squares for cocs Least squares sstem: Da 0 Need costrat o a to prevet trval soluto Dscrmat: b 4ac Negatve: ellpse Zero: parabola Postve: hperbola Mmzg squared algebrac dstace subject to costrats leads to a geeralzed egevalue problem Ma varatos possble For more formato: A. Ftzgbbo, M. Plu, ad R. Fsher, Drect least-squares fttg of ellpses, IEEE Trasactos o Patter Aalss ad Mache Itellgece, (5), 476--480, Ma 999

Least squares: Robustess to ose Least squares ft to the red pots:

Least squares: Robustess to ose Least squares ft wth a outler: Problem: squared error heavl pealzes outlers

Robust estmators Geeral approach: mmze ( r (, θ ) σ ) ρ ; r (, θ) resdual of th pot w.r.t. model parameters θ ρ robust fucto wth scale parameter σ The robust fucto ρ behaves lke squared dstace for small values of the resdual u but saturates for larger values of u

Choosg the scale: Just rght The effect of the outler s elmated

Choosg the scale: Too small The error value s almost the same for ever pot ad the ft s ver poor

Choosg the scale: Too large Behaves much the same as least squares

Robust estmato: Notes Robust fttg s a olear optmzato problem that must be solved teratvel Least squares soluto ca be used for talzato Adaptve choce of scale: magc umber tmes meda resdual

RANSAC Robust fttg ca deal wth a few outlers what f we have ver ma? Radom sample cosesus (RANSAC): Ver geeral framework for model fttg the presece of outlers Outle Choose a small subset uforml at radom Ft a model to that subset Fd all remag pots that are close to the model ad reject the rest as outlers Do ths ma tmes ad choose the best model M. A. Fschler, R. C. Bolles. Radom Sample Cosesus: A Paradgm for Model Fttg wth Applcatos to Image Aalss ad Automated Cartograph. Comm. of the ACM, Vol 4, pp 38-395, 98.

RANSAC for le fttg Repeat N tmes: Draw s pots uforml at radom Ft le to these s pots Fd lers to ths le amog the remag pots (.e., pots whose dstace from the le s less tha t) If there are d or more lers, accept the le ad reft usg all lers

Choosg the parameters Ital umber of pots s Tpcall mmum umber eeded to ft the model Dstace threshold t Choose t so probablt for ler s p (e.g. 0.95) Zero-mea Gaussa ose wth std. dev. σ: t 3.84σ Number of samples N Choose N so that, wth probablt p, at least oe radom sample s free from outlers (e.g. p0.99) (outler rato: e) ( ( ) ) s N e p N ( ) s ( p) / log ( e) log proporto of outlers e s 5% 0% 0% 5% 30% 40% 50% 3 5 6 7 7 3 3 4 7 9 9 35 4 3 5 9 3 7 34 7 5 4 6 7 6 57 46 6 4 7 6 4 37 97 93 7 4 8 0 33 54 63 588 8 5 9 6 44 78 7 77 Source: M. Pollefes

Choosg the parameters Ital umber of pots s Tpcall mmum umber eeded to ft the model Dstace threshold t Choose t so probablt for ler s p (e.g. 0.95) Zero-mea Gaussa ose wth std. dev. σ: t 3.84σ Number of samples N Choose N so that, wth probablt p, at least oe radom sample s free from outlers (e.g. p0.99) (outler rato: e) Cosesus set sze d Should match epected ler rato Source: M. Pollefes

Adaptvel determg the umber of samples Iler rato e s ofte ukow a pror, so pck worst case, e.g. 50%, ad adapt f more lers are foud, e.g. 80% would eld e0. Adaptve procedure: N, sample_cout 0 Whle N >sample_cout Choose a sample ad cout the umber of lers Set e (umber of lers)/(total umber of pots) Recompute N from e: N ( p) / log ( e) log Icremet the sample_cout b ( ) s Source: M. Pollefes

RANSAC pros ad cos Pros Smple ad geeral Applcable to ma dfferet problems Ofte works well practce Cos Lots of parameters to tue Ca t alwas get a good talzato of the model based o the mmum umber of samples Sometmes too ma teratos are requred Ca fal for etremel low ler ratos We ca ofte do better tha brute-force samplg

Fttg multple les Votg strateges Hough trasform RANSAC Other approaches Icremetal le fttg K-les Epectato mamzato

Icremetal le fttg Eame edge pots ther order alog a edge cha Ft le to the frst s pots Whle le fttg resdual s small eough, cotue addg pots to the curret le ad refttg Whe resdual eceeds a threshold, break off curret le ad start a ew oe wth the et s uassged pots

Icremetal le fttg

Icremetal fttg pros ad cos Pros Eplots localt Adaptvel determes the umber of les Cos Needs sequetal orderg of features Ca t cope wth occluso Sestve to ose ad choce of threshold

K-Les Italze k les Opto : Radoml talze k sets of parameters Opto : Radoml partto pots to k sets ad ft les to them Iterate utl covergece: Assg each pot to the earest le Reft parameters for each le

K-Les eample talzato

K-Les eample terato : assgmet to earest le

K-Les eample terato : refttg

K-Les eample talzato

K-Les eample terato : assgmet to earest le

K-Les eample terato : refttg

K-Les eample terato : assgmet to earest le

K-Les eample terato : refttg

K-Les eample 3 talzato

K-Les eample 3 terato : assgmet to earest le

K-Les eample 3 terato : refttg

K-Les pros ad cos Pros Guarateed to reduce le fttg resdual at each terato Ca cope wth occluso Cos Need to kow k Ca get stuck local mma Sestve to talzato