Unsupervised Segmentation of Moving MPEG Blocks Based on Classification of Temporal Information

Unsupevised Segmenaion of Moving MPEG Blocs Based on Classificaion of Tempoal Infomaion Ofe Mille 1, Ami Avebuch 1, and Yosi Kelle 2 1 School of Compue Science,Tel-Aviv Univesiy, Tel-Aviv 69978, Isael 2 Dep. of Compue Science, Yale Univesiy, New Haven 208285, CT, USA {milleo,ami}@pos.au.ac.il Absac. Mos codecs such as MPEG-1,2 ae based on pocessing pixels in pedeemined iles of squae blocs. The pocessing of each bloc is independen of he objec-based conens ha may be pesen in hese blocs. Segmenaion of moving objec blocs enables, fo example, efficien disibuion of bis among consecuive fames in video sequences by allocaing moe bis o moving objec blocs a he expense of he bacgound blocs. This pape poposes an algoihm which deemines whehe he MPEG blocs conain saic o moving objecs. The poposed algoihm is based on empoal saisical infomaion obained by muliple compaisons of consecuive fames. A dynamic pocess of gaheing saisical infomaion is conveged ino hee cluses wih diffeen chaaceizaions. Auomaic classificaion deemines which blocs belong o eihe a moving objec (foegound) o a saic objec (bacgound). The numbe of loo-ahead fame compaisons, which is equied o deemine he ype of each bloc, is auomaically and adapively deemined accoding o he naue of he pocessed sequence. 1 Inoducion The infomaion whehe a MPEG bloc conains moving o saic objecs is impoan o many applicaions such as image sequence esoaion, moion compensaion, bi ae disibuion and conens ineacion. A bloc ha conains a moving pa o a complee moving objec is called a moving bloc (foegound) while one ha conains a saic objec is called a saic bloc (bacgound). If we can idenify he moving MPEG blocs hen we can deive a moe efficien and smae bi disibuion among he fames o achieve bee compession in he video-encoding phase. One immediae use is he impovemen of he compession aio wihin he famewo of many exising video sandads, lie H.263, H26L o MPEG-1/2/4. In his pape we develop an auomaic adapive algoihm ha analyzes video sequences in ode o sepaae beween bacgound and foegound MPEG blocs. One of he poblems in convenional bloc-maching algoihms, which is indiecly addessed in his pape, is ha he disoion measue being used is unable o disinguish beween eos inoduced by objec moion o by luminance and/o conas changes. Eos caused by luminance changes esul in moion vecos, which may negaively affec he image coheency. Luminance coecion ies o educe he maching-eo by modifying he disoion cieia wih he inoducion of illuminaion coecion paamees 175

(see 1,2,3). Howeve, much wo deal wih moion segmenaion, ignoe he quesion of deph odeing of layes (which ae in fon of which), he poblem of occlusion, and he poblem of luminance changes. When hey ae consideed, i is by idenifying hose oulies of he moion esimaion which could be due o he occlusion of one laye by anohe 4,5. Segmenaion beween saic and moving MPEG blocs enhances he bi ae disibuion beween he blocs, and hus, he coheency of he image is peseved. Ou poposed segmenaion algoihm compaes a fame o seveal of is consecuive fames. Each compaison povides infomaion on he inensiy changes acoss he sequence. Two empoal saisical ess ae used. The ess ae pefomed as an incemenal calculaion along he compaisons esuls beween he pais. Afe each compaison we classify he bloc accoding o he elaions beween hese saisical ess. We show ha afe sufficien ieaive compaisons beween he inpu fame and is consecuives, he moving objec blocs ae consanly classified ino pe-nown cluses, while ohes do no have any peiodic o consan cluse paen. The numbe of compaisons needed is auomaically deemined and depends on he naue of he sequence. The es of his pape is oganized as follows. Secion 2 descibes he main algoihm, his secion conains descipion fo he compaison modeling echnique and fo he classificaion pocess. In secion 3 we pesen expeimenal esuls. 2 The MPEG Bloc Segmenaion Algoihm The segmenaion algoihm conains hee main seps: muliple empoal compaisons, empoal analysis and classificaion. The fis is he muliple empoal compaisons, which is an exension of he change deecion echnique 7 ino a empoal compaison among a seies of consecuive fames. We compae a given fame I (efeence fame) wih is nex successive fames I, =+1,...,T, whee T is he numbe of equied compaisons. The esuls fom each compaison ae held in a empoal compaison maix (TCM). The change deecion algoihm assumes ha he images, which paicipaed in he compaison pocess, ae egiseed. The second sep is he empoal analysis, which analyzes he TCM by wo saisical ess: disibuion (ds) and mean. The esuls will be he inpus o he classificaion sep which he hid sep. The classificaion sep sepaaes beween moving blocs and he es of he blocs. I is called afe each ieaive compaison, and only when hee is sufficien infomaion o classify he MPEG blocs, he algoihm is eminaed. The phases of he algoihm depend on each ohe. As long as he esuls of he saisical ess do no convege ino a pese cluse, he second phase coninues o compae moe fames and he empoal analysis coninues o geneae updaed ess. 2.1 Muliple Compaisons Modeling Le be he index of he efeence fame. Each of is successive I, =+1,...,T, will be compaed wih he efeence fame I, whee T is he index of he las fame ha paicipaes in he compaisons. I is adapively deemined (secion 2.4) accoding o he 176

naue of he examined sequence. We assume he bacgound is saic afe he moion of he camea is compensaed 9,11. Then each fame I can be wien as: I = b( ) + obj( ), = +1,..., T (1) whee b () and obj () epesen he bacgound and objec pixels in fame I, especively. Each compaison esul (he compaison echnique is descibed in secion 2.2) beween I and I is held in a empoal compaison image (TCI ). We divide he compaison esuls beween I and I ino fou diffeen goups of pixels: b (TCI), cv (TCI), uncv (TCI) and ol (TCI) (shown in Figue 1), which ae he bacgound, he coveed, he uncoveed and he ovelapped goups of pixels, especively. Thus, he TCI can be wien as: TCI ( ) = b( TCI ) U cv( TCI ) U uncv( TCI ) U ol( TCI ) (2) whee b( TCI ) = b( I b( ) (2.1) is he common bacgound (in pixels) of I and I ; cv { } ( TCI ) = obj( ) obj( I obj( ) (2.2) and uncv { } ( TCI ) = obj( obj( I obj( ) (2.3) ae he coveed and uncoveed egions, especively, poduced by he objec's movemen and; ol( TCI ) = obj( I obj( ) (2.4) is he ovelapping pixels beween he goups obj () and obj (). (a) (b) (c) (e) b (f) cv uncv ol Figue 1: (a), (b), (c) ae he oiginal fames whee he lag beween hem is five. (e) and (f) epesen he empoal compaison images (TCI ()) beween (a),(b) and (a),(c) especively. b (TCI) is he common bacgound. uncv (TCI) and ol (TCI) conain he objec pixels in hei oiginal posiion (image (a)). cv (TCI) and ol (TCI) conain he objec pixels in I (images (b) and (c)). 177

Excep fo he b (TCI) goup of pixels, all he descibed egions ae assumed o be change aeas fo a ypical change deecion algoihm 7, 8. In geneal, when an objec does no say in a fixed posiion we expec ha fo a ceain, + 1,..., T : obj I obj = φ, uncv ( TCI ) = cv( TCI ) and ol = φ. In ohe cases, when he objec is ( ( ) moving bu emains close o is oiginal locaion in I, we expec ha ol( TCI ) obj(, obj I obj obj and cv φ. ( ( ) ( 2.2 Compaison beween Consecuive Fames Deecion of changes beween fames can be caied ou wih a vaiey of exising change deecion algoihms. Howeve, since he MC mehodology compaes fames wih gowing accumulaed ime diffeences, he algoihm has o handle luminance and noise changes ha ae amplified duing he compaisons. In ode fo he algoihm o be independen of hese changes, we exend he model ha was descibed in 7, o handle muliple empoal compaisons beween seveal consecuive fames. Fis, we choose a moving window η of N η = 16 elemens, which belongs o he division image. Le I τ = (3) I ( ( ) (, = τ ( ) + Nη η 1 ˆ µ x 1,..., T. (3.1) Then, fo each pixel, a saisical diffeence (SD) is compued beween I and I whee he esul indicaes ha a local change beween he fames has occued: η η 2 ( ˆ µ ) + 1 T 1 sd = τ,..., ; (3.2) N whee is he index of he cuen compaed fame. Sd (x, will be used as an indicaion whehe o no a change was deeced in he (x, coodinae of I elaive o I. Binay values of each pai of compaed fames ae soed in a Tempoal Compaison Maix (TCM) ha has he following sucue fo =+1,...,T : 1 sd > β TCM ( )( x, = (3) 0 ohewise whee β is a pedefined heshold. Noe ha analysis among sequence of fame compaisons caused he heshold β o be insensiive o a local eo in changes acoss he sequence. By applying Eq. (3) on T fames we obain he (TCM) maix, as illusaed in Figue 2. 178

Figue 2: The empoal compaison maix, which is filled using Eq. (3) on T K fames. Each binay value a (x,y,) is he oupu fom he compaison beween fame I and I fo an 8x8 bloc whose cene is locaed a (x,. Figue 3 demonsaes he binay changes of he TCM slices elaive o pevious slices as a funcion of he pogess of fames compaisons. The video sequence in Figue 3 is chaaceized by a slow moion of he objecs beween he fames. Theefoe, he advanage of he compaison mehodology becomes eviden when 7. Image (e), which is he oupu afe hee compaisons (=3) fom he efeence fame, shows ha he change deecion algoihm fails o deec many of he ineios of he objec s egions. In image (g), which is he oupu fo =10, hee is a significan impovemen in he change deecion mas inside he objec egions (bounded by ed cuve). Howeve, he bypoduc of having a lage lag () beween he compaed fames is ha many of he coveed, cv (TCI), aeas (ouside of he ed cuves) ae also maed as change. These egions ae visible because of he accumulaed eo duing he compaison pocess. Figue 3: Images (a), (b), (c) and (d) ae fou fames aen fom a video clip. (a) is he efeence fame I (=0), (b) is fame I =3, (c) is fame I =7 and (d) is fame I =10. Images (e), (f) and (g) ae he hee slices aen fom he TCM. (e) epesens he slice of TCM (3). (f) epesens TCM (7) and (g) epesens TCM (10). 179

2.3 Tempoal Analysis As was shown, hee exis T such ha he values of TCM ( ), obj(, ae classified as change (how T is deemined will be discussed in secion 2.4). Howeve, focusing on a single TCM (`) (x, slice wihou aing ino consideaion he pevious slices whee =+1,...,`, may lead o wong classificaions such as consideing bacgound blocs as foegound blocs (in paicula, ove he occluded and he uncoveed aea, which become lage as `>>+1). Change deecion algoihm, such as 7, does no design o deal wih such lag beween compaed fame no o disinguish beween occluded and uncoveed aea. The empoal analysis acoss he TCM slices enables us o diffeeniae beween hese egions. Each slice in he TCM (), =+1,...,T, is composed of fou diffeen egions: b (TCI), cv (TCI), uncv (TCI) and ol (TCI) (see secion 2.1). Thus, fo he efeence fame I we have: obj = uncv Uol (4) b ( ( TCI ) ( TCI ) = cv Ub. (5) ( ( TCI ) ( TCI ) Ou claim is ha by a saisical es, which is applied on he TCM slices, boh uncv (TCI) and ol (TCI) have he same disinc disibuion. The following fomulae he basis fo he saisical ess: I. > λ, ( x, obj( TCM ( ) > β + λ T, λ 1 II. < T, ( x, b( TCM ( ) > β whee is he index of he efeence fame and β is a pe-defined change deecion heshold. Saemen I assumes ha fo any λ, obj () pixels ae consideed as a change, independen of he objec s movemens in he sequence. The second saemen assumes ha hee is a ime diffeence beween he compaed fames such ha if an objec coves he bacgound egion in I hen his egion is consideed as a change a his compaison. Recall ha boh saemens ely on he empoal compaison definiion in secion 2.1. In addiion o he above saemens, he following defines he saisical ess, which compue he disibuion, ds, (Eq. 7) and he mean, mean, (Eq. 6) of he compaisons esuls. Is is done fo a single bloc in I wih cenal coodinaes (i,, fo ieaive compaisons: ds mean N N x +, y + 1 1 2 2 j ) = TCM ( s ) 2 s = 1 N N N x i, y= j 2 2 = N N i+, j+ 1 1 2 2 = TCM ( s) 2 s= 1 N N N x i, y= j 2 2 mean( i, = N 2 is he bloc size in pixels i>n/2, j>n/2 and =+1,...,T. The disibuion es (Eq. 7) calculaes he empoal vaiance of a specific bloc fo all TCM slices. Afe each compaison, a new slice is added o TCM (), and is bloc disibuion is updaed. Similaly, he mean es (Eq. 6) compuaion, which calculaes he empoal mean of a specific bloc fo all he TCM () slices, =+1,...,T. 2 (6) (7) 180

2.4 Bloc Classificaion The goal of he classificaion phase is o sepaae he saisical esuls beween foegound and bacgound blocs. The esuls of he wo saisical ess ae placed in a feaue veco (FV). Each MPEG bloc b, I = 1,...,, is associaed wih a N N disinc FV whee ( i, ae he cenal coodinaes of he bloc. The FV is denoed by FV b = ( FVb j ) (1), FVb (2)) whee FV b ( 1) = ds( i, and FV b ( 2) = mean( i,, = +1,..., T. Pese cluses of he bacgound, objec and occlusion egions ae defined. In geneal, a simple classificaion echnique, which models he cluses by minimal disances, is equied (10, Chape 16.3). Howeve, since eliabiliy of he FV b depends on he numbe of MC ieaions, we have o incopoae ino he classificaion echnique he abiliy o deemine how many MC ieaions ae needed fo eliable blocs classificaion. Assume FV b is assigned o each bloc b fo = +1,..., T ieaive compaisons. Each compaison adds essenial infomaion o FV b ha fis he objec empo- al movemen in I. As menioned (secion 3.2), he moe ieaive compaisons ha aing place, he moe eliable infomaion on objec movemens is conained in FV b. To use his fac, we fis define a classificaion space Ω : Ω = {( x, 0 x 100 0 y 100}. This space is paiioned by hee diffeen subspaces. Each subspace is called a pese cluse, denoed by C p, p =1,2, 3. The pese cluses ae deemined o indicae he high pobabiliy of belonging o one of he following: and C 1 C2 C3 Figue 5): { x, ( ol uncv )} C =, 1 ( ( TCI ) ( TCI ) { x, y cv } C =, 2 ( ) ( TCI ) { x, y b } C 3 = ( ) ( TCI ), = φ. As a esul, a void space (VS), denoed by C VS, is obained (see C vs = { Ω ( C1 C2 C3) }. Recall ha (as defined in secion 2.1) ol( TCI ) uncv( TCI ) epesens he objec's egion in I while CV (TCI) and b (TCI) epesen he bacgound egion. The pese cluses wee deemined (see Figue 5) afe applying he classificaion echnique on one hunded video sequences. Then, fo a ceain, = +1,..., T, a given bloc FV (, ) is classified o belong o eihe one of he pese cluses C p, p 1,2, 3, o o C VS. Based on MC mehodology, we claim ha if afe ieaive compaisons FVb ( i, j ) CVS, hen hee is insufficien infomaion o classify his bloc; ohewise FVb ( i, j ) C p fo one p, 1 p 3. In his case, moe ieaive compaisons ae equied and is inceased by one. Howeve, an excepional siuaion has o be aen ino consideaion when FVb ( i, j ) CVS fo all = + 1,..., T. This may occu when an MO becomes saic in is b i j 181

movemens afe λ fames whee + 1 < λ T. In such a case, he following minimum disance echnique is compued. Each cluse C, p = 1,2, 3, is defined by is cene of gaviy m p whee x = C 1 x C, p y = p x p 1 y x y C, C p p, p C y p C ae he x and y coodinaes of C p, and C p is he size of C p in pixels. FV b can be classified by seaching is minimal disance (md) fom he m p, p =1,2, 3, by: 2 2 md (, ) = min ( FV (, )(1) m ( x)) + ( FV (, )(2) m ( ) i j b i j p b i j p p= 1,2,3 b (8) whee FV b (1) and FV b (2) ae he ds and mean ( i, j ) of he bloc, especively. To combine he minimal disance, we se a consan T max o be he maximum numbe of ieaive compaisons allowed. FV b = ( FVb (1), FVb (2)), I = 1,...,, N N is classified only if FVb ( i, j ) C p fo one p, 1 p 3. Ohewise, if Tmax and b i, j ) VS FV ( C he bloc will be classified accoding o is minimal disance fom m p, p 1,2,3, using Eq. (8). In his case, T max = T and he algoihm is eminaed. The complee classificaion pe single ieaive compaison consiss of he following seps: Noaion: Lis lis of (FV ) FV b, = 1,..., I N N CLASSIFY ( Lis (FV ),, T max ) 1. fo I = 1,..., do: N N a. fo p =1,2, 3 do: if FVb ( i, j ) C p hen b p else b VS 2. if Tmax do: 3. fo I = 1,..., do: N N a. If b = VS do: i. compue he minimum disance beween 4. eun Lis (FV ) ii. FV = FV (1), FV (2)) and m p =1,2, 3 b Eq.(8). b ( b b p (p is closes cluse o FV, ) b i j ( ). p using 3 Expeimenal Resuls The following illusaes he algoihm oupu a fou diffeen ieaive classificaion. In Figue 4, he objecs (sign by he gay aows) blocs moved diffeenially one o each ohe, no all he objecs blocs move a he same ime. Some move afe one o 182

wo fames (fom he efeence fame) while ohes move afe five fames o even moe. Figue 4.0 is he efeence fame I o be segmened. 4.0 4.1 4.2 4.3 4.4 Figue 4: Five consecuive fames aen fom mohe-daughe video sequence Figue 5(a,b,c,d) pesens he modeled C p cluses, p 1,2, 3, he blac dos ae he FV of each ieaive compaison. Each cluse is shown by a blac polygon ha suounds i. The hee pesened cluses (a Figue 5 a,b,c and d) wee obained fom a aining goup of one hunded diffeen video sequences. Images a, b and c pesen he classificaion esuls fo =3, =7 and =12 ieaions. Figue 5d pesens he final oupu of he classificaion sep, which was obained afe T =15 ieaions. Since afe he final ieaion hee ae sill seveal blocs locaed inside he C vs subspace, hese blocs ae classified by he minimum disance echnique (see secion 2.4). Objecs Bacgound (a) (b) Bacgound (c) (d) Figue 5: FVs disibuion of he blocs in image 4.0. (a), (b) and (c) illusae he segmenaion esuls afe =3 =7, =12 ieaive compaisons. (d) is he final bloc classificaion. Accoding o he final classificaion in Figue 5d, we pesen in Figue 6a he gid of he bloc segmenaion esuls such ha he ed and he geen blocs belong o he ed and geen cluses (in Figue 5d), especively. 183

(a) (b) Figue 6: (a) The final segmenaion oupu fo Figue 4.0 afe he T =15 compaisons. (b) The final oupu fo single fame fom he Tennis movie afe he T =12 compaisons Refeences 1. M. Gilge. "Moion esimaion by scene adapive bloc maching (sabm) and illuminaion coecion". Image Pocessing Algoihms and Techniques, volume 1244 of SPIE Poceedings, pages 355-366, (1990). 2. P. Eise and B. Giod. Model-based 3D-moion esimaion wih illuminaion compensaion. Poceedings of Image Pocessing and is Applicaions (IPA'97), VOL 443, pages 194-198, Dublin, Ieland, July (1997). 3. P.M. Kuhn and W. Sechele. VLSI achiecue fo vaiable size moion esimaion wih luminance coecion. In F.T. Lu, edio, Advanced Signal Pocessing: Algoihms, Achiecues, and Implemenaions VII, volume 3162 of SPIE Poceedings, pages 497-508, Ocobe (1997). 4. L. Begen and F. Meye. Moion segmenaion and deph odeing based on mophological segmenaion. Poc. 5h ECCV, volume II, pages 531 547, Feibug, Gemany, June (1998). 5. Y. Weiss and E. H. Adelson. A unified mixue famewo fo moion segmenaion: Incopoaing spaial coheence and esimaing he numbe of models. In Poc. CVPR 96, pages 321 326, San Fancisco, CA, June (1996). 6. P. H. S. To, R. Szelisi, and P. Anandan. An inegaed Bayesian appoach o laye exacion fom image sequences. In Poc. 7h ICCV, volume II, pages 983 990, Keya, Geece, Sepembe (1999). 7. K. Sifsad R. Jain, Illuminaion independen change deecion fo eal wold image sequences, Compue Vision, Gaphics, and Image Pocessing, Vol. 46, pp. 387-399, (1989). 8. Til Aach, Ande Kaup, Rudolf Mese, Saisical model-based change deecion in moving video, Signal Pocessing, Vol. 31, pp. 165-180, (1993). 9. F. Dufaux and J. Moscheni and A. Lippman. Efficien, obus and fas global moion esimaion fo video coding. IEEE Tans on Image Pocessing, Vol.9, Mo. 3, pp. 497-500, (2000). 10. B. Jahne. Digial Image Pocessing conceps algoihms and scienific applicaion 4 h ediion. (1997). 11. A. Avebuch, Y. Kelle, Fas moion esimaion using bidiecional gadien mehods, o appea in IEEE Tans. on Image Pocessing. 184