4205 1 Pr D Ftr MPEG-4 AVC/H.264 Mssvy-Pr Artturs Brt Ptrs, Crs-Frr J. Hrs, J D C, Mr, IEEE, Ptr Lrt, Mr, IEEE, Wsy D Nv, R V W, Mr, IEEE. Astrt T tr t MPEG-4 AVC/H.264 str s utty x us ts tt tvty, rsut st ur t s. Ts t s trr wt r tr ut rs ssvy-r rtturs. I ts r, w tru v r rtt s r urrt t MPEG-4 AVC/H.264 str, s ur D Ftr Iy, rrt vrs t Lt Errr Prt Et rs t trtur. Our rs s s urrt r u ss wt t syrzt rt, ty s urt, s t wt t MPEG-4 H.264/AVC str. W t t t t ssvy-r rttur t Grs Prss Ut (GPU). Exrt rsuts sw tt ur GPU tt vs str-t r-t t 1309 rs r s r 1080 v turs. Bt stwr-s trs stt--t-rt GPU- rts r utrr trs s y trs u t 10.2 19.5 rstvy r 1080 v turs. Ix Trs, GPU, MPEG-4 AVC/H.264, - tr, ssvy-r I. INTRODUCTION THE - tr t MPEG-4 AVC/H.264 v str [1] s s t ru rtts us y qutzt. T tr s y tttv, rsut rs tr y, ut s rs utt xty [2]. Ts utt xty s y u t t t rss s t trs sussv tr sts. E tr s ss y x trs us u t v ts. Ts ur vr s r urs, tru s tw tr s w trr wt r xut. Trr, st rts rs t trtur r t [3] r sr [4] [6] rss rs. Musrt rv Mr, 2010; rvs My 20, 2010. At r ut Juy 24, 2010. Cyrt () 2010 IEEE. Prs us ts tr s rtt. Hwvr, rss t us ts tr r y tr urss ust t r t IEEE y s t us-rsss@.r. T rsr ts r ws u y Gt Uvrsty, t IBBT, t IWT, FWO-Frs, BFSPO, t Eur U. A utrs ut Wsy D Nv r wt Mut L, ELIS, Gt Uvrsty IBBT, Bu (-: Brt.Ptrs, CrsFrr.Hrs, J.DC, Ptr.Lrt, R.VW@ut.). Wsy D Nv s wt t I V Systs L, Drtt Etr Er, Kr Av Isttut S Ty (KAIST), Ru Kr (-: Wsy.DNv@st..r). I ts r, v r rss rt r t tr MPEG-4 AVC/H.264 s rst, tt urrt tr t u t rs t ssvy-r rttur t GPU, w sty t wt t str. Sy, w rs wy svr r y tt s t s D Ftr Iy (DFI). It s s ur rrt rv vrs t Lt Errr Prt Et tru y W t. [8]. By rv t urs r tr- ss tt us ssy tr y W t., rrt tr rsuts r t t MPEG-4 AVC/H.264 str v. Nxt, v r rtt s s rst us ur DFI, r rss t r v wt t syrzt rut vr. T rs s s t t ssvyr rttur t GPU us t NVIDIA CUDA tr [7] s vut. T rr ts r s rz s ws. St II ry srs t - tr s t MPEG-4 AVC/H.264 str rs t r t trtur r yz. St III rss t DFI w St IV trus ur v r rtt s t urrt tr us t DFI. St V susquty sws t xrt rsuts ur tt ur rss s w y ur uss St VI. II. DEBLOCKING FILTERING IN THE MPEG-4 AVC/H.264 DESIGN AND RELATED WORK Wt t - tr, r s tr rstr-s rr wt t tr vr s urs. T tr rr r t u t t urrt r s sw F. 1, strt wt tr ur vrt s, w y tr ur rzt s. T vut tr r t u t y rqur u t ur ss ( 0 3, q 0 3 ), wrs tr y ut u t tr ss ( 0 2, q 0 2 ) t ss. Ts s t tr - tr s t tr s vus r us t tr t xt. A t s F. 1 r ty tr s Bury-Strt (BS) rtr t s rt rss t ury. T BS rtr s ut us rt ut qutzt rtrs (QPs), rsus, t vtrs, rr rs t urrt t s. I s tr, t BS rtr
4205 2 1 2 3 4 3 2 1 0 q 0 q 1 q 2 q 3 () F. 1. Ftr u r s. (): rzt tr vrt s, (): vrt tr rzt s. qu t 4 t r s, t t us str tr tt trs ur ss ( 0 3 q 0 3 ) t s r. T rsut tr ss 0 2 q 0 2 r wrtt t tr s t. W t BS rtr s qu t 3, 2, r 1, t r tr s st r tr ss t s t, 0 2 q 0 2, tr ss 0 1 q 0 1. Tw t trs uts r us t tr wtr st ss r t ss t tr. Ts uts r t t QP st vus vy wt t s r s w s ss 0 1 q 0 1. Fy, w t BS rtr s qu t 0, t s t tr. As stt t trtur [4] [6], [11], t BS rtr uts r tr xut r r ss us t trs. Lu wvr sws ur trs tw tr u ss: tr ss r us susqut tr sts tr s urs rss r s urs. Trr, svr tqus r ry squt uts [4] [6]. Cvt r rs vv r rss rws r us ss r, tr wt s [3] r wtut s [4], [11], [12]. As rs sur y u ss, xu s rws r us rss urrty. O sust t t rs rs s t us wvrt tqus [4], [11], [12]. Hwvr, y t ut r rss s ss s t ur rs r wv vry. Aty, ts tqus rqur ur syrzt ts,.. r wv. O y ssvy-r rtturs, t vr sst wt syrzt t rss t rr t r r rss. W t. [8] v sw tt t ss v r r t rvusy-tr ss. As rvusy stt, t str tr y tr tr ss t s r S: S 2, S 1, S 0, q2 S, q1 S, q0 S s ustrt F. 2. T xt r tr s t ts tr ss, wvr, y rty. T st tr s q1 N s y t q3 S t rvusytr. Ts s s ut y t str tr trr r utr s. Furtr, t rsut t xt r tr y s s q1 N, t tr s t t str tr. Csquty, ts s tt ss strt r q1 N r sus t t t rvusy-tr r. W 5 6 7 8 () 3 2 1 0 q 0 q 1 q 2 q 3 t. sust r tqu s ts. Sy, ty rs t v v tur ur rtur rts, tr rzty r vrty, w rs r rss rstr-s rr. Lt r t s rt r tr rrty us r s trs, tru rrr. Hwvr, ts rrr rts y r t ut ss us t srv t s. W t. ts t Lt Errr Prt Et. Atr rss rtur rts, t t t s t rts r rut us t rrty tr ss r t t rt, tus rrt t rvusy-tru rrrs. Wt ts tqu, syrzt s z ut urry s t xz. III. PROPOSED DEBLOCKING FILTER INDEPENDENCY W t. tru t t Lt Errr Prt [8]. T utrs wvr r t t tt tr t s y urs w t trs ut us 0 1 s tru s sw F. 2 s sr t str. Ts s vus, r N 1 N 0, q1 N t N 0 N 1. As N 1 s t rvusy-tr vu q2 S, t s q1 N, ty tr s st t t tr rsuts y t str tr. Furtr x q2 S, w s tt t vu ts s s ut us ss s r s 0 t str tr (u F. 2), s tr t rvus r. Ts s tt t s ty tr, s tuy t rvusy-tr r ss. W t r tr s us t rss t r st t str tr, q2 S s ut y t tr. Trr, N 0 N 1 vut us utr vus t s tr t s rrty. Csquty, ts srv y s t ss t tr ss u t F. 2 r vr rt rs w t str tr s us. Trr, t t rs y W t. s t rv rrt rsuts w v tur rts r. W sut rsuts t t W t. sw rrrs tru us - us rrrs t utut. Ts sws tt t t W t. s t t wt t str. W rs ssry t t Lt Errr Prt t rrt tr r rtturs r t t MPEG-4 AVC/H.264 str. Ts t s s wt w t D Ftr Iy (DFI), w s s vsuz F. 2. W t tr tw u s vstt, w t tt r ts t s t tr t vut wtut t s. Ts s us t M 0 M 1 us r ss ut r t r tr xut t s. Wtr q1 M s tr r t s t r rvusy-tr vus q1 M y s utr ss M 0, q0 M, q1 M, q2 M. Trr, ss strt r q1 M, r u tr ty r ss. Bus t tr MPEG-4 AVC/H.264 s tw-s tr, w stu t DFI tw rts,
4205 3 Prvus MB ' ' ' ' 3 S 2 S 1 S 0 S q0 S q1 S q2 S q3 S Str tr R s Ftr s Bury S Frst r tr Bury N 1 N - 0 N < β 2 N 1 N 0 N q0 N q1 N q2 N S r tr Currt MB Bury M 2 M 1 M 0 M q0 M 1 M - 0 M < β q1 M Ss ty tr r t W t. q2 M Atu ss ty tr us t rs DFI S ut y E tr y F. 2. Lt Errr Prt Et rs y W t. t rs D Ftr Iy. () MFP F. 3. T DFI tw ss. Lt-ry: vrt r rzt, r-ry: vrt rzt, r: tv stt. (): DFI wt r; (): tr tr r rs. sw F. 3(). I t ur, t tt rt try rs rrst ss ty tr r t t ur t r rstvy. T tw rs F. 3() vr. Ts vr rt ts ss wr t tr rss strt ty r t t ur r trr tr r rs r. I ts r, ts tr rts r r Mr Ftr Prtts (MFPs). Essty, MFPs r st ustrs ss tt tr r-r tr tr rrs s t t s tr stt s tr y t rstr-s rr rt t v t. Fr CIF v tur, MFP wu sst ut 396 s ustrs, r r. Hwvr, y rt t MFP sw F. 3() ts t rrty-tr vus. I, F. 3() sws t u tr t r t t rt tt t urrt r w tr rstr-s rr. It s ss tt str tr s us r 1 (s F. 1) t r t t rt r 5 t r t t tt t urrt r, tvy u rt t vr r. T ss t MFP tt t u r trr tr tr stt, sw F. 3() r. I rs wr t tr ty t tr rs, y ts s rt t MFP wu tr rrty. Trr, rr t tr tr r rrty, s rss rr MFPs s rqur. () MFP 1: vrt s () MFP 3: rzt s () MFP 5: rzt s () MFP 7: rzt s () MFP 2: rzt s () MFP 4: vrt s () MFP 6: vrt s () MFP 8: rzt s F. 4. Prs r rtt vr sussv sss. Wt: utr ss, t-ry: rvusy-tr ss, r-ry: tr urrt ss, r: ss tr stt. IV. PROPOSED MACROBLOCK PARTITIONING SCHEME FOR MACROBLOCK-PARALLEL PROCESSING By rtt r, t-t ss r tr ttr, r MFP. Ss ts MFP, ttr wt r utr ss, t t s t ss r tr ss t xt MFP. W MFP s s t tr utut tr MFP, syrzt t s tru s rss t rvus MFP ust s r rs t tur. MFPs susqut sts r ur, wt MFP s r utr ss ss MFP,<. F. 4 sws t rs rtt s. T tr v tur s v t t rt MFPs, t s urt. W tr MFP wt susqut syrzt t ss. E su tur sws t t ss s r t v tur tr tr t MFP. T ts tr surru ()
4205 4 rs t s MFP t urrt r s s t su ur. W stus v tys ss F. 4: wt, utr ss; t-ry, rvusy-tr ss; r-ry, ss tr t urrt ss; r, tr ss tr stt. Ftr strts wt t rst tw sss, rrst t suss s F. 3. Vrt s 3 (rty) 4 r tr ss 1 (F. 4()) us t DFI stt tt ts us r t rvusy-tr rs. T tr rsuts (MFP 1 ) r susquty us r t tr rzt s 7 8 ss 2 (F. 4()), r t t DFI. As stt r, rt ts MFP ry ts tr ss tr stt, sw r F. 4(). I ss 3, ss MFP 3 r tr wrtt. Hr, t t r s ssy tr y t str tr, u u t tr rws (,, ) ss t t tt t r v t r qust. Ts s sw y t tr rws u ss t t tt F. 4(). Nt tt rt rw s tr r t tr t. Ts rrss wt t tr u F. 2. T rsut tr ss r u rrt tr stt us t tr rzt s MFP 3 y s rrty tr ss r MFP 2 t r v. Nt tt tr t tt-t ss t r t t t t-rt t urrt r rts us t tr us,,. T xt ss trs ss vrt s strt r rw MFP 4 (F. 4()). T DFI stts tt ts ss tr ty r t r v t urrt r. Rsuts MFP 4 u t ur us ss r t r t t t,.. tr rsuts r MFP 1-2. Nxt, t r rzt s us - r tr ss 5 (F. 4()) s rt MFP 5, t ss t tr vrt s 1 t 3 ss 6 (F. 4()). Nw t rss rt r MFP 7 (F. 4()) us t DFI t rs rrty-tr ss r rzt s 7 8. Fy, r rzt ss s ss MFP 7 tr ss 8 (F. 4()). L t MFP 2 MFP 3 F. 4, t s r tt ts MFPs r t MFP s rzt s r tr r t t tt. Lws, MFP 7 MFP 8 r. H ur rtt s ws tr v tur t tr us sx syrzt ts. Ts wy, r xut ts utw syrzt vr. Curry vr rs s xz s rs MFP t v tur tr r, t s urt. V. EXPERIMENTAL RESULTS AND DISCUSSION W t ur rs r-r rt xt t t stt--t-rt wvrt rt sr t t ssvy-r rttur t GPU us t NVIDIA CUDA tr (vrs 3.01). W r t GPU tts t t y-tz CPU tr rst v [10]. Bt u r t tr ws sur. Fr t rr tsts, rrssv v squs rt rsuts wr us wt YUV4:2:0 s r rsut, rsuts r tr v squs wr vr. E v tur sst r ur tr- ss us tr QP 27 r 45. Prr tsts wr syst ru Wws 7, AMD Q9950 CPU NVIDIA 8800GTX (G80), GTX280 (G200), GTX480 (GF100) rs r. Ts rs v rstvy 128, 240, 480 Str Prssrs (SPs). Outut t GPU trs ws r t tt t rr r ws u t-urt. I, s t r, t rs MB-r tr s r t t MPEG- 4 AVC/H.264 st trr trus rrr. Tr srs wr sut,.., r sr tw r srs v turs us I IPPP GOP strutur. Fr t r sr, sr 1, t tstr s u t t GPU, sut t GPU. As rt s rst GPU ry, tr s syrzt rqur tw GPU CPU. Our surts sw tt GOP strutur QP v y t u rss s r t GPU rts us t us rt xut struts r tr. Trr, w u rsuts r t wrst-s QP s urt r ts rts. T I rs ur rs GPU rt t t GPU wvrt sr tr t, sw t ur rs tr r s (rs r s; s). It s w ur rs MB-r t utrrs t sr wvrt ts y tr 187.0 19.5 rstvy r 1080 t GF100. Bt sr wvrt trs sw sstty w s s ty rqur ur syrzt ts xt t rs. Our rt zs syrzt ts xzs rs. T t ry sws w rr ss ry wt t ut str rssrs r t rs MB-r t. Fr x, t GF100 (480 SPs) trs wt tr 2.1 str t t G200 (240 SPs) r 1080. Ts s t t s r s rsuts wr tr r t u s v t tr SP. Nxt, w suss rsuts r srs 2 3 T I wr r rsss v turs us I IPPP GOP strutur rstvy. Ts srs rqur rt rss y t r, su s qutzt rtrs, rstrut tur, t. t u GPU ry. T ut st rqur t rv t GPU wt ut t ws u t surts, s w s t t t w tur t syst ry. I s I GOP strutur, rt GPU CPU uts s ss s t xt r strt r t rvus s. Strt r t G200, ut sts y r xut. Furtrr, GPU rr s t y t ut s tw GPU syst ry. Our surts sw ts t tru r t GF100, v s ts str rssrs. Nt tt rsuts r t G200 r sr 1 2 vr r rsuts s r ts v squs, r xut t utws trsr s. Fr IPPP GOP strutur, rt
4205 5 TABLE I EXPERIMENTAL RESULTS FOR CPU AND GPU IMPLEMENTATIONS, INCLUDING THE PROPOSED MB-PARALLEL APPROACH (IN FRAMES PER SECOND) USING AN AMD Q9950 CPU AND GEFORCE 8800GTX (G80), GTX240 (G200), AND GTX480 (GF100) GPUS. CPU GPU v Sr Wvrt Prs MB-r Sr Rsut S-Cr Qu-Cr G80 G200 GF100 G80 G200 GF100 G80 G200 GF100 QP27 QP45 QP27 QP45 1 CIF 2501 3096 5409 5335 90 92 143 133 173 509 1984 2892 4762 1 480 615 815 2091 2472 24 25 42 44 93 261 948 2129 4403 1 720 279 390 949 17 10 10 15 24 55 147 596 1204 2632 1 1080 76 81 258 275 4 4 7 10 35 67 267 637 1309 2 CIF 1591 1569 5409 5335 90 92 143 129 173 504 1567 2588 2811 2 480 499 474 97 12 22 25 42 42 92 262 668 1531 2702 2 720 185 178 618 587 8 10 15 23 55 146 332 1201 1599 2 1080 76 75 254 248 4 4 7 9 35 66 154 622 762 3 CIF 2501 3096 8503 10012 89 92 141 129 8 491 1443 2375 2717 3 480 615 815 2091 2472 22 24 42 42 89 260 600 1472 1748 3 720 279 390 949 17 8 10 14 22 52 133 310 724 860 3 1080 76 81 258 275 4 4 7 9 33 66 138 351 412 s t ss CPU GPU ust syrz xut us t GPU t. Fr srs 2 3, w r t rs MBr t wt t y-tz tt v. T t sws ur rs t t utrr CPU-s ts r t srs r -t rsuts. Fr x, t GF100 stss su tr 10.2 r 1080 vr s CPU r r sr 2. Fr rss wt ut-r vrs v, w s vr s urs us v squs wt ur ss. I sr 2, ur t sws su 3.0 r t ur CPU rs. T t sws w rr t rs rt wrs r 762 s sr 2 t 412 s sr 3 r 1080 squs s t trsr syrzt vr s rs. Furtrr, s ss tr s us wt tr ss, rr t CPU tr rss r 248 t 259 s, us GPU CPU s t vr. Fr s rsuts, t GPU t s utrr s tr r ut sts r t CPU t t tr ts try t CPU. T MBr tt utrrs t ut-r vrs r t st rsut wt su tr 1.7. VI. CONCLUSION I ts r, w rst v r rss rt r t tr t MPEG-4 AVC/H.264 str, urrt tr rs. W sw tt t v rs stt--t-rt r rts s sut t ur syrzt ts t r us ssvy-r rtturs. Trr, v r rtt rt ws tru, s ur rrt vrs t Lt Errr Prt Et, tt s t wt t MPEG-4 AVC/H.264 str. It ws xu r rss urry r v turs t r v, t s urt w rqur y sx syrzt ts. T rs r tqu ws tst t ssvy-r rttur t GPU t us t NVIDIA CUDA tr. Exrt rsuts sw tt ur t ts tt w r str t r-t t 1309 rs r s r 1080 v turs GPU. I rtur, ur tt utrrs t tz CPU-s tr stt--t-rt r GPU ts trs s y tr u t 10.2 19.5 rstvy, t y t syst us ut vr ty s utr systs. REFERENCES [1] Jt V T (JVT) ITU-T ISO/IEC JTC 1, Av V C r Gr Auvsu Srvs, ITU-T R. H.264 ISO/IEC 14496-10 (MPEG-4 AVC), Vrs 5, Juy 2007. [2] P. Lst, A. J, J. L, G. Bøtr, M. Krzwz, Atv D Ftr, IEEE Trs. Cruts Syst. V T., v. 13,. 7,. 614 619, Juy 2003. [3] K. Xu C. Cy, A Fv-St P, 204 Cys/MB, S-Prt SRAM-Bs D Ftr r H.264/AVC, IEEE Trs. Cruts Syst. V T., v. 18,. 3,. 363 374, Mr 2008. [4] Z. Z P. L, Dt Prtt r Wvrt Przt H.264 V Er, IEEE Itrt Sysu Cruts Systs, My 2006. [5] J. C, N. Sts, B. Ctzr, K. Rvr, K. Kutzr, Et Przt H.264 D wt Mr B Lv Su, IEEE Itrt Cr Mut Ex, Juy 2007,. 1874 1877. [6] G. At A. Ps, R-T H.264 E y Tr-Lv Prs: Gs Pts. IASTED PDCS, Str 2005,. 254 259. [7] NVIDIA CUDA Cut U Dv Arttur: Prr Gu Vrs 2.0, NVIDIA Crrt, Juy 2008. [8] S.-W. W, S.-S. Y, H.-M. C, C.-L. Y, J.-L. Wu, A ut-r rttur s r rwr r H.264/AVC trs, S Prss Systs, v. 57,. 2,. 195 211, 2009. [9] G. J. Suv T. W, V Crss - Fr Cts t t H.264/AVC Str, Pr. t IEEE, S Issu Avs V C Dvry, v. 93,. 1,. 18 31, Jury 2005. [10] v, Prt FFMPEG, vrs 1.0, Juy 2007. [O]. Av: tt://.yrq.u/ [11] J. C. A. Bz, W. C, E. Crstrs, D. Du, B. Fr, R-T H Dt H.264 V D Us t Xx 360 GPU, Pr. SPIE: Ats Dt I Prss XXX, v. 6696,. 1, 2007. [12] F. H. Str, M. Byr, M. Gutz, R. M. Bus, Evut t-r H.264 rs r stry rsur-rstrt rtturs, Mut Ts Ats,. 1380 7501, 2009.