A Systematic Framework for Dynamically Optimizing Multi-User Wireless Video Transmission

A Sysemac Framework for Dynamcally Opmzng ul-user Wreless Vdeo Transmsson Fangwen Fu, haela van der Schaar Elecrcal Engneerng Deparmen, UCLA {fwfu, mhaela}@ee.ucla.edu Absrac In hs paper, we formulae he collaborave mul-user wreless vdeo ransmsson problem as a muluser arkov decson process (UDP) by explcly consderng he users heerogeneous vdeo raffc characerscs, me-varyng nework condons and he resulng dynamc couplng beween he wreless users. These envronmen dynamcs are ofen gnored n exsng mul-user vdeo ransmsson soluons. To comply wh he decenralzed naure of wreless neworks,, we propose o decompose he UDP no local DPs usng Lagrangan relaxaon. Unlke n convenonal mul-user vdeo ransmsson soluons semmng from he nework uly maxmzaon framework, he proposed decomposon enables each wreless user o ndvdually solve s own dynamc cross-layer opmzaon (.e. he local DP) and he nework coordnaor o updae he Lagrangan mulplers (.e. resource prces) based on no only curren, bu also fuure resource needs of all users, such ha he long-erm vdeo qualy of all users s maxmzed. However, solvng he UDP requres sascal knowledge of he experenced envronmen dynamcs, whch s ofen unavalable before ransmsson me. To overcome hs obsacle, we hen propose a novel onlne learnng algorhm, whch allows he wreless users o updae her polces n mulple saes durng one me slo. Ths s dfferen from convenonal learnng soluons, whch ofen updae one sae per me slo. The proposed learnng algorhm can sgnfcanly mprove he learnng performance, hereby dramacally mprovng he vdeo qualy experenced by he wreless users over me. Our smulaon resuls demonsrae he effcency of he proposed UDP framework as compared o convenonal mul-user vdeo ransmsson soluons. Keywords- ul-user Vdeo Transmsson, arkov Decson Process, Lagrangan Relaxaon, Onlne Learnng. I. INTRODUCTION Due o her flexble and low cos nfrasrucure, wreless neworks are posed o enable a varey of delay-sensve mulmeda ransmsson applcaons, such as vdeoconferencng, survellance, elemedcne, remoe eachng and ranng, and dsrbued gamng. However, exsng wreless neworks

provde dynamcally varyng resources wh only lmed suppor for he Qualy of Servce (QoS) requred by delay-sensve, bandwdh-nense and loss-oleran mulmeda applcaons. oreover, as mulmeda applcaons connue o prolferae, wreless nework nfrasrucures wll ofen need o suppor mulple smulaneously runnng applcaons. Key challenges assocaed wh he robus and effcen mul-user vdeo ransmsson over wreless neworks are he dynamc allocaon of he scarce nework resources among heerogeneous users experencng dfferen me-varyng nework condons and raffc characerscs, and he dynamc adapaon a he ndvdual users gven her allocaed nework resources. Exsng wreless vdeo ransmsson soluons can be broadly dvded no wo caegores: sngle-user vdeo ransmsson soluons, focusng on packe schedulng, error proecon or cross-layer adapaon n order o maxmze he recevng user s vdeo qualy []-[6], and mul-user vdeo ransmsson, emphaszng mul-user resource allocaon among mulple users smulaneously ransmng vdeo and sharng he same wreless resources [4]-[8]. However, mos exsng soluons n boh caegores do no explcly consder boh he heerogeneous characerscs of he vdeo raffc and he me-varyng nework condons (e.g. me-varyng channel condons, dynamc mul-user channel access, ec.), hereby ofen leadng o subopmal performance for wreless meda sysems. For example, n he sngleuser vdeo ransmsson caegory, mos soluons employ Unequal-Error-Proecon (UEP) echnques [7][8] o dfferenally proec he vdeo packes based on her dsoron mpac and delay deadlne. The work n [2] furher proposes a rae-dsoron opmzed packe schedulng soluon, whch explcly consders he vdeo packes dependences by usng a dreced-acyclc-graph (DAG). Ths rae-dsoron opmzaon mehod was subsequenly mproved n [4] by reducng he polcy search space and n [6] by reducng he sar-up laency for real-me vdeo sreamng applcaons. However, hese soluons assume only smplsc underlyng nework (channel) models and hey do no consder he adapaon of ransmsson parameers a he oher layers of he nework sack, besdes he applcaon layer. To deal wh he dynamcs n he wreless nework, cross-layer adapaon mehods [][3][5] have been proposed o opmze on-he-fly he ransmsson parameers a varous layers, based on curren observaons of he

channel condons. However, hese cross-layer soluons are myopc and resul n subopmal performance because hey do no accoun for he fuure channel condons and vdeo raffc. In he mul-user vdeo ransmsson caegory, many curren echnques [4]~[8] are based on he nework uly maxmzaon (NU) framework [3]. In he NU framework, he basc assumpon s ha each user has a sac uly funcon of he (average) allocaed ransmsson rae (or QoS). For example, he auhors n [5] smply consder he uly o be a funcon of he average allocaed rae. The soluons n [4][6][7] (ncludng our prevous work n [8]) defned he uly funcon (.e. he average vdeo qualy or dsoron) as a funcon of he average rae and packe loss. To deal wh he dynamc wreless channel condons, he resource allocaon among he mulple users s repeaedly performed o maxmze he curren vdeo qualy. However, hese soluons only myopcally maxmze he vdeo qualy for all he users a he curren me and do no predc he mpac of he curren resource allocaon on he fuure vdeo qualy of all he users. Ths myopc soluon leads o flucuaons n he long-erm vdeo qualy. Therefore, s crucal o judcously allocae he lmed resources o ndvdual wreless vdeo users such ha her long-erm uly (.e. vdeo qualy) s maxmzed. To address he abovemenoned challenges assocaed wh effcen mul-user vdeo ransmsson over he me-varyng wreless nework, we propose a sysemac framework for dynamcally and foresghedly opmzng he long-erm vdeo qualy of mulple users coexsng n he same wreless channel. Under he proposed framework, unlke he exsng sngle-user vdeo ransmsson soluons, we explcly consder when performng he cross-layer opmzaon for each wreless user, boh he heerogeneous vdeo raffc characerscs and he experenced me-varyng nework condons. Frs, o characerze he heerogeneous vdeo daa, we defne a raffc sae for each user whch consders he amoun of daa uns (e.g. vdeo frames or vdeo packes) o be ransmed, her dsoron mpacs, and he dependences beween hem a each ransmsson me. The raffc and nework sae (e.g. channel condons) ransons characerze he envronmen dynamcs experenced by each user. These dynamcs In order o faclae he dynamc opmzaon of vdeo ransmsson over me-varyng wreless neworks, we consder a me-sloed wreless ransmsson sysem n whch he decson s made every me slo. The lengh of one me slo s deermned based on how fas he nework and vdeo raffc are changed [8].

are affeced by he packe schedulng deployed by all he users and her resource acquson. We hen formulae he packe schedulng and resource acquson for each user as an DP. The deployed soluon o he sngle-user vdeo ransmsson problem usng DP can be found n our pror work [35]. We furher formulae he opmzaon of he packe schedulng and resource allocaon over he dynamc mul-user vdeo ransmsson sysem as an UDP [9] problem. Smlar o he DP formulaon for sngle-user vdeo ransmsson, he UDP formulaon allows each user o make foresghed ransmsson decsons by akng no accoun he fuure mpac of s curren decsons on he long-erm ules of all he users. Alhough he UDP problem can be solved n a cenralzed manner usng convenonal value or polcy eraon algorhms or lnear programmng [9], hs requres he nework coordnaor o know he dynamcs experenced by each user and o solve a hghly complex cenralzed UDP, nvolvng a very large sae space. Therefore, hs soluon leads o very hgh compuaon complexy and unaccepable communcaon overheads and ncurred delays. Forunaely, he UDP s weakly-coupled [20], snce he sae ranson of each user s coupled wh ha of oher users only hrough he mul-user resource allocaon a each me slo. We propose o decompose hs weakly-coupled UDP problem usng Lagrangan relaxaon no mulple local DPs each of whch can be separaely solved by he ndvdual users. Ths decomposon s dfferen from he convenonal dual soluons [5] o he mul-user NUbased vdeo ransmsson problem n wo ways: () nsead of maxmzng he sac uly a each ransmsson me, our approach allows each wreless user o solve he dynamc cross-layer opmzaon problem (formulaed as he local DP), whch s val for he delay-sensve vdeo applcaons; () nsead of updang he Lagrangan mulplers only based on he curren resource requremens of all users, our approach updaes he mulplers based on no only he curren bu also fuure resource needs, such ha he long-erm vdeo qualy of all he users s maxmzed. We would lke o noe ha he Lagrangan relaxaon usng a scalar mulpler has been proposed n [20][2] o decompose a weakly-coupled UDP problem. However, he dualy beween he proposed relaxed problem and he orgnal UDP s no analyzed n hese exsng works. In hs paper, we

mahemacally derve he dualy and propose a sysemac way o compue he subgraden for updang he Lagrangan mulpler. Furhermore, based on our knowledge, we beleve ha hs s he frs aemp o formalze he mul-user vdeo communcaon problem usng UDP and decompose he UDP for auonomous, bu collaborave, vdeo users. To solve he local DP, each wreless user s requred o know he ranson probables of raffc saes and channel saes beforehand, whch s ofen dffcul o accuraely characerze before ransmsson me, especally for vdeo applcaons operang n dynamc mul-user neworks. Hence, he local DP canno be solved n pracce usng he mehod n [35] and he Lagrangan mulpler canno be updaed based on he subgraden mehod. Imporanly hough, he UDP framework provdes he necessary foundaons and prncples for how he users can auonomously learn on-lne o cooperavely opmze he global vdeo uly when he dynamcs are unknown. In pracce, o deal wh he unknown dynamcs, each wreless user wll deploy onlne renforcemen acor-crc learnng [27], and he nework coordnaor wll updae he resource prce dynamcally usng sochasc subgraden mehods [26]. Ths approach has wo advanages: () does no requre each user o know he sascal dsrbuon of channel condons and ncomng vdeo raffc; and, (2) he wreless user and he nework coordnaor need o perform only very smple compuaons durng each me slo. Unlke convenonal onlne learnng algorhms [27], whch ofen updae he learnng polcy for only one sae durng each me slo, our proposed learnng algorhm can updae mulple saes smulaneously durng each me slo. As demonsraed by our smulaon resuls, he proposed renforcemen learnng soluon sgnfcanly ncreases he learnng performance, whch leads o dramac mprovemens n he receved vdeo qualy. The paper s organzed as follows. Secon II defnes he raffc saes, he sae ranson and he uly funcon for each wreless user a each me slo. Secon III dscusses he dynamc opmzaon for he sngle user. Ths s a local problem solved by he ndvdual users n he mul-user vdeo ransmsson problem. Secon IV formulaes he mul-user vdeo ransmsson problem as an UDP. Secon V presens how he UDP can be decomposed no mulple local DPs usng he Lagrangan relaxaon mehod and develop he correspondng subgraden mehod o updae he resource prce. Subsequenly,

Secon VI descrbes he proposed dsrbued onlne learnng algorhm o deal wh he unknown vdeo characerscs and channel condons. Secon VII presens numercal resuls o valdae he proposed framework. The conclusons are drawn n Secon VIII. II. ODELS FOR HETEROGENEOUS VIDEO TRAFFIC Unlke radonal raffc models [33], whch only characerze he rae changes of vdeo raffc, n hs secon we am o develop a general model for represenng he encoded vdeo raffc wh heerogeneous characerscs (e.g. varous delay deadlnes, dsoron mpacs, dependences, ec.). Usng hs vdeo raffc model, we wll be able o dynamcally opmze he resource acquson and packe schedulng for vdeo ransmsson over me-varyng neworks. A. Arbues of DUs In hs subsecon, we dscuss how he heerogeneous arbues of he vdeo raffc 2 can be modelled. Ths wll be used o defne he raffc saes a each me slo n Subsecon B. The vdeo daa s encoded perodcally usng a Group of Pcures (GOP) srucure as n [22][23], whch lass a perod of T me slos. The vdeo frames whn one GOP are encoded nerdependenly usng moon esmaon, whle he frames belongng o dfferen GOPs are encoded ndependenly. Noe ha he predcon-based codng schemes can lead o sophscaed dependences beween he vdeo daa. Afer beng encoded, each GOP conans N daa uns (DUs), each of whch represenng one or mulple encoded frames (e.g. one of I, P, and B frames) 3. The arbues of he DUs are lsed below. Sze: The sze of DU j {,, N } s denoed as l j n packes 4 max, where lj [, lj ], and l j max s he maxmum allowable sze. The sze of DU j s he amoun of packes when DU j s generaed by he vdeo codec. For example, he sze of DU j a GOP g s dfferen from he sze of DU j a GOP g +. To smplfy he exposon, hs s modelled as an..d. random varable 5. Dsoron mpac: Each DU j has a dsoron mpac q j per packes, whch s assumed o be consan 2 Noe ha he vdeo raffc can be generaed n real-me or pre-encoded. 3 In hs paper, we consder ha he vdeo s encoded usng a fxed GOP srucure. However, he proposed raffc represenaon can also be adaped o he dynamc GOP srucure. 4 For smplcy, we assume n hs paper ha each packe has he same lengh, bu hs does no affec our proposed soluon. I jus smplfes our exposon gven he space lmaons. 5 The DU sze can also be modeled as a random varable dependng on he prevous DUs [28].

for all he GOPs. Delay deadlne: We defne he relave delay deadlne (RDD) of DU j as he dfference beween he delay deadlnes of DU and DU j n he same GOP and s denoed by d j (measured n me slos). Hence, d = 0 and dj T. If he delay deadlne for DU a he frs GOP s se o be d 0, hen he delay deadlne of DU j a he g -h GOP s d + d + ( g ) T. 0 j Dependency: The dependences beween he DUs whn one GOP are expressed as a dreced acyclc graph (DAG). The DAG remans he same for a fxed GOP srucure. One llusrave example of DAGs for vdeo daa s gven n Fgure. In hs paper, we assume ha, f DU j depends on DU j (.e. here exss a pah dreced from DU j o DU j and denoed by j j.), hen d j d j and qj q j. In oher words, DU j should be decoded pror o DU j and DU j has hgher dsoron mpac. B. Traffc sae represenaon a each me slo In hs subsecon, we model he vdeo raffc whch can be poenally ransmed a each me slo. A me slo, as n [2], we assume ha he wreless user wll only consder for ransmsson DUs wh delay deadlnes n he range of [, + W), where W s referred o as he schedulng me wndow (STW) and assumed o be gven a pror 6. In hs paper, we furher assume ha STW s chosen o sasfy he followng condon: f DU j drecly depends on DU j (.e. here s a drec arc from j o j ), hen d j d < W. Ths assumpon ensures ha DU j and j can be n one STW. We defne he raffc sae as he DUs whn he STW a me slo, whch s denoed by T = ( G, B ). In he raffc sae T, = { jg g, d0 + dj + ( g ) T [, + W ), Dependences of j expressed by g g DAG} G s called he dependency paern of DUs whn he STW. The dependency paern G represens he number of DUs ha can be poenally ransmed durng he nex me slo and he dependences beween hem. In he j example llusraed n Fgure, G = {,2,3 2 3, 3 } g g g g g g g g and 6 Noe ha STW can be deermned based on he channel condons experenced by he user a each me slo. For example, he wreless user may se he STW o be small when he channel condons are poor, and he STW o be large whenever he channel condon are good.

G = { 4,5, 4 5 }, where g s he GOP ndex. B = { b j G } represens he amoun of + g g g+ g g j packes a each DU avalable for poenal ransmsson a me slo. Noe ha b j l. From he j example n Fgure, we noe ha he ranson of G s deermnsc and perodc for a predeermned GOP srucure and hence, s arkovan and denoed by p G ( G G ). The raffc sae T s able o capure heerogeneous vdeo raffc and s a super-se of exsng well-known sngle-buffer models [9][0] (.e. whch gnore boh he packe dependences and delay deadlnes) or mul-buffer models [3][2][7][35] (.e. whch gnore he packe dependences or he delay deadlnes). C. Schedulng polcy Gven a ransmsson rae 7 r, he wreless user has o deermne he amoun of daa o be ransmed for each DU n G, hrough s schedulng polcy. The schedulng polcy maps he curren raffc sae and ransmsson rae r no he amoun of packes ransmed durng each DU, y = [ yj j G ] n he curren STW,.e. π ( T, r ) = y. Formally, he schedulng polcy π sasfes he followng condons 8 : T () Underflow consran: 0 yj bj, j G ; () Rae consran: j U y j r. The se of possble polces n each raffc sae T gven a ceran ransmsson rae r s denoed by P ( T, r ). D. Traffc sae ranson and mmedae reward In he followng, we dscuss he ranson of he raffc sae T, gven he ransmsson rae r. Frs, he ranson of he dependency paern G s p G ( G+ G ), whch s deermnsc and does no depend on he ransmsson rae r. In order o compue he ranson from B o B +, we separae G + no wo dsjon ses: G G + and G+ / G 9. I s clear ha G+ = ( G G+ ) ( G+ / G ). Frs, we consder he DUs n he se of G / G + ha wll expre before me slo +. In hs paper, we consder he case ha, f DU j G / G + has he remanng daa b j y j greaer han a ceran 7 The ransmsson rae can be deermned by he allocaed nework resource and ransmsson sraeges a he layers below he applcaon layer 8 Smlar consrans are also consdered n [29]. However, he auhors heren dd no consder he me-varyng ransmsson rae and foresghed packe schedulng decsons amed a maxmzng he long-erm vdeo qualy. 9 Here G+ / G = G+ G+ G.

hreshold 0 (say V j,.e. b j y j > V j ), hen all s descendans wll be undecodable (or useless) and hence, wll be dscarded for ransmsson. Le us denoe E = { j j G / G, b y V } o be he se of + j j j DUs ha wll expre before me slo + and wll also resul n unusable descendan DUs due o large amoun of daa los. We furher denoe ε = { j j G G, b = } o be he se of DUs whch are + j n G G +, bu have b j = (we use - o ndcae ha a ceran DU s useless.). Then, he ranson from B o B + s compued as: f A( j ) ( E ε ) bj bj yj f j G G+ and A( j) ( E ε ) = () lj f j G+ / G and A( j) ( E ε ) = Noe ha A ( j ) represens all he ancesors of DU j, ncludng self. The frs lne n Eq. () means ha DU j s unusable, because s ancesors are no successfully receved. The second lne means ha DU j s nhered from he prevous raffc sae,.e. DU j dd no expre a he curren momen. The hrd lne means ha DU j s a new DU enerng he STW. Snce he ranson of he dependency paern s deermnsc and he ncomng DUs are..d., s clear ha he ranson of B s arkovan as well and hence, he raffc sae ranson s arkovan. The ranson probably from B o B + s denoed as pb ( B B, G, G, y, r ), and depends on he dependency paern ranson, schedulng polcy and + + he avalable ransmsson rae. The raffc sae ranson can be rewren as p ( T T, y, r ) = p ( G G ) p ( B B, G, G, y,, r ). Gven he schedulng polcy T + G + B + + y = π( T, r ) and ransmsson rae r, he dsoron reducon experenced by he wreless user can be compued as u ( T, y, r ) = qj yj. (2) j U III. DYNAIC OPTIIZATION FOR A SINGLE USER 0 ore complcaed models can also be developed for he dependency mpac beween wo DUs. For example, an error propagaon funcon can be defned o capure he dependency mpac. However, n hs case, he error propagaon funcon should be ncluded no he raffc sae whch may dramacally ncrease he sae space. In hs paper, we only consder a smple dependency mpac model (.e. on-off model).

In hs secon, we frs consder he opmzaon of boh he packe schedulng and resource acquson for a sngle wreless vdeo user experencng a slow fadng wreless channel. In each me slo, he wreless user experences a channel condon h. We assume ha he channel condon h remans consan whn one me slo, bu vares across me slos. The changes of h can be modelled as a fne sae arkov chan (FSC) [24] wh he sae ranson probably gven by ph ( h+ h ), whch s ndependen of he raffc sae ranson. The ransmsson rae aaned by he wreless user s deermned by r ( h, x ), where x X represens he amoun of nework resource (e.g. he ransmsson me n he TDA-lke nework [25] as dscussed n Secon IV) acqured by he wreless user from he nework and X represens he se of possble resource allocaons. As we wll dscuss n Secon IV for he mul-user vdeo ransmsson, he resource acquson wll be affeced by oher users. The ransmsson rae funcon r ( h, x ) s assumed o be an ncreasng funcon of x, gven he channel condon h [36], and hereby, a larger x leads o a hgher ransmsson rae r. We defne he sae for he wreless user a me slo as s = ( T, h ) S, whch ncludes he vdeo raffc sae and channel sae, whch sasfes he arkovan propery snce boh he raffc sae and channel sae are arkovan. Then, he wreless user sae ranson s expressed by p( s+ s, y, x ) = p T ( T+ T, y, r ) ph ( h+ h ). (3) A each sae s, he wreless user akes he acons ncludng he resource acquson x and schedulng y, hereby leadng o he mmedae uly u ( s, y, x ) λ x s, where λ s s nerpreed as he resource prce as n [2]. Noe ha we express u ( T, y, r ( h, x )) as u ( s, y, x ) o emphasze ha he mmedae uly s a funcon of he sae s, schedulng acon y and allocaed me x. In hs secon, we assume ha λ s s deermned a pror. In Secon V, we wll dscuss how he resource prce can be deermned n a mul-user scenaro. The objecve of he wreless user s o maxmze s expeced dscouned accumulaed uly (we call hs sngle-user prmary problem (SUP),)

as follows : max v( s0) E α ( u ( s,, x ) λs x ) s0 ( s, x ) y P s0 S = 0 x 0, 0 y, (SUP) where α s he dscouned facor n he range 2 of [ 0, ), and v( s ) s he dsrbuon of he nal sae. 0 The reasons why we consder he dscouned accumulaed uly are: () for our consdered delaysensve applcaons, he daa needs o be sen ou as soon as possble o avod mssng delay deadlnes; and (2) snce a wreless user may encouner unexpeced envronmenal dynamcs n he fuure, may care more abou s mmedae reward raher han he fuure reward. Based on he dscusson n Secon II.D, he ranson of he sae s only depends on he curren acons x and y. Hence, he problem above can be formulaed as a DP. Noe ha unlke he prevous vdeo ransmsson soluons n []-[7], here we explcly ake no consderaon he heerogeneous characerscs of vdeo raffc (represened by he raffc saes) and me-varyng channel condons (represened as channel saes). Smlar o he work n [2], we opmze he rade-off beween he consumed resource and he receved reward n erms of dsoron reducon, bu focusng on a dynamc seng. The opmzaon of SUP s called he foresghed opmzaon for vdeo ransmsson snce consders he mpac of curren decsons on he fuure uly. Based on [9], he opmzaon of SUP can be solved usng he followng Bellman s equaons: [ ( ) ( ) ( )] U( s, λ) = max y P ( sx, ) u s, y, x λs x + αp s s, y, x U s, λ, (4) x 0 where U( s, λ ) s he opmal reward-o-go sarng a sae s, gven λ = [ λ ] s s S. The Bellman s equaons can be solved usng he value eraon or polcy eraon mehods. One soluon o solve he SUP s proposed n our prevous work [35]. IV. ULTI-USER WIRELESS VIDEO TRANSISSION FORULATION In Secon III, we have formulaed he dynamc opmzaon problem for he sngle-user vdeo ransmsson. In hs secon, we am o formulae he problem of mul-user vdeo ransmsson over a In hs formulaon, we nerchangeably express he admssble polcy as P ( T, ) r and P ( s, ) x. 2 Our soluons dscussed below are also applcable o he problem of maxmzng he average accumulaed uly by allowng α.

slow fadng wreless channel. We wll show ha he sngle-user vdeo ransmsson serves as a subproblem of he mul-user problem n Secon V, whch s llusraed n Fgure 2. The users are ndexed by {,, }, where s he number of users sharng he channel. A each me slo, he channel condon experenced by user s represened by h (he superscrp represens user and he same n he below), and he ranson of h s ndependen of he ransons of oher users channel condons. Recall ha n hs paper we assumed ha he mulple users access he shared channel usng he TDA-lke proocol [25]. Then, a each me slo, he poron of me allocaed o user s denoed by x [ 0,]. Due o he resource consran, he allocaons o all he users sasfy he followng nequaly: x,. (5) = In hs paper, we consder a collaborave mul-user vdeo ransmsson problem wh he goal of maxmzng he expeced dscouned accumulaed vdeo qualy of all he users under he sage resource consrans (we call hs problem he mul-user prmary problem wh sage resource consrans - UP/SRC ),.e. U v s E u s x s s * = max ( ) ( ) 0 (,, ) 0,, α 0, x, =,, y y s0 S,, s0 S = = 0 = s.. y P ( s, x ), x, 0 = (UP/SRC) where we assume ha he nal saes of he vdeo users are ndependen. Noe ha, n hs paper, we consder ha each user has he same dscouned facor 3. Based on he dscusson on he vdeo raffc represenaon n Secon II and he formulaon for he sngle-user vdeo ransmsson, he mul-user ransmsson problem n UP/SRC can be also formulaed as an UDP. Specfcally, we defne he sae of he mul-user sysem as s = ( s,, s ). The acon performed by each user s a = (( x, y ),, ( x, y )). I s easy o verfy ha 3 Snce we consder a collaborave mul-user vdeo ransmsson, for smplcy, we enforce ha each user has he same dscouned facor. However, n general, dfferen users may have dfferen dscouned facor, especally n he non-collaborave scenaros.

(, ) = (,, ) p s s a p s s y x, gven he resource allocaon x,. The reward a each me slo s = = gven by u = u ( s, y, x ). We noe ha, when α = 0 (.e. all he users make he myopc decson), he UDP problem reduces o he radonal mul-user NU-based resource allocaon problems for vdeo ransmsson [3]~[8]: u s x = ( y ) max,,. (6) s.. y s, x, x P ( ) However, we consder here he dynamc opmzaon for he mul-user vdeo ransmsson by akng no accoun he resource allocaon and correspondng schedulng across me (.e. α 0 ). From [9], we know ha, for hs mul-user DP problem, here s a leas one opmal saonary polcy ha only depends on he curren mul-user sysem sae. Hence, n hs paper, we resrc our focus o he saonary polces,.e. he polcy only depends on he curren sae. Then, solvng he maxmzaon problem n UP/SRC s equvalen o solvng he followng Bellman s equaons [9]: = k k k k s = max (, y, ) + (, y, ) ( s ), s, (7) U( ) u s x α p s s x U y P ( s, x ), =,, k = s = x = * and U v( s ) U( ) s S,, s S = 0 0 = 0 0 s. From hs Bellman s equaons, we have he followng observaons: To solve he Bellman s equaons, we can use he cenralzed value eraon or polcy eraon o fnd he opmal reward-o-go U ( s ) for he mul-user DP problem [9]. However, hs cenralzed soluon requres knowng all he users nformaon (sae spaces, acon spaces, ranson probables, and uly funcons) and also has a hgh compuaon complexy. Hence, hs cenralzed soluon s no applcable o he mul-user wreless vdeo ransmsson. The couplng among he mulple users vdeo ransmsson s only hrough he resource allocaon

performed a each me slo. For example, he opmal schedulng polcy performed by each user depends on he mul-user sysem sae hrough he resource allocaon x. Then, gven he resource allocaon x, he schedulng polcy s ndependen of oher users saes. Ths ype of UDP s referred as he weakly-coupled DP [20] and he decomposon no mulple local DPs s possble. In he nex secon, we wll dscuss how he mul-user DP problem can be decomposed when he resource allocaon s dynamc and depends on he mul-user sysem s sae. The relaonshps among proposed soluons are llusraed n Fgure 2. V. DUAL DECOPOSITION OF ULTI-USER DP In hs secon, we wll consder he dual problem of he mul-user DP by relaxng he per-sage resource consrans and show how we can decompose he UDP. Frs, n Subsecon A, we nroduce a per-sae Lagrangan mulpler assocaed wh he resource consran a each sae. Ths dual soluon leads o he zero dualy compared o he prmary problem UP/SRC, bu requres a cenralzed soluon snce he resource prce depends on mul-user sae whch canno be observed by each ndvdual user. Then, n Subsecon B, we mpose a unform resource prce, whch s ndependen of he mul-user sae. Wh hs unform resource prce, he UDP problem can be decomposed no mulple local DPs, whch represen a dynamc cross-layer opmzaon problem ha can be separaely solved by each ndvdual user. Ths decomposon s promsng snce () enables each user o opmze s packe schedulng and resource acquson ndependenly of oher users; and (2) he nework coordnaor only needs o smply updae he resource prce, whch nvolves only very few compuaons. A. Dual soluon wh per-sae resource prces A each sae s, we nroduce a Lagrangan mulpler λ s assocaed wh he resource consran ( x ) a each sae s. Then he dual funcon s gven by = λ U( ) max v( s ) ( ) 0 E α s λ = (,, ) u s x λ x 0 ( s, x ), x 0, y s + s0 S,, s0 S = = 0 = s, (8) y P =,,, 0

wh λ [ λ ]. As n Secon III, we can nerpre he Lagrangan mulpler λ s as he resource prce n = s sae s. We refer o hs as pre-sae resource prce. Then, λ s x s he cos user has o pay n sae s and λs s he amoun of revenue receved by he mul-user sysem by allowng he users o consume he resources (.e. access he wreless channel). However, we should noe ha, n our collaborave communcaons, he resource prce s used n order o effcenly allocae he lmed resource, nsead of maxmzng he revenue of he mul-user sysem. The mul-user dual problem wh he per-sae resource prce (UD/PSRP) s hen gven by U λ,* = mnu( λ ). λ 0 (UD/PSRP) The followng proposon proves ha he dual problem UD/PSRP has zero dualy gap compared o he prmary problem n UP/SRC and hus, he opmal me allocaon and schedulng polces correspondng o he opmal resource prce λ s a each sae are also he opmal polces o he prmary problem.,* * Proposon : U = U λ. Proof: Due o he lmed space, we om he proof of hs proposon. However, he proof can be easly performed by showng ha he opmal schedulng polcy leads o an objecve funcon whch s a concave funcon. Ths s because, n each me slo, he opmal schedulng polcy always ransms he packes resulng n he hghes ncrease n he long-erm uly. Gven he concavy of he objecve funcon and he compacness of he feasble resource allocaon space, we can prove ha he dualy gap s zero [26]. Smlar o he Bellman s equaons n Eq. (7) for he prmary problem UP/SRC, we have he followng Bellman s equaons correspondng o he dual funcon n UD/PSRP: y Y x 0 =,, k k k k ( λs λs ) α ( ) ( ) U( s, λ ) = max u ( s,, x ) x p s s,, x U, y + + y s λ, = s (9) s k = We noe ha, by seng α = 0, he Bellman s equaons above degrade o he dual soluons [3][5]

o he convenonal mul-user vdeo ransmsson as shown n Eq. (6). The degraded Bellman s equaons can be decomposed no mulple sub-equaons, each correspondng o one user, by leng he user know he resource prce. However, n general, hs Bellman s equaon canno be decomposed no ndependen subproblems whch can be auonomously solved by each user, snce he Bellman s equaons are coupled hrough he resource prce λ s, whch vares wh he sae of he mul-user sysem. Hence, a cenralzed soluon has o be deployed by he nework coordnaor, whch requres all he users nformaon, as n he prmary soluon o UP/SRC. B. Dual soluon wh unform resource prce In Subsecon A, we assumed ha, dependng on he sae of he mul-user sysem, a dfferen resource prce s deermned. However, he drawback of hs s ha he Bellman s equaons n Eq. (9) canno be decomposed, hereby requrng a cenralzed soluon. In hs subsecon, we consder a scenaro where he same prce (referred o as unform resource prce ) s mposed n all he saes of he muluser sysem,.e. λ = λ, s. Then, he dual funcon s gven by s ( ) λ U( λ) = max ( ) ( ) 0 (,, ) v s E α u s x λx 0 ( s, x ), x 0, y + s s0 S,, s0 S = = 0 =. (0) y P =,,, 0 By mnmzng over he unform resource prce λ, we have he mul-user dual problem wh unform resource prce (UD/URP): U λ,* = mnu( λ ) (UD/URP) λ 0 Ineresngly, by seng he unform resource prce, he dual problem UD/URP s no dual o he prmary problem n UP/SRC. Insead, s dual o he followng problem: Uˆ max v s E u s,, x s,, s * = ( ) ( ) 0 ( ) α 0 0, x, =,, y y s0 S,, s0 S = = 0 = s.. y (, ), ( ) P s x α ( x ) 0 = 0 = (UP/ARC) We call hs opmzaon he mul-user vdeo ransmsson opmzaon wh accumulaed resource consran (UP/ARC). The dualy beween UD/URP and UP/ARC can be easly verfed. Smlar o Proposon, we can prove ha he dualy gap beween UD/URP and UP/ARC s zero.

We furher noce ha he resource consran n he prmary problem UP/SRC sasfes he followng condon: ) x,,,, 0,,,, 0 ) = x x = α ( x 0 = = 0 = (, () whch means ha he feasble resource allocaons n he UP/SRC s a subse of he feasble resource allocaons n he UP/ARC. Then, comparng o he dual soluon wh sae-wse prces, we have he,* followng proposon whch shows ha U λ serves as an upper bound of he opmal value for he prmary problem. λ,* * *,* Proposon 2: U = Uˆ U = U λ. Ths proof s based on he fac ha here s no dualy gap n he wo prmary-dual problems (.e. UP/SRC and UD/PSRP, UP/ARC and UD/URP) and ha he feasble resource allocaons n UP/SRC s a subse of he feasble allocaons n UP/ARC as shown n Eq. (). The deals of he proof are omed due o he lmed space. The Bellman s equaons corresponds o he dual funcon n Eq. (0) can be decomposed no local Bellman s equaon, each correspondng o one user, whch s presened n he followng heorem. Theorem 2: Gven λ = λ, s, he opmzaon n Eq. (0) s gven by s ( λ) ( ) (, λ) U = v s U s, (2) = s0 0 0 wh U ( s, λ ) sasfyng he followng local Bellman s equaon: U ( s, λ) = max u ( s, y, x ) λx λ α p( s s,, x ) U ( s, λ ) y, x + + y (3) s Proof: The key dea o prove hs s ha, by nroducng he unform resource prce, he uly funcons and sae ranson probables of all wreless users are separable whch makes he reward-ogo funcons separable. The deals can be seen n Appendx A. The key resul of Theorem 2 s ha U ( s, λ ) can be decomposed no local Bellman s equaons, whch can be solved n a dsrbued fashon. Each user can solve s own Bellman s equaons (and accordngly solve s own cross-layer opmzaon problem) provded he resource prce λ. Ths local

Bellman s equaons correspond o he local DP dscussed n Secon III. Nex, we dscuss how he resource prce can be updaed. Gven he resource prce λ, each user can solve s own Bellman s equaons usng e.g. value eraon, whch resuls n he opmal resource,*,* allocaon x ( s, λ ) and schedulng polcy y ( s, λ ). Noe ha he resource acquson s ndependen of oher users sae gven he unform resource prce. In he followng proposon, we formally compue he subgraden wh respec o he resource prce λ for he dual problem UD/URP, whch wll be used o updae he resource prce n each eraon. Proposon 3: The subgraden wh respec o λ s gven by Z, (4) α = where Z T = v ( s 0 ) e ( I P ) x ( λ ) s he expeced dscouned accumulaed resource consumpon 0 s s 0 (noe ha he expecaon s aken over all he possble sample pahs), and P s he sae ranson probably marx, and e s s he vecor wh he s componen beng and ohers beng zero. Proof: The key dea s o show Z sasfes he subgraden defnon [26]. The deals can = α be found n Appendx B. Usng he subgraden mehod, he resource prce s hen updaed as follows: k k k + λ = λ β Z + = α (5) We noce ha he subgraden compued n Eq. (4) accouns for no only he curren resource consran, bu also he fuure consrans snce UDP ams o maxmze he long-erm uly. The subgraden mehod shown n Eq. (5) converges o he opmal dual soluon due o he concavy of he objecve funcon. The advanages of he decomposon developed above for mul-user vdeo ransmsson are summarzed as follows: Gven a unform resource prce, each wreless user can solve s own local DP ndependenly of oher users. Ths enables us o decompose he consder mul-user vdeo ransmsson problem by +

enablng each user o auonomously opmze s he packe schedulng and resource acquson. Ths decomposon allows he nework coordnaor o smply updae he scalar resource prce as shown n Eq. (5). Furhermore, he proposed approach only requres wo scalar messages 4 (as shown n Fgure 3) o be exchanged beween he wreless users and he nework coordnaor. Ths sgnfcanly smplfes he desgn of he nework coordnaor (e.g. access pons) and reduces he cos of buldng a wreless nework o suppor vdeo applcaons. Unlke he dual soluon o he NU-based mul-user vdeo ransmsson [5], n whch he nework coordnaor has o fnd he opmal resource allocaon o all he users before each user opmzes he packe schedulng, our approach allows he resource allocaon and packe schedulng opmzaon o be performed smulaneously. Ths sgnfcanly reduces he ncurred delay, snce each user does no need o wa for he opmal resource allocaon. Prevously, we enforced a unform prce for all he saes, whch enables a decomposon n he dual funcon compuaon and provdes an upper bound on he uly funcon U * λ,* * (Recall ha U U ). However, he soluon o he dual problem may be nfeasble (.e. volang he resource consran n each me slo). We noe ha he opmal allocaon can be obaned by solvng a one-sage mul-user resource allocaon problem (e.g. as n [20]) n each me slo. However, hs agan requres hgh compuaonal and communcaon complexy. In order o avod he resource consrans beng volaed,.e. *, λ = x ( s ) >, we proporonally scale down he resource allocaon as follows: xˆ *, λ ( ) *, λ = j = *, λ n order o sasfy he resource consran,.e. x ( s ) ( s ) s x, (6) * j, λ j x = ( s ) ˆ =. Ths scalng can be performed by he nework coordnaor: a he begnnng of each me slo, he users subm he requred resource *, λ x ( s ) o he nework coordnaor, and he coordnaor performs he resource allocaon scalng, f he resource consran s volaed. Afer he scalng, he nework coordnaor polls he users accordng o he scaled 4 In proocol desgn, oher messages lke he hand-shakng messages are needed for he successful ransmsson. However, we gnore hs ype of messages snce hey are ndependen of our proposed algorhms.

resource allocaon [25]. VI. DISTRIBUTED ODIFIED ACTOR-CRITIC LEARNING In Secon V, we have dscussed how he UDP can be decomposed such ha each user can auonomously deermne s opmal ransmsson sraegy, usng he unform resource prce. However, when mplemenng hs soluon n pracce, we sll face he followng dffcules: () n he dsrbued soluon, each user sll has o solve s own local DP problem for each updaed resource prce, whch sll leads o a very hgh compuaon complexy for each user; () he channel saes and ncomng DUs dynamcs are ofen dffcul o characerze a pror, such ha he sngle-user DP canno be solved onlne; () whou beng able o opmally solve he local DP, he subgraden n Eq. (4) canno be compued. To address hese challenges, we am n hs secon a developng an onlne learnng algorhm. Specfcally, we deploy a modfed acor-crc learnng algorhm [27] o solve he sngle-user DP onlne and he sochasc subgraden mehod [26] o updae he unform resource prce. The advanages of he onlne learnng are: a each me slo, he wreless user only needs o perform lmed compuaons (.e. ncurs a low compuaon complexy); and, he onlne learnng does no requre a pror knowledge of he channel saes and ncomng DUs dynamcs. A. Updang sae-value funcon and resource acquson polcy To perform he acor-crc learnng algorhm for mul-user vdeo ransmsson, as shown n Fgure 5, each wreless user needs o mplemen wo componens: he acor and he crc. The acor s endowed wh a resource acquson polcy represenaon ( s, x, ) ρ λ +, whch ndcaes he endency o selec he resource acquson acon x a he sae s,.e. he hgher ρ ( s, x, λ ) s, he larger he probably of selecng he acon x should be. The crc s endowed wh he sae value funcon U ( s, λ ), whch s used o evaluae he resource acquson polcy updaed by he acor, and he hgher U ( s, λ ) s, he hgher long-erm uly he polcy wll provde. To evaluae he polcy, he crc wll keep updang he sae value funcon, whch s smlar o he polcy evaluaon n he polcy eraon algorhm [27]. A each me slo, he wreless user selecs a resource acquson acon x based on he followng

sofmax mehod [27]: ( s ) ρ( s, x ) e x ρ( s, x e ) x π x =. (7) Ths resource acquson x s hen submed o he nework coordnaor whch reurns he rue resource allocaon x ˆ based on Eq. (6) (hrough pollng [25]). Wh he schedulng polcy y ( s, xˆ ) presened n Subsecon B, he wreless user can receve he reward by he amoun of (, y, ˆ ) u s x λxˆ and hen moves o he nex sae s + by observng he channel condon h + and ncomng daa [ vj j G+ / G ]. The convenonal acor-crc learnng algorhm performs he followng wo operaons: sae-value funcon updae: U ( s, λ) U ( s, λ) + μ ( G, h ) δ ( s, xˆ, λ) ; (8) resource acquson polcy updae: ρ( s, x, λ) ρ( s, x, λ) + ν ( G, h, x ) δ ( s, xˆ, λ), (9) where μ ( G, h ) and ν ( G, h, x ) are dmnshng sep-szes and δ ( s, x, λ ) s he me-dfference error whch s compued as follows: δ ( s, xˆ, λ) = u ( s,, xˆ ) λxˆ + αu ( s, λ) U ( s, λ) y. + In hs paper, we separaely updae he schedulng polcy and he resource acquson polcy whch wll allow us o updae mulple saes n one me slo nsead of one sae a one me slo as performed n he convenonal on-lne learnng [27]. The deals are dscussed below. We frs defne he assocaed saes, whch are he saes havng he same dependency paern and sharng he same channel condon as he curren sae. These saes are denoed as s ( s ) {( T, h ) T = ( G, B), B}. Gven he resource allocaon x ˆ and he curren channel condon h, as dscussed n Subsecon B, he schedulng polcy can be compued ndependenly of he nex channel condon h + and ncomng DUs { G / G }. j + Assumng ha he wreless user s now n he assocaed sae s ( s ) nsead of s, we are agan able o compue he schedulng polcy y ( s ( s ), xˆ ) and vrually ransm he packes (.e. no acually ransm hem), whch resuls n he vrual reward of (,, ˆ ) u s y x λxˆ.

Besdes beng able o updae he sae-value funcon and resource acquson polcy only n he curren sae s, we are also able o updae hem n he assocaed sae s ( s ) usng Eqs. (8) and (9), wh he me-dfference error compued as: ( s ( s ), xˆ, ) u ( s ( s ), ˆ, xˆ ) xˆ U ( s = y + ( s ), ) U ( s ( s ), ), where s ( s ) s he nex δ λ λ γ λ λ sae ransng from he assocaed sae ( ) s s. The updae of he sae-value funcon and resource acquson polcy n all he assocaed saes (ncludng he curren sae) can sgnfcanly ncrease he learnng performance (.e. ncreases faser he receved dsoron reducon) as compared o he sandard on-lne learnng algorhm. In Secon VII.C, we verfy hs hrough he concree smulaon. B. Greedy schedulng polcy From he Bellman s equaons shown n Eq. (4), we noe ha he opmal schedulng polcy a each me slo can be compued as: (, ) arg max (,, y s λ = u ) (, ) s y x λx + αu s λ y ( s, x ) P (20) where s s he pos-decson sae defned as s = (( bj yj j G G + ), h ), whch keeps he DUs whch dd no expre ye n s. U ( s, λ ) s he average sae-value funcon whch s compued as U ( s, λ) ph ( h h ) p( bj ) U (( T, h ), λ), (2) j G G = h H bj, j G / G where T = [( b y j G G ),( b j G / G )] s he nex raffc sae ransng from he j j + j + pos-decson sae s. I s clear ha U ( s, λ ) does no depend on he channel sae ranson and ncomng new DUs. Hence, gven U ( s, λ ), y ( s, λ ) can be compued ndependenly of he nex channel condon and ncomng new DUs. However, snce we do no know he dsrbuons of channel sae ranson and ncomng DUs, we canno drecly compue U ( s, λ ) as n Eq. (2). Insead, we updae U ( s, λ ) as follows: +/ U s s U s s U s (22) (, λ) ( ϕ ( )) (, λ) + ϕ ( ) (, λ) +

where ϕ ( s ) s a dmnshng sep-sze. The key dea of he updae s ha, nsead of compue U( s, λ ) as n Eq. (2), we use one realzaon of U ( s + λ ) o represen U ( s, λ ) and updae U ( s, λ ) averagng all he pas realzaons. The learnng procedure s furher llusraed n Fgure 5., by C. Sochasc subgraden-based resource prce updae From Secon V.B, we noce ha he subgraden of he dual problem wh unform prce s compued as n Eq. (4) whch s he expeced dscouned accumulaed resource consumpon. Snce each wreless user does no know he ranson probably, we only use he realzed sample pah o esmae he subgraden of he dual problem (.e. usng he sochasc subgraden). Specfcally, we updae he Lagrangan mulpler as follows: λk+ = λk κ k α + = = 0 α + ( ) ( x ) (23) where ( ) α ( x ) s he sochasc subgraden approxmang he subgraden Z and κ k s a = 0 dmnshng sep-sze. However, n pracce, we canno wa for an nfne me o updae he Lagrangan ( k+ ) K mulpler. Insead, we updae he mulpler every K me slos,.e. we use Z ( α) kk ( x ) = nsead of ( ) α ( x ). The proposed onlne learnng algorhm s llusraed n Fgure 6. = 0 = kk VII. SIULATION RESULTS In hs secon, we presen smulaon resuls hghlghng he effcency of he proposed sngle-user and mul-user vdeo ransmsson soluons compared o exsng soluons. A. Sngle-user vdeo ransmsson To compress he vdeo daa, we used a scalable vdeo codng scheme [23], whch s aracve for wreless sreamng applcaons because provdes on-he-fly applcaon adapaon o channel condons, suppor for a varey of wreless recevers wh dfferen resource capables and power consrans, and easy prorzaon of varous codng layers and vdeo packes. In hs secon, we consder one user ransmng he vdeo sequence Coasguard (CIF resoluon, 30 Hz) over he me-varyng

wreless channel, whch s modelled as a FSC wh egh channel saes (0dB, 5dB, 8dB 20dB, 23dB, 25dB, 28dB and 30dB, respecvely). We compare our proposed approach usng he foresghed schedulng and resource acquson polces o a sae-of-he-ar dsoron mpac -based packe schedulng soluon, whch only maxmzes he curren vdeo qualy of he frames whn he STW, and we refer o hs soluon as he convenonal prory-based soluon [2][3][4]. Fgure 7 shows he average dsoron reducon (compued as n Eq. (2)) experenced by he user under varous resource consrans (.e. ransmsson me allocaed o he user). From hs fgure, we noce ha, o receve he same dsoron reducon, our proposed soluon can save 5%~20% of he resources. The mprovemen s due o he fac ha our soluon consders he mpac of he envronmenal dynamcs and foresghedly schedules he vdeo daa for ransmsson. We furher noe ha he dfference beween our proposed soluon and convenonal prory-based soluons becomes smaller when he nework resource s eher very scarce or plenful. Ths can be explaned as follows. On one hand, when he nework resource s very scarce, our proposed soluon only schedules he mos mporan daa (e.g. low-frequency frames n he wavele-based encoded daa), whch s he same as he dsoron-mpac -based polcy. On he oher hand, when he nework resource s plenful, boh polces can ransm all he vdeo daa. However, n he mos usage scenaros, when he resources are neher plenful nor very scarce (n whch case he users may no ransm anyway), Fgure 8 shows ha our approach mproves he vdeo qualy by.5db n erms of Peak Sgnal-o-Nose Rao (PSNR). B. Dual soluons wh unform prce In hs secon, we wll verfy he convergence of he dual soluon wh unform prce o he proposed UDP. We wll furher compare he performance of our approach o ha of he convenonal muluser dual soluon. We frs consder hree wreless users: User sreams he vdeo sequence Foreman (CIF resoluon, 30 Hz), User 2 sreams he vdeo sequence Coasguard (CIF resoluon, 30 Hz) and User 3 sreams he vdeo sequence oble (CIF resoluon, 30 Hz). We compare our proposed dual soluon wh unform prce o he convenonal dual soluon [5] based on he NU framework. Fgure 9 shows he convergence of he resource prces wh varous nal prce selecons. We noce ha, our

proposed dual soluon wh unform prce shows much faser convergence (less han 25 eraons) han he convenonal dual soluon (havng more han 00 eraons). We also noe ha our soluon converges o a lower resource prce han he convenonal one. Ths s because ha he convenonal soluon myopcally maxmzes he vdeo ransmsson over each me slo. Hence, o acheve a feasble resource allocaon, has o ncrease s resource prce o ensure ha he resource allocaons over all he saes are feasble (correspondng o he wors case scenaro.) However, n our soluon, we relax he sage resource consrans no he accumulaed resource consran (shown n he problem of UP/ARC) and he resource prce s reduced. However, we scale down he resource acquson a each mul-user sysem sae o enforce he feasble allocaon. Fgure 0 shows he dsoron reducon of each user wh varous nal prce selecons. I demonsraes ha our proposed soluon gans hgher dsoron reducon, whch s furher verfed by he receved PSNR shown n Fgure (.e. User receves 0.5dB hgher PSNR, User 2 receves db hgher PSNR and User 3 receves.db hgher PSNR). The mprovemen s due o he foresghed decsons n our soluon, as compared o he myopc decsons n he convenonal NUbased one. C. Onlne learnng In hs secon, we wll verfy he convergence rae of our proposed onlne learnng algorhm and correspondng mpac on he vdeo ransmsson. We also compare our algorhm o he convenonal onlne learnng algorhm [27], whch s ofen used o mprove he wreless ransmsson sraeges wh unknown dynamcs [37]. We consder hree wreless users sreamng vdeo sequences as n Secon VII.B. Dfferen from he sengs n Subsecon VII.B, we assume ha all he users nally do no have any sascal nformaon abou he channel condons and ncomng daa, hereby no knowng he sae ransons. Usng he proposed onlne learnng, he wreless users keep mprovng her own resource acquson polcy. The resource prce s updaed every 00 me slos. Fgure 2 shows he average dsoron reducon receved by each user deployed wh he proposed onlne learnng and he sandard onlne learnng, separaely. From hs fgure, we noce ha, compared o he convenonal learnng algorhm, our proposed mehod can sgnfcan ncrease he learnng curve (.e. sgnfcanly ncreasng

he average receved dsoron reducon) and dramacally mprove he receved dsoron reducon over me. Fgure 3 shows he receved vdeo qualy (n erms of PSNR) of each user over me when usng hese wo learnng algorhms. Ths fgure furher confrms ha our proposed learnng algorhm can mprove he vdeo qualy of all he users over me. On average, our proposed algorhm mproves he vdeo qualy of User by 0.9 db, User 2 by.2 db and User 3 by.4db n erms of PSNR. Ths mprovemen s due o he fac ha our proposed approach can updae he polcy a mulple saes durng one me slo and hence, exhbs a fas convergence rae. VIII. CONCLUSIONS Unlke he convenonal formulaons for he vdeo ransmsson over me-varyng wreless neworks, we sysemacally formulae he dynamc mul-user vdeo ransmsson as an UDP problem o explcly consder he heerogeneous vdeo daa and dynamc wreless nework condons. Ths DP formulaon allows he wreless users o make foresghed decsons n order o maxmze he long-erm uly (.e. vdeo qualy) nsead of he mmedae reward, whch s essenal for vdeo applcaons. The proposed dsrbued dynamc opmzaon approach usng Lagrangan relaxaon wh an unform resource prce allows each wreless user o maxmze s own vdeo qualy gven he resource prce. To deal wh he unknown vdeo characerscs and channel condons, and o reduce he compuaon complexy for each user, a novel onlne renforcemen learnng algorhm has been developed, whch allows he wreless users o updae her ransmsson polcy n mulple saes durng one me slo, hereby sgnfcanly accelerang he learnng speed and mprovng he receved vdeo qualy. Appendx A. Proof of mul-user Bellman s equaon usng unform prce Proof: We prove hs by nducon. We defne

wh U0 ( s, λ ) = max u ( s, y, x ) λx + λ y Y = x 0 = max u s,, x x + = U s, ( y ) λ λ ( λ ) = y Y = x 0 0 s U ( s, λ ) = max u ( s, y, x ) λx + λ. 0 s y Y x 0 Smlarly, Wh 0 y Y = s k = = x 0 =,, = max u ( s,, x ) λx + λ + α p( s s,, x ) U0 ( s, λ) y = y s y Y x 0 =,, k k k k U ( s, λ ) = max u (,, ) (,, ) (, s x λx λ α p s s x U s λ + + y y ) = max u s,, x x + + p s s,, x U s, s = = y Y x 0 = U ( s, λ ) ( y ) λ λ α ( y ) 0 ( λ) Recursvely, we have U ( s, λ) = max u (,, ) (,, ) (, s y x λx + λ + α p s s y x U s λ). 0 y Y s x 0 Where ( s, λ) = lm ( s, λ) = lm (, λ) = lm (, λ) = (, λ) U U U s U s U s n n n n n n = = = U ( s, λ) = max u ( s, y, x ) λx λ α p( s s,, x ) U (, s λ ) y, x + + y. s B. Proof of subgraden for unform prce,*,* Proof: For each gven λ, suppose ha x ( s, λ ) and ( s, λ ) y, =,, maxmze he dual Bellman s equaons n Eq. (3) and hence, maxmze he objecve n Eq. (0). Then, we have

λ,* λ,* U ( s) = max u ( s,, x ) λx λ α p( s s,, x ) U ( s y + + ), x y = y s λ,* u ( s, ( s, λ), x ( s, λ) ) λx λ α p( s s,, x ) U ( s y + + ) = y s u ( s, y ( s, λ), x ( s, λ) ) λx ( s, λ) + λ ( λ λ )( x ( s, λ) + ) + = λ,* = α p( s s,, x ) U ( s y ) s,* Recursvely applyng hs nequaly no U ( s ) λ, we furher have U λ,* = s ( ) Fnally, we have u ( s, y ( s, λ), x ( s, λ) ) λx ( s, λ) + λ + α p( s s, ( s, λ), x ( s, λ) )( u ( s, ( s, λ), x ( s, λ) ) λx ( s y y, λ) + λ ) + s. ( λ λ ) ( x ( s, λ) ) + α p( s s,, x )( x ( s, λ ) y ) + s 2 α p( s s, y ( s, λ), x ( s, λ) ) p s ( s λ,*, y ( s, λ), x ( s, λ) ) U ( s ) s s ( ) λ ( s) ( s) ( λ λ ) α ( ) (, λ ) λ,*,* U U + P x s s s = 0 s =,* λ U ( ) ( λ λ ) α ( P ) ( x ( s, λ )) = s + α,* λ = U ( s) + ( λ λ ) ( I P ) ( λ) e x α ( ), = s s = s = 0 T s = where P p s s, y ( s, λ), x ( s, λ), ( P ) 0 s S s S wh he s componen beng and ohers beng zero. s = and ( P )0 = 0, e s s a vecor T Hence, he subgraden wh respec o λ s s gven by ( I P ) ( λ ) e x s α. = s s s REFERENCES []. van der Schaar, and S. Shankar, Cross-layer wreless mulmeda ransmsson: challenges, prncples, and new paradgms, IEEE Wreless Commun. ag., vol. 2, no. 4, Aug. 2005. [2] P. Chou, and Z. ao, Rae-dsoron opmzed sreamng of packezed meda, IEEE Trans. ulmeda, vol. 8, no. 2, pp. 390-404, 2005. [3]. van der Schaar, and D. Turaga, Cross-Layer Packezaon and Reransmsson Sraeges for Delay-

Sensve Wreless ulmeda Transmsson, IEEE Transacons on ulmeda, vol. 9, no., pp. 85-97, Jan., 2007. [4] C. D. Vleeschouwer, and P. Frossard, Dependen packe ransmsson polces n rae-dsoron opmzed meda schedulng, IEEE Transacons on ulmeda, vol. 9, no 6, Ocober 2007, pp. 24-258. [5] Z. L, F. Zha, and A.K. Kasaggelos, "Jon Vdeo Summarzaon and Transmsson Adapaon for Energy- Effcen Wreless Vdeo Sreamng," EURASIP Journal on Advances n Sgnal Processng, specal ssue on Wreless Vdeo, vol. 2008. [6] Y. J. Lang and B. Grod, "Nework-Adapve Low-Laency Vdeo Communcaon over Bes-Effor Neworks," IEEE Transacons on Crcus and Sysems for Vdeo Technology, vol. 6, no., pp. 72-8, January 2006. [7] R. Hamzaou, V. Sankovc, and Z. Xong, Opmzed error proecon of scalable mage bsreams, IEEE Sgnal Processng agazne, vol. 22, pp. 9-07, November 2005. [8] B. Lamparer, A. Albanese,. Kalfane, and. Luby, PET-prory encodng ransmsson: a new, robus and effcen vdeo broadcas echnology, Proc. of AC ulmeda, 995. [9] R. Berry and R. G. Gallager, Communcaons over fadng channels wh delay consrans, IEEE Trans. Inf. Theory, vol 48, no. 5, pp. 35-49, ay 2002. [0] Q. Lu, S. Zhou, and G. B. Gannaks, Cross-layer combng of adapve modulaon and codng wh runcaed ARQ over wreless lnks, IEEE Trans. Wreless Commun., vol. 4, no. 3, ay 2005. [] W. Chen, U. ra, and. J. Neely, Energy-Effcen Schedulng wh Indvdual Packe Delay Consrans over a Fadng Channel, Wreless Neworks, DOI 0.007/s276-007-0093-y. [2] T. Hollday, A. Goldsmh, and P. Glynn, Opmal Power Conrol and Source-Channel Codng for Delay Consraned Traffc over Wreless Channels, Proceedngs of IEEE Inernaonal Conference on Communcaons, vol. 2, pp. 83-835, ay 2002. [3]. Chang, S. H. Low, A. R. Caldbank, and J. C. Doyle, Layerng as opmzaon decomposon: A mahemacal heory of nework archecures, Proceedngs of IEEE, vol. 95, no., 2007. [4] X. Zhu, P. Agrawal, J. P. Sngh, T. Alpcan, and B. Grod, Rae Allocaon for ul-user Vdeo Sreamng over Heerogenous Access Neworks, Proc. AC ulmeda, 07, Augsburg, Germany, Sepember 2007. [5] J. Huang, Z. L,. Chang, and A.K. Kasaggelos, "Jon Source Adapaon and Resource Allocaon for ul-user Wreless Vdeo Sreamng," IEEE Trans. Crcus and Sysems for Vdeo Technology, vol. 8, ssue 5, 582-595, ay 2008. [6] E. aan, P. Pahalawaa, R. Berry, T.N. Pappas, and A.K. Kasaggelos, "Resource Allocaon for Downlnk uluser Vdeo Transmsson over Wreless Lossy Neworks," IEEE Transacons on Image Processng, vol. 7, ssue 9, 663-67, Sepember 2008. [7] G-. Su, Z. Han,. Wu, and K.J.R. Lu, Jon Uplnk and Downlnk Opmzaon for Real-Tme uluser Vdeo Sreamng Over WLANs, IEEE Journal of Seleced Topcs n Sgnal Processng, vol., no. 2, pp. 280-294, Augus 2007. [8] F. Fu and. van der Schaar, "Noncollaborave Resource anagemen for Wreless ulmeda Applcaons Usng echansm Desgn," IEEE Trans. ulmeda, vol. 9, no. 4, pp. 85-868, Jun. 2007. [9] D. P. Bersekas, Dynamc programmng and opmal conrol, 3 rd, Ahena Scenfc, Belmon, assachuses, 2005. [20] J. Hawkns, A Lagrangan decomposon approach o weakly coupled dynamc opmzaon problems and s applcaons, PhD Dsseraon, IT, Cambrdge, A, 2003. [2] D. Adelman, and A. J. ersereau, Relaxaon of weakly coupled sochasc dynamc programs, Operaons Research, vol. 56, no. 3, pp. 72-727, ay-june 2008. [22] T. Wegand, G. J. Sullvan, G. Bjonegaard, A. Luhra, Overvew of he H.264/AVC vdeo codng sandard, IEEE Transacons on Crcus and Sysems for Vdeo Technology, vol. 3, no. 7, pp. 560-576, July, 2003. [23] J.R. Ohm, Three-dmensonal subband codng wh moon compensaon, IEEE Trans. Image Processng, vol. 3, no. 5, Sep 994. [24] Q. Zhang, S. A. Kassam, Fne-sae arkov odel for Reylegh fadng channels, IEEE Trans. Commun. vol. 47, no., Nov. 999.

[25] IEEE 802.e/D5.0, wreless medum access conrol (AC) and physcal layer (PHY) specfcaons: edum access conrol (AC) enhancemens for Qualy of Servce (QoS), draf supplemen, June 2003. [26] D. P. Bersekas, Nonlnear programmng, Belmon, A: Ahena Scenfc, 2nd Edon, 999. [27] R. S. Suon, and A. G. Baro, Renforcemen learnng: an nroducon, Cambrdge, A:IT press, 998. [28] D. S. Turaga and T. Chen, "Herarchcal odelng of Varable B Rae Vdeo Sources," Packe Vdeo, 200. [29] Q. L, Y. Andreopoulos, and. van der Schaar, "Sreamng-Vably Analyss and Packe Schedulng for Vdeo over QoS-enabled Neworks," IEEE Trans. Veh. Technol., vol. 56, no. 6, pp. 3533-3549, Nov. 2007. [30] D. Djonn and V. Krshnamurhy, Transmsson Conrol n Fadng Channels -- A Consraned arkov Decson Process Formulaon wh onoone Randomzed Polces, IEEE Transacons Sgnal Processng, Vol.55, No.0, pp. 5069--5083, Ocober 2007. [3] D. Djonn and V. Krshnamurhy, Q-Learnng Algorhms for Consraned arkov Decson Processes wh Randomzed onoone Polces: Applcaons o IO Transmsson Conrol, IEEE Transacons Sgnal Processng, Vol.55, No.5, pp.270--28, 2007. [32] D.. Topks, Supermodulary and complemenary, Prnceon Unversy Press, Prnceon, NJ, 998. [33] A. Rebman, and A. Berger, Traffc descrpors for VBR vdeo eleconferencng over AT neworks, IEEE/AC Transacons on Neworkng, vol. 3, no. 3, pp. 329-339, 995. [34] V. Borkar, and V. Konda, The acor-crc algorhm as mul-me-scale sochasc approxmaon, Sadhana, vol. 22, par 4, pp. 525-543, Aug. 997. [35] F. Fu and. van der Schaar, "A New Sysemac Framework for Auonomous Cross-Layer Opmzaon," IEEE Trans. Veh. Tech., o appear. [36] D. Bersekas, and R. Gallager, Daa neworks, Prence Hall, Inc. Upper Saddle Rver, NJ, 987. [37] C. Pandana, and K. J. R. Lu, Near-opmal renforcemen learnng framework for energy-aware sensor communcaons, IEEE J. Selec. Areas Commun. vol. 23, no. 4, Aprl, 2005.

Fgures and Tables Fgure. DAG-based dependences and raffc saes a each me slo usng IBPBP GOP srucure Fgure 2. Relaonshp beween he varous proposed soluons for he consdered mul-user DP problem Fgure 3. essage exchange n he dual mehod wh unform prce for he consdered mul-user wreless vdeo ransmsson

Fgure 4. Assocaed sae ranson n he onlne learnng gven he schedulng and resource allocaon Fgure 5. Procedures of he modfed Acor-Crc learnng whn one user Fgure 6. Flowchar of he onlne learnng algorhm

Fgure 7. Dsoron reducon vs. consumed resource usng varous approaches Fgure 8. Receved PSNR usng varous approaches

Fgure 9. Convergence of he dual soluons under varous nal resource prce selecon Fgure 0. Dsoron reducon of each user wh he dual soluons under varous nal resource prce selecon

Fgure. PSNR of sreamed vdeo sequences by each user wh varous soluons Fgure 2. Learnng curves of each user wh dfferen onlne learnng algorhms

Fgure 3. Receved PSNR of each user wh dfferen onlne algorhms