Foresighted Resource Reciprocation Strategies in P2P Networks

Size: px

Start display at page:

Download "Foresighted Resource Reciprocation Strategies in P2P Networks"

Byron Thomas
5 years ago
Views:

1 Foreghted Reource Recprocaton Stratege n PP Networ Hyunggon Par and Mhaela van der Schaar Electrcal Engneerng Department Unverty of Calforna Lo Angele (UCLA) Emal: {hgpar mhaela@ee.ucla.edu Abtract We conder peer-to-peer (PP) networ where multple peer are ntereted n harng content. Whle harng reource autonomou and elf-ntereted peer need to mae decon on the amount of ther reource recprocaton (.e. repreentng ther acton) uch that ther ndvdual reward are maxmzed. We model the reource recprocaton among the peer a a tochatc game and how how the peer can determne ther optmal tratege for the acton ung a Marov Decon Proce (MDP) framewor. The optmal tratege determned baed on MDP enable the peer to mae foreghted decon about reource recprocaton uch that they can explctly conder both ther mmedate a well a future expected reward. To uccefully formulate the MDP framewor we propoe a novel algorthm that effcently dentfe the tate tranton probablte ung repreentatve reource recprocaton model of peer. Smulaton reult how that the propoed approach baed on the recprocaton model can effectvely cope wth a dynamcally changng envronment of PP networ. Moreover we how that the foreghted decon lead to the bet performance n term of the cumulatve expected reward. I. INTRODUCTION PP applcaton (e.g [1] []) have become ncreangly popular and PP networ provde a cot effectve and ealy deployable framewor for demnatng large fle wthout relyng on a centralzed nfratructure [3]. Hence t ha been recently propoed to ue PP networ for general fle harng [] [4] or multmeda treamng [3] [5] [6]. In th paper we conder data-drven PP ytem uch a CoolStreamng [5] Chanaw [6] or BtTorrent ytem [4] whch adopt pull-baed technque [5] [6]. Whle th approach ha been uccefully deployed n real-tme multmeda treamng and fle dtrbuton over PP networ ey challenge uch a reource recprocaton among autonomou and elf-ntereted peer tll reman open. The reource recprocaton trategy deployed n BtTorrent (.e. Tt-for-Tat (TFT) trategy) dtrbute the bandwdth equally: a peer equally dvde t avalable upload bandwdth among multple leecher [4]. The reource recprocaton n [5] baed on a heurtc chedulng algorthm whch enable the peer to determne the uppler of requred chun and elect the peer wth the hghet bandwdth. Alternatvely the reource recprocaton can be baed on the random chun electon [6]. Snce the oluton n [5] and [6] are mplemented aumng that the aocated peer are altrutc the reource recprocaton tratege baed on thee oluton do not conder the nteracton of the autonomou elfntereted and heterogeneou peer. We model the reource recprocaton among the ntereted peer a a tochatc game where peer decde ther reource dtrbuton by explctly conderng the probabltc behavor (recprocaton) of ther aocated peer. Unle extng reource recprocaton tratege whch focu on myopc decon we formalze the reource recprocaton game a a Marov Decon Proce (MDP) [7] to enable peer to mae foreghted decon on ther reource dtrbuton n a way that maxmze ther cumulatve reward.e. ther mmedate a well a future reward. To uccefully formulate the reource recprocaton game a an MDP problem the peer need to dentfy the aocated peer probabltc behavor for reource recprocaton. The probabltc behavor of the aocated peer can be etmated ung the pat htory of reource recprocaton and are repreented by tate tranton probablte n the MDP framewor. In th paper the tate of a peer defned a the et of receved reource from each of the aocated peer. We propoe a novel algorthm that can effcently dentfy the tate tranton probablte ung peer recprocaton model. The recprocaton model of the peer are motvated by [8] whch clafy the atttude of player n a game toward ther tratege a optmtc pemtc and neutral archetype. We how that th approach for reource recprocaton baed on the recprocaton model provde a fater convergence. Hence th approach enable peer to effcently capture the change of the tate tranton probablte reultng n an effcent oluton for PP networ. Th paper organzed a follow. In Secton II we model the reource recprocaton among peer a a reource recprocaton game. In Secton III the type of peer n the condered PP networ are dcued. In Secton IV an algorthm that determne the tate tranton probablte baed on the recprocaton model propoed. Smulaton reult are provded n Secton V and concluon are drawn n Secton VI. II. A NEW FRAMEWORK FOR RESOURCE RECIPROCATION In th ecton we model the reource recprocaton among the peer a a reource recprocaton game. Reource recprocaton game n the PP networ are played by the peer that are ntereted n each other. A reource recprocaton game played n a group where a group cont of a peer and t aocated peer whch mlar to concept uch a the warm n [4] partnerhp n [5] or neghbor n [6]. We Th full text paper wa peer revewed at the drecton of IEEE Communcaton Socety ubject matter expert for publcaton n the IEEE "GLOBECOM" 8 proceedng.

2 3 Peer x 1 1 a 1 3) State Tranton Probablty P a ( ): A tate tranton probablty repreent the probablty that by tang an acton a peer wll trant nto a new tate. Gven a tate S at tme t an acton a A of peer can lead to another tate S at t +1 wth probablty P a ( )=Pr( a ). Hence for a tate =( 1... NC ) of peer n C the probablty that an acton a lead the tate tranton from to can be expreed a C 1 5 Fg An llutratve example for C wth 5 matched peer. denote the aocated peer n a group of a peer by C and an llutratve example of C depcted n Fg. 1. A hown n Fg. 1 C doe not nclude peer. The peer n C are ndexed by 1...N C.e. C = {1...N C. Note that peer C alo ha t own group C whch nclude peer. The reource recprocaton game n C a tochatc game [7]. To play the reource recprocaton game a peer can deploy an MDP. An MDP of a peer a tuple < S A P R > where S the tate pace A the acton pace P : S A S [ 1] a tate tranton probablty functon and R : S R a reward functon. The detal are explaned a follow. 1) State Space S : A tate of peer repreent the et of receved reource from the peer n C and the tate pace of peer can be expreed a S = { = ψ (x ) Δ = { Δ 1...Δ n C where = ( 1... NC ) and x denote the provded reource (.e. rate) by peer C whch lmted by t maxmum upload bandwdth L. determned by a peer dependent functon ψ : R Δ = {Δ 1...Δn that map x nto one of n dcrete value 1 whch are called tate decrpton n th paper. ) Acton Space A : An acton of peer t reource allocaton to the peer n C. Hence the acton pace of peer n C can be expreed a A = {a a L 1 N C a L C where a =(a 1...a NC ) A = A A and a A denote the allocated reource to peer by peer n C. Hence peer acton a to peer become peer receved reource from peer.e. a = x. We aume that the acton are the number of allocated unt of bandwdth [9] to the aocated peer n ther group. We defne the reource recprocaton (â ŝ ) = ( (â 1...â NC ) (ŝ 1...ŝ NC ) ) a a par comprng peer acton â and the correpondng modeled reource recprocaton ŝ whch determne a ŝ = ψ (ˆx ) for all C. 1 A contnuou value of x can be dcretzed by peer baed on t quantzaton polcy a the bandwdth of each peer can be decompoed nto everal unt of bandwdth by the clent oftware e.g. [9]. P a ( )= N C P a l=1 l ( l l). (1) 4) Reward R : Snce the peer prefer hgher download rate we conder that reward R ( ) for a peer n tate the total receved reource n C expreed a R ( )=R ( 1... NC )= C r ( ) () where r ( ) repreent the receved reource n. 5) Recprocaton Polcy π : TheolutontotheMDP repreented by peer optmal polcy π whch a mappng from the tate to optmal acton. The optmal polcy can be obtaned ung well-nown method uch a value teraton and polcy teraton [7]. Hence peer can decde t acton baed on the optmal polcy π.e. π ( )=a for all S. Note that our focu on the reource recprocaton game n a group a the reource recprocaton game n a PP networ can be condered a multple reource recprocaton game n group. We wll dcu how the optmal polce can be determned for dfferent type of peer n the next ecton. III. PEER TYPES IN PP NETWORKS In th paper we categorze the peer baed on ther adopted utlte and ther reource recprocaton atttude. A. Peer type dependng on ther adopted utlte Peer n the PP networ can be condered a myopc or foreghted baed on how they compute ther reward. A peer n tate (t) myopc f t focue on maxmzng t mmedate expected reward R myo ) defned a R myo ) P (t+1) a S (t+1) )R ( (t+1) ). (3) Hence the myopc peer determne t acton a uch that maxmze t adopted expected reward Rmyo ).e. a a = arg max a A R myo ) ubject to. C a L Unle the myopc peer the foreghted peer tae ther acton uch that the acton maxmze a cumulatve dcounted expected reward [7]. Hence the objectve of a foreghted peer n tate (t) at tme t = t c gven a dcount factor γ can be expreed a R fore where E ) ] = [ R ) t=t γ(t (tc+1)) c+1 P (t+1) a (t+1) S [ ] E R ) (4) (t+1) )R ( (t+1) ). Hence the foreghted peer can determne a et of acton Th full text paper wa peer revewed at the drecton of IEEE Communcaton Socety ubject matter expert for publcaton n the IEEE "GLOBECOM" 8 proceedng.

3 ψ Fg.. L ( ) ˆ ψ () Optmtc Neutral X aˆ A Reource Recprocaton ( aˆ ˆ ) Pemtc Illutraton of reource recprocaton baed on peer atttude. that maxmze R fore ) for every tate n S whch lead to an optmal polcy π. The optmal polcy π thu map each tate S nto a correpondng optmal acton a.e. π ( )=a for all S. From (3) and (4) we can oberve that the myopc decon are a pecal cae of the foreghted decon when γ =. Note that the dcount factor γ n the condered PP networ can alternatvely repreent the belef of the peer about the valdty of the expected future reward nce the tate tranton probablty can be affected by ytem dynamc uch a the other peer jonng wtchng or leavng group. Hence for example f the PP networ n a tranent regme a mall dcount factor derable. However a large dcount factor can be ued f the PP networ n tatonary regme [1]. Thu we aume that the value of the dcount factor can be determned by the peer ung nformaton baed on ther pat experence reputaton of ther aocated peer [11] [1] etc. B. Peer type baed on ther atttude Peer n the condered PP networ can alo be characterzed baed on ther atttude toward the reource recprocaton whch are pemtc neutral or optmtc [8]. Let (â ŝ ) be a reource recprocaton between peer and peer. A peer neutral f t preume that the reource recprocaton change lnearly dependng on t acton. A peer pemtc f t preume that the reource recprocaton decreae fat for a â and ncreae low for a â. On the other hand an optmtc peer preume that the reource recprocaton decreae low for a â and ncreae fat for a â. Illutratve example of reource recprocaton hape that correpond to peer dfferent atttude are hown n Fg.. In th paper we conder thee reource recprocaton profle whch wll be referred to a recprocaton model. Thee type of peer dcued above obvouly affect ther reource recprocaton tratege. In the followng ecton we dcu how the peer atttude can mpact the way n whch peer model the other peer reource recprocaton behavor. L IV. DETERMINING THE STATE TRANSITION PROBABILITIES A. Emprcal Frequency baed State Tranton Probablte A peer can dentfy t tate tranton probablte baed on the frequency of the recprocaton. For th we conder a table T that tore the htory of reource recprocaton for peer gven acton of peer. An element T (Δl1 Δl a ) of the table T denote the number of tate tranton from =Δ l1 to =Δl gven an acton a. Hence the tate tranton probablty P a ( =Δ l1 =Δl ) baed on the emprcal frequency can be expreed a: P a ( =Δ l1 =Δ l )= T (Δl1 Δl a ) n h=1 T (Δl1 Δh a ). A dadvantage of th approach that t may requre a conderable amount of obervaton of the reource recprocaton over tme to accurately dentfy the tate tranton probablte. To reduce the number of requred obervaton we propoe an alternatve algorthm that baed on the reource recprocaton model. B. Recprocaton Model baed State Tranton Probablte A et of the tate tranton probablty that correpond to the reource recprocaton model called recprocaton matrx. The et of m avalable recprocaton matrce of peer n for peer denoted by M ( ) = {M1 ( )...Mm ( ) where Ml ( ) a A n matrx wth t element Ml ( )[a ]=P a ( ).A recprocaton matrx Ml ( ) for a pemtc peer tang acton a n hown n Fg. can be expreed by Ml( )[a ] (5) 1 f a < â = ψ () or a > â = = 1/W P f a =â ψ () otherwe where W P = {l ψ () l the number of tate decrpton between and ψ () and a denote the avalable acton that can be taen. Smlarly a recprocaton matrx of an optmtc peer n for peer hown n Fg. can be repreented by Ml( )[a ] (6) 1 f a < â = or a > â = ψ (L ) = 1/W O f a =â ψ (L ) otherwe where W O = {l l ψ (L ) the number of tate decrpton between and ψ (L ). A neutral peer preume that an acton a â wll lead to lnear change n reource recprocaton. Hence the recprocaton matrx of a neutral peer can be expreed a { Ml( )[a 1 f ] = = ψ (x = α a ) (7) otherwe where α = /â denote the lope determned baed on the current reource recprocaton. In the followng ubecton Th full text paper wa peer revewed at the drecton of IEEE Communcaton Socety ubject matter expert for publcaton n the IEEE "GLOBECOM" 8 proceedng.

4 l l l m 1 ) M M ( ( ) Recprocaton Matrce w 1 ( ) State ( ) m w Tranton m Probablty ) M w ( ( ) l l = 1 Matrx: l ye M ( ) a? no = h a h 1 m H ( ( ( ( ) ) )) = m h ) ) w ( ( h ( ) l l l = l 1 ye h ( ) h ( ) + 1 no Accuracy < Threhold? Weght Update Fg. 3. Bloc dagram for updatng the weght of the recprocaton matrce. D Jl (average) D Jl (average) (a) wthout recprocaton model wth recprocaton model Number of Obervaton (b) wthout recprocaton model 1.5 wth recprocaton model Duraton we propoe an algorthm that ue the dcued recprocaton matrce to effcently dentfy the tate tranton probablty. C. Buldng State Tranton Probablte baed on Recprocaton Matrce We aume that each peer ha a pre-determned ntal acton a I =(a I 1...aI N C ) A that ued for ntalzng the recprocaton matrce.e. a peer ha a pre-determned acton a I A for peer and the reultng. Baed on the ntal reource recprocaton (a I ) between peer and peer the recprocaton matrce of peer can be ntalzed baed on (5) (6) and (7). Let M ( ) be the et of m recprocaton matrce that are ntalzed baed on (a I ). The weght of peer for the recprocaton matrce are denoted by w ( ) = (w1 ( )...wm ( )) for peer C. We defne H ( ) = (h 1 ( )...h m ( )) a the et of number of ht where the reource recprocaton between peer and peer are matched to non-zero element n M ( ). Specfcally f (a ) matched up to a non-zero element n Ml ( )[a ] then h l ( ) h l ( )+1. Baed on H ( ) the weght of recprocaton matrce for peer can be computed a w l( )= h l ( ) m l =1 h l ( ). Th update proce depcted n Fg. 3. Fnally baed on the dentfed weght the probablty P a ( ) can be obtaned by { m P a ( ) = wl( )Ml( ) [a ]. (8) l=1 V. SIMULATION RESULTS A. Comparon of Varou Approache for Identfyng the State Tranton Probablte To llutrate the tradeoff between the effcency and the accuracy we deploy the two approache dcued n Secton IV to dentfy the true tate tranton probablty. Fg. 4. (Averaged) D Jl for etmated tate tranton probablty matrce wth/wthout recprocaton model. In the mulaton we aume that peer and peer are n a group and recprocate ther reource. To tudy the mpact of the two approache on the dcounted expected reward we conder dcounted expected reward J = [ J ( 1 )...J ( n )] T and J = [ J ( 1 )...J ( n of peer from peer obtaned )] T baed on true and etmated tate tranton probablte P ad P and the correpondng optmal polcy π and π repectvely. We aume that P tatonary. Note that J can be computed by J = P r + γ P J [7] or equvalently J =[I γ P ] 1 P r =[P + γ P + γ P 3 + ]r where r = [r( 1 ) r(n )]T. Smlarly J can alo be computed by J = [I γ P ] 1 P r. Wthout lo of generalty we conder a dcounted expected reward from l.e. J ( l ) and J ( l ). To quantfy them we ue a metrc D Jl defned by D Jl = J ( l ) J ( l ). The reult are hown n Fg. 4. In Fg. 4 (a) nce the true tate tranton probablty matrx tatonary f there are enough obervaton of reource recprocaton the true tate tranton probablty matrx can be well-dentfed by the emprcal frequency baed approach. However t may requre the obervaton of reource recprocaton among peer to obtan accurate tate tranton probablte. In contrat the approach baed on the recprocaton model can dentfy the tate tranton probablte baed on fewer obervaton. However unle the emprcal frequency baed approach the mprovement ganed for more obervaton dmnhe rapdly (before reachng the bet performance of the emprcal frequency baed approach) a the etmaton rele on pre-determned recprocaton model. A hown n Fg. 4 (a) the approach baed on the recprocaton model provde a fater convergence whch become mportant when the tate tranton probablte vary over tme. Illutratve mulaton reult are hown n Fg. 4 (b). To emulate a dynamc envronment randomly generated dfferent tate tranton probablty matrce of a peer are deployed ev- Th full text paper wa peer revewed at the drecton of IEEE Communcaton Socety ubject matter expert for publcaton n the IEEE "GLOBECOM" 8 proceedng.

5 7 6 Immedate Expected Reward (a) 5 foreghted decon 4 myopc decon TFT decon State Index Cumulatve Dcounted Reward (γ =.8) (b) State Index Fg. 5. Immedate (a) and cumulatve expected dcounted reward (b) acheved by dfferent polce. ery 1 reource recprocaton. A expected the approach baed on the recprocaton model provde a fater convergence thereby enablng peer to effcently capture the change of the tate tranton probablty. Therefore the propoed approach can cope wth a dynamc envronment thereby mang t more utable than the emprcal frequency baed approach for PP networ. B. Impact of Myopc and Foreghted Polce on Reward The mpact of the myopc and foreghted polce for ther acton on acheved cumulatve expected reward are quantfed. To focu on the mpact of myopc and foreghted polce we aume that the aocated peer behavor are randomly generated and tatonary. The oluton to the MDP mplemented baed on a well-nown polcy teraton method [7]. In addton the TFT trategy mplemented n BtTorrent-le ytem compared whch upport two leecher multaneouly a an llutraton. The mulaton reult are hown n Fg. 5. Fg. 5 how the mmedate and cumulatve expected dcounted reward wth γ =.8 obtaned baed on the myopc foreghted and TFT polce. State ndex repreent avalable tate n the tate pace. To focu on the comparon among the condered polce we aume that the reward that can be obtaned n each tate are normalzed a the tate ndex. We can oberve that the obtaned reward baed on the TFT polcy are the wort nce the acton determned by the TFT polcy do not conder the expected reward. Moreover the contrant of fxed concurrent allowable upload to the leecher can prevent the decon proce from electng better acton. A dcued prevouly the myopc decon are made baed on (3) whch maxmze the mmedate expected reward. Hence we can verfy that the mmedate expected reward obtaned by the acton of myopc polcy are alway hgher (or equal) than the other polce n Fg. 5 (a). However the foreghted decon are made baed on (4) uch that they maxmze the cumulatve dcounted expected reward a hown n Fg. 5 (b). Therefore the foreghted polcy enable the peer to determne ther decon that lead to the hghet cumulatve expected dcounted reward. VI. CONCLUSION In th paper the reource recprocaton among the peer modeled a a recprocaton game and the game formulated baed on the MDP framewor. Hence peer can determne ther reource recprocaton uch that they can maxmze ther cumulatve expected reward. To uccefully formulate the MDP framewor n PP networ we propoe the recprocaton model baed approach that enable peer to effcently dentfy the tate tranton probablty matrx. We tudy the tradeoff between effcency and accuracy when dfferent number of recprocaton model are deployed. In the mulaton we how that the propoed recprocaton baed approach more utable for PP networ. We alo how that the propoed foreghted decon lead to the bet performance n term of the cumulatve expected reward. Therefore f the propoed approach deployed n extng reource recprocaton tratege uch a TFT n BtTorrent t enable peer to mae foreghted decon on ther reource allocaton leadng to performance mprovement. REFERENCES [1] Napter. [Onlne]. Avalable: [] Gnutella. [Onlne]. Avalable: [3] J. Lu S. G. Rao B. L and H. Zhang Opportunte and challenge of peer-to-peer nternet vdeo broadcat Proc. of the IEEE Specal Iue on Recent Advance n Dtrbuted Multmeda Commun. 7. [4] A. Legout N. Loga E. Kohler and L. Zhang Cluterng and harng ncentve n BtTorrent ytem INRIA-1166 Tech. Rep. 1-1 Nov. 6. [5] X. Zhang J. Lu B. L and T. S. P. Yum CoolStreamng/DONet: A data-drven overlay networ for effcent lve meda treamng n Proc. INFOCOM 5 5. [6] V. Pa K. Kumar K. Tamlman V. Sambamurthy and A. E. Mohr Chanaw: Elmnatng tree from overlay multcat n Proc. 4th Int. Worhop on Peer-to-Peer Sytem (IPTPS) Feb. 5. [7] D. P. Bertea Dynamc Programmng and Stochatc Control. Academc P [8] E. Haruvy D. O. Stahl and P. W. Wlon Evdence for optmtc and pemtc behavor n normal-form game Economc Lett. vol. 63 pp [9] K. Jan L. Lovz and P. A. Chou Buldng calable and robut peer-to-peer overlay networ for broadcatng ung networ codng Dtrbuted Computng vol. 19 no. 4 pp [1] G. de Vecana and X. Yang Farne ncentve and performance n peer-to-peer networ n 41th Annual Allerton Conference on Communcaton Control and Computng 3. [11] E. Daman D. C. d Vmercat S. Paraboch P. Samarat and F. Volante A reputaton-baed approach for choong relable reource n peer-to-peer networ n Proc. the 9th ACM Conf. on Comput. and Commun. Securty (CCS ). ACM Pre pp [1] M. Gupta P. Judge and M. Ammar A reputaton ytem for peer-topeer networ n Proc. the 13th Int. Worhop on Netw. and Operatng Sytem Support for Dgtal Audo and Vdeo (NOSSDAV 3). ACM Pre 3 pp Th full text paper wa peer revewed at the drecton of IEEE Communcaton Socety ubject matter expert for publcaton n the IEEE "GLOBECOM" 8 proceedng.

Additional File 1 - Detailed explanation of the expression level CPD

Additional File 1 - Detailed explanation of the expression level CPD Addtonal Fle - Detaled explanaton of the expreon level CPD A mentoned n the man text, the man CPD for the uterng model cont of two ndvdual factor: P( level gen P( level gen P ( level gen 2 (.).. CPD factor