Automated Recommendation Systems

Size: px

Start display at page:

Download "Automated Recommendation Systems"

Hugo West
6 years ago
Views:

1 Automted Recommendtion Systems Collbortive Filtering Through Reinorcement Lerning Most Akhmizdeh Deprtment o MS&E, Stnord University Emil: mkhmi@stnord.edu Alexei Avkov Deprtment o Electricl Engineering, Stnord University Emil: linrv@stnord.edu Rez Tkpoui Deprtment o Electricl Engineering, Stnord University Emil: tkpoui@stnord.edu Abstrct Within this work we explore the topic o lrge scle, utomted recommendtion systems. We ocus on collbortive iltering pproches, wherein system suggests new products to users bsed on their viewing history s well s other known demogrphics. There re severl pproches to this in current literture, the simplest o which tret it s mtrix completion problem. We explore the setting rom reinorcement lerning perspective by pplying trditionl lgorithms or reinorcement lerning to the problem. I. PROBLEM FORMULATION Numerous online services such s Netlix, Amzon, Yelp, Pndor, online dvertisings, etc. provide utomted recommendtions to help users to nvigte through lrge collection o items. Every time user queries the system or new item, suggestion is mde on the bsis o the user s pst history nd when vilble their demogrphic proile. Two typicl wys o producing these recommendtions re collbortive iltering nd content-bsed iltering. There re two simultneous gols to be stisied: helping the user to explore the vilble items nd probing the user s preerences. One o the models tht cptures this setting well is the multirm bndit, n importnt model or decision mking under uncertinty. In this model set o rms with unknown rewrd proiles is given nd, t ech time slot, the decision mker must choose n rm to mximize his expected rewrd. Clerly, the decision t ech time slot should depend on previous observtions. Thus, there is trde-o between explortion, trying rms with more uncertin rewrd in order to gther more inormtion, nd exploittion, pulling rms with reltively high rewrd expecttions. For our purposes the rms hve very speciic structure nd this setting hs previously been reerred to s the linerbndits model see [3]. Here, it is ssumed tht the underlying mtrix o preerences which contins the rting user i gives to item j t entry i, j hs low-rnk structure. Hence, rtings mde by user i to item j cn be pproximted by scler product o two eture vectors i, b j R p, chrcterizing user nd item respectively. In other words our observtions, r ij cn be viewed s r ij T i b j + z ij where z ij represents the unexplined ctors. In the generl setting, both the user nd item eture vectors re treted s unknown, nd our recommendtion lgorithms must estimte them over time. However, some works like [1] mke simpliying ssumption tht the item eture vectors re known. We explore both settings, but ind more meningul results in the cse where the item eture vectors re known. In this cse, the item eture vectors cn be either constructed explicitly, or derived rom users eedbck using mtrix ctoriztion methods. With the item ltent vectors in hnd, we cn tret ech user independently nd throughout the explore-exploit trde-o, we cn try to estimte nd exploit the users ltent vectors. These eture vectors cn depend on users demogrphic inormtion nd their pst behvior in rting items. The gol o our system is to develop recommendtion policy, which suggests items to users. This policy will, t ech time slot, output recommendtion bsed on the previous observtions. This policy must properly djust or the exploreexploit trde-o, nd clssiclly there re two types o policies, which dier in the wy they perorm explortion: optimistic policies, e.g. upper conidence bound UCB, nd probbilistic policies, e.g. posterior smpling. UCB lgorithms hve been pplied to this problem in the pst, but posterior smpling is less common. Posterior smpling lso known s Thompson smpling ws introduced in 1933 nd oers signiicnt dvntges over UCB methods s shown in [], however until recently it hs not been very populr or esible. Our primry objective is to explore the esibility o collbortive iltering through posterior smpling. We nlyze its perormnce on rel world dt, speciiclly the reely vilble MovieLens dtsets, nd compre it to existing methods such s UCB nd the work done in [1] II. SYSTEM MODELS AND ALGORITHMS In this section we will introduce some nottion used throughout the rest o this work, s well s the lgorithms tht we seek to implement. A. Nottion We hve set o users, i 1,,..., m, with corresponding eture vectors i R p ; nd items, j 1,,..., n, with corresponding eture vectors b j R p. We reer to these eture vectors collectively s A R p m nd B R p n, thus the true rtings cn be cptured in the mtrix A T B. At ech time t Z +, user i t will enter the system nd

2 be recommended n item j t, ter which they will give it rting r t ccording to r t T i t b j t + z t Where z t cptures the unexplinble devition o the observtion rom our model. We reer to the viewing history t time t s the sequence H t { i τ, j τ, r τ} t, i.e. ll the viewings in the system beore time t. Thus on high-level, t time t our progrm seeks to use its knowledge o user i t to mke the best possible recommendtion. The job o recommendtion system is to deine unction µ H, which given user will output recommendtion or tht user. Unknown to the system, there is some optiml policy which t ech time t would output recommendtion j t. To mesure the perormnce o our system, we will compre the system s recommendtions to the best recommendtion. Speciiclly deine the regret o the system, t time t, to be Rt t T i b τ τ j E [r τ] Tht is, t ech time-step we increse our regret by how r the expected rting o our recommendtion diers rom the best possible rting. Ultimtely we seek to derive policy which chieves miniml regret. B. Posterior Smpling Algorithm 1 Posterior Smpling Strt with prior distribution on A, B, A, B or t 1,,... do observe rrivl o user i t smple Â, ˆB A, B H t compute nd output recommendtion j t where j t rg mx E j. observe the user s rting r t end or [â T i ˆb j ] The ide behind posterior smpling lgorithm is to orce optimism through probbilistic ction. Speciiclly t ech time step, t, we will mke recommendtion j t bsed on the probbility tht it is the best possible recommendtion, P j t j t. However, this probbility is inccessible, so insted the lgorithm smples model or the unknown eture vectors bsed on the probbility tht they re the true eture vectors given the viewing history, nd inds the optiml recommendtion should this be the true model. It cn be shown tht this smpling technique is equivlent to smpling recommendtion bsed on the probbility it is optiml, nd more detiled description o the lgorithm nd its motivtions cn be seen in []. Thus the lgorithm proceeds to keep trck o the distribution o model prmeters t ech time step, nd updtes them ccordingly. To implement this lgorithm ll tht remins is to choose prior on the model prmeters, nd compute their posterior distribution given viewing history. As in [3], nd other prior literture, we ssume i, b j N, I p /p i.i.d.. Furthermore we ssume tht unexplined devitions o the observtions re Gussin, i.e. z t N, σ z. Now we re redy to compute the posterior distribution. Using Byes rule observe or compctness we use to denote the distribution o the rgument: A, B H t H t A, B A B H t AB t z τ A,B AB t z τ dadb In the bove z τ r τ T b i τ j τ, nd the integrl o the denomintor is over the entire spce o R p m R p n. For the rest o this report, we consider the simpler cse where the vectors b j re given nd we tret ech user independently. This problem is extensively studied in literture, but s r s we cn tell hs never been solved or nlyzed through posterior smpling. We explore it more concretely below. For compctness we will consider only the eture vector o single user, R p, priori we ssume it comes rom N, I p /p s bove, nd we now consider the viewing history H t to be the history o the ctive user s opposed to ll users. We cn now compute the posterior distribution s ollows: H H t H H t H H t t z z τ R p t z z τ d r τ T b j τ t z R p t z r τ T b j τ d But observe in this simple cse computing the posterior is much simpler. The numertor is clerly Gussin, nd the denomintor is just normlizing term, thus we determine H t N µ t, Σ t. We cn ormulte recursive updte rule or the prmeters µ t, Σ t by mssging the numertor into n pproprite orm this is done in the ppendix. We ind the ollowing updte equtions or the posterior: Σ t Σ t µ t Σ t + bt b tt σ z Σ t µ t + rt b t These recursive updte equtions re convenient or implementtion, nd cn be used eiciently by storing Σ, however some intuition s to their opertion cn be seen by pplying

3 the mtrix inversion lemm. Through it, we ind: Σ t Σ t µ t µ t Σt b j tb T Σ t j t σ z + b T j t Σ t + rt Σ t Σ t + b T Σ t j t b j t b j t H t b j tµ t T b j t Thus, essentilly, t ech step the posterior men shits towrds or wy the eture vector o the recommended item. Similrly the covrince Σ thins out to select or single direction. The rest o our work revolves mostly round nlyzing the simpliied problem setting, however this simpliiction is extremely useul or the generl cse s well. Observe, A, B H t A B, H t B H t Thus we cn perorm posterior smpling in the generl cse by irst smpling item etures ˆB, ccording to B H t, nd then smpling A rom gussin distribution with men nd vrince determined by the previously derived updte equtions given the selected etures ˆB. Unortuntely the distribution o B H t is quite complicted; ter vectorizing the mtrix B into vector B R np we ind: B H t B / p exp k B T B + c T B In the bove p is polynomil unction in the entries o B, k is some sclr, nd c is vector in R np. Unortuntely, even in this orm, it is still uncler how to smple rom this distribution. C. A UCB Approch Algorithm UCB Strt with prior distribution on A, B, A, B, nd n optimism prmeter p, 1 or t 1,,... do Observe rrivl o user i t Compute the distribution on the rewrd o ech item For ll items, compute U j, the p-th percentile o the rewrd o item j Compute nd output recommendtion j t where j t rg mx j Observe the user s rting r t end or UCB is completely dierent pproch rom posterior smpling. At ech timestep the lgorithm computes n upper conidence bound on the rewrd o ech o the items. The lgorithm will then suggest the lgorithm with the highest UCB. For our purposes, we will use speciic percentile o the rewrd s the UCB o ech item. This is generlly hrd to do nd other literture uses vrious heuristics to determine U j UCB. In the generl problem setting, it is uncler how to implement UCB in ny meningul wy, however it is rther elegnt in the simpliied cse o given item eture vectors. In the simpliied cse, using the priors described in the previous section we observe tht the posterior o given is Gussin, thus the distribution on the rewrd o recommending item j is lso Gussin. We compute the men nd vrince s ollows: σ j b T j Σ b j + σ z µ j b T j µ Thus computing the p-th percentile o the rewrd cn be done, simply by inverting the cd o the norml distribution. D. Mixed Approches From evlution we observe tht U CB nd posterior smpling ech hve unique dvntges. Thus we propose vrious schemes tht llow you to chieve the vrious perormnce trde-os o both. First we propose n ɛ-greedy pproch nd second we propose two-phse pproch. These were both studied in the simpliied cse, but could potentilly be pplied to the generl setting s well. The ɛ-ucb lgorithm will lip weighted coin t ech timestep to decide wether to obtin recommendtion through posterior smpling or through UCB. Speciiclly the lgorithm will elect to perorm UCB ɛ percent o the time. The two phsed pproch will begin by lerning through posterior smpling until some time T, ter which it proceeds to output recommendtions through the UCB pproch. In the next section, we will thoroughly study the perormnce o ll o the lgorithms presented in this section. E. The Cse o No-repet Recommendtions Throughout this work we ssumed tht it is relevnt to recommend the sme item severl times. However, in some settings this is not very nturl. For instnce, i the system provides recommendtions or viewing movies, ll o the bove lgorithms would eventully chose to show the sme movie over nd over. Clerly this is not very useul, nd this cn be resolved in severl wys. We could lower the rewrd o successive viewings, but this dds complicted time dependence to our model. More simply we cn prohibit the lgorithm rom suggesting the sme item multiple times. In the cse o suggesting movies this is nturl since users would rrely view the sme production multiple times. III. IMPLEMENTATION AND EVALUATION In this section, we present our implementtion results or the orementioned lgorithms. For the purpose o numericl simultions, we used MATLAB. We hve crried out lgorithms both on synthetic dt nd reely vilble MovieLens dtset. For the purpose o synthetic dt, we generte rndom mtrix the sme size s MovieLens dt with rnk 3 by generting rndom Gussin eture mtrices nd multiplying them together. Ech o the entries o eture

4 mtrices comes rom N, 1/p. Then, we will tke item eture vectors s grnted nd try to estimte user eture mtrix by considering Gussin prior. Figure 1 shows the cumultive regret versus time, or posterior smpling nd UCB with our dierent prmeters. We cn see tht posterior smpling might work worse t irst by exploring too much, but it pys o lter when the better understnding o the rms comes to help lter. Cumultive regret Posterior Smpling UCB with. percentile UCB with.9 percentile UCB with.99 percentile UCB with.999 percentile Fig. 1. Cumultive regret o posterior smpling nd UCB lgorithms on synthetic dt Notice tht the regret observed by ech o these lgorithms is very good compred to the totl rewrd they obtin the cumultive rewrds t t8 is in the order o while the dierence in rewrds is in the order o 1. In order to show this, we hve plotted the cumultive rewrd versus time or ll o these lgorithms in Figure nd it cn be seen tht it is close to the optiml rewrd. Cumultive rewrd Posterior Smpling UCB with. percentile UCB with.9 percentile UCB with.99 percentile UCB with.999 percentile Optiml rewrd Fig.. Cumultive rewrd o posterior smpling nd UCB lgorithms on synthetic dt We lso tried similr simultions or dierent rnks o the underlying mtrix. Figure 3 shows the perormnce o these lgorithms when the rnk o the preerence mtrix is 1. It cn be seen tht posterior smpling outperorms UCB. One interesting observtion is tht unlike posterior smpling, UCB methods re very sensitive to the prmeters used in lgorithms in our cse the percentile prmeter, nd using inpproprite prmeter my result in non-zero symptotic regret. We observe tht the optiml tuning is highly sensitive to the dt, speciiclly to its rnk. As vrition o the introduced lgorithms, we hve crried out combintion o posterior smpling nd greedy Cumultive regret Posterior Smpling UCB with. percentile UCB with.9 percentile UCB with.99 percentile UCB with.999 percentile Fig. 3. Cumultive regret o posterior smpling nd UCB lgorithms or synthetic dt with rnk 1 lgorithms. Here, the greedy lgorithm chooses the rm tht mximized the expected instntneous rewrd nd cn be considered s UCB with percentile %. At ech time, the ɛ-greedy lgorithm mkes greedy decision with probbility ɛ, nd perorms n itertion o posterior smpling with probbility 1 ɛ. By looking t Figure 4, we cn see tht the perormnce o the greedy lgorithm improves drmticlly when it s combined with posterior smpling % o the time. This will result in even more computtionlly eicient methods while the regret still remins cceptbly low. Cumultive regret Posterior Smpling eps. eps.8 Greedy Fig. 4. Cumultive regret or hybrid pproch on synthetic dt We lso crried out the posterior smpling nd UCB lgorithms under the ssumption tht no item cn be recommended to user more thn once. Notice tht in this cse, we expect the regret to be decresing t some point, becuse the expected regret t time 168 is equl to zero. Figure shows the perormnce o these lgorithms in this cse. We hve implemented ll these methods or the MovieLens dtset nd got similr results. For exmple Figure 6 shows the cumultive regret or the posterior smpling nd UCB with dierent prmeters on MovieLens dtset. As seen in Figure 6, UCB lgorithm with prmeter.9 works better thn the other instnces o UCB, which urther shows tht there is no rule or inding the best prmeters or UCB lgorithms. IV. CONCLUSIONS All o the lgorithms we nlyzed perorm extremely well. The dierence in regret between them is negligible compred

5 Cumultive Regret Posterior Smpling UCB. UCB.9 UCB Fig.. Cumultive regret or the cse o no repetitions, on synthetic dt Cumultive regret Posterior Smpling 1 UCB with. percentile UCB with.9 percentile UCB with.99 percentile UCB with.999 percentile Fig. 6. Cumultive regret o posterior smpling nd UCB on MovieLens dtset to the totl rewrd collected. Thus we dvocte posterior smpling s the best generl purpose solution or severl resons. First, it is extremely eicient compre to the UCB style pproch. Second, it does not require ny tuning; while we observed tht UCB cn outperorm posterior smpling it is extremely relint on proper tuning which cn be hrd to determine in prctice. Furthermore, posterior smpling cn clerly be extended to the generl problem setting, wheres our stted UCB method is not. Lstly we note tht using previously mentioned hybrid pproches it is possible to chieve mny dierent eiciency/regret trde-os. V. FUTURE WORK It would be interesting to more closely nlyze the generl cse. Posterior smpling cn be implemented through Gibb s smpling, or the Metropolis-Hstings lgorithm. UCB s described in this pper would be much more diicult to implement, but we could try vrious heuristics nd other UCB style lgorithms. Alterntively this work could be continued in the prcticl direction by building rel-lie recommendtion system utilizing these lgorithms nd studying its perormnce. APPENDIX A. Derivtion o Posterior Updte Rules Agin consider the simpliied cse where we know the ltent eture vectors o the items. For compctness we will consider only the eture vector o single user, R p. Then: H t t z r τ T b j τ R p t z r τ T b j τ d t C 1 z r τ T b j τ C z r t T b j t H t r t T b j t C 3 exp σ z exp 1 µt T Σ t µ t Now it is cler tht the distribution remins Gussin. At this point simply compute the coeicients o the qudrtic nd liner terms to solve or the new men nd covrince. This yields Σ t Σ t µ t Σ t + bt b tt σ z Σ t µ t B. Woodury Mtrix Identity & Updte rules Recll the Woodury Mtrix Identity: + rt b t A + UCV A A U C + V A U V A For ese o nottion in this section we will reer to Σ t s Σ t, µ t s µ t, r t s r, nd lstly we reer to b t simply s b. Apply the lemm to the previously derived updte rules: Σ t Σ t + bbt σ z Σ t Σ tbb T Σ t σ z + b T Σ t b Now we cn plug this into the derivtion o µ t : µ t Σ t Σ t µ t + r b Σ t Σ tbb T Σ t + b T Σ t µ t + r Σ t b b µ t Σ tbµ T tb + b T Σ t b + r σ z Σ t b + b T Σ t bσ t b Σ t bb T Σ t b + b T Σ t b rσt Σ t bµ T t µ t + + b T b Σ t b REFERENCES [1] Ysh Deshpnde, Andre Montnri, Liner Bndits in High Dimension nd Recommendtion Systems. Avilble online: [] Dniel Russo, Benjmin Vn Roy, Lerning to Optimize Vi Posterior Smpling. Avilble online: [3] Pt Rusmevichientong, John N. Tsitsiklis, Linerly Prmeterized Bndits. Avilble online:

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling CSE 547/Stt 548: Mchine Lerning for Big Dt Lecture Multi-Armed Bndits: Non-dptive nd Adptive Smpling Instructor: Shm Kkde 1 The (stochstic) multi-rmed bndit problem The bsic prdigm is s follows: K Independent