Conservative Contextual Linear Bandits

Size: px
Start display at page:

Download "Conservative Contextual Linear Bandits"

Transcription

1 Conservatve Contextual Lnear Bandts Abbas Kazeroun 1, Mohammad Ghavamzadeh 2, Yasn Abbas-Yadkor 3, and Benjamn Van Roy 4 arxv: v2 [stat.ml] 4 Mar Stanford Unversty, abbask@stanford.edu 2 Adobe Research, ghavamza@adobe.com 3 Adobe Research, abbasya@adobe.com 4 Stanford Unversty, bvr@stanford.edu Abstract Safety s a desrable property that can mmensely ncrease the applcablty of learnng algorthms n real-world decson-makng problems. It s much easer for a company to deploy an algorthm that s safe,.e., guaranteed to perform at least as well as a baselne. In ths paper, we study the ssue of safety n contextual lnear bandts that have applcaton n many dfferent felds ncludng personalzed ad recommendaton n onlne marketng. We formulate a noton of safety for ths class of algorthms. We develop a safe contextual lnear bandt algorthm, called conservatve lnear UCB (CLUCB), that smultaneously mnmzes ts regret and satsfes the safety constrant,.e., mantans ts performance above a fxed percentage of the performance of a baselne strategy, unformly over tme. We prove an upper-bound on the regret of CLUCB and show that t can be decomposed nto two terms: 1) an upper-bound for the regret of the standard lnear UCB algorthm that grows wth the tme horzon and 2) a constant (does not grow wth the tme horzon) term that accounts for the loss of beng conservatve n order to satsfy the safety constrant. We emprcally show that our algorthm s safe and valdate our theoretcal analyss. 1 Introducton Many problems n scence and engneerng can be formulated as decson-makng problems under uncertanty. Although many learnng algorthms have been developed to fnd a good polcy/strategy for these problems, most of them do not provde any guarantee that ther resultng polcy wll perform well, when t s deployed. Ths s a major obstacle n usng learnng algorthms n many dfferent felds, such as onlne marketng, health scences, fnance, and robotcs. Therefore, developng learnng algorthms wth safety guarantees can 1

2 mmensely ncrease the applcablty of learnng n solvng decson problems. A polcy generated by a learnng algorthm s consdered to be safe, f t s guaranteed to perform at least as well as a baselne. The baselne can be ether a baselne value or the performance of a baselne strategy. It s mportant to note that snce the polcy s learned from data, and data s often random, the generated polcy s a random varable, and thus, the safety guarantees are n hgh probablty. Safety can be studed n both offlne and onlne scenaros. In the offlne case, the algorthm learns the polcy from a batch of data, usually generated by the current strategy or recent strateges of the company, and the queston s whether the learned polcy wll perform as well as the current strategy or no worse than a baselne value, when t s deployed. Ths scenaro has been recently studed heavly n both model-based (e.g., [7]) and model-free (e.g., [3, 13, 14, 11, 10, 6]) settngs. In the model-based approach, we frst use the batch of data and buld a smulator that mmcs the behavor of the dynamcal system under study (hosptal s ER, fnancal market, robot), and then use ths smulator to generate data and learn the polcy. The man challenge here s to have guarantees on the performance of the learned polcy, gven the error n the smulator. Ths lne of research s closely related to the area of robust learnng and control. In the model-free approach, we learn the polcy drectly from the batch of data, wthout buldng a smulator. Ths lne of research s related to off-polcy evaluaton and control. Whle the model-free approach s more sutable for problems n whch we have access to a large batch of data, such as n onlne marketng, the model-based approach works better n problems n whch data s harder to collect, but nstead, we have good knowledge about the underlyng dynamcal system that allows us to buld an accurate smulator. In the onlne scenaro, the algorthm learns a polcy whle nteractng wth the real system. Although (reasonable) onlne algorthms wll eventually learn a good or an optmal polcy, there s no guarantee for ther performance along the way (the performance of ther ntermedate polces), especally at the very begnnng, when they perform a large amount of exploraton. Thus, n order to guarantee safety n onlne algorthms, t s mportant to control ther exploraton and make t more conservatve. Consder a manager that allows our learnng algorthm runs together wth her company s current strategy (baselne polcy), as long as t s safe,.e., the loss ncurred by lettng a porton of the traffc handled by our algorthm (nstead of by the baselne polcy) does not exceed a certan threshold. Although we are confdent that our algorthm wll eventually perform at least as well as the baselne strategy, t should be able to reman alve (not termnated by the manager) long enough for ths to happen. Therefore, we should make t more conservatve (less exploratory) n a way not to volate the manager s safety constrant. Ths settng has been studed n the mult-armed bandt (MAB) [15]. [15] consdered the baselne polcy as a fxed arm n MAB, formulated safety usng a constrant defned based on the performance of the baselne polcy (mean of the baselne arm), and modfed the UCB algorthm [2] to satsfy ths constrant. In ths paper, we study the noton of safety n contextual lnear bandts, a settng that has applcaton n many dfferent felds ncludng onlne personalzed ad recommendaton. 1 1 Other defntons of safety have been studed n contextual lnear bandts (e.g., Example 2 n [9]). 2

3 We frst formulate safety n ths settng, as a constrant that must hold unformly n tme, n Secton 2. Our goal s to desgn learnng algorthms that mnmze regret under the constrant that at any gven tme, ther expected sum of rewards should be above a fxed percentage of the expected sum of rewards of the baselne polcy. Ths fxed percentage depends on the amount of rsk that the manager s wllng to take. Then n Secton 3, we propose an algorthm, called conservatve lnear UCB (CLUCB), that satsfes the safety constrant. At each round, CLUCB plays the acton suggested by the standard lnear UCB (LUCB) algorthm (e.g., [5, 8, 1, 4, 9]), only f t satsfes the safety constrant for the worst choce of the parameter n the confdence set, and plays the acton suggested by the baselne polcy, otherwse. We also prove an upper-bound for the regret of CLUCB, whch can be decomposed nto two terms. The frst term s an upper-bound on the regret of LUCB that grows at the rate T log(t ). The second term s constant (does not grow wth the horzon T ) and accounts for the loss of beng conservatve n order to satsfy the safety constrant. Ths mproves over the regret bound derved n [15] for the MAB settng, where the regret of beng conservatve grows wth tme. In Secton 4, we show how CLUCB can be extended to the case that the reward of the baselne polcy s unknown wthout a change n ts rate of regret. Fnally n Secton 5, we report expermental results that show CLUCB behaves as expected n practce and valdate our theoretcal analyss. 2 Problem Formulaton In ths secton, we frst revew the standard lnear bandt settng and then ntroduce the conservatve lnear bandt formulaton consdered n ths paper. 2.1 Lnear Bandt In the lnear bandt settng, at any tme t, the agent s gven a set of (possbly) nfntely many actons/optons A t, where each acton a A t s assocated wth a feature vector φ t a R d. At each round t, the agent should select an acton a t A t. Upon selectng a t, the agent observes a random reward y t generated as y t = θ, φ t a t + η t, (1) where θ R d s the unknown reward parameter, θ, φ t a t = ra t t s the expected reward of acton a t at tme t,.e., ra t t = E[y t ], and η t s a random nose such that Assumpton 1. Each element η t of the nose sequence {η t } t=1 s condtonally σ-sub-gaussan,.e., ζ R, E [ ( ) ] ζ 2 σ 2 e ζηt a 1:t, η 1:t 1 exp. 2 The sub-gaussan assumpton automatcally mples that E[η t a 1:t, η 1:t 1 ] = 0 and Var[η t a 1:t, η 1:t 1 ] σ 2. 3

4 Note that the above formulaton contans tme-varyng acton sets and tme-dependent feature vectors for each acton, and thus, ncludes the lnear contextual bandt settng. In lnear contextual bandt, f we denote by x t, the state of the system at tme t, the tmedependent feature vector φ t a for acton a wll be equal to φ(x t, a), the feature vector of state-acton par (x t, a). We also make the followng standard assumpton on the unknown parameter θ and feature vectors: Assumpton 2. There exst constants B, D 0 such that θ 2 B, φ t a 2 D, and θ, φ t a [0, 1], for all t and all a A t. We defne B = { θ R d : θ 2 B } and F = { φ R d : φ 2 D, θ, φ [0, 1] } to be the parameter space and feature space, respectvely. Obvously, f the agent knows θ, she wll choose the optmal acton a t = arg max a At θ, φ t a at each round t. Snce θ s unknown, the agent s goal s to maxmze her cumulatve expected rewards after T rounds,.e., T t=1 θ, φ t a t, or equvalently, to mnmze ts (pseudo)-regret,.e., T T R T = θ, φ t a θ, φ t t a t, (2) t=1 whch s the dfference between the cumulatve expected rewards of the optmal and agent s strateges. 2.2 Conservatve Lnear Bandt The conservatve lnear bandt settng s exactly the same as the lnear bandt, except that there exsts a baselne polcy π b (the company s strategy) that at each round t, selects acton b t A t and ncurs the expected reward rb t t = θ, φ t b t. We assume that the expected rewards of the actons taken by the baselne polcy, rb t t, are known. Ths s often a reasonable assumpton, snce we usually have access to a large amount of data generated by the baselne polcy, as t s our company s strategy, and thus, have a good estmate of ts performance (see Remark 1 at the end of ths Secton). We relax ths assumpton n Secton 4 and extend our proposed algorthm to the case that the reward functon of the baselne polcy s not known n advance. Another dfference between the conservatve and standard lnear bandt settngs s the performance constrant, whch s defned as follows: Defnton 1 (Performance Constrant). At each round t, the dfference between the performances of the baselne and the agent s polces should reman below a pre-defned fracton α (0, 1) of the baselne performance. Ths constrant may be wrtten formally as t=1 t rb =1 t t ra α rb, t {1,..., T }, =1 =1 4

5 or equvalently as t ra (1 α) =1 t rb, t {1,..., T }. (3) =1 The parameter α (0, 1) controls how conservatve the agent should be. Small values of α show that only small losses are tolerated, and thus, the agent should be overly conservatve, whereas large values of α ndcate that the manager s wllng to take rsk, and thus, the agent can explore more and be less conservatve. Here gven the value of α, the goal of the agent s to select her actons n a way to both mnmze her regret (2) and satsfy the performance constrant (3). In the next secton, we propose a lnear bandt algorthm to acheve ths goal. Remark 1. As mentoned above, t s often reasonable to assume that we have access to a good estmate of the baselne reward functon. If n addton to ths estmate, we have access to the data generated by the baselne polcy that are used to compute ths estmate, we can use them n our algorthm. The reason we do not use the data generated by the actons suggested by the baselne polcy n constructng the confdence sets n our algorthm s manly to keep the analyss smple. However, when we deal wth the more general case of unknown baselne reward n Secton 4, we construct the confdence sets usng all avalable data, ncludng those generated by the baselne polcy. It s also mportant to note that havng a good estmate of the baselne reward functon does not necessarly mean that we know the unknown parameter θ, snce the data used for ths estmate has only been generated by the baselne polcy, and thus, may only provde a good estmate of θ n a lmted subspace. 3 A Conservatve Lnear Bandt Algorthm In ths secton, we propose a lnear bandt algorthm, called conservatve lnear upper confdence bound (CLUCB), that s based on the optmsm n the face of uncertanty prncple, and gven the value of α, both mnmzes the regret (2) and satsfes the performance constrant (3). Algorthm 1 contans the pseudocode of CLUCB. At each round t, CLUCB uses the prevous observatons and bulds a confdence set C t that wth hgh probablty contans the unknown parameter θ. It then selects the optmstc acton a t argmax a A t max θ C t θ, φ t a, whch has the best performance among all the actons avalable n A t, wthn the confdence set C t. In order to make sure that constrant (3) s satsfed, the algorthm plays the optmstc acton a t, only f t satsfes the constrant for the worst choce of the parameter θ C t. To make ths more precse, let S t 1 be the set of rounds before round t at whch CLUCB has played the optmstc acton,.e., a = a. In other words, S c t 1 = {1, 2,, t 1} S t 1 s the set of rounds j before round t at whch CLUCB has followed the baselne polcy,.e., a j = b j. 5

6 Algorthm 1 Pseudocode of CLUCB Input: α, B, F Intalze: S 0 =, z 0 = 0 R d, and C 1 = B for t = 1, 2, 3, do Fnd (a t, θ t ) arg max (a,θ) At Ct θ, φ t a Compute L t = mn θ Ct θ, z t 1 + φ t a t f L t + S r t 1 c b (1 α) t =1 r b then Play a t = a t and observe reward y t defned by (1) Set z t = z t 1 + φ t a t, S t = S t 1 t, St c = St 1 c Gven a t and y t, construct the confdence set C t+1 accordng to (5) else Play a t = b t and observe reward y t defned by (1) Set z t = z t 1, S t = S t 1, St c = St 1 c t, C t+1 = C t end f end for In order to guarantee that t does not volate constrant (3), at each round t, CLUCB plays the optmstc acton,.e., a t = a t, only f mn θ C t z t 1 [ { }}{ ] rb + θ, + θ, φ ta (1 α) t φ a St 1 c S t 1 =1 t rb, and plays the conservatve acton,.e., a t = b t, otherwse. In the next secton, we wll descrbe how CLUCB constructs and updates the confdence sets C t. 3.1 Constructon of Confdence Sets CLUCB starts by the most general confdence set C 1 = B and updates ts confdence set only when t plays an optmstc acton. Ths s manly for smplfcaton and s based on the dea that snce the reward functon of the baselne polcy s known ahead of tme, playng a baselne acton does not provde any new nformaton about the unknown parameter θ. However, ths can be easly changed to update the confdence set after each acton. Ths s n fact what we do n the algorthm proposed n Secton 4. We follow the approach of [1] to buld confdence sets for the unknown parameter θ. Let S t = { 1,..., mt } be the set of rounds up to and ncludng round t at whch CLUCB has played the optmstc acton. Note that we have defned m t = S t. For a fxed value of λ > 0, let θ t = ( Φ t Φ t + λi ) 1 Φt Y t, (4) be the regularzed least square estmate of θ at round t, where Φ t = [φ 1 a1,..., φ m t a mt ] and Y t = [y 1,..., y mt ]. For a fxed confdence parameter δ (0, 1), we construct the confdence 6

7 set for the next round t + 1 as where β t+1 = σ d log C t+1 = ( 1+(mt+1)D 2 /λ δ {θ R d : θ θ t Vt β t+1 }, (5) ) + λb, V t = λi + Φ t Φ t, and the weghted norm s defned as x V = x V x for any x R d and any postve defnte V R d d. Note that smlar to the lnear UCB algorthm (LUCB) n [1], the sub-gaussan parameter σ and the regularzaton parameter λ that appear n the defntons of β t+1 and V t should also be gven to the CLUCB algorthm as nput. The followng proposton (Theorem 2 n [1]) shows that the confdence sets constructed as n (5) contan the true parameter θ wth hgh probablty. Proposton 1. For any δ > 0 and the confdence set C t defned by (5), we have P [ θ C t, t N ] 1 δ. Proposton 1 ndcates that at each round t, the CLUCB algorthm satsfes the performance constrant (3) wth probablty at least 1 δ. Ths s because at each round t, CLUCB ensures that (3) holds for all θ C t and P[θ C t ] 1 δ. 3.2 Regret Analyss of CLUCB In ths secton, we prove a regret bound for the proposed CLUCB algorthm. Let t b t = r t a t rt b t be the baselne gap at round t,.e., the dfference between the expected rewards of the optmal and baselne actons at round t. Ths quantty shows how sub-optmal the acton suggested by the baselne polcy s at round t. We make the followng assumpton on the performance of the baselne polcy π b. Assumpton 3. There exst 0 l h and 0 < r l < r h such that, at each round t, l t b t h and r l r t b t r h. (6) An obvous canddate for both h and r h s 1, as all the mean rewards are confned n [0, 1]. The reward lower-bound r l ensures that the baselne polcy mantans a mnmum level of performance at each round. Fnally, l = 0 s a reasonable canddate for the lower-bound of the baselne gap. The followng proposton shows that the regret of CLUCB can be decomposed nto the regret of a lnear UCB (LUCB) algorthm (e.g., [1]) and a regret caused by beng conservatve n order to satsfy the performance constrant (3). Proposton 2. The regret of CLUCB can be decomposed nto two terms as follows: R T (CLUCB) R ST (LUCB) + n T h, (7) where R ST (LUCB) s the cumulatve (pseudo)-regret of LUCB at rounds t S T and n T = S c T = T S T = T m T s the number of rounds (n T rounds) at whch CLUCB has played the conservatve acton. 7

8 Proof. From the defnton of regret (2), we have R T (CLUCB) = T ra t T t t=1 t=1 r t a t = ) (ra t t rt a t + ) (ra t t rt b t t S T t S c T = ) (ra t t rt a t + t S T t S T t S c T t b t (r t a t rt a t ) + n T h. (8) The result follows from the fact that for t S T, CLUCB plays the exact same actons as LUCB, and thus, the frst term n (8) represents LUCB s regret for these rounds. The regret bound of LUCB for the confdence set defned by (5) can be derved from the results of [1]. Let E be the event that θ C t, t N, whch accordng to Proposton 1 holds wth probablty at least 1 δ. The followng proposton provdes a bound for R ST (LUCB). Snce ths proposton s a drect applcaton of Theorem 3 n [1], we omt ts proof here. Proposton 3. On event E, for any T N, we have ( R ST (LUCB) 4 m T d log λ + m ) T D d [ B λ + σ 2 log( 1δ ( ) + d log 1 + m T D λd ( ( ) ) D T = O d log λδ T. (9) Now n order to bound the regret of CLUCB, we only need to fnd an upper bound on n T,.e., the number of tmes CLUCB devates from LUCB and selects the acton suggested by the baselne polcy. We start ths part of the proof wth the followng lemma. Lemma 4. For gven k N, λ > 0 and any sequence X 1, X 2,, X k n R d such that : X 2 D, let V 0 = λi and V = λi + j=1 X jxj for 1 k. Then, we have k =1 ( ) mn 1, X 2 V 1 1 2d log )] ) (1 + kd2. (10) λd Lemma 4 s a drect applcaton of Lemma 11 n [1] and we omt ts proof here. The followng theorem provdes a bound on the number of rounds at whch CLUCB acts conservatvely and follows the baselne polcy π b. 8

9 Theorem 5. Assume that λ D 2. On event E, for any horzon T N, we have n T d 2 (B λ + σ) 2 l + αr l [ log ( 64d(B )] 2 λ + σ)d. λδ( l + αr l ) Proof. Let be the last round at whch CLUCB plays conservatvely (acton suggested by the baselne polcy),.e., = max { } 1 t T a t = b t. From Algorthm 1, at round, we may wrte mn θ, φ a θ C + φ t a t + rb t t < (1 α) rb t t. t S 1 t S 1 c t=1 or equvalently, α We may rewrte (11) as α rb t t < t=1 t=1 rb t t < rb t t mn θ, φ a θ C + φ t a t. (11) t S t S 1 t S 1 Note that for each t S 1, we have [ r t bt θ, φ t a t ] + [ r b θ, φ a ] + θ, φ a + φ t a t mn θ, φ a θ C + φ t a t t S 1 t S 1 = [ ] rb t t max θ, φ t a θ C t + max θ, φ t a t θ C t θ, φ t a t t t S 1 [ ] + rb max θ, φ a θ C + max θ, φ a θ C θ, φ a + max θ θ, φ a θ C +. (12) t S 1 φ t a t r t b t max θ C t θ, φ t a t r t b t θ, φ t a t = t b t, (13) and smlarly, for the round, we have r b max θ C θ, φ a r b θ, φ a = b. (14) Usng nequaltes (12) to (14), we may rewrte (11) as 9

10 α rb t t < [ ] t b t + max θ θ, φ t a θ C t t t=1 t S 1 b + max θ θ, φ a θ C + max θ θ, φ a θ C + φ t a t t S 1 (m 1 + 1) l + max θ θ, φ a θ C + θ θ, φ t a θ C t t t S 1 max + max θ θ, φ a θ C + t S 1 φ t a t (15) φ V (m 1 + 1) l + 2β a φ t V a t 1 + 2β t φ a + t S 1 β t t S 1 φ t a t V 1 φ V (m 1 + 1) l + 4β a 1 + 4β φ t V a t 1. (16) t S 1 t On the other hand, t follows from (15) and the fact that all rewards are n [0, 1] that α rb t t (m 1 + 1) l + 4(m 1 + 1). (17) t=1 Combnng (16) and (17), we may wrte α [ ( rb t φ t (m 1 + 1) l + 4β mn a t=1 + t S 1 mn 2 V 1 ( φ t a t V 1 t t S 1 mn V 1 ), 1 )], 1. (18) In order to wrte the next equaton more compactly, let us defne Γ as [ ( ) φ Γ = mn a, 1 + ( φ t )] a 2 t Vt 1, 1. From Cauchy-Schwarz nequalty and Lemma 4, we have 10

11 α rb t t (m 1 + 1) l + 4β (m 1 + 1) Γ t=1 ( (m 1 + 1) l + 4β 2(m 1 + 1)d log 1 + m D 2 ) λd = (m 1 + 1) l + 8 (m 1 + 1)d log [ λb + σ d log ( 1 + (m 1 + 1)D 2 ) λd ( λ + (m 1 + 1)D 2 ) ] (m 1 + 1) l + 8d(B ( 2(m 1 + 1)D 2 ) λ + σ) log (m 1 + 1), λδ where the last nequalty follows from the fact that λ D 2. On the other hand, snce t : r t b t r l and = n 1 + m 1 + 1, we may wrte λδ αr l n 1 (m 1 + 1)( l + αr l ) (19) + 8d(B ( 2(m 1 + 1)D 2 ) λ + σ) log (m 1 + 1). λδ The RHS of (19) s only postve for a fnte range of m, and thus, has a fnte upperbound. For m = (m 1 + 1), c 1 = 8d(B λ + σ), c 2 = 2D2 and c λδ 3 = ( l + αr l ), Lemma 8 reported n Appendx A provdes the followng upper-bound on the RHS (and thus for the LHS) of (19): [ ( αr l n 1 114d 2 (B λ + σ) 2 64d(B )] 2 λ + σ)d log. l + αr l λδ( l + αr l ) The result follows from n T = n = n We now have all the necessary ngredents to derve a regret bound on the performance of the CLUCB algorthm. We report the regret bound of CLUCB n Theorem 6, whose proof s a drect consequence of the results of Propostons 2 and 3, and Theorem 5. Theorem 6. Wth probablty at least 1 δ, the CLUCB algorthm satsfes the performance constrant (3) for all t N, and has the ( followng regret bound: R T (CLUCB) = O d log ( ) DT ) K h T +, (20) λδ αr l (αr l + l ) where K s a constant that depends only on the parameters of the problem as K = d 2 (B λ + σ) 2 l + αr l [ log ( 64d(B λ + σ)d λδ( l + αr l ) )] 2. 11

12 Remark 2. The frst term n the regret bound (20) s the regret of LUCB, whch grows at the rate T log(t ). The second term accounts for the loss ncurred by beng conservatve n order to satsfy the performance constrant (3). Our results ndcate that ths loss does not grow wth tme (snce CLUCB wll be conservatve only n a fnte number of tmes). Ths mproves over the regret bound derved n [15] for the MAB settng, where the regret of beng conservatve grows wth tme. Furthermore, the regret bound of Theorem 6 clearly ndcates that CLUCB s regret s larger for smaller values of α. Ths perfectly matches the ntuton that the agent must be more conservatve, and thus, suffers hgher regret for smaller values of α. Theorem 6 also ndcates that CLUCB s regret s smaller for smaller values of h, because when the baselne polcy π b s close to optmal, the algorthm does not lose much by beng conservatve. 4 Unknown Baselne Reward In ths secton, we consder the case where the expected rewards of the actons taken by the baselne polcy, rb t t, are unknown at the begnnng. We show how the CLUCB algorthm presented n Secton 3 should be changed to handle ths case, and present a new algorthm, called CLUCB2. We prove a regret bound for CLUCB2, whch s at the same rate as that for CLUCB. Ths shows that the lack of knowledge about the reward functon of the baselne polcy does not hurt our algorthm n terms of the rate of the regret. Algorthm 2 contans the pseudocode of CLUCB2. The man dfference wth CLUCB s n the condton that should be checked at each round t to see whether we should play the optmstc acton a t or the conservatve acton b t. Ths condton should be selected n a way that CLUCB2 satsfes the constrant (3). We may rewrte (3) as ra + r t a + α r t b (1 α) ( φ t b t + ) rb. (21) S t 1 S t 1 S c t 1 If we lower-bound the LHS and upper-bound the RHS of (21), we obtan mn θ, φ a θ C + φ t a + α mn θ, φ t b t θ C (22) t S t 1 (1 α) max θ C t θ, S c t 1 S t 1 φ b + φ t b t. Snce each confdence set C t s bult n a way to contan the true parameter θ wth hgh probablty, t s easy to see that (21) s satsfed whenever (22) s true. CLUCB2 uses both optmstc and conservatve actons, and ther correspondng rewards n buldng ts confdence sets. Specfcally for any t, we let Φ t = [φ 1 a 1, φ 2 a 2,, φ t a t ], Y t = [y 1, y 2,, y t ], V t = λi + Φ t Φ t, and defne the least-square estmate after round t as θ t = (Φ t Φ t + λi) 1 Φ t Y t. (23) 12

13 Algorthm 2 CLUCB2 Input: α, r l, B, F Intalze: n 0, z 0, w 0, v 0 and C 1 B for t = 1, 2, 3, do Let b t be the acton suggested by π b at round t Fnd (a t, θ) = arg max (a,θ) At Ct θ, φ t a Fnd R t = max θ Ct θ, v + φ t b t and L t = mnθ, z + φ t a + α max { } mn θ, w, nr θ C t l t θ C t f L t (1 α)r t then Play a t = a t and observe y t defned by (1) Set z z + φ t a and t else v v + φ t b t Play a t = b t and observe y t defned by (1) Set w = w + φ t b t and n n + 1 end f Gven a t and y t, construct the confdence set C t+1 accordng to (24) end for Gven V t and θ t, the confdence set for round t + 1 s constructed as C t+1 = {θ C t : θ θ } t Vt β t+1, (24) where C 1 = B and β t = σ d log ( 1+tD 2 /λ δ ) + B λ. Smlar to Proposton 1, we can easly prove that the confdence sets bult by (24) contan the true parameter θ wth hgh probablty,.e., P [ θ C t, t N ] 1 δ. Remark 3. Note that unlke the CLUCB algorthm, here we buld nested confdence sets,.e., C t+1 C t C t 1, whch s necessary for the proof of the algorthm. Potentally, ths can ncrease the computatonal complexty of CLUCB2, but from a practcal pont of vew, the confdence sets become nested automatcally after suffcent data has been observed. Therefore, the nested constrant n buldng the confdence sets can be relaxed at suffcently large rounds. The followng theorem guarantees that CLUCB2 satsfes the safety constrant (3) wth hgh probablty, whle ts regret has the same rate as that of CLUCB and s worse than that of LUCB only up to an addtve constant. Theorem 7. Wth probablty at least 1 δ, the CLUCB2 algorthm satsfes the performance constrant (3) for all t N, and has the followng regret bound ( ( ) ) DT T K h R T (CLUCB2) = O d log +, (25) λδ α 2 rl 2 13

14 Fgure 1: Average (over 1, 000 runs) per-step regret of LUCB and CLUCB for dfferent values of α. where K s a constant that depends only on the parameters of the problem as K = 256d 2 (B λ + σ) 2 [log ( 10d(B λ + σ) )] 2 D + 1. αr l (λδ) 1/4 We report the proof of Theorem 7 n Appendx B. The proof follows the same steps as that of Theorem 6, wth addtonal non-trval techncaltes that have been hghlghted there. 5 Smulaton Results In ths secton, we provde smulaton results to llustrate the performance of the proposed CLUCB algorthm. We consdered a tme ndependent acton set of 100 arms each havng a tme ndependent feature vector lvng n the R 4 space. These feature vectors and the parameter θ are randomly drawn from N ( ) 0, I 4 such that the mean reward assocated to each arm s postve. The observaton nose at each tme step s also generated ndependently from N (0, 1), and the mean reward of the baselne polcy at any tme s taken to be the reward assocated to the 10 th best acton. We have taken λ = 1, δ = and the results are averaged over 1,000 realzatons. In Fgure 1, we plot the per-step regret (.e., Rt ) of LUCB and CLUCB for dfferent values t of α over a horzon T = 40, 000. Fgure 1 shows that the per-step regret of CLUCB remans 14

15 Fgure 2: Percentage of the rounds, n the frst 1, 000 rounds, at whch the safety constrant s volated by LUCB and CLUCB for dfferent values of α. constant at the begnnng (the conservatve phase), because durng ths phase, CLUCB follows the baselne polcy to make sure that the performance constrant (3) s satsfed. As expected, the length of the conservatve phase decreases as α s ncreased, snce the performance constrant s relaxed for larger values of α, and hence, CLUCB starts playng optmstc actons more quckly. After ths ntal conservatve phase, CLUCB has learned enough about the optmal acton and ts performance starts convergng to that of LUCB. On the other hand, Fgure 1 shows that the per-step regret of CLUCB at the frst few perods remans much lower than that of LUCB. Ths s because LUCB plays agnostc to the safety constrant, and thus, may select very poor actons n ts ntal exploraton phase. In regard to ths, Fgure 2 plots the percentage of the rounds, n the frst 1, 000 rounds, at whch the safety constrant (3) s volated by LUCB and CLUCB for dfferent values of α. Accordng to ths fgure, CLUCB always satsfes the performance constrant for all the values of α, whle LUCB fals n a sgnfcant number of rounds, specally for small values of α (.e., tght constrant). To better see the effect of the safety constrant on the regret of the algorthms, Fgure 3 plots the per-step regret acheved by CLUCB at round t = 40, 000 for dfferent values of α, as well as that for LCUB. As expected from our analyss and s shown n Fgure 1, the performance of CLUCB converges to that of LUCB after an ntal conservatve phase. Fgure 3 confrms that such convergence happens more quckly for larger values of α, where the safety constrant s relaxed. 15

16 Fgure 3: Per-step regret of LUCB and CLUCB for dfferent values of α, at round t = 40, Concluson In ths paper, we studed the concept of safety n contextual lnear bandts to address the challenges that arse n mplementng such algorthms n practcal stuatons such as ad recommendaton systems. Most of the exstng lnear bandt algorthms, such as LUCB [1], suffer from a large regret at ther ntal exploratory rounds. Ths unsafe behavor s not acceptable n many practcal stuatons, where havng a reasonable performance at any tme s necessary for a learnng algorthm to be consdered relable and to reman n producton. To guarantee safe learnng, we formulated a conservatve lnear bandt problem, where the performance of the learnng algorthm (measured n terms of ts cumulatve rewards) at any tme s constraned to be at least as good as a fracton of the performance of a baselne polcy. We proposed a conservatve verson of the LUCB algorthm, called CLUCB, to solve ths constrant problem, and showed that t satsfes the safety constrant wth hgh probablty, whle achevng a regret bound equvalent to that of LUCB up to an addtve tme-ndependent constant. We desgned two versons of CLUCB that can be used dependng on whether the reward functon of the baselne polcy s known or unknown, and showed that n each case, CLUCB acts conservatvely (.e., plays the acton suggested by the baselne polcy) only at a fnte number of rounds, whch depends on how suboptmal the baselne polcy s. We reported smulaton results that support our analyss and show the performance of the proposed CLUCB algorthm. 16

17 References [1] Y. Abbas-Yadkor, D. Pál, and C. Szepesvár. Improved algorthms for lnear stochastc bandts. In Advances n Neural Informaton Processng Systems, pages , [2] P. Auer, N. Cesa-Banch, and P. Fscher. Fnte-tme analyss of the multarmed bandt problem. Machne Learnng Journal, 47: , [3] L. Bottou, J. Peters, J. Qunonero-Candela, D. Charles, D. Chckerng, E. Portugaly, D. Ray, P. Smard, and E. Snelson. Counterfactual reasonng and learnng systems: The example of computatonal advertsng. Journal of Machne Learnng Research, 14: , [4] W. Chu, L. L, L. Reyzn, and R. Schapre. Contextual bandts wth lnear payoff functons. In Proceedngs of the Fourteenth Internatonal Conference on Artfcal Intellgence and Statstcs, pages , [5] V. Dan, T. Hayes, and S. Kakade. Stochastc lnear optmzaton under bandt feedback. In COLT, pages , [6] N. Jang and L. L. Doubly robust off-polcy value evaluaton for renforcement learnng. In Proceedngs of the Thrty-Thrd Internatonal Conference on Machne Learnng, [7] M. Petrk, M. Ghavamzadeh, and Y. Chow. Safe polcy mprovement by mnmzng robust baselne regret. In Advances n Neural Informaton Processng Systems, [8] P. Rusmevchentong and J. Tstskls. Lnearly parameterzed bandts. Mathematcs of Operatons Research, 35(2): , [9] D. Russo and B. Van Roy. Learnng to optmze va posteror samplng. Mathematcs of Operatons Research, 39(4): , [10] A. Swamnathan and T. Joachms. Batch learnng from logged bandt feedback through counterfactual rsk mnmzaton. Journal of Machne Learnng Research, 16: , [11] A. Swamnathan and T. Joachms. Counterfactual rsk mnmzaton: Learnng from logged bandt feedback. In Proceedngs of The 32nd Internatonal Conference on Machne Learnng, [12] G. Theocharous, P. Thomas, and M. Ghavamzadeh. Buldng personal ad recommendaton systems for lfe-tme value optmzaton wth guarantees. In Proceedngs of the Twenty-Fourth Internatonal Jont Conference on Artfcal Intellgence, pages , [13] P. Thomas, G. Theocharous, and M. Ghavamzadeh. Hgh confdence off-polcy evaluaton. In Proceedngs of the Twenty-Nnth Conference on Artfcal Intellgence,

18 [14] P. Thomas, G. Theocharous, and M. Ghavamzadeh. Hgh confdence polcy mprovement. In Proceedngs of the Thrty-Second Internatonal Conference on Machne Learnng, pages , [15] Y. Wu, R. Sharff, T. Lattmore, and C. Szepesvár. Conservatve bandts. In Proceedngs of The 33rd Internatonal Conference on Machne Learnng, pages ,

19 A Techncal Detals of the Proof of Theorems 5 and 7 In the proof of Theorems 5 and 7, we use the followng lemma to bound the RHS of (16) and (43). Lemma 8. For any m 2 and c 1, c 2, c 3 > 0, the followng holds c 3 m + c 1 m log(c2 m) 16c2 1 9c 3 Proof. Defne the LHS of (26) as a functon g(m), m 2,.e., Frstly, note that we have g (m) = c 3 + c ( log(c2 m) ) 2 m g(m) = c 3 m + c 1 m log(c2 m). [ ( 2c1 c2 e )] 2 log. (26) c 3 and g (m) = c 1 log(c 2 m) 4m m. Ths mples that snce c 2 > 1, g s a dfferentable concave functon over ts doman [2, ), and thus, we can fnd m, the global maxmum of functon g. The frst order condton mples that g (m ) = 0, whch gves us Pluggng ths nto the defnton of g, we obtan 2 + log(c 2 m ) = 2c 3 c 1 m. (27) g = max m 2 g(m) = g(m ) = c 3 m 2c 1 m. Now, we use the change of varable x = c 3 2c 1 m, whch by (A) gves us On the other hand, (27) becomes Takng exp from both sdes gves us g = 4c2 1 c 3 (x 2 x). (28) ( 4c2 c 2 ) log + 2 log(x) = 4x. c 2 3 e 4x x = 4c2 1c 2 e 2. 2 Now, snce x 2 e x, for all x 0, t follows from (A) that c 2 3 4c 2 1c 2 e 2 c 2 3 = e4x x 2 e4x e x = e3x, 19

20 whch ndcates that Pluggng ths nto (28) gves us [ g 4c2 1 x 2 4c2 1 log c 3 9c 3 x 1 3 log ( 4c 2 1 c 2 e 2 ( 4c 2 1 c 2 e 2 ) ] 2 c 2 3 c 2 3 ). [ ( = 16c2 1 2c1 c2 e )] 2 log. 9c 3 c 3 The statement follows from the fact that g(m) g, for any m 2. B Proof of Theorem 7 Proof. Suppose the confdence sets do not fal, whch s true wth probablty at least 1 δ. Then, CLUCB2 satsfes constrants are all satsfed, snce t ensures that those constrants are satsfed by the worst parameter n the confdence set at any tme. Smlar to Proposton 2, we can decompose the regret of CLUCB2 as R T (CLUCB2) = ( ) ra t rt t a t + ) (r ta t rtbt t S T t ST c ( ) ra t rt t a t + n T h, t S T where n T = ST c s the number of tmes CLUCB2 follows the baselne polcy n T tme steps. Now note that for t S T, CLUCB2 s followng the acton suggested by LUCB and hence, ) (r ta t rtat R ST (LUCB) R T (LUCB), (29) t S T where R ST (LUCB) denotes the regret of LUCB played at tme steps t S T whch s upper bounded by the regret of LUCB played at all T tme steps. On the other hand, by Proposton 3, we have the followng regret bound for LCUB: ( ( ) ) D T R T (LUCB) = O d log λδ T. Thus, t follows that ( ( ) ) D T R T (CLUCB2) = O d log λδ T + nt h, (30) Note that accordng to (24), the confdence set C t whch CLUCB2 uses to fnd the optmstc acton at round t s bult based not only on the observatons made by prevously played optmstc actons, but also by the observatons made when the baselne polcy has been followed at rounds before t. Therefore, the confdence set C t used by CLUCB2 at round 20

21 t would be tghter than what LUCB would have had f t was appled only to rounds S t. Hence, the frst nequalty n (29) stll holds. Gven (30), we only need to show that CLUCB2 follows the baselne polcy only at a fnte number of rounds. Let be the last round that CLUCB2 follows the baselne polcy (plays conservatvely),.e., = max { 1 t T a t = b t }. At tme, let L = mn θ C θ, whch satsfes L mn θ C S 1 φ a + φ a + α max mn θ C θ, S c 1 θ, S 1 φ a + φ a + αn 1 r l, and R = max θ, φ b θ C. S 1 {} φ b, n 1 r l, From Algorthm 2 at tme, we have L < (1 α)r whch wth some smple algebra translates to αn 1 r l max θ, φ b θ C mn θ, φ a θ C + φ a α max θ, φ b θ C. (31) S 1 {} S 1 S 1 {} The rest of the proof s devoted to use (31) and prove a tme ndependent upper bound on n 1. Unlke n the proof of Theorem 5, we rely on the nested property of the confdence sets bult n (24) n ths proof. If the confdence sets do not fal (.e., θ C ), then snce : C C, we have max θ, φ b θ C S 1 {} S 1 {} Frst, t follows from (32) and the fact that θ, φ b rl that max θ, φ b max θ, φ b. (32) θ C θ C S 1 {} α max θ, θ C φ b α(m 1 + 1)r l. (33) S 1 {} On the other hand, by the defnton of optmstc acton a at round t follows that max θ C θ, φ b maxa A max θ C θ, φ a = maxθ C θ, φ a. Then, from (32) t follows that max θ, φ b θ C max θ, φ a + max θ, φ a θ C θ C. (34) S 1 {} S 1 Furthermore, snce : C C, we have θ, mn θ C S 1 φ a +φ a S 1 mn θ C θ, φ a +mn θ C 21 θ, φ a S 1 mn θ C θ, φ a +mn θ C θ, φ a. (35)

22 Combnng (33), (34), (35) wth (31) gves αn 1 r l α(m 1 +1)r l + [ ] [ ] max θ, φ a mn θ, φ a + max θ, φ a θ C θ C θ C mn θ, φ a θ C. S 1 [ (36) Now, note that for any, the reward of playng any acton s n [0, 1] and hence, max θ C θ, φ a ] θ, φ a 1. On the other hand, snce the confdence sets do not fal, we have mn θ C max θ, φ a mn θ C θ C θ, φ a = max θ θ 1, φ a θ C mn = 2 max θ C 2β φ V a 1. 1 θ C θ 1 θ, φ a θ θ 1, φ a 2 θ θ 1 V 1 φ a V 1 1 Hence snce β s are non-decreasng and all larger than 1, t follows that [ ] max θ, φ a mn θ, φ a 2β mn ( 1, ) φ V a 1. (37) θ C θ C 1 Smlarly, we can show that [ max θ, φ a θ C mn θ C θ, φ a ] 2β mn ( 1, φ a V 1 1 Substtutng (37) and (38) n (36) gves αn 1 r l α(m 1 + 1)r l + 2β mn ( ) ( 1, φ V a 1 + mn 1, φ a 1 S 1 ). (38) V 1 1 ). (39) Boundng the RHS of (39) gves rse to another key dfference between ths proof and that of Theorem 5. Note that V = λi + ( ) j S φ j a j φ j ( ) aj + j S φ j c b j φ j b j s bult not only based on the actons played at non-conservatve rounds but also based on the actons played at conservatve tmes (j S c ), and hence Lemma 4 cannot be drectly used to bound the RHS of (39). Instead, we defne for any : Ṽ = λi + ( ) j S φ j a j φ j aj whch satsfes V = Ṽ + j S c φ j b j ( φ j b j ). Therefore t follows that φ V a 1 φ Ṽ a 1, 1 1 φ a V φ 1 a Ṽ 1, 1 1 and hence, from (39) t follows that αn 1 r l α(m 1 + 1)r l + 2β mn ( ) ( 1, φ Ṽ a 1 + mn 1, φ a 1 S 1 22 Ṽ 1 1 ). (40)

23 Now, smlar to the proof of Theorem 5, we defne [ Γ = mn ( 1, φ a 2 ) ( V + mn 1, φ 1 a 1 S 1 2 V 1 1 ) ], whch by Lemma 4 satsfes Γ 2d log (1 + (m ) 1 + 1)D 2. (41) λd On the other hand, for n 1 3 2, we have ( ) 1 + (m n 1 ) β = σ d log D δ 2 /λ+b λ (σ+b ( ) 1 + (m 1 + 1)n 1 D λ) d log 2 /λ, δ (42) where we used = m 1 + n Usng (41) and (42), and an applcaton of Cauchy-Schwarz nequalty on (40) gves αn 1 r l α(m 1 + 1)r l + 2β (m 1 + 1)Γ ( α(m 1 + 1)r l + 2β 2d(m 1 + 1) log 1 + (m ) 1 + 1)D 2 λd = α(m 1 + 1)r l + 3d(B ( 2n 1 D 2 (m 1 λ + σ) log (m 1 + 1)) + 1). (43) λδ Note that n contrary to the proof of Theorem 5 where only m 1 appeared on the RHS of (19), here both n 1 and m 1 both appear on the RHS. To bound n 1, we frst provde an upper bound on the RHS of (43) n terms of n 1. For m = (m 1 + 1), c 1 = 3d(B λ + σ), c 2 = 2n 1D 2 and c λδ 3 = αr l, Lemma 8 provdes the followng upper bound on the RHS (and hence the LHS) of (43): whch s equvalent to [ αn 1 r l 16d 2 (B λ + σ) 2 log αr l n 1 4d (B λ + σ) αr l log ( 24d(B λ + σ)d n 1 λδαrl )] 2, ( ) 24d(B λ + σ)d n 1. (44) λδαrl Now, note that the LHS of (43) grows lnearly wth n 1 whle the RHS grows logarthmcally. Thus, such an nequalty holds only for a fnte number of n 1 s. Lemma 9 appled 2 If ths condton does not hold, then t results n the smple bound n

24 wth x = n 1, c 1 = 4d (B λ+σ) αr l n 1 : n d2 (B λ + σ) 2 α 2 r 2 l and c 2 = 24d(B λ+σ)d λδαrl, gves the followng upper bound on [ log ( 10d(B λ + σ) )] 2 D. αr l (λδ) 1/4 Therefore, CLUCB2 follows the baselne polcy only at n T = n = n d2 (B [ ( λ + σ) 2 10d(B λ + σ) )] 2 D log + 1 α 2 rl 2 αr l (λδ) 1/4 rounds, and hence accordng to (30), acheves a regret bound of ( ( ) D T R T (CLUCB2) = O d log λδ T h + K α 2 rl 2 where K = 256d 2 (B λ + σ) 2 [log ( ), 10d(B λ + σ) )] 2 D + 1. αr l (λδ) 1/4 C Techncal Detal Used n the Proof of Theorem 7 We used the followng Lemma n the proof of Theorem 7. Lemma 9. Let c 1 and c 2 be two postve constants such that log(c 1 c 2 ) 1. Then, any x > 0 satsfyng x c 1 log(c 2 x) also satsfes x 2c 1 log(c 1 c 2 ). Proof. Assume that x c 1 log(c 2 x) holds. Defne a = 1 c 1 c 2 and change the varble to z = c 2 x. Then, we have az log(z). Let q(z) = az and l(z) = log(z) and defne z = 2/a log(1/a). Frst, snce log(t 2 ) t for any t > 0, we have ( ) 1 1 a log log 1a ( a log 2 log 1 ) 2 log 1 ( 2 2 a a log a log 1 ). a By the defnton of z, q and l, the last nequalty s equvalent to q(z ) l(z ). (45) Furhtermore, snce log(1/a) 1, then for any z z : q a (z) = a 2 log(1/a) = l (z ) l (z). (46) Thus, t follows from (45) and (46) that q(z) l(z) for all z z. Thus, az log(z) s possble only for z z. Replacng the defnton of a, z and z, we deduce that x c 1 log(c 1 c 2 ) s possble only f x 2c 1 log(c 1 c 2 ). 24

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Lecture 14: Bandits with Budget Constraints

Lecture 14: Bandits with Budget Constraints IEOR 8100-001: Learnng and Optmzaton for Sequental Decson Makng 03/07/16 Lecture 14: andts wth udget Constrants Instructor: Shpra Agrawal Scrbed by: Zhpeng Lu 1 Problem defnton In the regular Mult-armed

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

The Minimum Universal Cost Flow in an Infeasible Flow Network

The Minimum Universal Cost Flow in an Infeasible Flow Network Journal of Scences, Islamc Republc of Iran 17(2): 175-180 (2006) Unversty of Tehran, ISSN 1016-1104 http://jscencesutacr The Mnmum Unversal Cost Flow n an Infeasble Flow Network H Saleh Fathabad * M Bagheran

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

1 The Mistake Bound Model

1 The Mistake Bound Model 5-850: Advanced Algorthms CMU, Sprng 07 Lecture #: Onlne Learnng and Multplcatve Weghts February 7, 07 Lecturer: Anupam Gupta Scrbe: Bryan Lee,Albert Gu, Eugene Cho he Mstake Bound Model Suppose there

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013 COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 )

Yong Joon Ryang. 1. Introduction Consider the multicommodity transportation problem with convex quadratic cost function. 1 2 (x x0 ) T Q(x x 0 ) Kangweon-Kyungk Math. Jour. 4 1996), No. 1, pp. 7 16 AN ITERATIVE ROW-ACTION METHOD FOR MULTICOMMODITY TRANSPORTATION PROBLEMS Yong Joon Ryang Abstract. The optmzaton problems wth quadratc constrants often

More information

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan

Winter 2008 CS567 Stochastic Linear/Integer Programming Guest Lecturer: Xu, Huan Wnter 2008 CS567 Stochastc Lnear/Integer Programmng Guest Lecturer: Xu, Huan Class 2: More Modelng Examples 1 Capacty Expanson Capacty expanson models optmal choces of the tmng and levels of nvestments

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Lecture 4: November 17, Part 1 Single Buffer Management

Lecture 4: November 17, Part 1 Single Buffer Management Lecturer: Ad Rosén Algorthms for the anagement of Networs Fall 2003-2004 Lecture 4: November 7, 2003 Scrbe: Guy Grebla Part Sngle Buffer anagement In the prevous lecture we taled about the Combned Input

More information

Message modification, neutral bits and boomerangs

Message modification, neutral bits and boomerangs Message modfcaton, neutral bts and boomerangs From whch round should we start countng n SHA? Antone Joux DGA and Unversty of Versalles St-Quentn-en-Yvelnes France Jont work wth Thomas Peyrn 1 Dfferental

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

More metrics on cartesian products

More metrics on cartesian products More metrcs on cartesan products If (X, d ) are metrc spaces for 1 n, then n Secton II4 of the lecture notes we defned three metrcs on X whose underlyng topologes are the product topology The purpose of

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

Best-arm Identification Algorithms for Multi-Armed Bandits in the Fixed Confidence Setting

Best-arm Identification Algorithms for Multi-Armed Bandits in the Fixed Confidence Setting Best-arm Identfcaton Algorthms for Mult-Armed Bandts n the Fxed Confdence Settng Kevn Jameson and Robert Nowak Department of Electrcal and Computer Engneerng Unversty of Wsconsn - Madson Emal: kgjameson@wsc.edu

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

The internal structure of natural numbers and one method for the definition of large prime numbers

The internal structure of natural numbers and one method for the definition of large prime numbers The nternal structure of natural numbers and one method for the defnton of large prme numbers Emmanul Manousos APM Insttute for the Advancement of Physcs and Mathematcs 3 Poulou str. 53 Athens Greece Abstract

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM Internatonal Conference on Ceramcs, Bkaner, Inda Internatonal Journal of Modern Physcs: Conference Seres Vol. 22 (2013) 757 761 World Scentfc Publshng Company DOI: 10.1142/S2010194513010982 FUZZY GOAL

More information

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1] DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm

More information

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning

Erratum: A Generalized Path Integral Control Approach to Reinforcement Learning Journal of Machne Learnng Research 00-9 Submtted /0; Publshed 7/ Erratum: A Generalzed Path Integral Control Approach to Renforcement Learnng Evangelos ATheodorou Jonas Buchl Stefan Schaal Department of

More information

Finding Primitive Roots Pseudo-Deterministically

Finding Primitive Roots Pseudo-Deterministically Electronc Colloquum on Computatonal Complexty, Report No 207 (205) Fndng Prmtve Roots Pseudo-Determnstcally Ofer Grossman December 22, 205 Abstract Pseudo-determnstc algorthms are randomzed search algorthms

More information

Lecture 10 Support Vector Machines. Oct

Lecture 10 Support Vector Machines. Oct Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron

More information

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 7, Number 2, December 203 Avalable onlne at http://acutm.math.ut.ee A note on almost sure behavor of randomly weghted sums of φ-mxng

More information

On the correction of the h-index for career length

On the correction of the h-index for career length 1 On the correcton of the h-ndex for career length by L. Egghe Unverstet Hasselt (UHasselt), Campus Depenbeek, Agoralaan, B-3590 Depenbeek, Belgum 1 and Unverstet Antwerpen (UA), IBW, Stadscampus, Venusstraat

More information

Complete subgraphs in multipartite graphs

Complete subgraphs in multipartite graphs Complete subgraphs n multpartte graphs FLORIAN PFENDER Unverstät Rostock, Insttut für Mathematk D-18057 Rostock, Germany Floran.Pfender@un-rostock.de Abstract Turán s Theorem states that every graph G

More information

Computing Correlated Equilibria in Multi-Player Games

Computing Correlated Equilibria in Multi-Player Games Computng Correlated Equlbra n Mult-Player Games Chrstos H. Papadmtrou Presented by Zhanxang Huang December 7th, 2005 1 The Author Dr. Chrstos H. Papadmtrou CS professor at UC Berkley (taught at Harvard,

More information

Stochastic Multi-armed Bandits in Constant Space

Stochastic Multi-armed Bandits in Constant Space Davd Lau davdlau@utexas.edu Erc Prce Zhao Song ecprce@cs.utexas.edu zhaos@utexas.edu The Unversty of Texas at Austn Ger Yang geryang@utexas.edu Abstract We consder the stochastc bandt problem n the sublnear

More information

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Power law and dimension of the maximum value for belief distribution with the max Deng entropy Power law and dmenson of the maxmum value for belef dstrbuton wth the max Deng entropy Bngy Kang a, a College of Informaton Engneerng, Northwest A&F Unversty, Yanglng, Shaanx, 712100, Chna. Abstract Deng

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Société de Calcul Mathématique SA

Société de Calcul Mathématique SA Socété de Calcul Mathématque SA Outls d'ade à la décson Tools for decson help Probablstc Studes: Normalzng the Hstograms Bernard Beauzamy December, 202 I. General constructon of the hstogram Any probablstc

More information

Online Appendix. t=1 (p t w)q t. Then the first order condition shows that

Online Appendix. t=1 (p t w)q t. Then the first order condition shows that Artcle forthcomng to ; manuscrpt no (Please, provde the manuscrpt number!) 1 Onlne Appendx Appendx E: Proofs Proof of Proposton 1 Frst we derve the equlbrum when the manufacturer does not vertcally ntegrate

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals Smultaneous Optmzaton of Berth Allocaton, Quay Crane Assgnment and Quay Crane Schedulng Problems n Contaner Termnals Necat Aras, Yavuz Türkoğulları, Z. Caner Taşkın, Kuban Altınel Abstract In ths work,

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

Lecture 10: May 6, 2013

Lecture 10: May 6, 2013 TTIC/CMSC 31150 Mathematcal Toolkt Sprng 013 Madhur Tulsan Lecture 10: May 6, 013 Scrbe: Wenje Luo In today s lecture, we manly talked about random walk on graphs and ntroduce the concept of graph expander,

More information

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India February 2008

Game Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India February 2008 Game Theory Lecture Notes By Y. Narahar Department of Computer Scence and Automaton Indan Insttute of Scence Bangalore, Inda February 2008 Chapter 10: Two Person Zero Sum Games Note: Ths s a only a draft

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

A new Approach for Solving Linear Ordinary Differential Equations

A new Approach for Solving Linear Ordinary Differential Equations , ISSN 974-57X (Onlne), ISSN 974-5718 (Prnt), Vol. ; Issue No. 1; Year 14, Copyrght 13-14 by CESER PUBLICATIONS A new Approach for Solvng Lnear Ordnary Dfferental Equatons Fawz Abdelwahd Department of

More information

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS BOUNDEDNESS OF THE IESZ TANSFOM WITH MATIX A WEIGHTS Introducton Let L = L ( n, be the functon space wth norm (ˆ f L = f(x C dx d < For a d d matrx valued functon W : wth W (x postve sem-defnte for all

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

Semi-supervised Classification with Active Query Selection

Semi-supervised Classification with Active Query Selection Sem-supervsed Classfcaton wth Actve Query Selecton Jao Wang and Swe Luo School of Computer and Informaton Technology, Beng Jaotong Unversty, Beng 00044, Chna Wangjao088@63.com Abstract. Labeled samples

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Lecture 17 : Stochastic Processes II

Lecture 17 : Stochastic Processes II : Stochastc Processes II 1 Contnuous-tme stochastc process So far we have studed dscrete-tme stochastc processes. We studed the concept of Makov chans and martngales, tme seres analyss, and regresson analyss

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Fuzzy Boundaries of Sample Selection Model

Fuzzy Boundaries of Sample Selection Model Proceedngs of the 9th WSES Internatonal Conference on ppled Mathematcs, Istanbul, Turkey, May 7-9, 006 (pp309-34) Fuzzy Boundares of Sample Selecton Model L. MUHMD SFIIH, NTON BDULBSH KMIL, M. T. BU OSMN

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

Welfare Properties of General Equilibrium. What can be said about optimality properties of resource allocation implied by general equilibrium?

Welfare Properties of General Equilibrium. What can be said about optimality properties of resource allocation implied by general equilibrium? APPLIED WELFARE ECONOMICS AND POLICY ANALYSIS Welfare Propertes of General Equlbrum What can be sad about optmalty propertes of resource allocaton mpled by general equlbrum? Any crteron used to compare

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION Advanced Mathematcal Models & Applcatons Vol.3, No.3, 2018, pp.215-222 ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EUATION

More information

GeNGA: A Generalization of Natural Gradient Ascent with Positive and Negative Convergence Results

GeNGA: A Generalization of Natural Gradient Ascent with Positive and Negative Convergence Results : A Generalzaton of Natural Gradent Ascent wth Postve and Negatve Convergence Results Phlp S. Thomas School of Computer Scence, Unversty of Massachusetts, Amherst, MA 13 USA PTHOMAS@CS.UMASS.EDU Abstract

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm Desgn and Optmzaton of Fuzzy Controller for Inverse Pendulum System Usng Genetc Algorthm H. Mehraban A. Ashoor Unversty of Tehran Unversty of Tehran h.mehraban@ece.ut.ac.r a.ashoor@ece.ut.ac.r Abstract:

More information

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space.

Linear, affine, and convex sets and hulls In the sequel, unless otherwise specified, X will denote a real vector space. Lnear, affne, and convex sets and hulls In the sequel, unless otherwse specfed, X wll denote a real vector space. Lnes and segments. Gven two ponts x, y X, we defne xy = {x + t(y x) : t R} = {(1 t)x +

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem. prnceton u. sp 02 cos 598B: algorthms and complexty Lecture 20: Lft and Project, SDP Dualty Lecturer: Sanjeev Arora Scrbe:Yury Makarychev Today we wll study the Lft and Project method. Then we wll prove

More information

The Experts/Multiplicative Weights Algorithm and Applications

The Experts/Multiplicative Weights Algorithm and Applications Chapter 2 he Experts/Multplcatve Weghts Algorthm and Applcatons We turn to the problem of onlne learnng, and analyze a very powerful and versatle algorthm called the multplcatve weghts update algorthm.

More information

Portfolios with Trading Constraints and Payout Restrictions

Portfolios with Trading Constraints and Payout Restrictions Portfolos wth Tradng Constrants and Payout Restrctons John R. Brge Northwestern Unversty (ont wor wth Chrs Donohue Xaodong Xu and Gongyun Zhao) 1 General Problem (Very) long-term nvestor (eample: unversty

More information