DISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION
|
|
- June Atkinson
- 5 years ago
- Views:
Transcription
1 DISCRETE PREDICTION PROBLEMS: RANDOMIZED PREDICTION Csaba Szepesvári Uiversity of Alberta CMPUT UofA, October , 2006
2 OUTLINE 1 DISCRETE PREDICTION PROBLEMS 2 RANDOMIZED FORECASTERS 3 WEIGHTED AVERAGE FORECASTER 4 FOLLOW THE PERTURBED LEADER 5 BIBLIOGRAPHY
3 BINARY PREDICTION PROBLEMS Biary predictio problem : D = Y = {0, 1}, l(p, y) = I {p y} Loss of forecaster: Loss of expert i: Loss of best expert: ˆL = l(ˆp t, y t ) L i = l(f it, y t ) Goal: Miimize regret, i.e., L = mi L i i R = ˆL L mi
4 BINARY PREDICTION PROBLEMS/2 Propositio: Cosider biary predictio problems. For ay determiistic forecaster y 1: s.t. ˆL (y 1: ) =, where ˆL (y 1: ) is the forecaster s loss o y 1: Proof: ˆp t is based o past iformatio. Hece t, y t ca be selected to let l(ˆp t, y t ) = 1. Qu.e.d. Corollary: There is o determiistic forecaster whose regret is subliear for ay biary predictio problem ad ay set of experts. Proof: Let N = 2, f 1t 0, f 2t 1. The y 1:, L (y 1: ) /2. Pick some y 1: that forces ˆL (y 1: ) =. Hece ˆL (y 1: ) L (y 1: ) /2 = /2. Idea: Radomize the forecaster as this falsifies the above propositio! (prevets worst-case)
5 RANDOMIZED FORECASTERS N def = {1, 2,..., N}. Covetio: l : N Y R, l(i, y) Note: sice l ad Y are ot further restricted, geerality is ot lost Radom choice: I t N is radom variable Forecaster computes I t based o Past iformatio (past decisios, past outcomes) U t U [0,1) Notatio: p it def = P (I t = i I 1:t 1, Y 1:t 1 ) Outcomes ca also be radomized. Outcomes do ot deped o the past actios I 1:t 1! Oblivious or o-reactive oppoet/eviromet, (stock, wheather, etc.)
6 WEIGHTED AVERAGE FORECASTER [LITTLESTONE AND WARMUTH, 1994] Previous result o EWA: THEOREM (LOSS BOUND FOR THE EWA FORECASTER) Assume that D is a covex subset of some vector-space. Let l : D Y [0, 1] be covex i its first argumet. The, for EWA (ˆp t = With η = P P i w i,t 1f it j w j,t 1, w i,t 1 = e ηl i,t 1) it holds: ˆL L l N η + η 8. 8 l N, ˆL L /2 l N. Let f it = e i (ith uit vector), ˆp it = w i,t 1 P N j=1 w j,t 1 l(p, y) def = N i=1 p il(i, y), l is covex i p def D = 1 = {p R N p i 0, j p i = 1 } R N is covex.
7 BOUND ON THE PSEUDO-EXPECTED REGRET EWA: ˆp t = P P i w i,t 1f it j w j,t 1, w i,t 1 = e ηl i,t 1 THEOREM (LOSS BOUND FOR THE EWA FORECASTER: RANDOMIZED PREDICTIONS) Let l : N Y [0, 1]. The, for EWA it holds: L L l N η + η 8. 8 l N With η =, L L /2 l N. Here Note: N L = l(ˆp t, Y t ) = ˆp it l(i, Y t ). i=1 l(ˆp t, Y t ) = E [l(i t, Y t ) Y 1:t, I 1:t 1 ] (= E t [l(i t, Y t )]).
8 BOUND ON THE ACTUAL REGRET? What about ˆL L?? ˆL = l(i t, Y t )?? l(ˆp t, Y t ) = L l(ˆp t, Y t ) is the (coditioal) expected value of l(i t, Y t ) Sums of i.i.d. radom variables are -close to their expectatios! Hoeffdig: if X 1,..., X is i.i.d., a t X t b t the for = (X t E [X t ]) 2ɛ P ( > ɛ) exp( P 2 (b ), t a t ) 2 2ɛ P ( < ɛ) exp( P 2 (b ). t a t ) 2 Whe b t a t 1, with prob. 1 δ, (X t E [X t ]) (/2) l(1/δ).
9 WEIGHTED AVERAGE FORECASTER: ACTUAL REGRET ˆL L = (l(i t, Y t ) l(ˆp t, Y t )); Boud? E [l(i t, Y t )] l(ˆp t, Y t ), but.. E [l(i t, Y t ) Y 1:t, I 1:t 1 ] = = = = N E [l(i t, Y t ) Y 1:t, I 1:t 1, I t = i] P (I t = i Y 1:t, I 1:t 1 ) i=1 N l(i, Y t ) P (I t = i Y 1:t 1, I 1:t 1 ) 1 i=1 N l(i, Y t ) p it = l(ˆp t, Y t ). i=1 0 l(ˆp t, Y t ) l(i t, Y t ) l(ˆp t, Y t ) 1 l(ˆp t, Y t ) (boudedess) 1 I t, Y t are idepedet give the past
10 HOEFFDING-AZUMA INEQUALITY DEFINITION (MARTINGALE DIFFERENCE SERIES) The sequece of radom variables, V 1, V 2,..., is a martigale differece series w.r.t. X 1, X 2,... if t N, V t is a fuctio of X 1,..., X t ad E [V t X 1:t 1 ] = 0 w.p.1. THEOREM (HOEFFDING-AZUMA) Assume that V 1, V 2,... is a martigale differece series w.r.t. X 1, X 2,... such that V t [A t, A t + c t ] where c t is a (o-radom) positive costat, A t is a fuctio of X 1:t 1. The for = V t, P ( > ɛ) exp( P 2ɛ2 c2 t ), P ( < ɛ) exp( P 2ɛ2 ). c2 t COROLLARY If c t 1 the w.p. 1 δ, (/2) l(1/δ).
11 BOUND ON THE RANDOM REGRET By applyig the Hoeffdig-Azuma (H-A) iequality to V t = l(i t, Y t ) l(ˆp t, Y t ), X 1 = (I 1, Y 1, Y 2 ), X t = (I t, Y t+1 ) (t > 1) we get: THEOREM (LOSS BOUND FOR THE EWA FORECASTER: RANDOM REGRET) Let l : N Y [0, 1]. The, for EWA it holds: ˆL L l N η + η l(1/δ) With η = 8 l N, ˆL L 2 l N + 2 l(1/δ).
12 BERNSTEIN S INEQUALITY THEOREM (BERNSTEIN S INEQUALITY FOR MARTINGALE DIFFERENCES) Assume that V 1, V 2,... is a martigale differece series w.r.t. X 1, X 2,... such that V t K. Let Σ 2 = = [ ] E Vt 2 X 1:t 1, V t. The for all Σ,δ > 0 ( ) 2 P 2Σ 2 log( 1 δ ) + 3 K log( 1 δ ) or Σ2 Σ 2 1 δ
13 SMALL LOSSES Previous small-loss boud: 2L l N + l N Radom fluctuatios: add /2 l(1/δ) too big! Berstei s iequality uses the predictable variace to boud the fluctuatios Boud o the predictable variace : E t [ (l(i t, Y t ) l(ˆp t, Y t )) 2] = E t [l(i t, Y t ) 2] l 2 (ˆp t, Y t ) E t [l(i t, Y t ) 2] E t [l(i t, Y t )] = l(ˆp t, Y t ) the effect of radom fluctuatios is comparable with the boud o the expected regret: ( l(it, Y t ) l(ˆp t, Y t ) ) 2L l(1/δ) l(1/δ).
14 FOLLOW THE LEADER Does it work? Take N = 2: Choices: l(1, y t ) : 1 2, 0, 1, 0, 1, 0,... l(2, y t ) : 1 2, 1, 0, 1, 0, 1,... 1 L l(1, y t ) : 11 =.5, 0 L 12 =.5, 1 L13=1.5, 0 L14=1.5, 1 L15=2.5, 0, L l(2, y t ) : 21 =.5 2, 1 L22=1.5, 0 L22=1.5, 1 L23=2.5, 0 L24=2.5, 1,... ˆL = , whilst L i /2, i = 1, 2, ˆL L /2 1.5
15 FOLLOW THE PERTURBED LEADER [Haa, 1957] Follow the perturbed leader (radomized fictitous play): ( ) I t = argmi i=1,...,n Li,t 1 + Z it, Z t f ( ), i.i.d. Goal: develop boud o L! Relate to BEH: ( ) Î t = argmi i N Li,t + Z i,t.
16 FPL: ANALYSIS, PLAN 1 ˆL ad ˆL BEH are close i expectatios: [ ] [ ] E l(i t, y t ) E l(ît, y t ) 2 ˆLBEH ad L are close: l(ît, y t ) 3 Estimate E [Boud ]. l(î, y t ) + Boud L + Boud
17 STEP 1: Goal: ˆL BEH ˆL BOUND [ ] [ ] E l(i t, y t ) E l(ît, y t ) E [l(i t, y t )] = E [ l(argmi i (L i,t 1 + Z it ), y t ) ] = E [F t (Z t )] = F t (z)f (z)dz, where F t (z) = l(argmi i (L i,t 1 + z i ), y t ) E [ l(argmi i (L i,t + Z it ), y t ) ] = E [ l(argmi i (L i,t 1 + l it + Z it ), y t ) ] = E [F t (Z t + l t )] = F t (z + l t )f (z)dz, where l it = l(i, y t ), l t = (l(1, y t ),..., l(n, y t )).
18 STEP 1: ˆL BEH ˆL BOUND/2 E [l(i t, y t )] = [ ] E l(ît, y t ) = F t (z + l t )f (z)dz = F t (z)f (z)dz, F t (z)f (z l t )dz. E [l(i t, y t )] = E [l(i t, y t )] F t (z)f (z)dz sup z,t ( ) sup z,t f (z) f (z l t ) - Choose e.g. f (z) = ( η 2 )N e η z 1, the f (z) f (z l t ) F t (z)f (z l t )dz [ ] E l(ît, y t ). ( ) f (z) f (z l t ) = e η( z 1 z l t 1 ) e η l t 1 e η.. provided that l t 1 1: TODO!
19 STEP 2: ˆL BEH L BOUND ˆL BEH = l(ît, y t ) Î t = argmi i (( t s=1 l(i, y s)) + Z it ). Pla: 1 BEH Boud ˆL = l(ît, y t ) by l(î, y t ) 2 Boud l(î, y t ) by L. I fact, for Step 2.2: { } l(î, y t ) + ZÎ, = mi(l i, + Z i ) i mi i (L i + max Here L has terms overgrows max j Z j! j Z j ) = L + max Z j j
20 STEP 2.1: ˆL BEH L BOUND We kow: For L t (p) = t s=1 l s(p), p t = argmi p L t (p), l t (p t ) l t (p ). Reuse? Î t = argmi i { ( t s=1 l(i, y s)) + Z it }. Rewrite as a miimum of a sum of losses: t ( ) Î t = argmi i l(i, ys ) + Z is Z i,s 1 s=1 t =: argmi i ˆl s (i), here Z i0 = 0 s=1 ˆl s (i) def = l(i, y s ) + Z is Z i,s 1 Reuse: ˆl t (Ît) ˆl t (Î)
21 STEP 2: ˆL BEH L BOUND/3 ˆl t (Ît) ˆl t (Î) ˆl t (Ît) = l(ît, y t ) + ZÎt,t Z Î t,t 1 = ˆL BEH + ZÎt,t Z Î t,t 1 ˆl t (Î) = l(î, y t ) + ZÎ, L + max Z j ( see above) j ˆL BEH + ˆL BEH ( ZÎt,t Z Î t,t 1 L + max Z i, + i ) L + max Z i, i max( Z i,t + Z i,t 1 ). i
22 STEP 3: TAKE EXPECTATIONS ˆL BEH = l(ît, y t ) L + max Z i, + i max(z i,t 1 Z i,t ) ( ) i Pla: Take expectatios Problem: Hard to cotrol E [ max i (Z i,t 1 Z i,t ) ] terms! Pla: Get rid of it! Idea: If Z t = Z t 1 (t 2), but Z 1 f ( ) the for [l(î ] [ ] Î t = argmi i (L it + Z it ), we have E t, y t) = E l(ît, y t ). (*) still applies to Z t ad Î t! [ ] [ ] E l(ît, y t ) = E l(î t, y t ) [ ] [ ] L + E max Z i + E max( Z i ). i i
23 SUMMARY Assumig that l t 1 1, Z t f (z) = ( η 2 )N e η z 1, ( ) f (z) E [l(i t, y t )] sup z,t f (z l t ) [ ] [ ] [ E l(ît, y t ) L + E max Z i + E i [ ] [ ] E l(ît, y t ) e η E l(ît, y t ) max( Z i ) i Hece [ ] [ ]) E [l(i t, y t )] e (L η + E max Z i + E max( Z i ). i i Outstadig issues: Show that we may assume that l t 1 1 Estimate E [max i Z i ] ad E [max i ( Z i )] Note: Z ad Z are idetically distributed, hece E [max i Z i ] = E [max i ( Z i )]. ].
24 ESTIMATE OF E [max i Z i ] E [max i Z i1 ] E [max i Z i1 ] = 0 P (max i Z i1 > u) du itegrate the tail P (max i Z i1 > u) NP ( Z 11 > u) N e ηu uio boud v 0 e ηu du = e ηv /η. P (max i Z i1 > u) du P (max i Z i1 > u) du v + N η e ηv v v Choose v = l(n)/η to get [ ] E max Z i1 (1 + l(n))/η. i..ad L e η ( L + ) 2(1+l N) η.
25 CAN WE ASSUME THAT l t 1 1? I geeral: NO Idea: l t 1 small sparse rewards (may zeroes) Sparsify rewards!
26 SPARSIFYING THE REWARDS Trasform rouds ito N rouds: #t l 1t l 2t... l Nt l 1t #N(t 1) l 2t... 0 #N(t 1) l Nt #Nt We have l ew s 1 1, sice 0 l it 1 p orig it, pis ew : actio probs; T t = N(t 1) + 1 Sychroicity of losses: L orig i,t 1 = Lew i,t t 1 porig it = pi,t ew t l 1t 0 first actio s prob. decreases from T t to T t + 1: p1,t ew t +1 pew 1,T t, others icrease. Repeat for T t + 2,... Hece, L orig t L ew T t ( L orig L ew N e η L,ew N + L ew T t L ew Nt ) ( 2(1+l N) η = e η L,orig + ) 2(1+l N) η
27 FPL BOUND THEOREM (FPL BOUND [KALAI AND VEMPALA, 2003]) Let l : N Y [0, 1] ad cosider FPL! Let Z t ( η 2 )N e η z 1. The ] ( ) E [ˆL e η E [L 2(1 + l N) ] + η Choose η = mi{1, 2(1 + l N)/((e 1)L )}. The E [L ] E [L ] 2 2(e 1)L (1 + l N) + 2(e + 1)(1 + l N). PROOF. Just combie the facts of the previous slides!
28 REFERENCES Haa, J. (1957). Approximatio to Bayes risk i repeated play. Cotributios to the theory of games, 3: Kalai, A. ad Vempala, S. (2003). Efficiet algorithms for the olie decisio problem. I Proceedigs of the 16th Aual Coferece o Learig Theory, pages Spriger. Littlestoe, N. ad Warmuth, M. (1994). The weighted majority algorithm. Iformatio ad Computatio, 108:
THE WEIGHTED MAJORITY ALGORITHM
THE WEIGHTED MAJORITY ALGORITHM Csaba Szepesvári University of Alberta CMPUT 654 E-mail: szepesva@ualberta.ca UofA, October 3, 2006 OUTLINE 1 PREDICTION WITH EXPERT ADVICE 2 HALVING: FIND THE PERFECT EXPERT!
More informationOn Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities
O Equivalece of Martigale Tail Bouds ad Determiistic Regret Iequalities Sasha Rakhli Departmet of Statistics, The Wharto School Uiversity of Pesylvaia Dec 16, 2015 Joit work with K. Sridhara arxiv:1510.03925
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 3 : Olie Learig, miimax value, sequetial Rademacher complexity Recap: Miimax Theorem We shall use the celebrated miimax theorem as a key tool to boud the miimax rate
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More information1 Review and Overview
CS9T/STATS3: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #6 Scribe: Jay Whag ad Patrick Cho October 0, 08 Review ad Overview Recall i the last lecture that for ay family of scalar fuctios F, we
More informationSelf-normalized deviation inequalities with application to t-statistic
Self-ormalized deviatio iequalities with applicatio to t-statistic Xiequa Fa Ceter for Applied Mathematics, Tiaji Uiversity, 30007 Tiaji, Chia Abstract Let ξ i i 1 be a sequece of idepedet ad symmetric
More informationAda Boost, Risk Bounds, Concentration Inequalities. 1 AdaBoost and Estimates of Conditional Probabilities
CS8B/Stat4B Sprig 008) Statistical Learig Theory Lecture: Ada Boost, Risk Bouds, Cocetratio Iequalities Lecturer: Peter Bartlett Scribe: Subhrasu Maji AdaBoost ad Estimates of Coditioal Probabilities We
More informationLecture 10 October Minimaxity and least favorable prior sequences
STATS 300A: Theory of Statistics Fall 205 Lecture 0 October 22 Lecturer: Lester Mackey Scribe: Brya He, Rahul Makhijai Warig: These otes may cotai factual ad/or typographic errors. 0. Miimaxity ad least
More informationConcentration inequalities
Cocetratio iequalities Jea-Yves Audibert 1,2 1. Imagie - ENPC/CSTB - uiversité Paris Est 2. Willow (INRIA/ENS/CNRS) ThRaSH 2010 with Problem Tight upper ad lower bouds o f(x 1,..., X ) X 1,..., X i.i.d.
More informationThe Maximum-Likelihood Decoding Performance of Error-Correcting Codes
The Maximum-Lielihood Decodig Performace of Error-Correctig Codes Hery D. Pfister ECE Departmet Texas A&M Uiversity August 27th, 2007 (rev. 0) November 2st, 203 (rev. ) Performace of Codes. Notatio X,
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationHOMEWORK 4 SOLUTIONS. E[f(X)g(Y ) A] = E f(x)e[g(y ) A] A = E[f(Y ) A]E[g(Y ) A],
18.445 HOMEWORK 4 SOLUIOS Exercise 1. Let X, Y be two radom variables o (Ω, F, P). Let A F be a sub-σ-algebra. he radom variables X ad Y are said to be idepedet coditioally o A is for every o-egative measurable
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week Lecture: Cocept Check Exercises Starred problems are optioal. Statistical Learig Theory. Suppose A = Y = R ad X is some other set. Furthermore, assume P X Y is a discrete
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More information18.657: Mathematics of Machine Learning
18.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 15 Scribe: Zach Izzo Oct. 27, 2015 Part III Olie Learig It is ofte the case that we will be asked to make a sequece of predictios,
More informationNYU Center for Data Science: DS-GA 1003 Machine Learning and Computational Statistics (Spring 2018)
NYU Ceter for Data Sciece: DS-GA 003 Machie Learig ad Computatioal Statistics (Sprig 208) Brett Berstei, David Roseberg, Be Jakubowski Jauary 20, 208 Istructios: Followig most lab ad lecture sectios, we
More informationSupplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate
Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More informationBinary classification, Part 1
Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machie Learig Lecturer: Philippe Rigollet Lecture 4 Scribe: Cheg Mao Sep., 05 I this lecture, we cotiue to discuss the effect of oise o the rate of the excess risk E(h) = R(h) R(h
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURE 23. SOME CONSEQUENCES OF ONLINE NO-REGRET METHODS I this lecture, we explore some cosequeces of the developed techiques.. Covex optimizatio Wheever
More informationLecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables
CSCI-B609: A Theorist s Toolkit, Fall 06 Aug 3 Lecture 0: the Cetral Limit Theorem Lecturer: Yua Zhou Scribe: Yua Xie & Yua Zhou Cetral Limit Theorem for iid radom variables Let us say that we wat to aalyze
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationLearning Theory: Lecture Notes
Learig Theory: Lecture Notes Kamalika Chaudhuri October 4, 0 Cocetratio of Averages Cocetratio of measure is very useful i showig bouds o the errors of machie-learig algorithms. We will begi with a basic
More informationRademacher Complexity
EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More information6.883: Online Methods in Machine Learning Alexander Rakhlin
6.883: Olie Methods i Machie Learig Alexader Rakhli LECTURES 5 AND 6. THE EXPERTS SETTING. EXPONENTIAL WEIGHTS All the algorithms preseted so far halluciate the future values as radom draws ad the perform
More informationLecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)
Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationHannan consistency in on-line learning in case of unbounded losses under partial monitoring
Haa cosistecy i o-lie learig i case of ubouded losses uder partial moitorig Chamy Alleberg, Peter Auer 2, László Györfi 3, ad György Ottucsák 3 School of Computer Sciece Tel Aviv Uiversity Tel Aviv, Israel,
More informationOnline Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb
Olie Covex Optimizatio i the Badit Settig: Gradiet Descet Without a Gradiet -Aviash Atreya Feb 9 2011 Outlie Itroductio The Problem Example Backgroud Notatio Results Oe Poit Estimate Mai Theorem Extesios
More informationLecture 11 and 12: Basic estimation theory
Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis
More informationConvergence of random variables. (telegram style notes) P.J.C. Spreij
Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More informationRates of Convergence by Moduli of Continuity
Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity
More informationAsymptotic distribution of products of sums of independent random variables
Proc. Idia Acad. Sci. Math. Sci. Vol. 3, No., May 03, pp. 83 9. c Idia Academy of Scieces Asymptotic distributio of products of sums of idepedet radom variables YANLING WANG, SUXIA YAO ad HONGXIA DU ollege
More informationST5215: Advanced Statistical Theory
ST525: Advaced Statistical Theory Departmet of Statistics & Applied Probability Tuesday, September 7, 2 ST525: Advaced Statistical Theory Lecture : The law of large umbers The Law of Large Numbers The
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/5.070J Fall 203 Lecture 3 9//203 Large deviatios Theory. Cramér s Theorem Cotet.. Cramér s Theorem. 2. Rate fuctio ad properties. 3. Chage of measure techique.
More informationChapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities
Chapter 5 Iequalities 5.1 The Markov ad Chebyshev iequalities As you have probably see o today s frot page: every perso i the upper teth percetile ears at least 1 times more tha the average salary. I other
More informationAdaptive Routing Using Expert Advice
c The Author 2005. Published by Oxford Uiversity Press o behalf of The British Computer Society. All rights reserved. For Permissios, please email: jourals.permissios@oupjourals.org doi:0.093/comjl/bxh000
More informationHOMEWORK 2 SOLUTIONS
HOMEWORK SOLUTIONS CSE 55 RANDOMIZED AND APPROXIMATION ALGORITHMS 1. Questio 1. a) The larger the value of k is, the smaller the expected umber of days util we get all the coupos we eed. I fact if = k
More informationarxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More informationMarkov Decision Processes
Markov Decisio Processes Defiitios; Statioary policies; Value improvemet algorithm, Policy improvemet algorithm, ad liear programmig for discouted cost ad average cost criteria. Markov Decisio Processes
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS
MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak
More informationGlivenko-Cantelli Classes
CS28B/Stat24B (Sprig 2008 Statistical Learig Theory Lecture: 4 Gliveko-Catelli Classes Lecturer: Peter Bartlett Scribe: Michelle Besi Itroductio This lecture will cover Gliveko-Catelli (GC classes ad itroduce
More informationLecture Chapter 6: Convergence of Random Sequences
ECE5: Aalysis of Radom Sigals Fall 6 Lecture Chapter 6: Covergece of Radom Sequeces Dr Salim El Rouayheb Scribe: Abhay Ashutosh Doel, Qibo Zhag, Peiwe Tia, Pegzhe Wag, Lu Liu Radom sequece Defiitio A ifiite
More informationA Proof of Birkhoff s Ergodic Theorem
A Proof of Birkhoff s Ergodic Theorem Joseph Hora September 2, 205 Itroductio I Fall 203, I was learig the basics of ergodic theory, ad I came across this theorem. Oe of my supervisors, Athoy Quas, showed
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationDifferent kinds of Mathematical Induction
Differet ids of Mathematical Iductio () Mathematical Iductio Give A N, [ A (a A a A)] A N () (First) Priciple of Mathematical Iductio Let P() be a propositio (ope setece), if we put A { : N p() is true}
More informationProbability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].
Probability 2 - Notes 0 Some Useful Iequalities. Lemma. If X is a radom variable ad g(x 0 for all x i the support of f X, the P(g(X E[g(X]. Proof. (cotiuous case P(g(X Corollaries x:g(x f X (xdx x:g(x
More informationFall 2013 MTH431/531 Real analysis Section Notes
Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters
More informationA survey on penalized empirical risk minimization Sara A. van de Geer
A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the
More informationPROBLEM SET 5 SOLUTIONS 126 = , 37 = , 15 = , 7 = 7 1.
Math 7 Sprig 06 PROBLEM SET 5 SOLUTIONS Notatios. Give a real umber x, we will defie sequeces (a k ), (x k ), (p k ), (q k ) as i lecture.. (a) (5 pts) Fid the simple cotiued fractio represetatios of 6
More informationLecture 16: Achieving and Estimating the Fundamental Limit
EE378A tatistical igal Processig Lecture 6-05/25/207 Lecture 6: Achievig ad Estimatig the Fudametal Limit Lecturer: Jiatao Jiao cribe: William Clary I this lecture, we formally defie the two distict problems
More informationSieve Estimators: Consistency and Rates of Convergence
EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes
More informationAn Introduction to Randomized Algorithms
A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis
More informationACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory
1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.
More informationProbability and Random Processes
Probability ad Radom Processes Lecture 5 Probability ad radom variables The law of large umbers Mikael Skoglud, Probability ad radom processes 1/21 Why Measure Theoretic Probability? Stroger limit theorems
More informationNotes 19 : Martingale CLT
Notes 9 : Martigale CLT Math 733-734: Theory of Probability Lecturer: Sebastie Roch Refereces: [Bil95, Chapter 35], [Roc, Chapter 3]. Sice we have ot ecoutered weak covergece i some time, we first recall
More informationMath 104: Homework 2 solutions
Math 04: Homework solutios. A (0, ): Sice this is a ope iterval, the miimum is udefied, ad sice the set is ot bouded above, the maximum is also udefied. if A 0 ad sup A. B { m + : m, N}: This set does
More informationSupplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting
Supplemetary Materials for Statistical-Computatioal Phase Trasitios i Plated Models: The High-Dimesioal Settig Yudog Che The Uiversity of Califoria, Berkeley yudog.che@eecs.berkeley.edu Jiamig Xu Uiversity
More informationLecture 2: Poisson Sta*s*cs Probability Density Func*ons Expecta*on and Variance Es*mators
Lecture 2: Poisso Sta*s*cs Probability Desity Fuc*os Expecta*o ad Variace Es*mators Biomial Distribu*o: P (k successes i attempts) =! k!( k)! p k s( p s ) k prob of each success Poisso Distributio Note
More informationLecture 7: Channel coding theorem for discrete-time continuous memoryless channel
Lecture 7: Chael codig theorem for discrete-time cotiuous memoryless chael Lectured by Dr. Saif K. Mohammed Scribed by Mirsad Čirkić Iformatio Theory for Wireless Commuicatio ITWC Sprig 202 Let us first
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationInformation Theory and Statistics Lecture 4: Lempel-Ziv code
Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)
More informationRegret Bounds for Sleeping Experts and Bandits
Machie Learig mauscript No. will be iserted by the editor) Regret Bouds for Sleepig Experts ad Badits Robert Kleiberg Alexadru Niculescu-Mizil Yogeshwer Sharma Received: date / Accepted: date Abstract
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationErratum to: An empirical central limit theorem for intermittent maps
Probab. Theory Relat. Fields (2013) 155:487 491 DOI 10.1007/s00440-011-0393-0 ERRATUM Erratum to: A empirical cetral limit theorem for itermittet maps J. Dedecker Published olie: 25 October 2011 Spriger-Verlag
More informationSolutions to HW Assignment 1
Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.
More informationBinomial Distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.0 0 1 2 3 4 5 6 7 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Overview Example: coi tossed three times Defiitio Formula Recall that a r.v. is discrete if there are either a fiite umber of possible
More informationAgnostic Learning and Concentration Inequalities
ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture
More informationLecture 19. sup y 1,..., yn B d n
STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s
More informationPRACTICE PROBLEMS FOR THE FINAL
PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to
More informationLecture 15: Density estimation
Lecture 15: Desity estimatio Why do we estimate a desity? Suppose that X 1,...,X are i.i.d. radom variables from F ad that F is ukow but has a Lebesgue p.d.f. f. Estimatio of F ca be doe by estimatig f.
More informationOutput Analysis and Run-Length Control
IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationA New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem
This is the Pre-Published Versio. A New Solutio Method for the Fiite-Horizo Discrete-Time EOQ Problem Chug-Lu Li Departmet of Logistics The Hog Kog Polytechic Uiversity Hug Hom, Kowloo, Hog Kog Phoe: +852-2766-7410
More information7.1 Convergence of sequences of random variables
Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite
More informationAdvanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology
Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More informationUnbiased Estimation. February 7-12, 2008
Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More informationIntroduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT
Itroductio to Extreme Value Theory Laures de Haa, ISM Japa, 202 Itroductio to Extreme Value Theory Laures de Haa Erasmus Uiversity Rotterdam, NL Uiversity of Lisbo, PT Itroductio to Extreme Value Theory
More informationSlide Set 13 Linear Model with Endogenous Regressors and the GMM estimator
Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday
More informationInformation Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame
Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for
More informationCS 330 Discussion - Probability
CS 330 Discussio - Probability March 24 2017 1 Fudametals of Probability 11 Radom Variables ad Evets A radom variable X is oe whose value is o-determiistic For example, suppose we flip a coi ad set X =
More information1 Convergence in Probability and the Weak Law of Large Numbers
36-752 Advaced Probability Overview Sprig 2018 8. Covergece Cocepts: i Probability, i L p ad Almost Surely Istructor: Alessadro Rialdo Associated readig: Sec 2.4, 2.5, ad 4.11 of Ash ad Doléas-Dade; Sec
More information