Stochastic Optimization Methods

Size: px
Start display at page:

Download "Stochastic Optimization Methods"

Transcription

1 Stochastc Optmzaton Methods Lecturer: Pradeep Ravkumar Co-nstructor: Aart Sngh Convex Optmzaton / Adapted from sldes from Ryan Tbshran

2 Stochastc gradent descent Consder sum of functons 1 mn x n n f (x) Gradent descent appled to ths problem would repeat x (k) = x (k 1) t k 1 n f (x (k 1) ), k = 1, 2, 3,... n =1 =1 In comparson, stochastc gradent descent (or ncremental gradent descent) repeats x (k) = x (k 1) t k f k (x (k 1) ), k = 1, 2, 3,... where k {1,... n} s some chosen ndex at teraton k 2

3 Notes: Typcally we make a (unform) random choce k {1,... n} Also common: mn-batch stochastc gradent descent, where we choose a random subset I k {1,... n}, of sze b n, and update accordng to x (k) = x (k 1) t k 1 f (x (k 1) ), k = 1, 2, 3,... b I k In both cases, we are approxmatng the full gradent by a nosy estmate, and our nosy estmate s unbased E[ f k (x)] = f(x) [ 1 ] E f (x) = f(x) b I k The mn-batch reduces the varance by a factor 1/b, but s also b tmes more expensve! 3

4 Example: regularzed logstc regresson Gven labels y {0, 1}, features x R p, = 1,... n. Consder logstc regresson wth rdge regularzaton: 1 mn β R p n n =1 ( ) y x T β + log(1 + e xt β ) + λ 2 β 2 2 Wrte the crteron as f(β) = 1 n f (β), f (β) = y x T β + log(1 + e xt β ) + λ n 2 β 2 2 =1 The gradent computaton f(β) = n =1 ( y p (β) ) x + λβ s doable when n s moderate, but not when n s huge. Note that: One batch update costs O(np) One stochastc update costs O(p) One mn-batch update costs O(bp) 4

5 Example wth n = 10, 000, p = 20, all methods employ fxed step szes (dmnshng step szes gve roughly smlar results): Crteron fk Full Stochastc Mn batch, b=10 Mn batch, b= Iteraton number k 5

6 What s happenng? Iteratons make better progress as mn-batch sze b gets bgger. But now let s parametrze by flops: Crteron fk Full Stochastc Mn batch, b=10 Mn batch, b=100 1e+02 1e+04 1e+06 Flop count 6

7 Convergence rates Recall that, under sutable step szes, when f s convex and has a Lpschtz gradent, full gradent (FG) descent satsfes f(x (k) ) f = O(1/k) What about stochastc gradent (SG) descent? Under dmnshng step szes, when f s convex (plus other condtons) E[f(x (k) )] f = O(1/ k) Fnally, what about mn-batch stochastc gradent? Agan, under dmnshng step szes, for f convex (plus other condtons) E[f(x (k) )] f = O(1/ bk + 1/k) But each teraton here b tmes more expensve... and (for small b), n terms of flops, ths s the same rate 7

8 Back to our rdge logstc regresson example, we gan mportant nsght by lookng at suboptmalty gap (on log scale): Crteron gap fk fstar 1e 12 1e 09 1e 06 1e 03 Full Stochastc Mn batch, b=10 Mn batch, b= Iteraton number k 8

9 Recall that, under sutable step szes, when f s strongly convex wth a Lpschtz gradent, gradent descent satsfes f(x (k) ) f = O(ρ k ) where ρ < 1. But, under dmnshng step szes, when f s strongly convex (plus other condtons), stochastc gradent descent gves E[f(x (k) )] f = O(1/k) So stochastc methods do not enjoy the lnear convergence rate of gradent descent under strong convexty For a whle, ths was beleved to be nevtable, as Nemrovsk and others had establshed matchng lower bounds... but these appled to stochastc mnmzaton of crterons, f(x) = F (x, ξ) dξ. Can we do better for fnte sums? 9

10 10 Stochastc Gradent: Bas For solvng mn x f(x), stochastc gradent s actually a class of algorthms that use the terates: x (k) = x (k 1) η k g(x (k 1) ; ξ k ), where g(x (k 1), ξ k ) s a stochastc gradent of the objectve f(x) evaluated at x (k 1). Bas: The bas of the stochastc gradent s defned as: bas(g(x (k 1) ; ξ k )) := E ξk (g(x (k 1) ; ξ k )) f(x (k 1) ). Unbased: When E ξk (g(x (k 1) ; ξ k )) = f(x (k 1) ), the stochastc gradent s sad to be unbased. (e.g. the stochastc gradent scheme dscussed so far) Based: We mght also be nterested n based estmators, but where the bas s small, so that E ξk (g(x (k 1) ; ξ k )) f(x (k 1) ).

11 11 Stochastc Gradent: Varance Varance. In addton to small (or zero) bas, we also want the varance of the estmator to be small: varance(g(x (k 1), ξ k )) := E ξk (g(x (k 1), ξ k ) E ξk (g(x (k 1), ξ k ))) 2 E ξk (g(x (k 1), ξ k )) 2. The caveat wth the stochastc gradent scheme we have seen so far s that ts varance s large, and n partcular doesn t decay to zero wth the teraton ndex. Loosely: because of above, we have to decay the step sze η k to zero, whch n turn means we can t take large steps, and hence the convergence rate s slow. Can we get the varance to be small, and decay to zero wth teraton ndex?

12 12 Varance Reducton Consder an estmator X for a parameter θ. Note that for an unbased estmator, E(X) = θ. Now consder the followng modfed estmator: Z := X Y, such that E(Y ) 0. Then the bas of Z s also close to zero, snce E(Z) = E(X) E(Y ) θ. If E(Y ) = 0, then Z s unbased ff X s unbased. What about the varance of estmator X? Var(X Y ) = Var(X) + Var(Y ) 2 Cov(X, Y ). Ths can be seen to much less than Var(X) f Y s hghly correlated wth X. Thus, gven any estmator X, we can reduce ts varance, f we can construct a Y that (a) has expectaton (close to) zero, and (b) s hghly correlated wth X. Ths s the abstract template followed by SAG, SAGA, SVRG, SDCA,...

13 13 Outlne Rest of today: Stochastc average gradent (SAG) SAGA (does ths stand for somethng?) Stochastc Varance Reduced Gradent (SVRG) Many, many others

14 14 Stochastc average gradent Stochastc average gradent or SAG (Schmdt, Le Roux, Bach 2013) s a breakthrough method n stochastc optmzaton. Idea s farly smple: Mantan table, contanng gradent g of f, = 1,... n Intalze x (0), and g (0) = x (0), = 1,... n At steps k = 1, 2, 3,..., pck a random k {1,... n} and then let Set all other g (k) Update g (k) k = f (x (k 1) ) (most recent gradent of f ) = g (k 1), k,.e., these stay the same x (k) = x (k 1) t k 1 n n =1 g (k)

15 15 Notes: Key of SAG s to allow each f, = 1,... n to communcate a part of the gradent estmate at each step Ths basc dea can be traced back to ncremental aggregated gradent (Blatt, Hero, Gauchman, 2006) SAG gradent estmates are no longer unbased, but they have greatly reduced varance Isn t t expensve to average all these gradents? (Especally f n s huge?) Ths s bascally just as effcent as stochastc gradent descent, as long we re clever: ( (k) g x (k) = x (k 1) t k k n g(k 1) k + 1 n n n =1 g (k 1) }{{} old table average ) } {{ } new table average

16 16 Stochastc gradent n SAG: SAG Varance Reducton g (k) k }{{} X (g (k 1) k n g (k 1) ) =1 } {{ } Y It can be seen that E(X) = f(x (k) ). But that E(Y ) 0, so that we have a based estmator. But we do have that Y seems correlated wth X (n lne wth varance reducton template). In partcular, we have that X Y 0, as k, snce x (k 1) and x (k) converge to x, the dfference between frst two terms converges to zero, and the last term converges to gradent at optmum,.e. also to zero. Thus, the overall estmator l 2 norm (and accordngly ts varance) decays to zero..

17 17 SAG convergence analyss Assume that f(x) = 1 n n =1 f (x), where each f s dfferentable, and f s Lpschtz wth constant L Denote x (k) = 1 k k 1 l=0 x(l), the average terate after k 1 steps Theorem (Schmdt, Le Roux, Bach): SAG, wth a fxed step sze t = 1/(16L), and the ntalzaton satsfes g (0) = f (x (0) ) f(x (0) ), = 1,... n E[f( x (k) )] f 48n ( f(x (0) ) f ) + 128L k k x(0) x 2 2 where the expectaton s taken over the random choce of ndex at each teraton

18 18 Notes: Result stated n terms of the average terate x (k), but also can be shown to hold for best terate x (k) best seen so far Ths s O(1/k) convergence rate for SAG. Compare to O(1/k) rate for FG, and O(1/ k) rate for SG But, the constants are dfferent! Bounds after k steps: SAG : FG : SG : 48n( f(x (0) ) f ) + 128L k k x(0) x 2 2 L 2k x(0) x 2 2 L 5 x (0) x 2 (*not a real bound, loose translaton) 2k So frst term n SAG bound suffers from factor of n; authors suggest smarter ntalzaton to make f(x (0) ) f small (e.g., they suggest usng result of n SG steps)

19 19 Convergence analyss under strong convexty Assume further that each f s strongly convex wth parameter m Theorem (Schmdt, Le Roux, Bach): SAG, wth a step sze t = 1/(16L) and the same ntalzaton as before, satsfes ( { m E[f(x (k) )] f 1 mn 16L, 1 } ) k 8n ( 3 ( f(x (0) ) f ) + 4L ) 2 n x(0) x 2 2 More notes: Ths s lnear convergence rate O(ρ k ) for SAG. Compare ths to O(ρ k ) for FG, and only O(1/k) for SG Lke FG, we say SAG s adaptve to strong convexty (acheves better rate wth same settngs) Proofs of these results not easy: 15 pages, computed-aded!

20 Back to our rdge logstc regresson example, SG versus SAG, over 30 reruns of these randomzed algorthms: Crteron gap fk fstar SG SAG Iteraton number k 20

21 21 SAG does well, but dd not work out of the box; requred a specfc setup Took one full cycle of SG (one pass over the data) to get β (0), and then started SG and SAG both from β (0). Ths warm start helped a lot SAG ntalzed at g (0) = f (β (0) ), = 1,... n, computed durng ntal SG cycle. Centerng these gradents was much worse (and so was ntalzng them at 0) Tunng the fxed step szes for SAG was very fncky; here now hand-tuned to be about as large as possble before t dverges

22 Experments from Schmdt, Le Roux, Bach (each plot s a dfferent problem settng): Objectve mnus Optmum ASG AFG IAG SAG LS SG L BFGS Objectve mnus Optmum L BFGS AFG SG SAG LS IAG ASG Objectve mnus Optmum ASG AFG SAG LS IAG L BFGS SG Objectve mnus Optmum Effectve Passes Effectve Passes Effectve Passes SAG LS L BFGS SG ASG IAG AFG Objectve mnus Optmum SAG LS SG L BFGS ASG AFG IAG Objectve mnus Optmum L BFGS AFG IAG SAG LS SG ASG Effectve Passes 10 0 L BFGS IAG AFG Effectve Passes Effectve Passes 10 0 IAG AFG Objectve mnus Optmum SAG LS SG ASG Objectve mnus Optmum IAG AFG L BFGS SG SAG LS ASG Objectve mnus Optmum L BFGS SG SAG LS ASG Effectve Passes Effectve Passes Effectve Passes Fg. 1 Comparson of d erent FG and SG optmzaton strateges. The top row gves 22

23 23 SAGA SAGA (Defazo, Bach, Lacoste-Julen, 2014) s another recent stochastc method, smlar n sprt to SAG. Idea s agan smple: Mantan table, contanng gradent g of f, = 1,... n Intalze x (0), and g (0) = x (0), = 1,... n At steps k = 1, 2, 3,..., pck a random k {1,... n} and then let Set all other g (k) Update g (k) k = f (x (k 1) ) (most recent gradent of f ) = g (k 1), k,.e., these stay the same ( x (k) = x (k 1) t k g (k) k g (k 1) k + 1 n n =1 ) g (k 1)

24 24 Notes: SAGA gradent estmate g (k) k SAG gradent estmate 1 n g(k) k g (k 1) k 1 n g(k 1) k + 1 n n =1 g(k 1) + 1 n n =1 g(k 1), versus Recall, SAG estmate s based; remarkably, SAGA estmate s unbased!

25 25 Stochastc gradent n SAGA: SAGA Varance Reducton g (k) k }{{} X n (g (k 1) k 1 n } {{ } Y g (k 1) ) =1 It can be seen that E(X) = f(x (k) ). And that E(Y ) 0, so that we have an unbased estmator. Moreover, we have that Y seems correlated wth X (n lne wth varance reducton template). In partcular, we have that X Y 0, as k, snce x (k 1) and x (k) converge to x, the dfference between frst two terms converges to zero, and the last term converges to gradent at optmum,.e. also to zero. Thus, the overall estmator l 2 norm (and accordngly ts varance) decays to zero..

26 26 SAGA bascally matches strong convergence rates of SAG (for both Lpschtz gradents, and strongly convex cases), but the proofs here much smpler Another strength of SAGA s that t can extend to composte problems of the form 1 mn x n m f (x) + h(x) =1 where each f s smooth and convex, and h s convex and nonsmooth but has a known prox. The updates are now ( x (k) = prox h,tk (x (k 1) t k g (k) g (k 1) + 1 n n =1 ) ) g (k 1) It s not known whether SAG s generally convergent under such a scheme

27 27 Back to our rdge logstc regresson example, now addng SAGA to the mx: Crteron gap fk fstar SG SAG SAGA Iteraton number k

28 28 SAGA does well, but agan t requred somewhat specfc setup As before, took one full cycle of SG (one pass over the data) to get β (0), and then started SG, SAG, SAGA all from β (0). Ths warm start helped a lot SAGA ntalzed at g (0) = f (β (0) ), = 1,... n, computed durng ntal SG cycle. Centerng these gradents was much worse (and so was ntalzng them at 0) Tunng the fxed step szes for SAGA was fne; seemngly on par wth tunng for SG, and more robust than tunng for SAG Interestngly, the SAGA crteron curves look lke SG curves (realzatons beng jagged and hghly varable); SAG looks very dfferent, and ths really emphaszes the fact that ts updates have much lower varance

29 29 Stochastc Varance Reduced Gradent (SVRG) The Stochastc Varance Reduced Gradent (SVRG) algorthm (Johnson, Zhang, 2013) runs n epochs: Intalze x (0). For k = 1,...: Set x = x (k 1). Compute µ := f( x). Set x (0) = x. For l = 1,..., m: Pck coordnate l at random from {1,..., n}. Set: Set x (k) = x (m). x (l) = x (l 1) η ( f l (x (l 1) ) f l ( x) + µ).

30 30 Stochastc Varance Reduced Gradent (SVRG) Just lke SAG/SAGA, but does not store a full table of gradents, just an average, and updates ths occasonally. SAGA: f (k) k f (k 1) k + 1 n (k 1) n =1 f, SVRG: f l (x (l 1) ) f l ( x) + 1 n n =1 f ( x). Can be shown to acheve varance reducton smlar to SAGA. Convergence rates smlar to SAGA, but formal analyss much smpler.

31 31 Many, many others A lot of recent work revstng stochastc optmzaton: SDCA (Shalev-Schwartz, Zhang, 2013): apples randomzed coordnate ascent to the dual of rdge regularzed problems. Effectve prmal updates are smlar to SAG/SAGA. There s also S2GD (Konecny, Rchtark, 2014), MISO (Maral, 2013), Fnto (Defazo, Caetano, Domke, 2014), etc.

32 32 SAGA SAG SDCA SVRG FINITO Strongly Convex (SC) Convex, Non-SC* 3 3 7?? Prox Reg. 3? 3[6] 3 7 Non-smooth Low Storage Cost Smple(-sh) Proof Adaptve to SC 3 3 7?? Fgure 1: Basc summary of method propertes. Queston marks denote unproven, but not expermentally (From Defazo, Bach, Lacoste-Julen, 2014) Are we approachng optmalty wth these methods? Agarwal and Bottou (2014) recently proved nonmatchng lower bounds for mnmzng fnte sums Leaves three possbltes: () algorthms we currently have are not optmal; () lower bounds can be tghtened; or () upper bounds can be tghtened Very actve area of research, ths wll lkely be sorted out soon

33 33 References and further readng D. Bertsekas (2010), Incremental gradent, subgradent, and proxmal methods for convex optmzaton: a survey A. Defaso and F. Bach and S. Lacoste-Julen (2014), SAGA: A fast ncremental gradent method wth support for non-strongly convex composte objectves R. Johnson and T. Zhang (2013), Acceleratng stochastc gradent descent usng predctve varance reducton A. Nemrosvk and A. Judtsky and G. Lan and A. Shapro (2009), Robust stochastc optmzaton approach to stochastc programmng M. Schmdt and N. Le Roux and F. Bach (2013), Mnmzng fnte sums wth the stochastc average gradent S. Shalev-Shwartz and T. Zhang (2013), Stochastc dual coordnate ascent methods for regularzed loss mnmzaton

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization Modern Stochastic Methods Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization 10-725 Last time: conditional gradient method For the problem min x f(x) subject to x C where

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses

Primal Method for ERM with Flexible Mini-batching Schemes and Non-convex Losses Prmal Method for ERM wth Flexble Mn-batchng Schemes and Non-convex Losses Domnk Csba Peter Rchtárk June 7, 205 Abstract In ths work we develop a new algorthm for regularzed emprcal rsk mnmzaton. Our method

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning A Delay-tolerant Proxmal-Gradent Algorthm for Dstrbuted Learnng Konstantn Mshchenko Franck Iutzeler Jérôme Malck Massh Amn KAUST Unv. Grenoble Alpes CNRS and Unv. Grenoble Alpes Unv. Grenoble Alpes ICML

More information

On Optimal Probabilities in Stochastic Coordinate Descent Methods

On Optimal Probabilities in Stochastic Coordinate Descent Methods On Optmal Probabltes n Stochastc Coordnate Descent Methods Peter Rchtárk and Martn Takáč Unversty of Ednburgh, Unted Kngdom October, 203 Abstract We propose and analyze a new parallel coordnate descent

More information

Lecture 4. Instructor: Haipeng Luo

Lecture 4. Instructor: Haipeng Luo Lecture 4 Instructor: Hapeng Luo In the followng lectures, we focus on the expert problem and study more adaptve algorthms. Although Hedge s proven to be worst-case optmal, one may wonder how well t would

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora prnceton unv. F 13 cos 521: Advanced Algorthm Desgn Lecture 3: Large devatons bounds and applcatons Lecturer: Sanjeev Arora Scrbe: Today s topc s devaton bounds: what s the probablty that a random varable

More information

Lecture Randomized Load Balancing strategies and their analysis. Probability concepts include, counting, the union bound, and Chernoff bounds.

Lecture Randomized Load Balancing strategies and their analysis. Probability concepts include, counting, the union bound, and Chernoff bounds. U.C. Berkeley CS273: Parallel and Dstrbuted Theory Lecture 1 Professor Satsh Rao August 26, 2010 Lecturer: Satsh Rao Last revsed September 2, 2010 Lecture 1 1 Course Outlne We wll cover a samplng of the

More information

18.1 Introduction and Recap

18.1 Introduction and Recap CS787: Advanced Algorthms Scrbe: Pryananda Shenoy and Shjn Kong Lecturer: Shuch Chawla Topc: Streamng Algorthmscontnued) Date: 0/26/2007 We contnue talng about streamng algorthms n ths lecture, ncludng

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Adaptive Variance Reducing for Stochastic Gradient Descent

Adaptive Variance Reducing for Stochastic Gradient Descent daptve Varance Reducng for Stochastc Gradent Descent Zebang Shen, Hu Qan, Tengfe Zhou, Tongzhou Mu Zhejang Unversty, Chna {shenzebang, qanhu, zhoutengfe zju, mutongzhou}@zju.edu.cn bstract Varance Reducng

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

Structural Extensions of Support Vector Machines. Mark Schmidt March 30, 2009

Structural Extensions of Support Vector Machines. Mark Schmidt March 30, 2009 Structural Extensons of Support Vector Machnes Mark Schmdt March 30, 2009 Formulaton: Bnary SVMs Multclass SVMs Structural SVMs Tranng: Subgradents Cuttng Planes Margnal Formulatons Mn-Max Formulatons

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

On the Multicriteria Integer Network Flow Problem

On the Multicriteria Integer Network Flow Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 5, No 2 Sofa 2005 On the Multcrtera Integer Network Flow Problem Vassl Vasslev, Marana Nkolova, Maryana Vassleva Insttute of

More information

arxiv: v2 [math.oc] 2 Mar 2017

arxiv: v2 [math.oc] 2 Mar 2017 Dual Free Adaptve Mn-batch SDCA for Emprcal Rsk Mnmzaton X He 1 Martn Takáč 1 arxv:1510.06684v2 [math.oc] 2 Mar 2017 Abstract In ths paper we develop dual free mn-batch SDCA wth adaptve probabltes for

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin

LOW BIAS INTEGRATED PATH ESTIMATORS. James M. Calvin Proceedngs of the 007 Wnter Smulaton Conference S G Henderson, B Bller, M-H Hseh, J Shortle, J D Tew, and R R Barton, eds LOW BIAS INTEGRATED PATH ESTIMATORS James M Calvn Department of Computer Scence

More information

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016 CS 29-128: Algorthms and Uncertanty Lecture 17 Date: October 26, 2016 Instructor: Nkhl Bansal Scrbe: Mchael Denns 1 Introducton In ths lecture we wll be lookng nto the secretary problem, and an nterestng

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Journal of Machne Learnng Research () Submtted 9/12; Publshed Stochastc ual Coordnate Ascent Methods for Regularzed Loss Mnmzaton Sha Shalev-Shwartz Benn school of Computer Scence and Engneerng The Hebrew

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

Lecture 17: Lee-Sidford Barrier

Lecture 17: Lee-Sidford Barrier CSE 599: Interplay between Convex Optmzaton and Geometry Wnter 2018 Lecturer: Yn Tat Lee Lecture 17: Lee-Sdford Barrer Dsclamer: Please tell me any mstake you notced. In ths lecture, we talk about the

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso Machne Learnng Data Mnng CS/CS/EE 155 Lecture 4: Regularzaton, Sparsty Lasso 1 Recap: Complete Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Functon,b L( y, f

More information

Randomized block proximal damped Newton method for composite self-concordant minimization

Randomized block proximal damped Newton method for composite self-concordant minimization Randomzed block proxmal damped Newton method for composte self-concordant mnmzaton Zhaosong Lu June 30, 2016 Revsed: March 28, 2017 Abstract In ths paper we consder the composte self-concordant CSC mnmzaton

More information

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur

Module 2. Random Processes. Version 2 ECE IIT, Kharagpur Module Random Processes Lesson 6 Functons of Random Varables After readng ths lesson, ou wll learn about cdf of functon of a random varable. Formula for determnng the pdf of a random varable. Let, X be

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

The Second Anti-Mathima on Game Theory

The Second Anti-Mathima on Game Theory The Second Ant-Mathma on Game Theory Ath. Kehagas December 1 2006 1 Introducton In ths note we wll examne the noton of game equlbrum for three types of games 1. 2-player 2-acton zero-sum games 2. 2-player

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Hidden Markov Models

Hidden Markov Models Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,

More information

Excess Error, Approximation Error, and Estimation Error

Excess Error, Approximation Error, and Estimation Error E0 370 Statstcal Learnng Theory Lecture 10 Sep 15, 011 Excess Error, Approxaton Error, and Estaton Error Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton So far, we have consdered the fnte saple

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Economics 101. Lecture 4 - Equilibrium and Efficiency

Economics 101. Lecture 4 - Equilibrium and Efficiency Economcs 0 Lecture 4 - Equlbrum and Effcency Intro As dscussed n the prevous lecture, we wll now move from an envronment where we looed at consumers mang decsons n solaton to analyzng economes full of

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Edge Isoperimetric Inequalities

Edge Isoperimetric Inequalities November 7, 2005 Ross M. Rchardson Edge Isopermetrc Inequaltes 1 Four Questons Recall that n the last lecture we looked at the problem of sopermetrc nequaltes n the hypercube, Q n. Our noton of boundary

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Expected Value and Variance

Expected Value and Variance MATH 38 Expected Value and Varance Dr. Neal, WKU We now shall dscuss how to fnd the average and standard devaton of a random varable X. Expected Value Defnton. The expected value (or average value, or

More information

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence. Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm

More information

arxiv: v1 [math.oc] 6 Jan 2016

arxiv: v1 [math.oc] 6 Jan 2016 arxv:1601.01174v1 [math.oc] 6 Jan 2016 THE SUPPORTING HALFSPACE - QUADRATIC PROGRAMMING STRATEGY FOR THE DUAL OF THE BEST APPROXIMATION PROBLEM C.H. JEFFREY PANG Abstract. We consder the best approxmaton

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

Fast Stochastic Optimization Algorithms for ML

Fast Stochastic Optimization Algorithms for ML Fast Stochastic Optimization Algorithms for ML Aaditya Ramdas April 20, 205 This lecture is about efficient algorithms for minimizing finite sums min w R d n i= f i (w) or min w R d n f i (w) + λ 2 w 2

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

Variance-Reduced Stochastic Gradient Descent on Streaming Data

Variance-Reduced Stochastic Gradient Descent on Streaming Data Varance-Reduced Stochastc Gradent Descent on Streamng Data Ellango Jothmurugesan Carnege Mellon Unversty ejothmu@cs.cmu.edu Phllp B. Gbbons Carnege Mellon Unversty gbbons@cs.cmu.edu Ashraf Tahmasb Iowa

More information

Research Article. Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization

Research Article. Almost Sure Convergence of Random Projected Proximal and Subgradient Algorithms for Distributed Nonsmooth Convex Optimization To appear n Optmzaton Vol. 00, No. 00, Month 20XX, 1 27 Research Artcle Almost Sure Convergence of Random Projected Proxmal and Subgradent Algorthms for Dstrbuted Nonsmooth Convex Optmzaton Hdea Idua a

More information

Line Drawing and Clipping Week 1, Lecture 2

Line Drawing and Clipping Week 1, Lecture 2 CS 43 Computer Graphcs I Lne Drawng and Clppng Week, Lecture 2 Davd Breen, Wllam Regl and Maxm Peysakhov Geometrc and Intellgent Computng Laboratory Department of Computer Scence Drexel Unversty http://gcl.mcs.drexel.edu

More information

Convergence rates of proximal gradient methods via the convex conjugate

Convergence rates of proximal gradient methods via the convex conjugate Convergence rates of proxmal gradent methods va the convex conjugate Davd H Gutman Javer F Peña January 8, 018 Abstract We gve a novel proof of the O(1/ and O(1/ convergence rates of the proxmal gradent

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 15 Scribe: Jieming Mao April 1, 2013 COS 511: heoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 15 Scrbe: Jemng Mao Aprl 1, 013 1 Bref revew 1.1 Learnng wth expert advce Last tme, we started to talk about learnng wth expert advce.

More information

Calculation of time complexity (3%)

Calculation of time complexity (3%) Problem 1. (30%) Calculaton of tme complexty (3%) Gven n ctes, usng exhaust search to see every result takes O(n!). Calculaton of tme needed to solve the problem (2%) 40 ctes:40! dfferent tours 40 add

More information

6) Derivatives, gradients and Hessian matrices

6) Derivatives, gradients and Hessian matrices 30C00300 Mathematcal Methods for Economsts (6 cr) 6) Dervatves, gradents and Hessan matrces Smon & Blume chapters: 14, 15 Sldes by: Tmo Kuosmanen 1 Outlne Defnton of dervatve functon Dervatve notatons

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information