Optimization. September 4, 2018

Size: px
Start display at page:

Download "Optimization. September 4, 2018"

Transcription

1 Optmzaton September 4, 2018

2 Optmzaton problem 1/34 An optmzaton problem s the problem of fndng the best soluton for an objectve functon. Optmzaton method plays an mportant role n statstcs, for example, to fnd maxmum lkelhood estmate (MLE). Unconstraned vs. constraned optmzaton problem: whether there s constrant n the soluton space. Most algorthms are based on teratve procedures. We ll spend next few lectures on several optmzaton methods, under the context of statstcs: New-Raphson, Fsher scorng, etc. EM and MM. Hdden Markov models. Lnear and quadratc programmng.

3 Revew: Newton-Raphson (NR) method 2/34 Goal: Fnd the root for equaton f (θ) = 0. Approach: 1. Choose an ntal value θ (0) as the startng pont. 2. By Taylor expanson at θ (0), we have f (θ) = f (θ (0) ) + f (θ (0) )(θ θ (0) ). Set f (θ) = 0 gves an update of the parameter: θ (1) = θ (0) f (θ (0) )/ f (θ (0) ). 3. Repeated update untl convergence: θ (k+1) = θ (k) f (θ (k) )/ f (θ (k) ).

4 NR method convergence rate 3/34 Quadratc convergence: θ s the soluton. lm k θ (k+1) θ θ (k) θ 2 = c (rate = c > 0, order = 2) The # of sgnfcant dgts nearly doubles at each step (n the neghborhood of θ ). Proof: By Taylor expanson (to the second order) at θ (k), 0 = f (θ ) = f (θ (k) ) + f (θ (k) )(θ θ (k) ) f (ξ (k) )(θ θ (k) ) 2, ξ (k) [θ, θ (k) ] Dvdng the equaton by f (θ (k) ) gves f (θ (k) )/ f (θ (k) ) (θ θ (k) ) = f (ξ (k) ) 2 f (θ (k) ) (θ θ (k) ) 2. The defnton of θ (k+1) = θ (k) f (θ (k) )/ f (θ (k) ) gves What condtons are needed? f (θ (k) ) 0 n the neghborhood of θ f (ξ (k) ) s bounded θ (k+1) θ = f (ξ (k) ) 2 f (θ (k) ) (θ θ (k) ) 2. Startng pont s suffcently close to the root θ

5 Revew: maxmum lkelhood 4/34 Here s a lst of some defntons related to maxmum lkelhood estmate: Parameter θ, a p-vector Data X Log lkelhood l(θ) = log Pr(X θ) Score functon l(θ) = ( l/ θ 1,..., l/ θ p ) Hessan matrx l(θ) = { 2 l/ θ θ j }, j=1,...,p Fsher nformaton I(θ) = E l(θ) = E l(θ){ l(θ)} Observed nformaton l(ˆθ) When θ s a local maxmum of l, l(θ ) = 0, and l(θ ) s negatve defnte.

6 Applcaton of NR method n MLE: when θ s a scalar 5/34 Maxmum Lkelhood Estmaton (MLE): ˆθ = arg max θ l(θ). Approach Fnd ˆθ such that l(ˆθ) = 0. If the closed form soluton for l(ˆθ) = 0 s dffcult to obtan, one can use NR method (replace f by l). The the NR update for solvng MLE s: θ (k+1) = θ (k) l(θ (k) )/ l(θ (k) ).

7 What can go wrong? 6/34 Bad startng pont May not converge to the global maxmum Saddle pont: l(ˆθ) = 0, but l(ˆθ) s nether negatve defnte nor postve defnte (statonary pont but not a local extremum; can be used to check the lkelhood) startng pont & local extremum saddle pont saddle pont l(θ) = θ 3 l(θ 1, θ 2 ) = θ1 2 θ2 2

8 Generalzaton to hgher dmensons: when θ s a vector 7/34 General Algorthm 1. (Startng pont) Pck a startng pont θ (0) and let k = 0 2. (Iteraton) Determne the drecton d (k) (a p-vector) and the step sze α (k) (a scalar) and calculate θ (k+1) = θ (k) + α (k) d (k), such that l(θ (k+1) ) > l(θ (k) ) 3. (Stop crtera) Stop teraton f l(θ (k+1) ) l(θ (k) ) /( l(θ (k) ) + ɛ 1 ) < ɛ 2 or θ k+1, j θ k, j /( θ k, j + ɛ 1 ) < ɛ 2, j = 1,..., p for precsons such as ɛ 1 = 10 4 and ɛ 2 = Otherwse go to 2. Key: Determne the drecton and the step sze

9 Generalzaton to hgher dmensons (contnued) 8/34 Determnng the drecton (general framework, detals later) We generally pck d (k) = R 1 l(θ (k) ), where R s a postve defnte matrx. Choosng a step sze (gven the drecton) Step halvng To fnd α (k) such that l(θ (k+1) ) > l(θ (k) ) Start at a large value of α (k). Halve α (k) untl l(θ (k+1) ) > l(θ (k) ) Smple, robust, but relatvely slow Lnear search To fnd α (k) = arg max α l(θ (k) + αd (k) ) Approxmate l(θ (k) + αd (k) ) by dong a polynomal nterpolaton and fnd α (k) maxmzng the polynomal Fast

10 Polynomal nterpolaton 9/34 Gven a set of p + 1 data ponts from the functon f (α) l(θ (k) + αd (k) ), we can fnd a unque polynomal wth degree p that goes through the p + 1 data ponts. (For a quadratc approxmaton, we only need 3 data ponts.)

11 Survey of basc methods 10/34 1. Steepest ascent: R = I = dentty matrx d (k) = l(θ (k) ) α (k) = arg max α l(θ(k) + α l(θ (k) )) or a small fxed number θ (k+1) = θ (k) + α (k) l(θ (k) ) Why l(θ (k) ) s the steepest ascent drecton? By Taylor expanson at θ (k), By Cauchy-Schwarz nequalty, l(θ (k) + ) l(θ (k) ) = T l(θ (k) ) + o( ) T l(θ (k) ) l(θ (k) ) The equalty holds at = α l(θ (k) ). So when = α l(θ (k) ), l(θ (k) + ) ncreases the most. Easy to mplement; only requre the frst dervatve/gradent/score Guarantee an ncrease at each step no matter where you start Converge slowly. The drectons of two consecutve steps are orthogonal, so the algorthm zgzags to the maxma.

12 Steepest ascent (contnued) 11/34 When α (k) s chosen as arg max α l(θ (k) + α l(θ (k) )), the drectons of two consecutve steps are orthogonal,.e., [ l(θ (k) )] T l(θ (k+1) ) = 0. Proof: By the defnton of α (k) and θ (k+1) 0 = l(θ(k) + α l(θ (k) )) α = l(θ (k) + α (k) l(θ (k) )) T l(θ (k) ) = l(θ (k+1) ) T l(θ (k) ). α=α (k)

13 Example: Steepest Ascent 12/34 Maxmze the functon f (x) = 6x x 3 Y X

14 Example: Steepest Ascent (cont.) 13/34 fun0 <- functon(x) return(- xˆ3 + 6*x) grd0 <- functon(x) return(- 3*xˆ2 + 6) # target functon # gradent # Steepest Ascent Algorthm Steepest_Ascent <- functon(x, fun=fun0, grd=grd0, step=0.01, kmax=1000, tol1=1e-6, tol2=1e-4) { dff <- 2*x # use a large value to get nto the followng "whle" loop k <- 0 # count teraton whle ( all(abs(dff) > tol1*(abs(x)+tol2) ) & k <= kmax) # stop crtera { g_x <- grd(x) # calculate gradent usng x dff <- step * g_x # calculate the dfference used n the stop crtera x <- x + dff # update x k <- k + 1 # update teraton } f_x = fun(x) } return(lst(teraton=k, x=x, f_x=f_x, g_x=g_x))

15 Example: Steepest Ascent (cont.) 14/34 > Steepest_Ascent(x=2, step=0.01) $teraton [1] 117 $x [1] $f_x [1] $g_x [1] > Steepest_Ascent(x=1, step=-0.01) $teraton [1] 159 $x [1] $f_x [1] $g_x [1]

16 In large dataset 15/34 The data log-lkelhood s usually summed over n observatons: l(θ) = n =1 l(x ; θ). When n s large, ths poses computatonal burden. One can mplement a stochastc verson of the algorthm: stochastc gradent descent (SGD). Note: Gradent descent s just steepest descent. Smple SGD algorthm: replace the gradent l(θ) by the gradent computed from a sngle sample l(x ; θ), where x s randomly sampled. Mn-batch SGD algorthm: compute the gradent based on a small number of observatons. Advantage of SGD: Evaluate gradent at one (or a few) observatons, requres less memory. Has better property to escape from local mnmum (gradent s nosy). Dsdvantage of SGD: Slower convergence.

17 Survey of basc methods (contnued) 16/34 2. Newton-Raphson: R = l(θ (k) ) = observed nformaton d (k) = [ l(θ (k) )] 1 l(θ (k) ) θ (k+1) = θ (k) + [ l(θ (k) )] 1 l(θ (k) ) α (k) = 1 for all k Fast, quadratc convergence Need very good startng ponts Theorem: If R s postve defnte, the equaton set Rd (k) = l(θ (k) ) has a unque soluton for the drecton d (k), and the drecton ensures ascent of l(θ). Proof: When R s postve defnte, t s nvertble. So we have a unque soluton d (k) = R 1 l(θ (k) ). Let θ (k+1) = θ (k) + αd (k) = θ (k) + αr 1 l(θ (k) ). By Taylor expanson, l(θ (k+1) ) l(θ (k) ) + α l(θ (k) ) T R 1 l(θ (k) ). The postve defnte matrx R ensures that l(θ (k+1) ) > l(θ (k) ) for suffcently small postve α.

18 Newton-Raphson vs. steepest ascent 17/34 Newton-Raphson converges much faster than steepest ascent (gradent descent). NR requres the computaton of second dervatve, whch can be dffcult and computatonally expensve. In contrast, gradent descent requres only the frst dervatve, whch s easy to compute. For poorly behaved objectve functon (non-convex), gradent-based methods are often more stable. Gradent-based method (especally SGD) s wdely used n modern machne learnng.

19 Example: Newton Raphson 18/34 fun0 <- functon(x) return(- xˆ3 + 6*x) grd0 <- functon(x) return(- 3*xˆ2 + 6) hes0 <- functon(x) return(- 6*x) # target functon # gradent # Hessan # Newton-Raphson Algorthm Newton_Raphson <- functon(x, fun=fun0, grd=grd0, hes=hes0, kmax=1000, tol1=1e-6, tol2=1e-4) { dff <- 2*x k <- 0 whle ( all(abs(dff) > tol1*(abs(x)+tol2) ) & k <= kmax) { g_x <- grd(x) h_x <- hes(x) # calculate the second dervatve (Hessan) dff <- -g_x/h_x # calculate the dfference used by the stop crtera x <- x + dff k <- k + 1 } f_x = fun(x) } return(lst(teraton=k, x=x, f_x=f_x, g_x=g_x, h_x=h_x))

20 Example: Newton Raphson 19/34 > Newton_Raphson(x=2) $teraton [1] 5 $x [1] $f_x [1] $g_x [1] e-11 $h_x [1] > Newton_Raphson(x=1) $teraton [1] 5 $x [1] $f_x [1] $g_x [1] e-11 $h_x [1]

21 Survey of basc methods (contnued) 20/34 3. Modfcaton of Newton-Raphson Fsher scorng: replace l(θ) wth E l(θ) E l(θ) = E l(θ) l(θ) s always postve and stablze the algorthm E l(θ) can have a smpler form than l(θ) Newton-Raphson and Fsher scorng are equvalent for parameter estmaton n GLM wth canoncal lnk. Quas-Newton: aka varable metrc methods or secant methods. Approxmate l(θ) n a way that avods calculatng Hessan and ts nverse has convergence propertes smlar to Newton

22 Fsher Scorng: Example 21/34 In the Posson regresson model of n subjects, The responses Y Posson(λ ) = (Y!) 1 λ Y e λ. We know that λ = E(Y X ). We relate the mean of Y to X by g(λ ) = X β. Takng dervatve on both sdes, g (λ ) λ β = X λ β = X g (λ ) Log lkelhood: l(β) = n =1 (Y log λ λ ), where λ s satsfy g(λ ) = X β. Maxmum lkelhood estmaton: ˆβ = arg max β l(β) Newton-Raphson needs ( ) Y λ ( ) l(β) = 1 λ β = Y 1 1 λ g (λ ) X Y λ 1 ( ) l(β) = λ 2 β g (λ ) X Y g (λ ) λ 1 λ g (λ ) 2 β X 1 1 ( ) Y 1 1 = λ g (λ ) 2X2 1 λ λ g (λ ) 2X2 ( Y λ 1 ) g (λ ) g (λ ) 3X2

23 Fsher Scorng: Example (contnued) 22/34 Fsher scorng needs l(β) and E [ l(β) ] = whch s l(β) wthout the extra terms. 1 1 λ g (λ ) 2X2 Wth the canoncal lnk for Posson regresson: we have g (λ ) = λ 1 g(λ ) = log λ, and g (λ ) = λ 2. So that the extra terms equal to zero (check ths!) and we conclude that Newton-Raphson and Fsher scorng are equvalent.

24 Quas-Newton 23/34 1. Davdson-Fletcher-Powell QNR algorthm Let l (k) = l(θ (k) ) l(θ (k 1) ) and θ (k) = θ (k) θ (k 1). Approxmate negatve Hessan by Use the startng matrx G (0) = I. G (k+1) = G (k) + θ(k) ( θ (k) ) T ( θ (k) ) T θ (k) G(k) l (k) ( l (k) ) T G (k) ( l (k) ) T G (k) l (k). Theorem: If the startng matrx G (0) s postve defnte, the above formula ensures that every G (k) durng the teraton s postve defnte.

25 Nonlnear Regresson Models 24/34 Data: (x, y ) for = 1,..., n Notaton and assumptons Model: y = h(x, β) + ɛ, where ɛ..d N(0, σ 2 ) and h(.) s known Resdual: e (β) = y h(x, β) Jacoban: {J(β)} j = h(x,β) β j = e (β) β j, a n p matrx Goal: to obtan MLE ˆβ = arg mn β S (β), where S (β) = {y h(x, β)} 2 = [e(β)] T e(β) We could use the prevously-dscussed Newton-Raphson algorthm. Gradent: g j (β) = S (β) β j = 2 e (β) e (β) β j,.e., g(β) = 2J(β) T e(β) Hessan: H jr (β) = 2 S (β) β j β r = 2 {e (β) 2 e (β) Problem: Hessan could be hard to obtan. β j β r + e (β) β j e (β) β r }

26 Gauss-Newton algorthm 25/34 Recall n lnear regresson models, we mnmze { S (β) = y x T β} 2 Because S (β) s a quadratc functon, t s easy to get MLE ˆβ = x x 1 T x y Now n the nonlnear regresson models, we want to mnmze S (β) = {y h(x, β)} 2 Idea: Approxmate h(x, β) by a lnear functon, teratvely at β (k) Gven β (k) and by Taylor expanson of h(x, β) at β (k), S (β) becomes { S (β) y h(x, β (k) ) (β β (k) ) T h(x, β (k) ) β } 2

27 Gauss-Newton algorthm (cont.) 26/34 1. Fnd a good startng pont β (0) 2. At step k + 1, (a) Form e(β (k) ) and J(β (k) ) (b) Use a standard lnear regresson routne to obtan δ (k) = [J(β (k) ) T J(β (k) )] 1 J(β (k) ) T e(β (k) ) (c) Obtan the new estmate β (k+1) = β (k) + δ (k) Don t need computng Hessan matrx. Need good startng values. Requre J(β (k) ) T J(β (k) ) to be nvertble. Ths s not a general optmzaton method. Only applcable to lease square problem.

28 Example: Generalzed lnear models (GLM) 27/34 Data: (y, x ) for = 1,..., n Notaton and assumptons Mean: E(y x) = µ Lnk g: g(µ) = x β Varance functon V: Var(y x) = φv(µ) Log lkelhood (exponental famly): l(θ, φ; y) = {yθ b(θ)}/a(φ) + c(y, φ) We obtan Score functon: l = {y b (θ)}/a(φ) Observed nformaton: l = b (θ)/a(φ) Mean (n term of θ): E(y x) = a(φ)e( l) + b (θ) = b (θ) Varance (θ, φ): Var(y x) = E(y b (θ)) 2 = a(φ) 2 E( l l ) = a(φ) 2 E( l) = b (θ)a(φ) Canoncal lnk: g such that g(µ) = θ,.e. g 1 = b Generally we have a(φ) = φ/w, n whch case φ wll drop out of the followng.

29 GLM 28/34 Model Normal Posson Bnomal Gamma φ σ 2 1 1/m 1/ν b(θ) θ 2 /2 exp(θ) log(1 + e θ ) log( θ) µ θ exp(θ) e θ /(1 + e θ ) 1/θ Canoncal lnk g dentty log logt recprocal Varance functon V 1 µ µ(1 µ) µ 2

30 Towards Iteratvely reweghted least squares 29/34 In lnear regresson models, E(y x ) = x T β, so we mnmze { S (β) = y x T β} 2 Because S (β) s a quadratc functon, t s easy to get MLE ˆβ = x x 1 T x y In generalzed lnear models, consder construct a smlar quadratc functon S (β). Queston? Can we use Answer: No, because { S (β) = g(y ) x T β} 2 E {g(y ) x } x T β Idea: Approxmate g(y ) by a lnear functon wth expectaton x T β(k), nteractvely at β (k)

31 Iteratvely reweghted least squares 30/34 Lnearze g(y ) around ˆµ (k) Check the varances of ỹ (k) W (k) = g 1 (x T β(k) ), denote the lnearzed value by ỹ (k). ỹ (k) = g(ˆµ (k) ) + (y ˆµ (k) )g (ˆµ (k) and use them as weghts = { Var(ỹ (k) Gven β (k), we consder mnmze IRLS algorthm: S (β) = ) ) } 1 [ = {g (ˆµ (k) )} 2 V(ˆµ (k) ) ] 1 W (k) 1. Start wth ntal estmates, generally ˆµ (0) 2. Form ỹ (k) and W (k) 3. Estmate β (k+1) by regressng ỹ (k) {ỹ(k) x T β} 2 = y on x wth weghts W (k) 4. Form ˆµ (k+1) = g 1 (x T β(k+1) ) and return to step 2.

32 Iteratvely reweghted least squares (contnued) 31/34 Model Posson Bnomal Gamma µ = g 1 (η) e η e η /(1 + e η ) 1/η g (µ) 1/µ 1/[µ(1 µ)] 1/µ 2 V(µ) µ µ(1 µ) µ 2 McCullagh and Nelder (1983) justfed IRLS by showng that IRLS s equvalent to Fsher scorng. In the case of the canoncal lnk, IRLS s also equvalent to Newton-Raphson. IRLS s attractve because no specal optmzaton algorthm s requred, just a subroutne that computes weghted least square estmates.

33 Mscellaneous thngs 32/34 Dsperson parameter: When we do not take φ = 1, the usual estmate s va the method of moments: ˆφ = 1 (y ˆµ ) 2 n p V(ˆµ ) Standard errors: Var(ˆβ) = ˆφ(X ŴX) 1 Quas lkelhood: Pck a lnk and a varance functon, and IRLS can proceed wthout worryng about the model. In other words, IRLS s a good thng!

34 A quck revew 33/34 Optmzaton method s mportant n statstcs, (.e., to fnd MLE), or n general machne learnng (mnmze some loss functon). Maxmzng/mnmzng an objectve functon s acheved by solvng the equaton that the frst dervatve s 0 (need to check second dervatve). Steepest ascent method: Only need gradent. Slow convergence. In large dataset wth ll-behaved objectve functon, stochastc verson (SGD) usually works better.

35 Newton-Raphson (NR) method: Quadratc convergence rate. Could stuck n local maxmum. In hgher dmenson, the problems are to fnd drectons and step szes n each teraton. Fsher scorng: use expected nformaton matrx. NR use observed nformaton matrx. The expected nformaton s more stable and smpler. Fsher scorng and Newton-Raphson are equvalent under canoncal lnk. Gauss-Newton algorthm for non-lnear regresson: Hessan matrx s not needed.

Optimization. August 30, 2016

Optimization. August 30, 2016 Optmzaton August 30, 2016 Optmzaton problem 1/31 An optmzaton problem s the problem of fndng the best soluton for an objectve functon. Optmzaton method plays an mportant role n statstcs, for example, to

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Chapter Newton s Method

Chapter Newton s Method Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2) 1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons

More information

Topic 5: Non-Linear Regression

Topic 5: Non-Linear Regression Topc 5: Non-Lnear Regresson The models we ve worked wth so far have been lnear n the parameters. They ve been of the form: y = Xβ + ε Many models based on economc theory are actually non-lnear n the parameters.

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution. Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X

The EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Review of Taylor Series. Read Section 1.2

Review of Taylor Series. Read Section 1.2 Revew of Taylor Seres Read Secton 1.2 1 Power Seres A power seres about c s an nfnte seres of the form k = 0 k a ( x c) = a + a ( x c) + a ( x c) + a ( x c) k 2 3 0 1 2 3 + In many cases, c = 0, and the

More information

OPTIMISATION. Introduction Single Variable Unconstrained Optimisation Multivariable Unconstrained Optimisation Linear Programming

OPTIMISATION. Introduction Single Variable Unconstrained Optimisation Multivariable Unconstrained Optimisation Linear Programming OPTIMIATION Introducton ngle Varable Unconstraned Optmsaton Multvarable Unconstraned Optmsaton Lnear Programmng Chapter Optmsaton /. Introducton In an engneerng analss, sometmes etremtes, ether mnmum or

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Introduction to the R Statistical Computing Environment R Programming

Introduction to the R Statistical Computing Environment R Programming Introducton to the R Statstcal Computng Envronment R Programmng John Fox McMaster Unversty ICPSR 2018 John Fox (McMaster Unversty) R Programmng ICPSR 2018 1 / 14 Programmng Bascs Topcs Functon defnton

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Inexact Newton Methods for Inverse Eigenvalue Problems

Inexact Newton Methods for Inverse Eigenvalue Problems Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Chapter 4: Root Finding

Chapter 4: Root Finding Chapter 4: Root Fndng Startng values Closed nterval methods (roots are search wthn an nterval o Bsecton Open methods (no nterval o Fxed Pont o Newton-Raphson o Secant Method Repeated roots Zeros of Hgher-Dmensonal

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Lecture 2 Solution of Nonlinear Equations ( Root Finding Problems )

Lecture 2 Solution of Nonlinear Equations ( Root Finding Problems ) Lecture Soluton o Nonlnear Equatons Root Fndng Problems Dentons Classcaton o Methods Analytcal Solutons Graphcal Methods Numercal Methods Bracketng Methods Open Methods Convergence Notatons Root Fndng

More information

Estimation: Part 2. Chapter GREG estimation

Estimation: Part 2. Chapter GREG estimation Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Maximum Likelihood Estimation (MLE)

Maximum Likelihood Estimation (MLE) Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

4.3 Poisson Regression

4.3 Poisson Regression of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)

More information

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.

Vector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence. Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 13 GENE H GOLUB 1 Iteratve Methods Very large problems (naturally sparse, from applcatons): teratve methods Structured matrces (even sometmes dense,

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Newton s Method for One - Dimensional Optimization - Theory

Newton s Method for One - Dimensional Optimization - Theory Numercal Methods Newton s Method for One - Dmensonal Optmzaton - Theory For more detals on ths topc Go to Clck on Keyword Clck on Newton s Method for One- Dmensonal Optmzaton You are free to Share to copy,

More information

: Numerical Analysis Topic 2: Solution of Nonlinear Equations Lectures 5-11:

: Numerical Analysis Topic 2: Solution of Nonlinear Equations Lectures 5-11: 764: Numercal Analyss Topc : Soluton o Nonlnear Equatons Lectures 5-: UIN Malang Read Chapters 5 and 6 o the tetbook 764_Topc Lecture 5 Soluton o Nonlnear Equatons Root Fndng Problems Dentons Classcaton

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Designing a Pseudo R-Squared Goodness-of-Fit Measure in Generalized Linear Models

Designing a Pseudo R-Squared Goodness-of-Fit Measure in Generalized Linear Models Desgnng a Pseudo R-Squared Goodness-of-Ft Measure n Generalzed Lnear Models H. I. Mbachu Dept. of Mathematcs/Statstcs, Unversty of Port Harcourt, Port Harcourt E. C. Nduka Dept. of Mathematcs/Statstcs,

More information

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression STAT 45 BIOSTATISTICS (Fall 26) Handout 5 Introducton to Logstc Regresson Ths handout covers materal found n Secton 3.7 of your text. You may also want to revew regresson technques n Chapter. In ths handout,

More information

CHAPTER 7 CONSTRAINED OPTIMIZATION 2: SQP AND GRG

CHAPTER 7 CONSTRAINED OPTIMIZATION 2: SQP AND GRG Chapter 7: Constraned Optmzaton CHAPER 7 CONSRAINED OPIMIZAION : SQP AND GRG Introducton In the prevous chapter we eamned the necessary and suffcent condtons for a constraned optmum. We dd not, however,

More information

Solutions Homework 4 March 5, 2018

Solutions Homework 4 March 5, 2018 1 Solutons Homework 4 March 5, 018 Soluton to Exercse 5.1.8: Let a IR be a translaton and c > 0 be a re-scalng. ˆb1 (cx + a) cx n + a (cx 1 + a) c x n x 1 cˆb 1 (x), whch shows ˆb 1 s locaton nvarant and

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?

More information

Primer on High-Order Moment Estimators

Primer on High-Order Moment Estimators Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

VECTORS AND MATRICES:

VECTORS AND MATRICES: VECTORS AND MATRICES: Matrx: a rectangular array of elements Dmenson: rxc means r rows by b columns Example: A = [a ], =1,2,3; j=1,2 j In general: A = [a ], =1, r; j=1,...c j Multvarate observatons=vector:

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

Logistic Regression Maximum Likelihood Estimation

Logistic Regression Maximum Likelihood Estimation Harvard-MIT Dvson of Health Scences and Technology HST.951J: Medcal Decson Support, Fall 2005 Instructors: Professor Lucla Ohno-Machado and Professor Staal Vnterbo 6.873/HST.951 Medcal Decson Support Fall

More information

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1 School of Computer Scence 10-601 Introducton to Machne Learnng Lnear Regresson Readngs: Bshop, 3.1 Matt Gormle Lecture 5 September 14, 016 1 Homework : Remnders Extenson: due Frda (9/16) at 5:30pm Rectaton

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

Lecture 4: Universal Hash Functions/Streaming Cont d

Lecture 4: Universal Hash Functions/Streaming Cont d CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected

More information

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads

More information

Maximal Margin Classifier

Maximal Margin Classifier CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

Rockefeller College University at Albany

Rockefeller College University at Albany Rockefeller College Unverst at Alban PAD 705 Handout: Maxmum Lkelhood Estmaton Orgnal b Davd A. Wse John F. Kenned School of Government, Harvard Unverst Modfcatons b R. Karl Rethemeer Up to ths pont n

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above

The conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ

More information

Randomness and Computation

Randomness and Computation Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually

More information

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD

Matrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo

More information

STATIC OPTIMIZATION: BASICS

STATIC OPTIMIZATION: BASICS STATIC OPTIMIZATION: BASICS 7A- Lecture Overvew What s optmzaton? What applcatons? How can optmzaton be mplemented? How can optmzaton problems be solved? Why should optmzaton apply n human movement? How

More information

INTRODUCTION TO DOUBLE GENERALIZED LINEAR MODELS. Edilberto Cepeda-Cuervo

INTRODUCTION TO DOUBLE GENERALIZED LINEAR MODELS. Edilberto Cepeda-Cuervo INTRODUCTION TO DOUBLE GENERALIZED LINEAR MODELS SINAPE 2016 Edlberto Cepeda-Cuervo Natonal Unversty of Colomba March 2016 c Natonal Unversty of Colomba Faculty of Scences Department of Statstcs c Edlberto

More information