Optimization. September 4, 2018
|
|
- Aubrey Richard
- 5 years ago
- Views:
Transcription
1 Optmzaton September 4, 2018
2 Optmzaton problem 1/34 An optmzaton problem s the problem of fndng the best soluton for an objectve functon. Optmzaton method plays an mportant role n statstcs, for example, to fnd maxmum lkelhood estmate (MLE). Unconstraned vs. constraned optmzaton problem: whether there s constrant n the soluton space. Most algorthms are based on teratve procedures. We ll spend next few lectures on several optmzaton methods, under the context of statstcs: New-Raphson, Fsher scorng, etc. EM and MM. Hdden Markov models. Lnear and quadratc programmng.
3 Revew: Newton-Raphson (NR) method 2/34 Goal: Fnd the root for equaton f (θ) = 0. Approach: 1. Choose an ntal value θ (0) as the startng pont. 2. By Taylor expanson at θ (0), we have f (θ) = f (θ (0) ) + f (θ (0) )(θ θ (0) ). Set f (θ) = 0 gves an update of the parameter: θ (1) = θ (0) f (θ (0) )/ f (θ (0) ). 3. Repeated update untl convergence: θ (k+1) = θ (k) f (θ (k) )/ f (θ (k) ).
4 NR method convergence rate 3/34 Quadratc convergence: θ s the soluton. lm k θ (k+1) θ θ (k) θ 2 = c (rate = c > 0, order = 2) The # of sgnfcant dgts nearly doubles at each step (n the neghborhood of θ ). Proof: By Taylor expanson (to the second order) at θ (k), 0 = f (θ ) = f (θ (k) ) + f (θ (k) )(θ θ (k) ) f (ξ (k) )(θ θ (k) ) 2, ξ (k) [θ, θ (k) ] Dvdng the equaton by f (θ (k) ) gves f (θ (k) )/ f (θ (k) ) (θ θ (k) ) = f (ξ (k) ) 2 f (θ (k) ) (θ θ (k) ) 2. The defnton of θ (k+1) = θ (k) f (θ (k) )/ f (θ (k) ) gves What condtons are needed? f (θ (k) ) 0 n the neghborhood of θ f (ξ (k) ) s bounded θ (k+1) θ = f (ξ (k) ) 2 f (θ (k) ) (θ θ (k) ) 2. Startng pont s suffcently close to the root θ
5 Revew: maxmum lkelhood 4/34 Here s a lst of some defntons related to maxmum lkelhood estmate: Parameter θ, a p-vector Data X Log lkelhood l(θ) = log Pr(X θ) Score functon l(θ) = ( l/ θ 1,..., l/ θ p ) Hessan matrx l(θ) = { 2 l/ θ θ j }, j=1,...,p Fsher nformaton I(θ) = E l(θ) = E l(θ){ l(θ)} Observed nformaton l(ˆθ) When θ s a local maxmum of l, l(θ ) = 0, and l(θ ) s negatve defnte.
6 Applcaton of NR method n MLE: when θ s a scalar 5/34 Maxmum Lkelhood Estmaton (MLE): ˆθ = arg max θ l(θ). Approach Fnd ˆθ such that l(ˆθ) = 0. If the closed form soluton for l(ˆθ) = 0 s dffcult to obtan, one can use NR method (replace f by l). The the NR update for solvng MLE s: θ (k+1) = θ (k) l(θ (k) )/ l(θ (k) ).
7 What can go wrong? 6/34 Bad startng pont May not converge to the global maxmum Saddle pont: l(ˆθ) = 0, but l(ˆθ) s nether negatve defnte nor postve defnte (statonary pont but not a local extremum; can be used to check the lkelhood) startng pont & local extremum saddle pont saddle pont l(θ) = θ 3 l(θ 1, θ 2 ) = θ1 2 θ2 2
8 Generalzaton to hgher dmensons: when θ s a vector 7/34 General Algorthm 1. (Startng pont) Pck a startng pont θ (0) and let k = 0 2. (Iteraton) Determne the drecton d (k) (a p-vector) and the step sze α (k) (a scalar) and calculate θ (k+1) = θ (k) + α (k) d (k), such that l(θ (k+1) ) > l(θ (k) ) 3. (Stop crtera) Stop teraton f l(θ (k+1) ) l(θ (k) ) /( l(θ (k) ) + ɛ 1 ) < ɛ 2 or θ k+1, j θ k, j /( θ k, j + ɛ 1 ) < ɛ 2, j = 1,..., p for precsons such as ɛ 1 = 10 4 and ɛ 2 = Otherwse go to 2. Key: Determne the drecton and the step sze
9 Generalzaton to hgher dmensons (contnued) 8/34 Determnng the drecton (general framework, detals later) We generally pck d (k) = R 1 l(θ (k) ), where R s a postve defnte matrx. Choosng a step sze (gven the drecton) Step halvng To fnd α (k) such that l(θ (k+1) ) > l(θ (k) ) Start at a large value of α (k). Halve α (k) untl l(θ (k+1) ) > l(θ (k) ) Smple, robust, but relatvely slow Lnear search To fnd α (k) = arg max α l(θ (k) + αd (k) ) Approxmate l(θ (k) + αd (k) ) by dong a polynomal nterpolaton and fnd α (k) maxmzng the polynomal Fast
10 Polynomal nterpolaton 9/34 Gven a set of p + 1 data ponts from the functon f (α) l(θ (k) + αd (k) ), we can fnd a unque polynomal wth degree p that goes through the p + 1 data ponts. (For a quadratc approxmaton, we only need 3 data ponts.)
11 Survey of basc methods 10/34 1. Steepest ascent: R = I = dentty matrx d (k) = l(θ (k) ) α (k) = arg max α l(θ(k) + α l(θ (k) )) or a small fxed number θ (k+1) = θ (k) + α (k) l(θ (k) ) Why l(θ (k) ) s the steepest ascent drecton? By Taylor expanson at θ (k), By Cauchy-Schwarz nequalty, l(θ (k) + ) l(θ (k) ) = T l(θ (k) ) + o( ) T l(θ (k) ) l(θ (k) ) The equalty holds at = α l(θ (k) ). So when = α l(θ (k) ), l(θ (k) + ) ncreases the most. Easy to mplement; only requre the frst dervatve/gradent/score Guarantee an ncrease at each step no matter where you start Converge slowly. The drectons of two consecutve steps are orthogonal, so the algorthm zgzags to the maxma.
12 Steepest ascent (contnued) 11/34 When α (k) s chosen as arg max α l(θ (k) + α l(θ (k) )), the drectons of two consecutve steps are orthogonal,.e., [ l(θ (k) )] T l(θ (k+1) ) = 0. Proof: By the defnton of α (k) and θ (k+1) 0 = l(θ(k) + α l(θ (k) )) α = l(θ (k) + α (k) l(θ (k) )) T l(θ (k) ) = l(θ (k+1) ) T l(θ (k) ). α=α (k)
13 Example: Steepest Ascent 12/34 Maxmze the functon f (x) = 6x x 3 Y X
14 Example: Steepest Ascent (cont.) 13/34 fun0 <- functon(x) return(- xˆ3 + 6*x) grd0 <- functon(x) return(- 3*xˆ2 + 6) # target functon # gradent # Steepest Ascent Algorthm Steepest_Ascent <- functon(x, fun=fun0, grd=grd0, step=0.01, kmax=1000, tol1=1e-6, tol2=1e-4) { dff <- 2*x # use a large value to get nto the followng "whle" loop k <- 0 # count teraton whle ( all(abs(dff) > tol1*(abs(x)+tol2) ) & k <= kmax) # stop crtera { g_x <- grd(x) # calculate gradent usng x dff <- step * g_x # calculate the dfference used n the stop crtera x <- x + dff # update x k <- k + 1 # update teraton } f_x = fun(x) } return(lst(teraton=k, x=x, f_x=f_x, g_x=g_x))
15 Example: Steepest Ascent (cont.) 14/34 > Steepest_Ascent(x=2, step=0.01) $teraton [1] 117 $x [1] $f_x [1] $g_x [1] > Steepest_Ascent(x=1, step=-0.01) $teraton [1] 159 $x [1] $f_x [1] $g_x [1]
16 In large dataset 15/34 The data log-lkelhood s usually summed over n observatons: l(θ) = n =1 l(x ; θ). When n s large, ths poses computatonal burden. One can mplement a stochastc verson of the algorthm: stochastc gradent descent (SGD). Note: Gradent descent s just steepest descent. Smple SGD algorthm: replace the gradent l(θ) by the gradent computed from a sngle sample l(x ; θ), where x s randomly sampled. Mn-batch SGD algorthm: compute the gradent based on a small number of observatons. Advantage of SGD: Evaluate gradent at one (or a few) observatons, requres less memory. Has better property to escape from local mnmum (gradent s nosy). Dsdvantage of SGD: Slower convergence.
17 Survey of basc methods (contnued) 16/34 2. Newton-Raphson: R = l(θ (k) ) = observed nformaton d (k) = [ l(θ (k) )] 1 l(θ (k) ) θ (k+1) = θ (k) + [ l(θ (k) )] 1 l(θ (k) ) α (k) = 1 for all k Fast, quadratc convergence Need very good startng ponts Theorem: If R s postve defnte, the equaton set Rd (k) = l(θ (k) ) has a unque soluton for the drecton d (k), and the drecton ensures ascent of l(θ). Proof: When R s postve defnte, t s nvertble. So we have a unque soluton d (k) = R 1 l(θ (k) ). Let θ (k+1) = θ (k) + αd (k) = θ (k) + αr 1 l(θ (k) ). By Taylor expanson, l(θ (k+1) ) l(θ (k) ) + α l(θ (k) ) T R 1 l(θ (k) ). The postve defnte matrx R ensures that l(θ (k+1) ) > l(θ (k) ) for suffcently small postve α.
18 Newton-Raphson vs. steepest ascent 17/34 Newton-Raphson converges much faster than steepest ascent (gradent descent). NR requres the computaton of second dervatve, whch can be dffcult and computatonally expensve. In contrast, gradent descent requres only the frst dervatve, whch s easy to compute. For poorly behaved objectve functon (non-convex), gradent-based methods are often more stable. Gradent-based method (especally SGD) s wdely used n modern machne learnng.
19 Example: Newton Raphson 18/34 fun0 <- functon(x) return(- xˆ3 + 6*x) grd0 <- functon(x) return(- 3*xˆ2 + 6) hes0 <- functon(x) return(- 6*x) # target functon # gradent # Hessan # Newton-Raphson Algorthm Newton_Raphson <- functon(x, fun=fun0, grd=grd0, hes=hes0, kmax=1000, tol1=1e-6, tol2=1e-4) { dff <- 2*x k <- 0 whle ( all(abs(dff) > tol1*(abs(x)+tol2) ) & k <= kmax) { g_x <- grd(x) h_x <- hes(x) # calculate the second dervatve (Hessan) dff <- -g_x/h_x # calculate the dfference used by the stop crtera x <- x + dff k <- k + 1 } f_x = fun(x) } return(lst(teraton=k, x=x, f_x=f_x, g_x=g_x, h_x=h_x))
20 Example: Newton Raphson 19/34 > Newton_Raphson(x=2) $teraton [1] 5 $x [1] $f_x [1] $g_x [1] e-11 $h_x [1] > Newton_Raphson(x=1) $teraton [1] 5 $x [1] $f_x [1] $g_x [1] e-11 $h_x [1]
21 Survey of basc methods (contnued) 20/34 3. Modfcaton of Newton-Raphson Fsher scorng: replace l(θ) wth E l(θ) E l(θ) = E l(θ) l(θ) s always postve and stablze the algorthm E l(θ) can have a smpler form than l(θ) Newton-Raphson and Fsher scorng are equvalent for parameter estmaton n GLM wth canoncal lnk. Quas-Newton: aka varable metrc methods or secant methods. Approxmate l(θ) n a way that avods calculatng Hessan and ts nverse has convergence propertes smlar to Newton
22 Fsher Scorng: Example 21/34 In the Posson regresson model of n subjects, The responses Y Posson(λ ) = (Y!) 1 λ Y e λ. We know that λ = E(Y X ). We relate the mean of Y to X by g(λ ) = X β. Takng dervatve on both sdes, g (λ ) λ β = X λ β = X g (λ ) Log lkelhood: l(β) = n =1 (Y log λ λ ), where λ s satsfy g(λ ) = X β. Maxmum lkelhood estmaton: ˆβ = arg max β l(β) Newton-Raphson needs ( ) Y λ ( ) l(β) = 1 λ β = Y 1 1 λ g (λ ) X Y λ 1 ( ) l(β) = λ 2 β g (λ ) X Y g (λ ) λ 1 λ g (λ ) 2 β X 1 1 ( ) Y 1 1 = λ g (λ ) 2X2 1 λ λ g (λ ) 2X2 ( Y λ 1 ) g (λ ) g (λ ) 3X2
23 Fsher Scorng: Example (contnued) 22/34 Fsher scorng needs l(β) and E [ l(β) ] = whch s l(β) wthout the extra terms. 1 1 λ g (λ ) 2X2 Wth the canoncal lnk for Posson regresson: we have g (λ ) = λ 1 g(λ ) = log λ, and g (λ ) = λ 2. So that the extra terms equal to zero (check ths!) and we conclude that Newton-Raphson and Fsher scorng are equvalent.
24 Quas-Newton 23/34 1. Davdson-Fletcher-Powell QNR algorthm Let l (k) = l(θ (k) ) l(θ (k 1) ) and θ (k) = θ (k) θ (k 1). Approxmate negatve Hessan by Use the startng matrx G (0) = I. G (k+1) = G (k) + θ(k) ( θ (k) ) T ( θ (k) ) T θ (k) G(k) l (k) ( l (k) ) T G (k) ( l (k) ) T G (k) l (k). Theorem: If the startng matrx G (0) s postve defnte, the above formula ensures that every G (k) durng the teraton s postve defnte.
25 Nonlnear Regresson Models 24/34 Data: (x, y ) for = 1,..., n Notaton and assumptons Model: y = h(x, β) + ɛ, where ɛ..d N(0, σ 2 ) and h(.) s known Resdual: e (β) = y h(x, β) Jacoban: {J(β)} j = h(x,β) β j = e (β) β j, a n p matrx Goal: to obtan MLE ˆβ = arg mn β S (β), where S (β) = {y h(x, β)} 2 = [e(β)] T e(β) We could use the prevously-dscussed Newton-Raphson algorthm. Gradent: g j (β) = S (β) β j = 2 e (β) e (β) β j,.e., g(β) = 2J(β) T e(β) Hessan: H jr (β) = 2 S (β) β j β r = 2 {e (β) 2 e (β) Problem: Hessan could be hard to obtan. β j β r + e (β) β j e (β) β r }
26 Gauss-Newton algorthm 25/34 Recall n lnear regresson models, we mnmze { S (β) = y x T β} 2 Because S (β) s a quadratc functon, t s easy to get MLE ˆβ = x x 1 T x y Now n the nonlnear regresson models, we want to mnmze S (β) = {y h(x, β)} 2 Idea: Approxmate h(x, β) by a lnear functon, teratvely at β (k) Gven β (k) and by Taylor expanson of h(x, β) at β (k), S (β) becomes { S (β) y h(x, β (k) ) (β β (k) ) T h(x, β (k) ) β } 2
27 Gauss-Newton algorthm (cont.) 26/34 1. Fnd a good startng pont β (0) 2. At step k + 1, (a) Form e(β (k) ) and J(β (k) ) (b) Use a standard lnear regresson routne to obtan δ (k) = [J(β (k) ) T J(β (k) )] 1 J(β (k) ) T e(β (k) ) (c) Obtan the new estmate β (k+1) = β (k) + δ (k) Don t need computng Hessan matrx. Need good startng values. Requre J(β (k) ) T J(β (k) ) to be nvertble. Ths s not a general optmzaton method. Only applcable to lease square problem.
28 Example: Generalzed lnear models (GLM) 27/34 Data: (y, x ) for = 1,..., n Notaton and assumptons Mean: E(y x) = µ Lnk g: g(µ) = x β Varance functon V: Var(y x) = φv(µ) Log lkelhood (exponental famly): l(θ, φ; y) = {yθ b(θ)}/a(φ) + c(y, φ) We obtan Score functon: l = {y b (θ)}/a(φ) Observed nformaton: l = b (θ)/a(φ) Mean (n term of θ): E(y x) = a(φ)e( l) + b (θ) = b (θ) Varance (θ, φ): Var(y x) = E(y b (θ)) 2 = a(φ) 2 E( l l ) = a(φ) 2 E( l) = b (θ)a(φ) Canoncal lnk: g such that g(µ) = θ,.e. g 1 = b Generally we have a(φ) = φ/w, n whch case φ wll drop out of the followng.
29 GLM 28/34 Model Normal Posson Bnomal Gamma φ σ 2 1 1/m 1/ν b(θ) θ 2 /2 exp(θ) log(1 + e θ ) log( θ) µ θ exp(θ) e θ /(1 + e θ ) 1/θ Canoncal lnk g dentty log logt recprocal Varance functon V 1 µ µ(1 µ) µ 2
30 Towards Iteratvely reweghted least squares 29/34 In lnear regresson models, E(y x ) = x T β, so we mnmze { S (β) = y x T β} 2 Because S (β) s a quadratc functon, t s easy to get MLE ˆβ = x x 1 T x y In generalzed lnear models, consder construct a smlar quadratc functon S (β). Queston? Can we use Answer: No, because { S (β) = g(y ) x T β} 2 E {g(y ) x } x T β Idea: Approxmate g(y ) by a lnear functon wth expectaton x T β(k), nteractvely at β (k)
31 Iteratvely reweghted least squares 30/34 Lnearze g(y ) around ˆµ (k) Check the varances of ỹ (k) W (k) = g 1 (x T β(k) ), denote the lnearzed value by ỹ (k). ỹ (k) = g(ˆµ (k) ) + (y ˆµ (k) )g (ˆµ (k) and use them as weghts = { Var(ỹ (k) Gven β (k), we consder mnmze IRLS algorthm: S (β) = ) ) } 1 [ = {g (ˆµ (k) )} 2 V(ˆµ (k) ) ] 1 W (k) 1. Start wth ntal estmates, generally ˆµ (0) 2. Form ỹ (k) and W (k) 3. Estmate β (k+1) by regressng ỹ (k) {ỹ(k) x T β} 2 = y on x wth weghts W (k) 4. Form ˆµ (k+1) = g 1 (x T β(k+1) ) and return to step 2.
32 Iteratvely reweghted least squares (contnued) 31/34 Model Posson Bnomal Gamma µ = g 1 (η) e η e η /(1 + e η ) 1/η g (µ) 1/µ 1/[µ(1 µ)] 1/µ 2 V(µ) µ µ(1 µ) µ 2 McCullagh and Nelder (1983) justfed IRLS by showng that IRLS s equvalent to Fsher scorng. In the case of the canoncal lnk, IRLS s also equvalent to Newton-Raphson. IRLS s attractve because no specal optmzaton algorthm s requred, just a subroutne that computes weghted least square estmates.
33 Mscellaneous thngs 32/34 Dsperson parameter: When we do not take φ = 1, the usual estmate s va the method of moments: ˆφ = 1 (y ˆµ ) 2 n p V(ˆµ ) Standard errors: Var(ˆβ) = ˆφ(X ŴX) 1 Quas lkelhood: Pck a lnk and a varance functon, and IRLS can proceed wthout worryng about the model. In other words, IRLS s a good thng!
34 A quck revew 33/34 Optmzaton method s mportant n statstcs, (.e., to fnd MLE), or n general machne learnng (mnmze some loss functon). Maxmzng/mnmzng an objectve functon s acheved by solvng the equaton that the frst dervatve s 0 (need to check second dervatve). Steepest ascent method: Only need gradent. Slow convergence. In large dataset wth ll-behaved objectve functon, stochastc verson (SGD) usually works better.
35 Newton-Raphson (NR) method: Quadratc convergence rate. Could stuck n local maxmum. In hgher dmenson, the problems are to fnd drectons and step szes n each teraton. Fsher scorng: use expected nformaton matrx. NR use observed nformaton matrx. The expected nformaton s more stable and smpler. Fsher scorng and Newton-Raphson are equvalent under canoncal lnk. Gauss-Newton algorthm for non-lnear regresson: Hessan matrx s not needed.
Optimization. August 30, 2016
Optmzaton August 30, 2016 Optmzaton problem 1/31 An optmzaton problem s the problem of fndng the best soluton for an objectve functon. Optmzaton method plays an mportant role n statstcs, for example, to
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationChapter Newton s Method
Chapter 9. Newton s Method After readng ths chapter, you should be able to:. Understand how Newton s method s dfferent from the Golden Secton Search method. Understand how Newton s method works 3. Solve
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationTopic 5: Non-Linear Regression
Topc 5: Non-Lnear Regresson The models we ve worked wth so far have been lnear n the parameters. They ve been of the form: y = Xβ + ε Many models based on economc theory are actually non-lnear n the parameters.
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationAPPENDIX A Some Linear Algebra
APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationSolutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.
Solutons HW #2 Dual of general LP. Fnd the dual functon of the LP mnmze subject to c T x Gx h Ax = b. Gve the dual problem, and make the mplct equalty constrants explct. Soluton. 1. The Lagrangan s L(x,
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationThe EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X
The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationReview of Taylor Series. Read Section 1.2
Revew of Taylor Seres Read Secton 1.2 1 Power Seres A power seres about c s an nfnte seres of the form k = 0 k a ( x c) = a + a ( x c) + a ( x c) + a ( x c) k 2 3 0 1 2 3 + In many cases, c = 0, and the
More informationOPTIMISATION. Introduction Single Variable Unconstrained Optimisation Multivariable Unconstrained Optimisation Linear Programming
OPTIMIATION Introducton ngle Varable Unconstraned Optmsaton Multvarable Unconstraned Optmsaton Lnear Programmng Chapter Optmsaton /. Introducton In an engneerng analss, sometmes etremtes, ether mnmum or
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationIntroduction to the R Statistical Computing Environment R Programming
Introducton to the R Statstcal Computng Envronment R Programmng John Fox McMaster Unversty ICPSR 2018 John Fox (McMaster Unversty) R Programmng ICPSR 2018 1 / 14 Programmng Bascs Topcs Functon defnton
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More informationInexact Newton Methods for Inverse Eigenvalue Problems
Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.
More informationEEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationChapter 4: Root Finding
Chapter 4: Root Fndng Startng values Closed nterval methods (roots are search wthn an nterval o Bsecton Open methods (no nterval o Fxed Pont o Newton-Raphson o Secant Method Repeated roots Zeros of Hgher-Dmensonal
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationLecture 2 Solution of Nonlinear Equations ( Root Finding Problems )
Lecture Soluton o Nonlnear Equatons Root Fndng Problems Dentons Classcaton o Methods Analytcal Solutons Graphcal Methods Numercal Methods Bracketng Methods Open Methods Convergence Notatons Root Fndng
More informationEstimation: Part 2. Chapter GREG estimation
Chapter 9 Estmaton: Part 2 9. GREG estmaton In Chapter 8, we have seen that the regresson estmator s an effcent estmator when there s a lnear relatonshp between y and x. In ths chapter, we generalzed the
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationSome modelling aspects for the Matlab implementation of MMA
Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationAssortment Optimization under MNL
Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.
More information4.3 Poisson Regression
of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)
More informationVector Norms. Chapter 7 Iterative Techniques in Matrix Algebra. Cauchy-Bunyakovsky-Schwarz Inequality for Sums. Distances. Convergence.
Vector Norms Chapter 7 Iteratve Technques n Matrx Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematcs Unversty of Calforna, Berkeley Math 128B Numercal Analyss Defnton A vector norm
More informationCME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 13
CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 13 GENE H GOLUB 1 Iteratve Methods Very large problems (naturally sparse, from applcatons): teratve methods Structured matrces (even sometmes dense,
More informationLecture 2: Prelude to the big shrink
Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson
More informationAdditional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty
Addtonal Codes usng Fnte Dfference Method Benamn Moll 1 HJB Equaton for Consumpton-Savng Problem Wthout Uncertanty Before consderng the case wth stochastc ncome n http://www.prnceton.edu/~moll/ HACTproect/HACT_Numercal_Appendx.pdf,
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationNewton s Method for One - Dimensional Optimization - Theory
Numercal Methods Newton s Method for One - Dmensonal Optmzaton - Theory For more detals on ths topc Go to Clck on Keyword Clck on Newton s Method for One- Dmensonal Optmzaton You are free to Share to copy,
More information: Numerical Analysis Topic 2: Solution of Nonlinear Equations Lectures 5-11:
764: Numercal Analyss Topc : Soluton o Nonlnear Equatons Lectures 5-: UIN Malang Read Chapters 5 and 6 o the tetbook 764_Topc Lecture 5 Soluton o Nonlnear Equatons Root Fndng Problems Dentons Classcaton
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationSupplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso
Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationDesigning a Pseudo R-Squared Goodness-of-Fit Measure in Generalized Linear Models
Desgnng a Pseudo R-Squared Goodness-of-Ft Measure n Generalzed Lnear Models H. I. Mbachu Dept. of Mathematcs/Statstcs, Unversty of Port Harcourt, Port Harcourt E. C. Nduka Dept. of Mathematcs/Statstcs,
More informationCS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationSTAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression
STAT 45 BIOSTATISTICS (Fall 26) Handout 5 Introducton to Logstc Regresson Ths handout covers materal found n Secton 3.7 of your text. You may also want to revew regresson technques n Chapter. In ths handout,
More informationCHAPTER 7 CONSTRAINED OPTIMIZATION 2: SQP AND GRG
Chapter 7: Constraned Optmzaton CHAPER 7 CONSRAINED OPIMIZAION : SQP AND GRG Introducton In the prevous chapter we eamned the necessary and suffcent condtons for a constraned optmum. We dd not, however,
More informationSolutions Homework 4 March 5, 2018
1 Solutons Homework 4 March 5, 018 Soluton to Exercse 5.1.8: Let a IR be a translaton and c > 0 be a re-scalng. ˆb1 (cx + a) cx n + a (cx 1 + a) c x n x 1 cˆb 1 (x), whch shows ˆb 1 s locaton nvarant and
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationPrimer on High-Order Moment Estimators
Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More information1 GSW Iterative Techniques for y = Ax
1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn
More informationVECTORS AND MATRICES:
VECTORS AND MATRICES: Matrx: a rectangular array of elements Dmenson: rxc means r rows by b columns Example: A = [a ], =1,2,3; j=1,2 j In general: A = [a ], =1, r; j=1,...c j Multvarate observatons=vector:
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationGlobal Sensitivity. Tuesday 20 th February, 2018
Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationTHE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens
THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of
More informationEcon Statistical Properties of the OLS estimator. Sanjaya DeSilva
Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationLogistic Regression Maximum Likelihood Estimation
Harvard-MIT Dvson of Health Scences and Technology HST.951J: Medcal Decson Support, Fall 2005 Instructors: Professor Lucla Ohno-Machado and Professor Staal Vnterbo 6.873/HST.951 Medcal Decson Support Fall
More informationLinear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1
School of Computer Scence 10-601 Introducton to Machne Learnng Lnear Regresson Readngs: Bshop, 3.1 Matt Gormle Lecture 5 September 14, 016 1 Homework : Remnders Extenson: due Frda (9/16) at 5:30pm Rectaton
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationLecture 4: Universal Hash Functions/Streaming Cont d
CSE 5: Desgn and Analyss of Algorthms I Sprng 06 Lecture 4: Unversal Hash Functons/Streamng Cont d Lecturer: Shayan Oves Gharan Aprl 6th Scrbe: Jacob Schreber Dsclamer: These notes have not been subjected
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationMaximal Margin Classifier
CS81B/Stat41B: Advanced Topcs n Learnng & Decson Makng Mamal Margn Classfer Lecturer: Mchael Jordan Scrbes: Jana van Greunen Corrected verson - /1/004 1 References/Recommended Readng 1.1 Webstes www.kernel-machnes.org
More informationEcon107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)
I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationWe present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.
CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationRockefeller College University at Albany
Rockefeller College Unverst at Alban PAD 705 Handout: Maxmum Lkelhood Estmaton Orgnal b Davd A. Wse John F. Kenned School of Government, Harvard Unverst Modfcatons b R. Karl Rethemeer Up to ths pont n
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationECE559VV Project Report
ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate
More informationThe conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above
The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ
More informationRandomness and Computation
Randomness and Computaton or, Randomzed Algorthms Mary Cryan School of Informatcs Unversty of Ednburgh RC 208/9) Lecture 0 slde Balls n Bns m balls, n bns, and balls thrown unformly at random nto bns usually
More informationMatrix Approximation via Sampling, Subspace Embedding. 1 Solving Linear Systems Using SVD
Matrx Approxmaton va Samplng, Subspace Embeddng Lecturer: Anup Rao Scrbe: Rashth Sharma, Peng Zhang 0/01/016 1 Solvng Lnear Systems Usng SVD Two applcatons of SVD have been covered so far. Today we loo
More informationSTATIC OPTIMIZATION: BASICS
STATIC OPTIMIZATION: BASICS 7A- Lecture Overvew What s optmzaton? What applcatons? How can optmzaton be mplemented? How can optmzaton problems be solved? Why should optmzaton apply n human movement? How
More informationINTRODUCTION TO DOUBLE GENERALIZED LINEAR MODELS. Edilberto Cepeda-Cuervo
INTRODUCTION TO DOUBLE GENERALIZED LINEAR MODELS SINAPE 2016 Edlberto Cepeda-Cuervo Natonal Unversty of Colomba March 2016 c Natonal Unversty of Colomba Faculty of Scences Department of Statstcs c Edlberto
More information