Distributed and Stochastic Machine Learning on Big Data
|
|
- Jody Francis
- 5 years ago
- Views:
Transcription
1 Dstrbuted and Stochastc Machne Learnng on Bg Data Department of Computer Scence and Engneerng Hong Kong Unversty of Scence and Technology Hong Kong
2 Introducton Synchronous ADMM Asynchronous ADMM Stochastc ADMM Concluson Bg Data human and machnes are creatng tons of data everyday
3 Introducton Synchronous ADMM Asynchronous ADMM Stochastc ADMM Concluson Bg Data human and machnes are creatng tons of data everyday some statstcs Facebook: more than 12 terabytes ( ) of data every day the world created about 1.8 zettabytes ( ) of data n 2011 by 2020, 35 zettabytes of data across the globe
4 Hardware Platform sngle machne
5 Hardware Platform sngle machne data sets can be too large to be processed / stored on one sngle computer
6 Hardware Platform sngle machne data sets can be too large to be processed / stored on one sngle computer dstrbuted processng: e.g., Google s Sbyl 100B samples and 100B features based on MapReduce
7 Hardware Platform sngle machne data sets can be too large to be processed / stored on one sngle computer dstrbuted processng: e.g., Google s Sbyl 100B samples and 100B features based on MapReduce machne learnng algorthms are often teratve not very well suted for MapReduce
8 Dstrbuted Archtectures 1 shared memory: varables stored n a shared address space e.g., servers, hgh-end workstatons, multcore processors
9 Dstrbuted Archtectures 1 shared memory: varables stored n a shared address space e.g., servers, hgh-end workstatons, multcore processors data can be convenently accessed less scalable
10 Dstrbuted Archtectures 1 shared memory: varables stored n a shared address space e.g., servers, hgh-end workstatons, multcore processors data can be convenently accessed less scalable 2 dstrbuted memory: each node has ts own memory nodes connected by hgh-speed communcaton network
11 Dstrbuted Archtectures 1 shared memory: varables stored n a shared address space e.g., servers, hgh-end workstatons, multcore processors data can be convenently accessed less scalable 2 dstrbuted memory: each node has ts own memory nodes connected by hgh-speed communcaton network scalable need to dstrbute/collect nformaton to/from the nodes
12 Example ML Scenaro supervsed learnng data set D = {(x 1, y 1 ),..., (x n, y n )} lnear model y = w x loss for sample : l (w) = (y w x ) 2 mnmze tranng error: mn n w =1 l (w)
13 Example ML Scenaro supervsed learnng data set D = {(x 1, y 1 ),..., (x n, y n )} lnear model y = w x loss for sample : l (w) = (y w x ) 2 mnmze tranng error: mn n w =1 l (w) n s bg, how to solve ths wth multple machnes?
14 Example ML Scenaro supervsed learnng data set D = {(x 1, y 1 ),..., (x n, y n )} lnear model y = w x loss for sample : l (w) = (y w x ) 2 mnmze tranng error: mn n w =1 l (w) n s bg, how to solve ths wth multple machnes? splt the n samples over N machnes D D 1 D 2 D N
15 Example ML Scenaro supervsed learnng data set D = {(x 1, y 1 ),..., (x n, y n )} lnear model y = w x loss for sample : l (w) = (y w x ) 2 mnmze tranng error: mn n w =1 l (w) n s bg, how to solve ths wth multple machnes? splt the n samples over N machnes D D 1 D 2 D N n =1 l (w) = l (w) D } 1 {{} f 1 (w) mnmze one f on each machne + + D N l (w) }{{} f N (w)
16 Dstrbuted Consensus Optmzaton mn x f (x) mn x1,...,x N f (x ) x : node s local copy of x to be learned
17 Dstrbuted Consensus Optmzaton mn x f (x) mn x1,...,x N,z f (x ) : x 1 = = x N = z x : node s local copy of x to be learned z: consensus varable how to ensure a consensus?
18 Alternatng Drecton Method of Multplers (ADMM) mn x,z φ(x) + ψ(z) }{{} convex s.t. Ax + Bz = c }{{} constrant between x and y φ, ψ: convex functons A, B: constant matrces; c: constant vector
19 Alternatng Drecton Method of Multplers (ADMM) augmented Lagrangan mn x,z φ(x) + ψ(z) s.t. Ax + Bz = c L(x, z, λ) = φ(x) + ψ(z) + λ (Ax + Bz c) + β 2 Ax + Bz c 2 λ: Lagrangan multplers; β > 0: penalty parameter
20 Alternatng Drecton Method of Multplers (ADMM) augmented Lagrangan mn x,z φ(x) + ψ(z) s.t. Ax + Bz = c L(x, z, λ) = φ(x) + ψ(z) + λ (Ax + Bz c) + β 2 Ax + Bz c 2 λ: Lagrangan multplers; β > 0: penalty parameter mnmze L n an alternatng manner x t+1 arg mn x L(x, z t, λ t ) z t+1 arg mn z L(x t+1, z, λ t ) λ t+1 λ t + β(ax t+1 + Bz t+1 c)
21 Alternatng Drecton Method of Multplers (ADMM) augmented Lagrangan mn x,z φ(x) + ψ(z) s.t. Ax + Bz = c L(x, z, λ) = φ(x) + ψ(z) + λ (Ax + Bz c) + β 2 Ax + Bz c 2 λ: Lagrangan multplers; β > 0: penalty parameter mnmze L n an alternatng manner x t+1 arg mn x L(x, z t, λ t ) z t+1 arg mn z L(x t+1, z, λ t ) λ t+1 λ t + β(ax t+1 + Bz t+1 c) updates of x and z are often easer
22 Consensus Optmzaton usng ADMM mn x1,...,x N,z f (x ) : x 1 = = x N = z augmented Lagrangan L = N =1 f (x )+λ (x z)+ β 2 x z 2 mnmze L n an alternatng manner x k+1 arg mn L(x, z k, λ k ), = 1,..., N x z k+1 arg mn z L(x k+1, z, λ k ) λ k+1 λ k + β(x k+1 z k+1 ), = 1,..., N
23 Consensus Optmzaton usng ADMM mn x1,...,x N,z f (x ) : x 1 = = x N = z augmented Lagrangan L = N =1 f (x )+λ (x z)+ β 2 x z 2 mnmze L n an alternatng manner x k+1 arg mn L(x, z k, λ k ), = 1,..., N x z k+1 arg mn z L(x k+1, z, λ k ) λ k+1 λ k + β(x k+1 z k+1 ), = 1,..., N dstrbuted consensus optmzaton each worker : updates x, λ master: updates z
24 Dstrbuted Implementaton worker x k+1 arg mn x f (x) + λ k x + β 2 x z k 2 }{{} dff w/ consensus update n parallel usng local data subset
25 Dstrbuted Implementaton worker x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 sends x k+1 to the master
26 Dstrbuted Implementaton worker master x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 z k+1 1 N N =1 x k+1 recompute the consensus
27 Dstrbuted Implementaton worker master x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 z k+1 1 N N =1 x k+1 dstrbutes z k+1 back to all the workers
28 Dstrbuted Implementaton worker master worker x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 z k+1 1 N N =1 x k+1 λ k+1 λ k + β(x k+1 z k+1 )
29 Problem updates have to be synchronzed master needs to wat for the x updates from all workers
30 Problem updates have to be synchronzed master needs to wat for the x updates from all workers workers have dfferent delays (processng speeds, network delays, etc.) has to wat for the slowest worker
31 Dstrbuted Asynchronous ADMM: Worker x update x k+1 arg mn x f (x) + λ k x + β 2 x zk 2
32 Dstrbuted Asynchronous ADMM: Worker x update x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 x k +1 arg mn x f (x) + λ k x + β 2 x zk 2 master and each worker keep ndependent clocks
33 Dstrbuted Asynchronous ADMM: Worker x update x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 x k +1 arg mn x f (x) + λ k x + β 2 x zk 2 x k +1 = arg mn x f (x) + λ k x + β 2 x z 2 z : most recent z value receved from master n general, z s are dfferent (workers have dfferent speeds)
34 Dstrbuted Asynchronous ADMM: Worker x update λ update x k+1 x k +1 x k +1 arg mn x f (x) + λ k x + β 2 x zk 2 arg mn x f (x) + λ k x + β 2 x zk 2 x + β 2 x z 2 = arg mn x f (x) + λ k λ k+1 λ k + β(x k+1 z k+1 )
35 Dstrbuted Asynchronous ADMM: Worker x update λ update x k+1 x k +1 x k +1 arg mn x f (x) + λ k x + β 2 x zk 2 arg mn x f (x) + λ k x + β 2 x zk 2 x + β 2 x z 2 = arg mn x f (x) + λ k λ k+1 λ k + β(x k+1 z k+1 ) λ k +1 λ k + β(x k +1 z )
36 Dstrbuted Asynchronous ADMM: Master master needs to wat for all N worker updates z k+1 1 N N =1 x k+1
37 Dstrbuted Asynchronous ADMM: Master master needs to wat for all N worker updates z k+1 1 N N =1 x k+1
38 Dstrbuted Asynchronous ADMM: Master master only needs to wat for a mnmum of S worker updates synchronous ADMM: S = N z k+1 1 N N =1 ( ) x k β λk
39 Dstrbuted Asynchronous ADMM: Master master only needs to wat for a mnmum of S worker updates synchronous ADMM: S = N z k+1 1 N N =1 ( ) x k β λk S can be much smaller than N (partal barrer) z k+1 1 N N =1 (ˆx + 1 β ˆλ ) (ˆx, ˆλ ): most recent (x, λ ) receved from worker update s stll based on all the {(ˆx, ˆλ )} N =1 (some may not be very fresh)
40 Dstrbuted Asynchronous ADMM: Master master only needs to wat for a mnmum of S worker updates synchronous ADMM: S = N z k+1 1 ( ) N N =1 x k β λk S can be much smaller than N (partal barrer) z k+1 1 N N =1 (ˆx + 1 β ˆλ ) (ˆx, ˆλ ): most recent (x, λ ) receved from worker update s stll based on all the {(ˆx, ˆλ )} N =1 (some may not be very fresh) master sends the updated z k+1 back to (only) those workers whose (x, λ ) have been processed
41 Slow Workers some (ˆx, ˆλ ) s may not be very fresh faster workers more updates slow workers fewer updates, can be very outdated
42 Slow Workers some (ˆx, ˆλ ) s may not be very fresh faster workers more updates slow workers fewer updates, can be very outdated how to ensure suffcent freshness of all updates?
43 Slow Workers some (ˆx, ˆλ ) s may not be very fresh faster workers more updates slow workers fewer updates, can be very outdated how to ensure suffcent freshness of all updates?
44 Bounded Delay every worker has to be updated at least once every τ teratons (x, λ ) update can at most be τ clock cycles old τ = 1 reduces back to synchronous ADMM
45 Example (S = 2, τ = 3) delay counter: how old the worker update s
46 Example (S = 2, τ = 3) master teraton 1: 2 worker updates arrve
47 Example (S = 2, τ = 3) master teraton 2: 4 worker updates arrve
48 Example (S = 2, τ = 3) master teraton 3: 3 worker updates arrve
49 Example (S = 2, τ = 3) master wats... untl update from worker 3 arrves
50 Example (S = 2, τ = 3) master wats... untl update from worker 3 arrves partal barrer S speeds up the algorthm bounded delay τ guarantees convergence to globally optmal soluton
51 Convergence mn x1,...,x N,z f (x ) : x 1 = = x N = z Theorem after T master teratons, E f ( x ) f (x ) +λ ( x }{{} z ) }{{} dff w/ optmal obj dff w/ consensus varable { } Nτ β z 0 z T S β λ0 λ 2 x : average x for node durng the teratons; z: average z
52 Convergence mn x1,...,x N,z f (x ) : x 1 = = x N = z Theorem after T master teratons, E f ( x ) f (x ) +λ ( x }{{} z ) }{{} dff w/ optmal obj dff w/ consensus varable { } Nτ β z 0 z T S β λ0 λ 2 x : average x for node durng the teratons; z: average z workers and network are fast n each teraton, updates from all workers can arrve recover the O( 1 T ) convergence rate of standard ADMM
53 Experments: Structured Sparse Models real-world data are often hgh-dmensonal text bonformatcs hyperspectral mage
54 Experments: Structured Sparse Models real-world data are often hgh-dmensonal text bonformatcs hyperspectral mage Feature selecton va regularzed rsk mnmzaton mnmze loss l(w) + sparsty-nducng regularzer Ω(w)
55 Experments: Structured Sparse Models real-world data are often hgh-dmensonal text bonformatcs hyperspectral mage Feature selecton va regularzed rsk mnmzaton mnmze loss l(w) + sparsty-nducng regularzer Ω(w) lasso: Ω(w) = w 1 = w
56 Structured Sparsty features often have ntrnsc structures structured sparsty
57 Structured Sparsty features often have ntrnsc structures structured sparsty group lasso e.g., categorcal feature: represented by a group of bnary features ((0,0,1) for chnese; (0,1,0) french; (1,0,0) german) graph-guded fused lasso: groups can overlap Ω(w) = (,j) E w j x x j (encourages x x j for (, j) E)
58 Experment: Graph-Guded Fused Lasso 1 L mn x l (x) +λ (,j) E L w j x x j =1 }{{} logstc loss dgts 4 and 9 from the MNIST data set 1.6 mllon 784-dmensonal samples, dvded unformly among the 64 workers
59 Cluster Setup 18 computng nodes nterconnected wth a ggabt Ethernet each node 4 AMD Opteron 2216 (2.4GHz) processors 16GB memory one core for master and each worker process nter-processor communcaton: MPI
60 Convergence of the Objectve wth Tme async-admm faster than sync-admm (S = 64 or τ = 1)
61 Reduces Network Watng Tme Total tme (n seconds) Computatonal tme Network watng tme (64,1) (2,8) (2,16) (4,16) (4,32) (S,τ) combnaton dfferent (S, τ) combnatons have smlar computaton tme smaller S and/or larger τ allows for a hgher degree of asynchrony less tme on network watng
62 More Workers Total tme(n seconds) Computatonal tme(sync ADMM) Network watng tme(sync ADMM) Computatonal tme(async ADMM) Network watng tme(async ADMM) number of workers async-admm s agan faster than sync-admm more workers less tme to wat at least S worker updates sgnfcantly less network watng tme
63 Low-Rank Matrx Factorzaton ADMM can also be effcently used on nonconvex problems low-rank matrx factorzaton (e.g., n collaboratve flterng) mn L,R M LR 2 F }{{} dff w/ orgnal matrx + λ 1 L 2 F + λ 2 R 2 F }{{} standard regularzer
64 Low-Rank Matrx Factorzaton ADMM can also be effcently used on nonconvex problems low-rank matrx factorzaton (e.g., n collaboratve flterng) mn L,R M LR 2 F }{{} dff w/ orgnal matrx + λ 1 L 2 F + λ 2 R 2 F }{{} standard regularzer m = 10000, n = and rank= 100 partton M evenly across columns and then assgn to N = 64 workers
65 Convergence of Objectve wth Tme x 10 5 objectve value sync ADMM async ADMM tme(n seconds) async-admm converges faster than sync-admm
66 Reduces Network Watng Tme 2500 Computatonal tme Network watng tme Total tme(n seconds) (64,1) (2,32) (S,τ) combnaton
67 Problems wth Verson 1 n cloud computng envronments, computng nodes may be dynamcally added, removed and may also fal MPI has no mechansm to handle faults
68 Verson 2: Parameter Server parameter dvded nto shards and stored n multple masters each master stores a replca of ts neghbors keys master manager / worker manager: handle addton, removal, falure of masters and workers
69 Verson 2: Parameter Server parameter dvded nto shards and stored n multple masters each master stores a replca of ts neghbors keys master manager / worker manager: handle addton, removal, falure of masters and workers fault tolerance
70 Verson 2: Parameter Server parameter dvded nto shards and stored n multple masters each master stores a replca of ts neghbors keys master manager / worker manager: handle addton, removal, falure of masters and workers fault tolerance spreads master-worker communcaton more scalable
71 Prelmnary Results Google Cloud cluster wth 32 nodes, each wth 2 cores RCV1 data set; l 1 -regularzed logstc regresson 3.5 x Computatonal tme(sync ADMM) Network watng tme(sync ADMM) Computatonal tme(async ADMM) Network watng tme(async ADMM) 2.5 Total tme(n seconds) sync admm (1 master) async admm (1 master) sync admm (4 master) async admm (4 master) asynchronous algorthm spends less tme on network watng more masters further speedup
72 From BIG Data to bg data
73 From BIG Data to bg data learnng subproblem on each worker 1 n mn w l (w) + Ω(w) n }{{} =1 }{{} regularzer emprcal loss l (w): sample s contrbuton to the loss
74 From BIG Data to bg data learnng subproblem on each worker 1 n mn w l (w) + Ω(w) n }{{} =1 }{{} regularzer emprcal loss l (w): sample s contrbuton to the loss may have closed-form soluton may have to be solved teratvely (e.g., gradent descent, ADMM,...)
75 From BIG Data to bg data learnng subproblem on each worker 1 n mn w l (w) + Ω(w) n }{{} =1 }{{} regularzer emprcal loss l (w): sample s contrbuton to the loss may have closed-form soluton may have to be solved teratvely (e.g., gradent descent, ADMM,...) orgnal problem s BIG, subproblems can stll be bg
76 ADMM on Each Worker mn w,y 1 n n =1 l (w) + Ω(y) s.t. Aw + By = c
77 ADMM on Each Worker mn w,y 1 n n =1 l (w) + Ω(y) s.t. Aw + By = c ADMM s w update: w t+1 arg mn w 1 n n l (w) + β 2 Aw + By t c + α t 2 =1
78 ADMM on Each Worker mn w,y 1 n n =1 l (w) + Ω(y) s.t. Aw + By = c ADMM s w update: w t+1 arg mn w 1 n n l (w) + β 2 Aw + By t c + α t 2 =1 each teraton needs to vst all the samples (batch learnng) bg data become computatonally expensve
79 From Batch to Stochastc popularly used n gradent descent replace the gradent over the whole data set 1 n n =1 l (w) by the gradent l k(t) (w) at a sngle sample k(t) {1, 2,..., n}
80 From Batch to Stochastc popularly used n gradent descent replace the gradent over the whole data set 1 n n =1 l (w) by the gradent l k(t) (w) at a sngle sample k(t) {1, 2,..., n} or over a small mn-batch of samples 1 m S = m n) S l (w) (where
81 From Batch to Stochastc popularly used n gradent descent replace the gradent over the whole data set 1 n n =1 l (w) by the gradent l k(t) (w) at a sngle sample k(t) {1, 2,..., n} or over a small mn-batch of samples 1 m S = m n) S l (w) (where per-teraton complexty s much lower (O(n) vs O(1)) can scale to much larger data sets
82 Stochastc ADMM 1 w t+1 arg mn n w n =1 l (w) + β 2 Aw + By t c + α t 2
83 Stochastc ADMM 1 w t+1 arg mn n w n =1 l (w) + β 2 Aw + By t c + α t 2 w t+1 arg mn w learns from only one sample l k(t) (w) + β 2 Aw + By t c + α t 2 +R(w)
84 Stochastc ADMM 1 w t+1 arg mn n w n =1 l (w) + β 2 Aw + By t c + α t 2 w t+1 arg mn w l k(t) (w) + β 2 Aw + By t c + α t 2 +R(w) w t+1 arg mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w)
85 Stochastc ADMM 1 w t+1 arg mn n w n =1 l (w) + β 2 Aw + By t c + α t 2 w t+1 arg mn w l k(t) (w) + β 2 Aw + By t c + α t 2 +R(w) w t+1 arg mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w) slower convergence (T : number of teratons) batch ADMM stochastc ADMM convergence rate O(1/T ) O(1/ T ) teraton cost O(n) O(1)
86 Stochastc ADMM... w t+1 arg mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w) not usng the old gradents seems wasteful
87 Stochastc ADMM... w t+1 arg mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w) not usng the old gradents seems wasteful
88 Stochastc Average ADMM save gradents 1 l 1 (w 0 ),..., n l n (w 0 ) at teraton t randomly choose a sample k(t) {1,..., n} update the gradent for k(t) only: k(t) l k(t) (w t )
89 Stochastc Average ADMM save gradents 1 l 1 (w 0 ),..., n l n (w 0 ) at teraton t randomly choose a sample k(t) {1,..., n} update the gradent for k(t) only: k(t) l k(t) (w t ) mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w)
90 Stochastc Average ADMM save gradents 1 l 1 (w 0 ),..., n l n (w 0 ) at teraton t mn w randomly choose a sample k(t) {1,..., n} update the gradent for k(t) only: k(t) l k(t) (w t ) l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w) mn w 1 n n =1 (w w t ) + β 2 Aw + By t c + α t 2 + R(w) (cf. mn w 1 n n =1 l (w w t ) + β 2 Aw + By t c + α t 2 + R(w)) out of the n gradents only one of them (whch corresponds to sample k(t)) s based on the current w t ) the others are prevously-stored gradent values
91 Convergence Theorem mn x,y Φ(x, y) φ(x) + ψ(y) : Ax + By = c E Φ( x T, ȳ T ) Φ(x, y ) +γ A x }{{} T + Bȳ T c }{{} dff w/ optmal obj constrant volaton 1 { ( )} γ nl x x y y H 2T y + 2β β 2 + α 0 2 x T : average of x s; ȳ T : average of y s
92 Convergence Theorem mn x,y Φ(x, y) φ(x) + ψ(y) : Ax + By = c E Φ( x T, ȳ T ) Φ(x, y ) +γ A x }{{} T + Bȳ T c }{{} dff w/ optmal obj constrant volaton 1 { ( )} γ nl x x y y H 2T y + 2β β 2 + α 0 2 x T : average of x s; ȳ T : average of y s batch stochastc stochastc avg convergence rate O(1/T ) O(1/ T ) O(1/T ) teraton cost O(n) O(1) O(1)
93 Convergence Theorem mn x,y Φ(x, y) φ(x) + ψ(y) : Ax + By = c E Φ( x T, ȳ T ) Φ(x, y ) +γ A x }{{} T + Bȳ T c }{{} dff w/ optmal obj constrant volaton 1 { ( )} γ nl x x y y H 2T y + 2β β 2 + α 0 2 x T : average of x s; ȳ T : average of y s batch stochastc stochastc avg convergence rate O(1/T ) O(1/ T ) O(1/T ) teraton cost O(n) O(1) O(1) caveat: need to store the gradent values extra O(n) space
94 Experment: Graph-Guded Fused Lasso forest cover type: 581,012 samples; drug: 12,678 objectve value vs tme speed: stochastc average > stochastc > batch
95 Experment: Graph-Guded Fused Lasso... objectve value vs number of passes over data batch: one effectve pass = one teraton stochastc: one effectve pass = n teratons speed: stochastc average > stochastc > batch
96 Concluson BIG data sets dstrbuted processng: asynchronous ADMM partal barrer and bounded delay to control asynchrony
97 Concluson BIG data sets dstrbuted processng: asynchronous ADMM partal barrer and bounded delay to control asynchrony bg data sets (processng on each worker) batch (better) stochastc recyclng old gradents
98 Concluson BIG data sets dstrbuted processng: asynchronous ADMM partal barrer and bounded delay to control asynchrony bg data sets (processng on each worker) batch (better) stochastc recyclng old gradents smple; has convergence guarantees; fast (R. Zhang, J.T. Kwok. ICML-2014) (L.W. Zhong, J.T. Kwok. ICML-2014)
99 Shameless Plug I am lookng for research students research assstants / postdocs / programmers on a ML/DM project related to fnancal data If nterested, please emal me (jamesk@cse.ust.hk) Thank You
A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning
A Delay-tolerant Proxmal-Gradent Algorthm for Dstrbuted Learnng Konstantn Mshchenko Franck Iutzeler Jérôme Malck Massh Amn KAUST Unv. Grenoble Alpes CNRS and Unv. Grenoble Alpes Unv. Grenoble Alpes ICML
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationCSE 546 Midterm Exam, Fall 2014(with Solution)
CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationToward Understanding Heterogeneity in Computing
Toward Understandng Heterogenety n Computng Arnold L. Rosenberg Ron C. Chang Department of Electrcal and Computer Engneerng Colorado State Unversty Fort Collns, CO, USA {rsnbrg, ron.chang@colostate.edu}
More informationLecture 20: November 7
0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationAsynchronous Distributed ADMM for Large-Scale Optimization Part II: Linear Convergence Analysis and Numerical Performance
Industral and Manufacturng Systems Engneerng Publcatons Industral and Manufacturng Systems Engneerng 016 Asynchronous Dstrbuted ADMM for Large-Scale Optmzaton Part II: Lnear Convergence Analyss and Numercal
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationMachine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso
Machne Learnng Data Mnng CS/CS/EE 155 Lecture 4: Regularzaton, Sparsty Lasso 1 Recap: Complete Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Functon,b L( y, f
More informationSingle-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition
Sngle-Faclty Schedulng over Long Tme Horzons by Logc-based Benders Decomposton Elvn Coban and J. N. Hooker Tepper School of Busness, Carnege Mellon Unversty ecoban@andrew.cmu.edu, john@hooker.tepper.cmu.edu
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationOutline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]
DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm
More informationSupport Vector Machines
Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationResource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud
Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal
More informationHongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)
ISSN 1749-3889 (prnt), 1749-3897 (onlne) Internatonal Journal of Nonlnear Scence Vol.17(2014) No.2,pp.188-192 Modfed Block Jacob-Davdson Method for Solvng Large Sparse Egenproblems Hongy Mao, College of
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationSolutions to exam in SF1811 Optimization, Jan 14, 2015
Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More informationA PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS
HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationOn a Parallel Implementation of the One-Sided Block Jacobi SVD Algorithm
Jacob SVD Gabrel Okša formulaton One-Sded Block-Jacob Algorthm Acceleratng Parallelzaton Conclusons On a Parallel Implementaton of the One-Sded Block Jacob SVD Algorthm Gabrel Okša 1, Martn Bečka, 1 Marán
More informationLarge-Margin HMM Estimation for Speech Recognition
Large-Margn HMM Estmaton for Speech Recognton Prof. Hu Jang Department of Computer Scence and Engneerng York Unversty, Toronto, Ont. M3J 1P3, CANADA Emal: hj@cs.yorku.ca Ths s a jont work wth Chao-Jun
More informationAn Admission Control Algorithm in Cloud Computing Systems
An Admsson Control Algorthm n Cloud Computng Systems Authors: Frank Yeong-Sung Ln Department of Informaton Management Natonal Tawan Unversty Tape, Tawan, R.O.C. ysln@m.ntu.edu.tw Yngje Lan Management Scence
More informationThe Study of Teaching-learning-based Optimization Algorithm
Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute
More informationA Link Transmission Model for Air Traffic Flow Prediction and Optimization
School of Aeronautcs and Astronautcs A Ln Transmsson Model for Ar Traffc Flow Predcton and Optmzaton Y Cao and Dengfeng Sun School of Aeronautcs and Astronautcs Purdue Unversty cao20@purdue.edu Aerospace
More informationDistributed design of optimal structured feedback gains
217 IEEE 56th Annual Conference on Decson and Control (CDC) December 12-15, 217, Melbourne, Australa Dstrbuted desgn of optmal structured feedback gans Sepdeh Hassan-Moghaddam and Mhalo R. Jovanovć Abstract
More informationAPPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14
APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More informationUnified Subspace Analysis for Face Recognition
Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA
More informationHighly Efficient Gradient Computation for Density-Constrained Analytical Placement Methods
Hghly Effcent Gradent Computaton for Densty-Constraned Analytcal Placement Methods Jason Cong and Guoje Luo UCLA Computer Scence Department { cong, gluo } @ cs.ucla.edu Ths wor s partally supported by
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationDynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)
/24/27 Prevew Fbonacc Sequence Longest Common Subsequence Dynamc programmng s a method for solvng complex problems by breakng them down nto smpler sub-problems. It s applcable to problems exhbtng the propertes
More informationCombining Constraint Programming and Integer Programming
Combnng Constrant Programmng and Integer Programmng GLOBAL CONSTRAINT OPTIMIZATION COMPONENT Specal Purpose Algorthm mn c T x +(x- 0 ) x( + ()) =1 x( - ()) =1 FILTERING ALGORITHM COST-BASED FILTERING ALGORITHM
More informationAdmin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester
0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #
More informationCollege of Computer & Information Science Fall 2009 Northeastern University 20 October 2009
College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:
More informationEvaluation of classifiers MLPs
Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label
More informationA Rigorous Framework for Robust Data Assimilation
A Rgorous Framework for Robust Data Assmlaton Adran Sandu 1 wth Vshwas Rao 1, Elas D. Nno 1, and Mchael Ng 2 1 Computatonal Scence Laboratory (CSL) Department of Computer Scence Vrgna Tech 2 Hong Kong
More informationIntroduction to the R Statistical Computing Environment R Programming
Introducton to the R Statstcal Computng Envronment R Programmng John Fox McMaster Unversty ICPSR 2018 John Fox (McMaster Unversty) R Programmng ICPSR 2018 1 / 14 Programmng Bascs Topcs Functon defnton
More informationECE559VV Project Report
ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationTOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION
1 2 MULTIPLIERLESS FILTER DESIGN Realzaton of flters wthout full-fledged multplers Some sldes based on support materal by W. Wolf for hs book Modern VLSI Desgn, 3 rd edton. Partly based on followng papers:
More informationClock Synchronization in WSN: from Traditional Estimation Theory to Distributed Signal Processing
Clock Synchronzaton n WS: from Tradtonal Estmaton Theory to Dstrbuted Sgnal Processng Yk-Chung WU The Unversty of Hong Kong Emal: ycwu@eee.hku.hk, Webpage: www.eee.hku.hk/~ycwu Applcatons requre clock
More informationSupport Vector Machines
CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at
More informationNatural Images, Gaussian Mixtures and Dead Leaves Supplementary Material
Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess
More informationSupport Vector Machines
/14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x
More informationDe-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG
6th Internatonal Conference on Mechatroncs, Materals, Botechnology and Envronment (ICMMBE 6) De-nosng Method Based on Kernel Adaptve Flterng for elemetry Vbraton Sgnal of the Vehcle est Kejun ZEG PLA 955
More informationCoarse-Grain MTCMOS Sleep
Coarse-Gran MTCMOS Sleep Transstor Szng Usng Delay Budgetng Ehsan Pakbazna and Massoud Pedram Unversty of Southern Calforna Dept. of Electrcal Engneerng DATE-08 Munch, Germany Leakage n CMOS Technology
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationSparse Methods. Sanjiv Kumar, Google Research, NY. EECS-6898, Columbia University - Fall, 2010
Sparse Methos Sanv Kumar, Google Research, NY EECS-6898, Columba Unversty - Fall, 00 Sanv Kumar /3/00 EECS6898 Large Scale Machne Learnng What s Sparsty? When a ata tem such as a vector or a matrx has
More informationTornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003
Tornado and Luby Transform Codes Ashsh Khst 6.454 Presentaton October 22, 2003 Background: Erasure Channel Elas[956] studed the Erasure Channel β x x β β x 2 m x 2 k? Capacty of Noseless Erasure Channel
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More informationCS-433: Simulation and Modeling Modeling and Probability Review
CS-433: Smulaton and Modelng Modelng and Probablty Revew Exercse 1. (Probablty of Smple Events) Exercse 1.1 The owner of a camera shop receves a shpment of fve cameras from a camera manufacturer. Unknown
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationCSCI B609: Foundations of Data Science
CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex
More information1 Matrix representations of canonical matrices
1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:
More informationDFT with Planewaves pseudopotential accuracy (LDA, PBE) Fast time to solution 1 step in minutes (not hours!!!) to be useful for MD
LLNL-PRES-673679 Ths work was performed under the auspces of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence Lvermore Natonal Securty, LLC Sequoa, IBM BGQ, 1,572,864 cores O(N)
More informationDeep Learning. Boyang Albert Li, Jie Jay Tan
Deep Learnng Boyang Albert L, Je Jay Tan An Unrelated Vdeo A bcycle controller learned usng NEAT (Stanley) What do you mean, deep? Shallow Hdden Markov models ANNs wth one hdden layer Manually selected
More informationMMA and GCMMA two methods for nonlinear optimization
MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons
More informationOutline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique
Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng
More informationHidden Markov Models
CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte
More informationMULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS
014 IEEE Internatonal Conference on Acoustc, Speech and Sgnal Processng ICASSP MULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS Tsung-Hu
More informationMultilayer neural networks
Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer
More information9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar
More informationA FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS
Shervn Haamn A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS INTRODUCTION Increasng computatons n applcatons has led to faster processng. o Use more cores n a chp
More informationU.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017
U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that
More informationMarkov Chain Monte Carlo Lecture 6
where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationAn Integrated OR/CP Method for Planning and Scheduling
An Integrated OR/CP Method for Plannng and Schedulng John Hooer Carnege Mellon Unversty IT Unversty of Copenhagen June 2005 The Problem Allocate tass to facltes. Schedule tass assgned to each faclty. Subect
More informationInternet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks
Internet Engneerng Jacek Mazurkewcz, PhD Softcomputng Part 3: Recurrent Artfcal Neural Networks Self-Organsng Artfcal Neural Networks Recurrent Artfcal Neural Networks Feedback sgnals between neurons Dynamc
More informationCSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing
CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann
More informationTwo-level Parallel Computation for Approximate Inverse with AISM Method
Vol 49 No 6 2164 2168 (June 2008) AISM 2 1 2 3 2 AISM SGI Orgn 2400 MPI PE16 PE 13672 Two-level Parallel Computaton for Approxmate Inverse wth AISM Method Lnje Zhang, 1 Kentaro Morya 2 and Takash Nodera
More informationIntegrated approach in solving parallel machine scheduling and location (ScheLoc) problem
Internatonal Journal of Industral Engneerng Computatons 7 (2016) 573 584 Contents lsts avalable at GrowngScence Internatonal Journal of Industral Engneerng Computatons homepage: www.growngscence.com/ec
More informationLecture 21: Numerical methods for pricing American type derivatives
Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)
More informationNonlinear Classifiers II
Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural
More informationInterconnect Optimization for Deep-Submicron and Giga-Hertz ICs
Interconnect Optmzaton for Deep-Submcron and Gga-Hertz ICs Le He http://cadlab.cs.ucla.edu/~hele UCLA Computer Scence Department Los Angeles, CA 90095 Outlne Background and overvew LR-based STIS optmzaton
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationLinear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1
School of Computer Scence 10-601 Introducton to Machne Learnng Lnear Regresson Readngs: Bshop, 3.1 Matt Gormle Lecture 5 September 14, 016 1 Homework : Remnders Extenson: due Frda (9/16) at 5:30pm Rectaton
More information8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF
10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the
More informationReal-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling
Real-Tme Systems Multprocessor schedulng Specfcaton Implementaton Verfcaton Multprocessor schedulng -- -- Global schedulng How are tasks assgned to processors? Statc assgnment The processor(s) used for
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More information