Distributed and Stochastic Machine Learning on Big Data

Size: px
Start display at page:

Download "Distributed and Stochastic Machine Learning on Big Data"

Transcription

1 Dstrbuted and Stochastc Machne Learnng on Bg Data Department of Computer Scence and Engneerng Hong Kong Unversty of Scence and Technology Hong Kong

2 Introducton Synchronous ADMM Asynchronous ADMM Stochastc ADMM Concluson Bg Data human and machnes are creatng tons of data everyday

3 Introducton Synchronous ADMM Asynchronous ADMM Stochastc ADMM Concluson Bg Data human and machnes are creatng tons of data everyday some statstcs Facebook: more than 12 terabytes ( ) of data every day the world created about 1.8 zettabytes ( ) of data n 2011 by 2020, 35 zettabytes of data across the globe

4 Hardware Platform sngle machne

5 Hardware Platform sngle machne data sets can be too large to be processed / stored on one sngle computer

6 Hardware Platform sngle machne data sets can be too large to be processed / stored on one sngle computer dstrbuted processng: e.g., Google s Sbyl 100B samples and 100B features based on MapReduce

7 Hardware Platform sngle machne data sets can be too large to be processed / stored on one sngle computer dstrbuted processng: e.g., Google s Sbyl 100B samples and 100B features based on MapReduce machne learnng algorthms are often teratve not very well suted for MapReduce

8 Dstrbuted Archtectures 1 shared memory: varables stored n a shared address space e.g., servers, hgh-end workstatons, multcore processors

9 Dstrbuted Archtectures 1 shared memory: varables stored n a shared address space e.g., servers, hgh-end workstatons, multcore processors data can be convenently accessed less scalable

10 Dstrbuted Archtectures 1 shared memory: varables stored n a shared address space e.g., servers, hgh-end workstatons, multcore processors data can be convenently accessed less scalable 2 dstrbuted memory: each node has ts own memory nodes connected by hgh-speed communcaton network

11 Dstrbuted Archtectures 1 shared memory: varables stored n a shared address space e.g., servers, hgh-end workstatons, multcore processors data can be convenently accessed less scalable 2 dstrbuted memory: each node has ts own memory nodes connected by hgh-speed communcaton network scalable need to dstrbute/collect nformaton to/from the nodes

12 Example ML Scenaro supervsed learnng data set D = {(x 1, y 1 ),..., (x n, y n )} lnear model y = w x loss for sample : l (w) = (y w x ) 2 mnmze tranng error: mn n w =1 l (w)

13 Example ML Scenaro supervsed learnng data set D = {(x 1, y 1 ),..., (x n, y n )} lnear model y = w x loss for sample : l (w) = (y w x ) 2 mnmze tranng error: mn n w =1 l (w) n s bg, how to solve ths wth multple machnes?

14 Example ML Scenaro supervsed learnng data set D = {(x 1, y 1 ),..., (x n, y n )} lnear model y = w x loss for sample : l (w) = (y w x ) 2 mnmze tranng error: mn n w =1 l (w) n s bg, how to solve ths wth multple machnes? splt the n samples over N machnes D D 1 D 2 D N

15 Example ML Scenaro supervsed learnng data set D = {(x 1, y 1 ),..., (x n, y n )} lnear model y = w x loss for sample : l (w) = (y w x ) 2 mnmze tranng error: mn n w =1 l (w) n s bg, how to solve ths wth multple machnes? splt the n samples over N machnes D D 1 D 2 D N n =1 l (w) = l (w) D } 1 {{} f 1 (w) mnmze one f on each machne + + D N l (w) }{{} f N (w)

16 Dstrbuted Consensus Optmzaton mn x f (x) mn x1,...,x N f (x ) x : node s local copy of x to be learned

17 Dstrbuted Consensus Optmzaton mn x f (x) mn x1,...,x N,z f (x ) : x 1 = = x N = z x : node s local copy of x to be learned z: consensus varable how to ensure a consensus?

18 Alternatng Drecton Method of Multplers (ADMM) mn x,z φ(x) + ψ(z) }{{} convex s.t. Ax + Bz = c }{{} constrant between x and y φ, ψ: convex functons A, B: constant matrces; c: constant vector

19 Alternatng Drecton Method of Multplers (ADMM) augmented Lagrangan mn x,z φ(x) + ψ(z) s.t. Ax + Bz = c L(x, z, λ) = φ(x) + ψ(z) + λ (Ax + Bz c) + β 2 Ax + Bz c 2 λ: Lagrangan multplers; β > 0: penalty parameter

20 Alternatng Drecton Method of Multplers (ADMM) augmented Lagrangan mn x,z φ(x) + ψ(z) s.t. Ax + Bz = c L(x, z, λ) = φ(x) + ψ(z) + λ (Ax + Bz c) + β 2 Ax + Bz c 2 λ: Lagrangan multplers; β > 0: penalty parameter mnmze L n an alternatng manner x t+1 arg mn x L(x, z t, λ t ) z t+1 arg mn z L(x t+1, z, λ t ) λ t+1 λ t + β(ax t+1 + Bz t+1 c)

21 Alternatng Drecton Method of Multplers (ADMM) augmented Lagrangan mn x,z φ(x) + ψ(z) s.t. Ax + Bz = c L(x, z, λ) = φ(x) + ψ(z) + λ (Ax + Bz c) + β 2 Ax + Bz c 2 λ: Lagrangan multplers; β > 0: penalty parameter mnmze L n an alternatng manner x t+1 arg mn x L(x, z t, λ t ) z t+1 arg mn z L(x t+1, z, λ t ) λ t+1 λ t + β(ax t+1 + Bz t+1 c) updates of x and z are often easer

22 Consensus Optmzaton usng ADMM mn x1,...,x N,z f (x ) : x 1 = = x N = z augmented Lagrangan L = N =1 f (x )+λ (x z)+ β 2 x z 2 mnmze L n an alternatng manner x k+1 arg mn L(x, z k, λ k ), = 1,..., N x z k+1 arg mn z L(x k+1, z, λ k ) λ k+1 λ k + β(x k+1 z k+1 ), = 1,..., N

23 Consensus Optmzaton usng ADMM mn x1,...,x N,z f (x ) : x 1 = = x N = z augmented Lagrangan L = N =1 f (x )+λ (x z)+ β 2 x z 2 mnmze L n an alternatng manner x k+1 arg mn L(x, z k, λ k ), = 1,..., N x z k+1 arg mn z L(x k+1, z, λ k ) λ k+1 λ k + β(x k+1 z k+1 ), = 1,..., N dstrbuted consensus optmzaton each worker : updates x, λ master: updates z

24 Dstrbuted Implementaton worker x k+1 arg mn x f (x) + λ k x + β 2 x z k 2 }{{} dff w/ consensus update n parallel usng local data subset

25 Dstrbuted Implementaton worker x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 sends x k+1 to the master

26 Dstrbuted Implementaton worker master x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 z k+1 1 N N =1 x k+1 recompute the consensus

27 Dstrbuted Implementaton worker master x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 z k+1 1 N N =1 x k+1 dstrbutes z k+1 back to all the workers

28 Dstrbuted Implementaton worker master worker x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 z k+1 1 N N =1 x k+1 λ k+1 λ k + β(x k+1 z k+1 )

29 Problem updates have to be synchronzed master needs to wat for the x updates from all workers

30 Problem updates have to be synchronzed master needs to wat for the x updates from all workers workers have dfferent delays (processng speeds, network delays, etc.) has to wat for the slowest worker

31 Dstrbuted Asynchronous ADMM: Worker x update x k+1 arg mn x f (x) + λ k x + β 2 x zk 2

32 Dstrbuted Asynchronous ADMM: Worker x update x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 x k +1 arg mn x f (x) + λ k x + β 2 x zk 2 master and each worker keep ndependent clocks

33 Dstrbuted Asynchronous ADMM: Worker x update x k+1 arg mn x f (x) + λ k x + β 2 x zk 2 x k +1 arg mn x f (x) + λ k x + β 2 x zk 2 x k +1 = arg mn x f (x) + λ k x + β 2 x z 2 z : most recent z value receved from master n general, z s are dfferent (workers have dfferent speeds)

34 Dstrbuted Asynchronous ADMM: Worker x update λ update x k+1 x k +1 x k +1 arg mn x f (x) + λ k x + β 2 x zk 2 arg mn x f (x) + λ k x + β 2 x zk 2 x + β 2 x z 2 = arg mn x f (x) + λ k λ k+1 λ k + β(x k+1 z k+1 )

35 Dstrbuted Asynchronous ADMM: Worker x update λ update x k+1 x k +1 x k +1 arg mn x f (x) + λ k x + β 2 x zk 2 arg mn x f (x) + λ k x + β 2 x zk 2 x + β 2 x z 2 = arg mn x f (x) + λ k λ k+1 λ k + β(x k+1 z k+1 ) λ k +1 λ k + β(x k +1 z )

36 Dstrbuted Asynchronous ADMM: Master master needs to wat for all N worker updates z k+1 1 N N =1 x k+1

37 Dstrbuted Asynchronous ADMM: Master master needs to wat for all N worker updates z k+1 1 N N =1 x k+1

38 Dstrbuted Asynchronous ADMM: Master master only needs to wat for a mnmum of S worker updates synchronous ADMM: S = N z k+1 1 N N =1 ( ) x k β λk

39 Dstrbuted Asynchronous ADMM: Master master only needs to wat for a mnmum of S worker updates synchronous ADMM: S = N z k+1 1 N N =1 ( ) x k β λk S can be much smaller than N (partal barrer) z k+1 1 N N =1 (ˆx + 1 β ˆλ ) (ˆx, ˆλ ): most recent (x, λ ) receved from worker update s stll based on all the {(ˆx, ˆλ )} N =1 (some may not be very fresh)

40 Dstrbuted Asynchronous ADMM: Master master only needs to wat for a mnmum of S worker updates synchronous ADMM: S = N z k+1 1 ( ) N N =1 x k β λk S can be much smaller than N (partal barrer) z k+1 1 N N =1 (ˆx + 1 β ˆλ ) (ˆx, ˆλ ): most recent (x, λ ) receved from worker update s stll based on all the {(ˆx, ˆλ )} N =1 (some may not be very fresh) master sends the updated z k+1 back to (only) those workers whose (x, λ ) have been processed

41 Slow Workers some (ˆx, ˆλ ) s may not be very fresh faster workers more updates slow workers fewer updates, can be very outdated

42 Slow Workers some (ˆx, ˆλ ) s may not be very fresh faster workers more updates slow workers fewer updates, can be very outdated how to ensure suffcent freshness of all updates?

43 Slow Workers some (ˆx, ˆλ ) s may not be very fresh faster workers more updates slow workers fewer updates, can be very outdated how to ensure suffcent freshness of all updates?

44 Bounded Delay every worker has to be updated at least once every τ teratons (x, λ ) update can at most be τ clock cycles old τ = 1 reduces back to synchronous ADMM

45 Example (S = 2, τ = 3) delay counter: how old the worker update s

46 Example (S = 2, τ = 3) master teraton 1: 2 worker updates arrve

47 Example (S = 2, τ = 3) master teraton 2: 4 worker updates arrve

48 Example (S = 2, τ = 3) master teraton 3: 3 worker updates arrve

49 Example (S = 2, τ = 3) master wats... untl update from worker 3 arrves

50 Example (S = 2, τ = 3) master wats... untl update from worker 3 arrves partal barrer S speeds up the algorthm bounded delay τ guarantees convergence to globally optmal soluton

51 Convergence mn x1,...,x N,z f (x ) : x 1 = = x N = z Theorem after T master teratons, E f ( x ) f (x ) +λ ( x }{{} z ) }{{} dff w/ optmal obj dff w/ consensus varable { } Nτ β z 0 z T S β λ0 λ 2 x : average x for node durng the teratons; z: average z

52 Convergence mn x1,...,x N,z f (x ) : x 1 = = x N = z Theorem after T master teratons, E f ( x ) f (x ) +λ ( x }{{} z ) }{{} dff w/ optmal obj dff w/ consensus varable { } Nτ β z 0 z T S β λ0 λ 2 x : average x for node durng the teratons; z: average z workers and network are fast n each teraton, updates from all workers can arrve recover the O( 1 T ) convergence rate of standard ADMM

53 Experments: Structured Sparse Models real-world data are often hgh-dmensonal text bonformatcs hyperspectral mage

54 Experments: Structured Sparse Models real-world data are often hgh-dmensonal text bonformatcs hyperspectral mage Feature selecton va regularzed rsk mnmzaton mnmze loss l(w) + sparsty-nducng regularzer Ω(w)

55 Experments: Structured Sparse Models real-world data are often hgh-dmensonal text bonformatcs hyperspectral mage Feature selecton va regularzed rsk mnmzaton mnmze loss l(w) + sparsty-nducng regularzer Ω(w) lasso: Ω(w) = w 1 = w

56 Structured Sparsty features often have ntrnsc structures structured sparsty

57 Structured Sparsty features often have ntrnsc structures structured sparsty group lasso e.g., categorcal feature: represented by a group of bnary features ((0,0,1) for chnese; (0,1,0) french; (1,0,0) german) graph-guded fused lasso: groups can overlap Ω(w) = (,j) E w j x x j (encourages x x j for (, j) E)

58 Experment: Graph-Guded Fused Lasso 1 L mn x l (x) +λ (,j) E L w j x x j =1 }{{} logstc loss dgts 4 and 9 from the MNIST data set 1.6 mllon 784-dmensonal samples, dvded unformly among the 64 workers

59 Cluster Setup 18 computng nodes nterconnected wth a ggabt Ethernet each node 4 AMD Opteron 2216 (2.4GHz) processors 16GB memory one core for master and each worker process nter-processor communcaton: MPI

60 Convergence of the Objectve wth Tme async-admm faster than sync-admm (S = 64 or τ = 1)

61 Reduces Network Watng Tme Total tme (n seconds) Computatonal tme Network watng tme (64,1) (2,8) (2,16) (4,16) (4,32) (S,τ) combnaton dfferent (S, τ) combnatons have smlar computaton tme smaller S and/or larger τ allows for a hgher degree of asynchrony less tme on network watng

62 More Workers Total tme(n seconds) Computatonal tme(sync ADMM) Network watng tme(sync ADMM) Computatonal tme(async ADMM) Network watng tme(async ADMM) number of workers async-admm s agan faster than sync-admm more workers less tme to wat at least S worker updates sgnfcantly less network watng tme

63 Low-Rank Matrx Factorzaton ADMM can also be effcently used on nonconvex problems low-rank matrx factorzaton (e.g., n collaboratve flterng) mn L,R M LR 2 F }{{} dff w/ orgnal matrx + λ 1 L 2 F + λ 2 R 2 F }{{} standard regularzer

64 Low-Rank Matrx Factorzaton ADMM can also be effcently used on nonconvex problems low-rank matrx factorzaton (e.g., n collaboratve flterng) mn L,R M LR 2 F }{{} dff w/ orgnal matrx + λ 1 L 2 F + λ 2 R 2 F }{{} standard regularzer m = 10000, n = and rank= 100 partton M evenly across columns and then assgn to N = 64 workers

65 Convergence of Objectve wth Tme x 10 5 objectve value sync ADMM async ADMM tme(n seconds) async-admm converges faster than sync-admm

66 Reduces Network Watng Tme 2500 Computatonal tme Network watng tme Total tme(n seconds) (64,1) (2,32) (S,τ) combnaton

67 Problems wth Verson 1 n cloud computng envronments, computng nodes may be dynamcally added, removed and may also fal MPI has no mechansm to handle faults

68 Verson 2: Parameter Server parameter dvded nto shards and stored n multple masters each master stores a replca of ts neghbors keys master manager / worker manager: handle addton, removal, falure of masters and workers

69 Verson 2: Parameter Server parameter dvded nto shards and stored n multple masters each master stores a replca of ts neghbors keys master manager / worker manager: handle addton, removal, falure of masters and workers fault tolerance

70 Verson 2: Parameter Server parameter dvded nto shards and stored n multple masters each master stores a replca of ts neghbors keys master manager / worker manager: handle addton, removal, falure of masters and workers fault tolerance spreads master-worker communcaton more scalable

71 Prelmnary Results Google Cloud cluster wth 32 nodes, each wth 2 cores RCV1 data set; l 1 -regularzed logstc regresson 3.5 x Computatonal tme(sync ADMM) Network watng tme(sync ADMM) Computatonal tme(async ADMM) Network watng tme(async ADMM) 2.5 Total tme(n seconds) sync admm (1 master) async admm (1 master) sync admm (4 master) async admm (4 master) asynchronous algorthm spends less tme on network watng more masters further speedup

72 From BIG Data to bg data

73 From BIG Data to bg data learnng subproblem on each worker 1 n mn w l (w) + Ω(w) n }{{} =1 }{{} regularzer emprcal loss l (w): sample s contrbuton to the loss

74 From BIG Data to bg data learnng subproblem on each worker 1 n mn w l (w) + Ω(w) n }{{} =1 }{{} regularzer emprcal loss l (w): sample s contrbuton to the loss may have closed-form soluton may have to be solved teratvely (e.g., gradent descent, ADMM,...)

75 From BIG Data to bg data learnng subproblem on each worker 1 n mn w l (w) + Ω(w) n }{{} =1 }{{} regularzer emprcal loss l (w): sample s contrbuton to the loss may have closed-form soluton may have to be solved teratvely (e.g., gradent descent, ADMM,...) orgnal problem s BIG, subproblems can stll be bg

76 ADMM on Each Worker mn w,y 1 n n =1 l (w) + Ω(y) s.t. Aw + By = c

77 ADMM on Each Worker mn w,y 1 n n =1 l (w) + Ω(y) s.t. Aw + By = c ADMM s w update: w t+1 arg mn w 1 n n l (w) + β 2 Aw + By t c + α t 2 =1

78 ADMM on Each Worker mn w,y 1 n n =1 l (w) + Ω(y) s.t. Aw + By = c ADMM s w update: w t+1 arg mn w 1 n n l (w) + β 2 Aw + By t c + α t 2 =1 each teraton needs to vst all the samples (batch learnng) bg data become computatonally expensve

79 From Batch to Stochastc popularly used n gradent descent replace the gradent over the whole data set 1 n n =1 l (w) by the gradent l k(t) (w) at a sngle sample k(t) {1, 2,..., n}

80 From Batch to Stochastc popularly used n gradent descent replace the gradent over the whole data set 1 n n =1 l (w) by the gradent l k(t) (w) at a sngle sample k(t) {1, 2,..., n} or over a small mn-batch of samples 1 m S = m n) S l (w) (where

81 From Batch to Stochastc popularly used n gradent descent replace the gradent over the whole data set 1 n n =1 l (w) by the gradent l k(t) (w) at a sngle sample k(t) {1, 2,..., n} or over a small mn-batch of samples 1 m S = m n) S l (w) (where per-teraton complexty s much lower (O(n) vs O(1)) can scale to much larger data sets

82 Stochastc ADMM 1 w t+1 arg mn n w n =1 l (w) + β 2 Aw + By t c + α t 2

83 Stochastc ADMM 1 w t+1 arg mn n w n =1 l (w) + β 2 Aw + By t c + α t 2 w t+1 arg mn w learns from only one sample l k(t) (w) + β 2 Aw + By t c + α t 2 +R(w)

84 Stochastc ADMM 1 w t+1 arg mn n w n =1 l (w) + β 2 Aw + By t c + α t 2 w t+1 arg mn w l k(t) (w) + β 2 Aw + By t c + α t 2 +R(w) w t+1 arg mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w)

85 Stochastc ADMM 1 w t+1 arg mn n w n =1 l (w) + β 2 Aw + By t c + α t 2 w t+1 arg mn w l k(t) (w) + β 2 Aw + By t c + α t 2 +R(w) w t+1 arg mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w) slower convergence (T : number of teratons) batch ADMM stochastc ADMM convergence rate O(1/T ) O(1/ T ) teraton cost O(n) O(1)

86 Stochastc ADMM... w t+1 arg mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w) not usng the old gradents seems wasteful

87 Stochastc ADMM... w t+1 arg mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w) not usng the old gradents seems wasteful

88 Stochastc Average ADMM save gradents 1 l 1 (w 0 ),..., n l n (w 0 ) at teraton t randomly choose a sample k(t) {1,..., n} update the gradent for k(t) only: k(t) l k(t) (w t )

89 Stochastc Average ADMM save gradents 1 l 1 (w 0 ),..., n l n (w 0 ) at teraton t randomly choose a sample k(t) {1,..., n} update the gradent for k(t) only: k(t) l k(t) (w t ) mn w l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w)

90 Stochastc Average ADMM save gradents 1 l 1 (w 0 ),..., n l n (w 0 ) at teraton t mn w randomly choose a sample k(t) {1,..., n} update the gradent for k(t) only: k(t) l k(t) (w t ) l k(t) (w t ) (w w t ) + β 2 Aw + By t c + α t 2 + R(w) mn w 1 n n =1 (w w t ) + β 2 Aw + By t c + α t 2 + R(w) (cf. mn w 1 n n =1 l (w w t ) + β 2 Aw + By t c + α t 2 + R(w)) out of the n gradents only one of them (whch corresponds to sample k(t)) s based on the current w t ) the others are prevously-stored gradent values

91 Convergence Theorem mn x,y Φ(x, y) φ(x) + ψ(y) : Ax + By = c E Φ( x T, ȳ T ) Φ(x, y ) +γ A x }{{} T + Bȳ T c }{{} dff w/ optmal obj constrant volaton 1 { ( )} γ nl x x y y H 2T y + 2β β 2 + α 0 2 x T : average of x s; ȳ T : average of y s

92 Convergence Theorem mn x,y Φ(x, y) φ(x) + ψ(y) : Ax + By = c E Φ( x T, ȳ T ) Φ(x, y ) +γ A x }{{} T + Bȳ T c }{{} dff w/ optmal obj constrant volaton 1 { ( )} γ nl x x y y H 2T y + 2β β 2 + α 0 2 x T : average of x s; ȳ T : average of y s batch stochastc stochastc avg convergence rate O(1/T ) O(1/ T ) O(1/T ) teraton cost O(n) O(1) O(1)

93 Convergence Theorem mn x,y Φ(x, y) φ(x) + ψ(y) : Ax + By = c E Φ( x T, ȳ T ) Φ(x, y ) +γ A x }{{} T + Bȳ T c }{{} dff w/ optmal obj constrant volaton 1 { ( )} γ nl x x y y H 2T y + 2β β 2 + α 0 2 x T : average of x s; ȳ T : average of y s batch stochastc stochastc avg convergence rate O(1/T ) O(1/ T ) O(1/T ) teraton cost O(n) O(1) O(1) caveat: need to store the gradent values extra O(n) space

94 Experment: Graph-Guded Fused Lasso forest cover type: 581,012 samples; drug: 12,678 objectve value vs tme speed: stochastc average > stochastc > batch

95 Experment: Graph-Guded Fused Lasso... objectve value vs number of passes over data batch: one effectve pass = one teraton stochastc: one effectve pass = n teratons speed: stochastc average > stochastc > batch

96 Concluson BIG data sets dstrbuted processng: asynchronous ADMM partal barrer and bounded delay to control asynchrony

97 Concluson BIG data sets dstrbuted processng: asynchronous ADMM partal barrer and bounded delay to control asynchrony bg data sets (processng on each worker) batch (better) stochastc recyclng old gradents

98 Concluson BIG data sets dstrbuted processng: asynchronous ADMM partal barrer and bounded delay to control asynchrony bg data sets (processng on each worker) batch (better) stochastc recyclng old gradents smple; has convergence guarantees; fast (R. Zhang, J.T. Kwok. ICML-2014) (L.W. Zhong, J.T. Kwok. ICML-2014)

99 Shameless Plug I am lookng for research students research assstants / postdocs / programmers on a ML/DM project related to fnancal data If nterested, please emal me (jamesk@cse.ust.hk) Thank You

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning

A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning A Delay-tolerant Proxmal-Gradent Algorthm for Dstrbuted Learnng Konstantn Mshchenko Franck Iutzeler Jérôme Malck Massh Amn KAUST Unv. Grenoble Alpes CNRS and Unv. Grenoble Alpes Unv. Grenoble Alpes ICML

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

CSE 546 Midterm Exam, Fall 2014(with Solution)

CSE 546 Midterm Exam, Fall 2014(with Solution) CSE 546 Mdterm Exam, Fall 014(wth Soluton) 1. Personal nfo: Name: UW NetID: Student ID:. There should be 14 numbered pages n ths exam (ncludng ths cover sheet). 3. You can use any materal you brought:

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Toward Understanding Heterogeneity in Computing

Toward Understanding Heterogeneity in Computing Toward Understandng Heterogenety n Computng Arnold L. Rosenberg Ron C. Chang Department of Electrcal and Computer Engneerng Colorado State Unversty Fort Collns, CO, USA {rsnbrg, ron.chang@colostate.edu}

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Asynchronous Distributed ADMM for Large-Scale Optimization Part II: Linear Convergence Analysis and Numerical Performance

Asynchronous Distributed ADMM for Large-Scale Optimization Part II: Linear Convergence Analysis and Numerical Performance Industral and Manufacturng Systems Engneerng Publcatons Industral and Manufacturng Systems Engneerng 016 Asynchronous Dstrbuted ADMM for Large-Scale Optmzaton Part II: Lnear Convergence Analyss and Numercal

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 4: Regularization, Sparsity & Lasso Machne Learnng Data Mnng CS/CS/EE 155 Lecture 4: Regularzaton, Sparsty Lasso 1 Recap: Complete Ppelne S = {(x, y )} Tranng Data f (x, b) = T x b Model Class(es) L(a, b) = (a b) 2 Loss Functon,b L( y, f

More information

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition Sngle-Faclty Schedulng over Long Tme Horzons by Logc-based Benders Decomposton Elvn Coban and J. N. Hooker Tepper School of Busness, Carnege Mellon Unversty ecoban@andrew.cmu.edu, john@hooker.tepper.cmu.edu

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1] DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm

More information

Support Vector Machines

Support Vector Machines Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x n class

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud Resource Allocaton wth a Budget Constrant for Computng Independent Tasks n the Cloud Wemng Sh and Bo Hong School of Electrcal and Computer Engneerng Georga Insttute of Technology, USA 2nd IEEE Internatonal

More information

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k) ISSN 1749-3889 (prnt), 1749-3897 (onlne) Internatonal Journal of Nonlnear Scence Vol.17(2014) No.2,pp.188-192 Modfed Block Jacob-Davdson Method for Solvng Large Sparse Egenproblems Hongy Mao, College of

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear

More information

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Solutions to exam in SF1811 Optimization, Jan 14, 2015 Solutons to exam n SF8 Optmzaton, Jan 4, 25 3 3 O------O -4 \ / \ / The network: \/ where all lnks go from left to rght. /\ / \ / \ 6 O------O -5 2 4.(a) Let x = ( x 3, x 4, x 23, x 24 ) T, where the varable

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far

More information

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS HCMC Unversty of Pedagogy Thong Nguyen Huu et al. A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS Thong Nguyen Huu and Hao Tran Van Department of mathematcs-nformaton,

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

On a Parallel Implementation of the One-Sided Block Jacobi SVD Algorithm

On a Parallel Implementation of the One-Sided Block Jacobi SVD Algorithm Jacob SVD Gabrel Okša formulaton One-Sded Block-Jacob Algorthm Acceleratng Parallelzaton Conclusons On a Parallel Implementaton of the One-Sded Block Jacob SVD Algorthm Gabrel Okša 1, Martn Bečka, 1 Marán

More information

Large-Margin HMM Estimation for Speech Recognition

Large-Margin HMM Estimation for Speech Recognition Large-Margn HMM Estmaton for Speech Recognton Prof. Hu Jang Department of Computer Scence and Engneerng York Unversty, Toronto, Ont. M3J 1P3, CANADA Emal: hj@cs.yorku.ca Ths s a jont work wth Chao-Jun

More information

An Admission Control Algorithm in Cloud Computing Systems

An Admission Control Algorithm in Cloud Computing Systems An Admsson Control Algorthm n Cloud Computng Systems Authors: Frank Yeong-Sung Ln Department of Informaton Management Natonal Tawan Unversty Tape, Tawan, R.O.C. ysln@m.ntu.edu.tw Yngje Lan Management Scence

More information

The Study of Teaching-learning-based Optimization Algorithm

The Study of Teaching-learning-based Optimization Algorithm Advanced Scence and Technology Letters Vol. (AST 06), pp.05- http://dx.do.org/0.57/astl.06. The Study of Teachng-learnng-based Optmzaton Algorthm u Sun, Yan fu, Lele Kong, Haolang Q,, Helongang Insttute

More information

A Link Transmission Model for Air Traffic Flow Prediction and Optimization

A Link Transmission Model for Air Traffic Flow Prediction and Optimization School of Aeronautcs and Astronautcs A Ln Transmsson Model for Ar Traffc Flow Predcton and Optmzaton Y Cao and Dengfeng Sun School of Aeronautcs and Astronautcs Purdue Unversty cao20@purdue.edu Aerospace

More information

Distributed design of optimal structured feedback gains

Distributed design of optimal structured feedback gains 217 IEEE 56th Annual Conference on Decson and Control (CDC) December 12-15, 217, Melbourne, Australa Dstrbuted desgn of optmal structured feedback gans Sepdeh Hassan-Moghaddam and Mhalo R. Jovanovć Abstract

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

Highly Efficient Gradient Computation for Density-Constrained Analytical Placement Methods

Highly Efficient Gradient Computation for Density-Constrained Analytical Placement Methods Hghly Effcent Gradent Computaton for Densty-Constraned Analytcal Placement Methods Jason Cong and Guoje Luo UCLA Computer Scence Department { cong, gluo } @ cs.ucla.edu Ths wor s partally supported by

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence) /24/27 Prevew Fbonacc Sequence Longest Common Subsequence Dynamc programmng s a method for solvng complex problems by breakng them down nto smpler sub-problems. It s applcable to problems exhbtng the propertes

More information

Combining Constraint Programming and Integer Programming

Combining Constraint Programming and Integer Programming Combnng Constrant Programmng and Integer Programmng GLOBAL CONSTRAINT OPTIMIZATION COMPONENT Specal Purpose Algorthm mn c T x +(x- 0 ) x( + ()) =1 x( - ()) =1 FILTERING ALGORITHM COST-BASED FILTERING ALGORITHM

More information

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester 0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #

More information

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009 College of Computer & Informaton Scence Fall 2009 Northeastern Unversty 20 October 2009 CS7880: Algorthmc Power Tools Scrbe: Jan Wen and Laura Poplawsk Lecture Outlne: Prmal-dual schema Network Desgn:

More information

Evaluation of classifiers MLPs

Evaluation of classifiers MLPs Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label

More information

A Rigorous Framework for Robust Data Assimilation

A Rigorous Framework for Robust Data Assimilation A Rgorous Framework for Robust Data Assmlaton Adran Sandu 1 wth Vshwas Rao 1, Elas D. Nno 1, and Mchael Ng 2 1 Computatonal Scence Laboratory (CSL) Department of Computer Scence Vrgna Tech 2 Hong Kong

More information

Introduction to the R Statistical Computing Environment R Programming

Introduction to the R Statistical Computing Environment R Programming Introducton to the R Statstcal Computng Envronment R Programmng John Fox McMaster Unversty ICPSR 2018 John Fox (McMaster Unversty) R Programmng ICPSR 2018 1 / 14 Programmng Bascs Topcs Functon defnton

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION 1 2 MULTIPLIERLESS FILTER DESIGN Realzaton of flters wthout full-fledged multplers Some sldes based on support materal by W. Wolf for hs book Modern VLSI Desgn, 3 rd edton. Partly based on followng papers:

More information

Clock Synchronization in WSN: from Traditional Estimation Theory to Distributed Signal Processing

Clock Synchronization in WSN: from Traditional Estimation Theory to Distributed Signal Processing Clock Synchronzaton n WS: from Tradtonal Estmaton Theory to Dstrbuted Sgnal Processng Yk-Chung WU The Unversty of Hong Kong Emal: ycwu@eee.hku.hk, Webpage: www.eee.hku.hk/~ycwu Applcatons requre clock

More information

Support Vector Machines

Support Vector Machines CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at

More information

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material

Natural Images, Gaussian Mixtures and Dead Leaves Supplementary Material Natural Images, Gaussan Mxtures and Dead Leaves Supplementary Materal Danel Zoran Interdscplnary Center for Neural Computaton Hebrew Unversty of Jerusalem Israel http://www.cs.huj.ac.l/ danez Yar Wess

More information

Support Vector Machines

Support Vector Machines /14/018 Separatng boundary, defned by w Support Vector Machnes CISC 5800 Professor Danel Leeds Separatng hyperplane splts class 0 and class 1 Plane s defned by lne w perpendcular to plan Is data pont x

More information

De-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG

De-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG 6th Internatonal Conference on Mechatroncs, Materals, Botechnology and Envronment (ICMMBE 6) De-nosng Method Based on Kernel Adaptve Flterng for elemetry Vbraton Sgnal of the Vehcle est Kejun ZEG PLA 955

More information

Coarse-Grain MTCMOS Sleep

Coarse-Grain MTCMOS Sleep Coarse-Gran MTCMOS Sleep Transstor Szng Usng Delay Budgetng Ehsan Pakbazna and Massoud Pedram Unversty of Southern Calforna Dept. of Electrcal Engneerng DATE-08 Munch, Germany Leakage n CMOS Technology

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

Sparse Methods. Sanjiv Kumar, Google Research, NY. EECS-6898, Columbia University - Fall, 2010

Sparse Methods. Sanjiv Kumar, Google Research, NY. EECS-6898, Columbia University - Fall, 2010 Sparse Methos Sanv Kumar, Google Research, NY EECS-6898, Columba Unversty - Fall, 00 Sanv Kumar /3/00 EECS6898 Large Scale Machne Learnng What s Sparsty? When a ata tem such as a vector or a matrx has

More information

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003

Tornado and Luby Transform Codes. Ashish Khisti Presentation October 22, 2003 Tornado and Luby Transform Codes Ashsh Khst 6.454 Presentaton October 22, 2003 Background: Erasure Channel Elas[956] studed the Erasure Channel β x x β β x 2 m x 2 k? Capacty of Noseless Erasure Channel

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

CS-433: Simulation and Modeling Modeling and Probability Review

CS-433: Simulation and Modeling Modeling and Probability Review CS-433: Smulaton and Modelng Modelng and Probablty Revew Exercse 1. (Probablty of Smple Events) Exercse 1.1 The owner of a camera shop receves a shpment of fve cameras from a camera manufacturer. Unknown

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

CSCI B609: Foundations of Data Science

CSCI B609: Foundations of Data Science CSCI B609: Foundatons of Data Scence Lecture 13/14: Gradent Descent, Boostng and Learnng from Experts Sldes at http://grgory.us/data-scence-class.html Grgory Yaroslavtsev http://grgory.us Constraned Convex

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

DFT with Planewaves pseudopotential accuracy (LDA, PBE) Fast time to solution 1 step in minutes (not hours!!!) to be useful for MD

DFT with Planewaves pseudopotential accuracy (LDA, PBE) Fast time to solution 1 step in minutes (not hours!!!) to be useful for MD LLNL-PRES-673679 Ths work was performed under the auspces of the U.S. Department of Energy by under contract DE-AC52-07NA27344. Lawrence Lvermore Natonal Securty, LLC Sequoa, IBM BGQ, 1,572,864 cores O(N)

More information

Deep Learning. Boyang Albert Li, Jie Jay Tan

Deep Learning. Boyang Albert Li, Jie Jay Tan Deep Learnng Boyang Albert L, Je Jay Tan An Unrelated Vdeo A bcycle controller learned usng NEAT (Stanley) What do you mean, deep? Shallow Hdden Markov models ANNs wth one hdden layer Manually selected

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

Hidden Markov Models

Hidden Markov Models CM229S: Machne Learnng for Bonformatcs Lecture 12-05/05/2016 Hdden Markov Models Lecturer: Srram Sankararaman Scrbe: Akshay Dattatray Shnde Edted by: TBD 1 Introducton For a drected graph G we can wrte

More information

MULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS

MULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS 014 IEEE Internatonal Conference on Acoustc, Speech and Sgnal Processng ICASSP MULTI-AGENT DISTRIBUTED LARGE-SCALE OPTIMIZATION BY INEXACT CONSENSUS ALTERNATING DIRECTION METHOD OF MULTIPLIERS Tsung-Hu

More information

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer

More information

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov 9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar

More information

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS

A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS Shervn Haamn A FAST HEURISTIC FOR TASKS ASSIGNMENT IN MANYCORE SYSTEMS WITH VOLTAGE-FREQUENCY ISLANDS INTRODUCTION Increasng computatons n applcatons has led to faster processng. o Use more cores n a chp

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analyss Handout 4s Luca Trevsan September 5, 07 Summary of Lecture 4 In whch we ntroduce semdefnte programmng and apply t to Max Cut. Semdefnte Programmng Recall that

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression 11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING

More information

An Integrated OR/CP Method for Planning and Scheduling

An Integrated OR/CP Method for Planning and Scheduling An Integrated OR/CP Method for Plannng and Schedulng John Hooer Carnege Mellon Unversty IT Unversty of Copenhagen June 2005 The Problem Allocate tass to facltes. Schedule tass assgned to each faclty. Subect

More information

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks Internet Engneerng Jacek Mazurkewcz, PhD Softcomputng Part 3: Recurrent Artfcal Neural Networks Self-Organsng Artfcal Neural Networks Recurrent Artfcal Neural Networks Feedback sgnals between neurons Dynamc

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Two-level Parallel Computation for Approximate Inverse with AISM Method

Two-level Parallel Computation for Approximate Inverse with AISM Method Vol 49 No 6 2164 2168 (June 2008) AISM 2 1 2 3 2 AISM SGI Orgn 2400 MPI PE16 PE 13672 Two-level Parallel Computaton for Approxmate Inverse wth AISM Method Lnje Zhang, 1 Kentaro Morya 2 and Takash Nodera

More information

Integrated approach in solving parallel machine scheduling and location (ScheLoc) problem

Integrated approach in solving parallel machine scheduling and location (ScheLoc) problem Internatonal Journal of Industral Engneerng Computatons 7 (2016) 573 584 Contents lsts avalable at GrowngScence Internatonal Journal of Industral Engneerng Computatons homepage: www.growngscence.com/ec

More information

Lecture 21: Numerical methods for pricing American type derivatives

Lecture 21: Numerical methods for pricing American type derivatives Lecture 21: Numercal methods for prcng Amercan type dervatves Xaoguang Wang STAT 598W Aprl 10th, 2014 (STAT 598W) Lecture 21 1 / 26 Outlne 1 Fnte Dfference Method Explct Method Penalty Method (STAT 598W)

More information

Nonlinear Classifiers II

Nonlinear Classifiers II Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural

More information

Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs

Interconnect Optimization for Deep-Submicron and Giga-Hertz ICs Interconnect Optmzaton for Deep-Submcron and Gga-Hertz ICs Le He http://cadlab.cs.ucla.edu/~hele UCLA Computer Scence Department Los Angeles, CA 90095 Outlne Background and overvew LR-based STIS optmzaton

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1 School of Computer Scence 10-601 Introducton to Machne Learnng Lnear Regresson Readngs: Bshop, 3.1 Matt Gormle Lecture 5 September 14, 016 1 Homework : Remnders Extenson: due Frda (9/16) at 5:30pm Rectaton

More information

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF

8 : Learning in Fully Observed Markov Networks. 1 Why We Need to Learn Undirected Graphical Models. 2 Structural Learning for Completely Observed MRF 10-708: Probablstc Graphcal Models 10-708, Sprng 2014 8 : Learnng n Fully Observed Markov Networks Lecturer: Erc P. Xng Scrbes: Meng Song, L Zhou 1 Why We Need to Learn Undrected Graphcal Models In the

More information

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling

Real-Time Systems. Multiprocessor scheduling. Multiprocessor scheduling. Multiprocessor scheduling Real-Tme Systems Multprocessor schedulng Specfcaton Implementaton Verfcaton Multprocessor schedulng -- -- Global schedulng How are tasks assgned to processors? Statc assgnment The processor(s) used for

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information