Chapter 11: Neural Networks

Size: px
Start display at page:

Download "Chapter 11: Neural Networks"

Transcription

1 Chapter 11: Neural Netwrks DD3364 December 16, 2012

2 Prjectin Pursuit Regressin

3 Prjectin Pursuit Regressin mdel: Prjectin Pursuit Regressin f(x) = M g m (wmx) t i=1 where X R p and have targets Y R. Additive mdel in the derived features V m = w t mx. g m (w t mx) the ridge functin in R p - nly varies in directin f w m. PPR mdel can apprximate any cntinuus functin in R p if M arbitrarily large and apprpriate chice f g m s. = PPR mdel is a universal apprximatr.

4 Example Ridge Functins 390 Neural Netwrks g(v ) g(v ) X 2 X 1 X 2 FIGURE Perspective plts f tw ridge functins. (Left:) g(v )=1/[1 + exp( 5(V 0.5))], wherev =(X 1 + X 2)/ 2. (Right:) g(v )=(V +0.1) sin(1/(v/3+0.1)), wherev = X 1. Left graph X 1 1 g(v ) = 1 + exp{ 5(V 0.5)}, V = (X 1 + X 2 ) 2 mated alng with the directins ω m using sme flexible smthing methd (see belw). The functin g m (ωmx) T is called a ridge functin in IR p. It varies nly in the directin defined by the vectr ω m. The scalar variable V m = ωmx T is the prjectin f X nt the unit vectr ω m, and we seek ω m s that the mdel fits well, hence the name prjectin ( pursuit. ) Figure 11.1 shws sme examples f ridge functins. In the example 1 n the left ω =(1/ 2)(1, 1) T, s that the g(v functin ) = (V nly+ varies 0.1) insin the directin X 1 + X, 2. InV the = example X 1 n the right, ω =(1, 0). V/3 +.1 The PPR mdel (11.1) is very general, since the peratin f frming Right graph

5 Hw t fit a PPR mdel? Have training data {(x i, y i )} n i=1. Seek t minimize [ n y i i=1 M g m (wmx t i ) m=1 ver functins g m and directins w m, m = 1,..., M. Hw?? General apprach - Build mdel in a frward stage-wise manner. Add a pair (w m, g m ) at each stage. - At each stage iterate * Fix w m and update g m * Fix g m and update w m ] 2

6 Hw t fit a PPR mdel? Have training data {(x i, y i )} n i=1. Seek t minimize [ n y i i=1 M g m (wmx t i ) m=1 ver functins g m and directins w m, m = 1,..., M. Hw?? General apprach - Build mdel in a frward stage-wise manner. Add a pair (w m, g m ) at each stage. - At each stage iterate * Fix w m and update g m * Fix g m and update w m ] 2

7 Hw t fit a PPR mdel? Fix w and update g Must impse cmplexity cnstraints n g m t avid verfitting. Fix g and update w i=1 g(w t x i ) g(w t ldx i ) + g (w t ldx i )(w w ld ) t x i t give n [ yi g(w t x ] n [( 2 i) g (wldx t i) 2 wldx t i + yi ) ] 2 g(wt ldx i) g (wld t xi) w t x i i=1 T minimize the rhs: - Perfrm least squares regressin (n intercept (bias) term). - Input w t x i has target w t ld x i + y i g(w t ldx i ) g (w t ldx i ) - Weight errrs with g (w t ldx i ) 2 - This prduces the updated cefficient vectr w new.

8 Hw t fit a PPR mdel? Iterate these tw steps until cnvergence Fix w and update g Fix g and update w

9 Neural Netwrks

10 Single hidden layer, feed-frward netwrk 11.3 Neural Netwrks 393 Y Y Y 1 2 K Z 1 Z 2 Z 3 Z m M X 1 X 2 X 3 X p-1 XpX

11 K-classificatin Input: X = (X 1, X 2,..., X p ) and say it belngs t class k Ideal utput: Y 1,..., Y K where { 0 if i k Y i = 1 if i = k The 2 layer neural netwrk estimates the utputs by - deriving features Z 1,..., Z M - hidden units - frm linear cmbinatins f X - the target Y k is mdeled as a functin f linear cmbinatins f Z 1,..., Z M. In maths...

12 K-classificatin Cmputatin f the kth utput where Y k = f k (X) = g k (T 1,..., T K ) T k = β k0 + M m=1 β kmz m, Z m = σ ( α m0 + p l=1 α ) mlx l the activatin functin σ can be defined σ(v) = exp{ v} and the utput functin g k sigmid functin g k (T 1,..., T K ) = exp{t k } K l=1 exp{t l} sftmax functin

13 The Activatin Functin Shwn is σ(sv) fr s =.5, 1, Neural Netwrks 1/(1 + e v ) FIGURE Plt f the sigmid functin σ(v) =1/(1+exp( v)) (red curve), cmmnly used in the hidden layer f a neural netwrk. Included are σ(sv) fr s = 1 (blue curve) and s = 10 (purple curve). The scale parameter i=1 w klx l s cntrls 2 the activatin rate, and we can see that large s amunts t a hard activatin at v =0.Ntethatσ(s(v v0)) shifts the activatin threshld frm 0 t v0. If σ is the identity = each T k = w k0 + p Can think f neural netwrks as a nn-linear generalizatin f the linear mdel. Ntice that if σ is the identity functin, then the entire mdel cllapses t a linear mdel in the inputs. Hence a neural netwrk can be thught f as a nnlinear generalizatin f the linear mdel, bth fr regressin and classificatin. By intrducing the nnlinear transfrmatin σ, it greatly enlarges the class l=1 f linear α ) mlx mdels. l In Figure 11.3 we see that the rate f activatin f the sigmid depends n the nrm f α m, and if α m is very small, the unit will indeed be perating in the linear part f its activatin functin. Ntice als that the neural netwrk mdel with ne hidden layer has Rate f activatin f the sigmid depends n the nrm f α m where Z m = σ ( α m0 + p When σ m small = unit perates in the linear part f its activatin functin. v

14 Neural Netwrk is a universal apprximatr A NN with ne hidden units, can apprximate arbitrarily well any functinal cntinuus mapping frm ne finite dimensinal space t anther, prvided number f hidden units is sufficiently large.

15 Fitting Neural Netwrks

16 Errr measure This 2-layer neural netwrk has unknwn parameters, θ, {α m0, α m1,..., α mp ; m = 1,..., M} M(p + 1) weights {β k0, β k1,..., β km ; k = 1,..., K} K(M + 1) weights Aim: Estimate parameters, θ, frm labeled training data: {x i, g i } n i=1 with each x i R p, g i {1,..., K} D this by minimizing a measure-f-fit such as r n K R(θ) = (y ik f k (x i )) 2 sum-f-squared errr i=1 k=1 n K R(θ) = y ik lg f k (x i ) i=1 k=1 crss-entrpy errr

17 Minimizing R(θ) Typically dn t want ˆθ = arg min θ R(θ) = an verfit slutin. Sme frm f regularizatin is required - will cme back t this. Generic apprach t minimizing R(θ) is by gradient descent a.k.a. back-prpagatin. This amunts t implementatin f the chain rule fr differentiatin.

18 Back-prpagatin fr squared-errr lss Let z i = (z 1i,..., z Mi ) and Have z im = σ(α m0 + α t mx i ) where α m = (α m1,..., α mp ) with derivatives R i(θ) β km R i(θ) α ml = R(θ) = n R i = i=1 n i=1 k=1 K (y ik f k (x i )) 2 = 2 (y ik f k (x i)) g k(β 10 + β t 1z i,..., β K0 + β t Kz i) z im = δ ki z im K δ ki β km σ (α m0 + αmx t i) x il k=1 = x il σ (α m0 + α t mx i) K δ ki β km = x il s mi k=1

19 Back-prpagatin fr squared-errr lss Given these derivatives update at the (r + 1)st iteratin n β (r+1) km = β(r) km γ R i (θ) r β km α (r+1) ml = α (r) ml γ r where γ r is the learning rate. i=1 n i=1 R i (θ) α ml βkm =β (r) km αml =α (r) ml The quantities δ ki and s mi are errrs frm the current mdel at the utput and hidden layer units respective R i (θ) β km = δ ki z im, Remember the errrs satisfy R i (θ) α ml s mi = σ (α m0 + α t mx i ) = x il s mi K δ ki β km k=1

20 Back-prpagatin update equatins The updates can be implemented in a tw-pass algrithm: Frward pass: current weights are fixed and cmpute ˆf k (x i ) Backward pass: Cmpute errrs δ ki and then back-prpagated with t give the errrs s mi. s mi = σ (α m0 + α t mx i ) K δ ki β km k=1 Use bth sets f errrs t cmpute the gradients fr the updates.

21 Details f back prpagatin Can d updates with batch learning. Parameters updated by summing ver all training examples. Can d updates with nline learning. Parameters updated after each training example. = can train netwrk with very large trained datasets. Training epch ne sweep thrugh the entire training set. Learning rate: γ r - Batch learning - usually taken t be cnstant and can be ptimized by a line search. - Online learning - γ r 0 as r Nte: Back-prp is very slw.

22 Sme Issues in Training Neural Netwrks

23 Training Neural Netwrks Training a neural netwrks is nn-trivial! Why? - Mdel is verparametrized - Optimizatin prblem is nncnvex and unstable Bk summarizes sme f the imprtant issues...

24 Starting Values If weights are near zer = σ( ) is rughly linear = neural netwrk cllapses int an apprx linear mdel. Usually start with randm values clse t zer. = mdel starts ut linear and becmes nn-linear as weights increase. Use f exact zer weights gives zer derivatives, perfect symmetry and the algrithm never mves. Starting with large weights ften leads t pr slutins.

25 Cmbating Overfitting Neural netwrks will verfit at the glbal minimum f R. Therefre different appraches t regularizatin have been adpted: Early stpping - Only train the mdel fr a while. - Stp befre cnverging t a minimum f R(θ). - As initial weights are clse t 0 = initially have a highly regularized linear slutin = early stpping shrinks the mdel twards a linear mdel. - Can use a validatin dataset t determine when t stp. Weight decay - Add a penalty t the errr functin R(θ) + λj(θ), where J(θ) = km β2 km + ml α2 ml and λ 0 is a tuning parameter. - Larger values f λ tend t shrink weights twards zer.

26 Cmbating Overfitting Neural netwrks will verfit at the glbal minimum f R. Therefre different appraches t regularizatin have been adpted: Early stpping - Only train the mdel fr a while. - Stp befre cnverging t a minimum f R(θ). - As initial weights are clse t 0 = initially have a highly regularized linear slutin = early stpping shrinks the mdel twards a linear mdel. - Can use a validatin dataset t determine when t stp. Weight decay - Add a penalty t the errr functin R(θ) + λj(θ), where J(θ) = km β2 km + ml α2 ml and λ 0 is a tuning parameter. - Larger values f λ tend t shrink weights twards zer.

27 11.5 Sme Issues in Training Neural Netwrks Training Errr: Test Errr: Bayes Errr: Effect f Weight Decay Neural Netwrk - 10 Units, N Weight Decay Training Errr: Test Errr: Bayes Errr: Neural Netwrk - 10 Units, Weight Decay= Training Errr: Test Errr: Bayes Errr: Neural Netwrk - 10 Units, Weight Decay=0.02 FIGURE A neural netwrk n the mixture example f Chapter 2. The upper panel uses n weight decay, and verfits the training data. The lwer panel uses weight decay, and achieves clse t the Bayes errr rate (brken purple bundary). Bth use the sftmax activatin functin and crss-entrpy errr Bth use sftmax g k and crss-entrpy errr. Bayes ptimal decisin bundary is the purple curve

28 Weights learnt 400 Neural Netwrks N weight decay Weight decay y 2 y 2 y 1 y x 2 x 2 x 1 x z1 z2 z3 z1 z5 z6 z7 z8 z9 z10 z1 z2 z3 z1 z5 z6 z7 z8 z9 z10 z1 z2 z3 z1 z5 z6 z7 z8 z9 z10 z1 z2 z3 z1 z5 z6 z7 z8 z9 z10 FIGURE Heat maps f the estimated weights frm the training f neural netwrks Bth frm use sftmax Figure g k The anddisplay crss-entrpy ranges frm errr. bright green (negative) t bright red (psitive). The display ranges frm bright green (negative) t bright red (psitive). slutin. At the utset it is best t standardize all inputs t have mean zer and standard deviatin ne. This ensures all inputs are treated equally in the regularizatin prcess, and allws ne t chse a meaningful range fr the randm starting weights. With standardized inputs, it is typical t take

29 Cmbating Overfitting Scaling the Inputs - Scale f inputs determines scale f bttm layer weights. - At beginning best t standardize all inputs t have mean 0 and standard deviatin 1 - Ensures all inputs are treated equally in the regularizatin prcess. Number f Hidden Units and Layers - Generally better t have t many than t few hidden units. - Fewer hidden units = less flexibility in the mdel - Prper regularizatin shuld shrink unnecessary hidden unit weights t zer. - Multiple hidden layers allws cnstructin f hierarchical features at different reslutins.

30 Cmbating Overfitting Multiple Minima - R(θ) nn-cnvex = final slutin depends n initial weights. - Optin 1: Learn different netwrks fr different randm initial weights. Chse the netwrk with lwest penalized errr. - Optin 2: Learn different netwrks fr different randm initial weights. Fr a test example average the predictin f each netwrk. - Optin 3: (bagging) Learn different netwrks frm randm subsets f the training data. Fr a test example average the predictin f each netwrk.

31 Example: Simulated Data

32 Example 1: Underlying mdel Generated data frm this additive mdel Y = σ(a t 1X) + σ(a t 2X) + ɛ where X = (X 1, X 2 ) t with X i N (0, 1) fr i = 1, 2 a t 1 = (3, 3), a t 2 = (3, 3), ɛ N (0, σ 2 ) and σ 2 is chsen s the s-n-r is 4 that is Var{f(X)} = 4σ 2 n train = 100 and n test = 10000

33 Example 1: Neural netwrk fit Fit neural netwrk with weight decay and varius number f hidden units. Recrded the average test errr fr 10 randm starting weights. 402 Neural Netwrks Zer hidden unit mdel refers t linear least squares regressin. Sum f Sigmids Radial Test Errr Test Errr Number f Hidden Units Number f Hidden Units Test errr quted FIGURE relative tbxplts the Bayes f test errr, errr, λ = fr simulated data example,

34 Example 1: Effect f weight decay n test errr 11.6 Example: Simulated Data 403 N Weight Decay Weight Decay=0.1 Test Errr Test Errr Number f Hidden Units Number f Hidden Units FIGURE Bxplts f test errr, fr simulated data example, relative t the Bayes errr. True functin is a sum f tw sigmids. The test errr is displayed fr ten different starting weights, fr a single hidden layer neural netwrk with the number units as indicated. The tw panels represent n weight decay (left) and strng weight decay λ = 0.1 (right).

35 FIGURE Bxplts f test errr, fr simulated data example, relative t the Bayes errr. True functin is a sum f tw sigmids. The test errr is displayed fr Example ten different starting 1: Fixed weights, fr number a single hidden flayer hidden neural netwrk units, with vary λ the number units as indicated. The tw panels represent n weight decay (left) and strng weight decay λ = 0.1 (right). Sum f Sigmids, 10 Hidden Unit Mdel Test Errr Weight Decay Parameter FIGURE Bxplts f test errr, fr simulated data example. True functin is a sum f tw sigmids. The test errr is displayed fr ten different starting weights, fr a single hidden layer neural netwrk with ten hidden units and weight decay parameter value as indicated.

36 Example 2: Underlying mdel Generated data frm this additive mdel 10 Y = φ(x j ) + ɛ j=1 where X = (X 1,..., X 10 ) t with X i N (0, 1) fr i = 1, 2 φ(v) = exp{ v 2 /2}/ 2π, ɛ N (0, σ 2 ) and σ 2 is chsen s the s-n-r is 4 that is Var{f(X)} = 4σ 2 n train = 100 and n test = 10000

37 Example 2: Neural netwrk des nt prduce a gd fit Fit neural netwrk with weight decay and varius number f hidden units. Recrded the average test errr fr 10 randm starting weights. Neural Netwrks Zer hidden unit mdel refers t linear least squares regressin. Sum f Sigmids Radial Test Errr Number f Hidden Units Number f Hidden Units RE Bxplts Test f test errr errr, quted fr simulated relativedata t the example, Bayesrelative errr, λ t=

38 Cnvlutinal Neural Netwrks

39 Neural Nets and image recgnitin tasks A black bx neural netwrk applied t pixel intensity data des nt perfrm well fr image pattern recgnitin task. Why? - Because the pixel representatin f the images lack certain invariances (such as small rtatins f the image). - Huge number f parameters Slutin Intrduce cnstraints n the netwrk t allw fr mre cmplex cnnectivity but fewer parameters. Prime Example: Cnvlutinal Neural Netwrks

40 Neural Nets and image recgnitin tasks A black bx neural netwrk applied t pixel intensity data des nt perfrm well fr image pattern recgnitin task. Why? - Because the pixel representatin f the images lack certain invariances (such as small rtatins f the image). - Huge number f parameters Slutin Intrduce cnstraints n the netwrk t allw fr mre cmplex cnnectivity but fewer parameters. Prime Example: Cnvlutinal Neural Netwrks

41 Prperty 1: Cnvlutinal Neural Netwrks Sparse Cnnectivity CNNs explit spatially lcal crrelatin by enfrcing a lcal cnnectivity pattern between units f adjacent layers. The hidden units in the m-th layer are cnnected t a lcal subset f units in the (m 1)-th layer, which have spatially cntiguus receptive fields. Example f a cnvlutinal layer

42 Prperty 2: Cnvlutinal Neural Netwrks Shared Weights In CNNs, each sparse filter is replicated acrss the image. These replicated units frm a feature map. Example f a cnvlutinal layer The k-th feature map h k, whse filters are defined by the weights W k and bias b k, is (with tanh used fr nn-linearities): h k ij = tanh{(w k x) ij + b k }

43 Prperty 3: Cnvlutinal Neural Netwrks Max Pling Max-pling partitins the input image int a set f nn-verlapping rectangles and, fr each such sub-regin, utputs the maximum value. Reduces the cmputatinal cmplexity fr upper layers. Prvides a frm f translatin invariance.

44 The Full Mdel Sparse, cnvlutinal layers and max-pling are at the heart f the LeNet family f mdels. Graphical depictin f a LeNet mdel.

45 Example: ZIP Cde Data

46 The data 404 Neural Netwrks Each FIGURE image is11.9. a 16 Examples 16 8-bit f training grayscale cases representatin frm ZIP cdefdata. a handwritten Each image digit. is a bit grayscale representatin f a handwritten digit. Fr the experiments in the bk: n decay parameter, and hence crss-validatin train = 320 and n f this test = 160. parameter wuld be preferred.

47 Five netwrks fit t the data Net-1 N hidden layer, equivalent t multinmial lgistic regressin. Net-2 One hidden layer, 12 hidden units fully cnnected. Net-3 Tw hidden layers lcally cnnected. - 1st hidden layer (8 8 array), each unit takes inputs frm a 3 3 patch f the input layer after 10 subsampling by nd hidden layer, inputs are frm a 5 5 patch f the input layer after subsampling by Example: ZIP Cde Data x4 8x8 - Lcal cnnectivity makes each 16x16unit respnsible fr extracting lcal features frm the layer belw. Net-1 Net-2 16x16 16x16 Net-3 Lcal Cnnectivity

48 Five netwrks fit t the data Net-4 (cnvlutinal neural netwrk) Tw hidden layers, lcally cnnected with weight sharing. - 1st hidden layer has tw 8 8 arrays. Each unit takes inputs frm a 3 3 patch f the input layer after subsampling by 2. The units in the feature map share the same set f nine weights (but have their wn bias parameter). - 2nd hidden layer, inputs are frm a vlume f the tw input layers after subsampling by 2. It has n weight sharing. Net-1 16x x4 8x8x2 - Lcal cnnectivity makes each unit respnsible fr extracting lcal features frm the layer belw. Net-4 16x16 Share FIGURE Architecture f phasize the effects. The examp hand-drawn digits, and then ge

49 Five netwrks fit t the 10 data Net-5 (cnvlutinal neural netwrk) 12 Tw hidden layers, lcally cnnected, tw levels f weight sharing. - 1st hidden layer has tw 8 8 arrays. Net-1 Each unit Net-2 takes inputs frm a 3 3 patch f the input layer after subsampling by 2. The units in the 10 feature map share the same set f nine weights (but have their wn bias parameter). 4x Example: ZIP Cde Data 10 4x x16 8x8 16x16 16x16 Net-3 Lcal Cnnectivity 10 4x4x4-2nd hidden layer has fur 4 4 feature maps. Inputs are frm a vlume f8x8x2 the tw input layers after subsampling by 2. The units in the feature map share the same set f 50 weights (but have their wn bias parameter). 16x16 16x16 8x8x2 - Lcal cnnectivity makes eachnet-4 unit respnsible Shared Weights Net-5 fr extracting lcal features frm the layer belw. FIGURE Architecture f the five netwrks used in the ZIP cde exa phasize the effects. The examples were btained by scanning sme a hand-drawn digits, and then generating additinal images by randm

50 reduces cnsiderably the ttal number f weights. With many mre hidden units than Net-2, Net-3 has fewer links and hence weights (1226 vs. 3214), and achieves similar perfrmance. Net-4 and Net-5 have lcal cnnectivity with shared weights. All units in a lcal feature map perfrm the same peratin n different parts f the image, achieved by sharing the same weights. The first hidden layer f Net- Number f parameters 11.7 Example: ZIP Cde Data 407 TABLE Test set perfrmance f five different neural netwrks n a handwritten digit classificatin example (Le Cun, 1989). Netwrk Architecture Links Weights % Crrect Net-1: Single layer netwrk % Net-2: Tw layer netwrk % Net-3: Lcally cnnected % Net-4: Cnstrained netwrk % Net-5: Cnstrained netwrk %

51 Results 406 Neural Netwrks 100 Net-4 Net-5 % Crrect n Test Data Net-3 Net-2 Net Training Epchs FIGURE Test perfrmance curves, as a functin f the number f training epchs, fr the five netwrks f Table 11.1 applied t the ZIP cde data. (Le Cun, 1989) The netwrks all have sigmidal utput units, and were all fit with the sum-f-squares errr functin.

52 Results 406 Neural Netwrks 100 Net-4 Net-5 % Crrect n Test Data Net-3 Net-2 Net Training Epchs FIGURE Test perfrmance curves, as a functin f the number f training epchs, fr the five netwrks f Table 11.1 applied t the ZIP cde data. (Le Cun, 1989) The netwrks all have sigmidal utput units, and were all fit with the sum-f-squares errr functin.

53 Overview f results n full MNIST dataset n train = 60, 000 and n test = 10, 000.

54 Overview f results n full MNIST dataset n train = 60, 000 and n test = 10, 000.

55 Overview f results n full MNIST dataset n train = 60, 000 and n test = 10, 000.

56 Overview f results n full MNIST dataset n train = 60, 000 and n test = 10, 000.

57 Bayesian Neural Nets & the NIPS 2003 Challenge

58 TABLE NIPS 2003 challenge NIPS data2003 sets. TheClassificatin clumn labeled p is the Challenge number f features. Fr the Drthea dataset the features are binary. N tr, N val and N te are the number f training, validatin and test cases, respectively Dataset Dmain Feature p Percent N tr N val N te Type Prbes Arcene Mass spectrmetry Dense 10, Dexter Text classificatin Sparse 20, Drthea Drug discvery Sparse 100, Gisette Digit recgnitin Dense Madeln Artificial Dense in three f the five datasets, and were 5th and 7th n the remaining tw datasets. Each dataset represents a tw-class classificatin prblems In their winning entries, Neal and Zhang (2006) used a series f preprcessing Emphasis feature-selectin extractin steps, fllwed by Bayesian neural netwrks, Dirichlet diffusin trees, and cmbinatins f these methds. Here we fcus nly Artificial n the Bayesian prbes neural (nise netwrk features) apprach, added andttry thet data. discern which aspects f their apprach were imprtant fr its success. We rerun their Winning prgrams methd: and cmpare Neal and thezhang results(2006) t bsted used aneural seriesnetwrks f and bsted trees, and ther related methds. - preprcessing feature-selectin steps, - fllwed by Bayesian neural netwrks, Bayes, Bsting and Bagging - Dirichlet diffusin trees, and Let us first review briefly the Bayesian apprach t inference and its applicatin t neural netwrks. Given training data X tr, y tr, we assume a - cmbinatins f these methds. sam-

59 TABLE NIPS 2003 challenge NIPS data2003 sets. TheClassificatin clumn labeled p is the Challenge number f features. Fr the Drthea dataset the features are binary. N tr, N val and N te are the number f training, validatin and test cases, respectively Dataset Dmain Feature p Percent N tr N val N te Type Prbes Arcene Mass spectrmetry Dense 10, Dexter Text classificatin Sparse 20, Drthea Drug discvery Sparse 100, Gisette Digit recgnitin Dense Madeln Artificial Dense in three f the five datasets, and were 5th and 7th n the remaining tw datasets. Each dataset represents a tw-class classificatin prblems In their winning entries, Neal and Zhang (2006) used a series f preprcessing Emphasis feature-selectin extractin steps, fllwed by Bayesian neural netwrks, Dirichlet diffusin trees, and cmbinatins f these methds. Here we fcus nly Artificial n the Bayesian prbes neural (nise netwrk features) apprach, added andttry thet data. discern which aspects f their apprach were imprtant fr its success. We rerun their Winning prgrams methd: and cmpare Neal and thezhang results(2006) t bsted used aneural seriesnetwrks f and bsted trees, and ther related methds. - preprcessing feature-selectin steps, - fllwed by Bayesian neural netwrks, Bayes, Bsting and Bagging - Dirichlet diffusin trees, and Let us first review briefly the Bayesian apprach t inference and its applicatin t neural netwrks. Given training data X tr, y tr, we assume a - cmbinatins f these methds. sam-

60 Overview f Neal & Zang s apprach Have training data X tr, y tr Build a tw-hidden layer netwrk with parameters θ. Output ndes mdel P (Y = 1 X, θ) and P (Y = +1 X, θ). Given a prir distributin p(θ) then p(θ X tr, y tr ) = p(θ) P (y tr X tr, θ) p(θ) P (ytr X tr, θ) dθ Fr a test case X new the predictive distributin fr Y new is P (Y new X new, X tr, y tr ) = P (Y new X new, θ) p(θ X tr, y tr ) dθ MCMC methds are used t sample frm p(θ X tr, y tr ) and hence t apprximate the integral.

61 Overview f Neal & Zang s apprach Als tried different frms f pre-prcessing the features univariate screening using t-tests, autmatic relevance determinatin Ptentially 3 main features imprtant fr its success: the feature selectin and pre-prcessing, the neural netwrk mdel, and the Bayesian inference fr the mdel using MCMC. Authrs f bk wanted t understand the reasns fr the success f the Bayesian methd...

62 Authrs pinin pwer f mdern Bayesian methds des nt lie in their use as a frmal inference prcedure; mst peple wuld nt believe that the prirs in a high-dimensinal, cmplex neural netwrk mdel are actually crrect. Rather the Bayesian/MCMC apprach gives an efficient way f sampling the relevant parts f mdel space, and then averaging the predictins fr the high-prbability mdels.

63 Bagging & Bsting als average mdels Bagging - Perturbs the data in an i.i.d fashin and then - re-estimates the mdel t give a new set f mdel parameters. - Output is a simple average f the mdel predictins frm different bagged samples Bsting - Fits a mdel that is additive in the mdels f each individual base learner - Base learners fit using nn i.i.d. samples Bayesian Apprach - Fixes the data and - perturbs the parameters accrding t current estimate f the pste- rir distributin

64 Methds cmpared Bayesian Neural Nets (2 hidden layers f 20 and 8 units) Bsted trees Bsted Neural Nets Randm frests Bagged Neural Netwrks

65 TABLE NIPS 2003 challenge data sets. The clumn labeled p is the number Results f features. Fr the Drthea dataset the features are binary. N tr, N val and N te are the number f training, validatin and test cases, respectively Dataset Dmain Feature p Percent N tr N val N te Type Prbes Arcene Mass spectrmetry Dense 10, Dexter Text classificatin Sparse 20, Drthea Drug discvery Sparse 100, Gisette Digit recgnitin 11.9 BayesianDense Neural Nets5000 and the NIPS Challenge Madeln Artificial Dense in three f the five datasets, and were 5th and 7th n the remaining tw Bayesian neural nets datasets. bsted trees bsted neural nets In their winning entries, randm frests Neal and Zhang (2006) used a series f preprcessing feature-selectin steps, fllwed by Bayesian neural netwrks, bagged neural netwrks Dirichlet diffusin trees, and cmbinatins f these methds. Here we fcus nly n the Bayesian neural netwrk apprach, and try t discern which aspects f their apprach were imprtant fr its success. We rerun their prgrams and cmpare the results t bsted neural netwrks and bsted trees, and ther related methds. Test Errr (%) Univariate Screened Features Bayes, Bsting and Bagging Let us first review briefly the Bayesian apprach t inference and its applicatin t neural netwrks. Given training data X tr, y tr, we assume a Arcene Dexter Drthea Gisette Madeln Arcene Dexter Drthea Gisette Madeln sam- Test Errr (%) ARD Reduced Features

66 410 Neural Netwrks Results TABLE NIPS 2003 challenge data sets. The clumn labeled p is the number f features. Fr the Drthea dataset the features are binary. N tr, N val and N te are the number f training, validatin and test cases, respectively Dataset Dmain Feature p Percent N tr N val N te Type Prbes Arcene Mass spectrmetry Dense 10, Dexter Text classificatin Sparse 20, Drthea 414 Drug Neural discvery Netwrks Sparse 100, Gisette Digit recgnitin Dense Madeln Artificial Dense TABLE Perfrmance f different methds. Values are average rank f test errr acrss the five prblems (lw is gd), and mean cmputatin time and standard errr f the mean, in minutes. Screened Features ARD Reduced Features Methd Average Average Average Average Rank Time Rank Time Bayesian neural netwrks (138) (186) Bsted trees (2.5) (32.4) Bsted neural netwrks (8.6) (33.5) Randm frests (1.7) (9.3) Bagged neural netwrks (1.1) (4.4) in three f the five datasets, and were 5th and 7th n the remaining tw datasets. In their winning entries, Neal and Zhang (2006) used a series f preprcessing feature-selectin steps, fllwed by Bayesian neural netwrks, Dirichlet diffusin trees, and cmbinatins f these methds. Here we fcus nly n the Bayesian neural netwrk apprach, and try t discern which aspects f their apprach were imprtant fr its success. We rerun their prgrams and cmpare the results t bsted neural netwrks and bsted and linear cmbinatins f features wrk better. Hwever the impressive trees, and ther related methds. perfrmance f randm frests is at dds with this explanatin, and came as a surprise t us. Since the reduced feature sets cme frm the Bayesian neural netwrk apprach, nly the methds that use the screened features are legitimate, Bayes, Bsting and Bagging

67 Why Bayesian Neural Netwrks wrks best? Authrs cnjecture that reasns are perhaps the neural netwrk mdel is well suited t these five prblems the MCMC apprach prvides an efficient way f explring the imprtant part f the parameter space, and then averaging the resulting mdels accrding t their quality.

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

Chapter 7. Neural Networks

Chapter 7. Neural Networks Chapter 7. Neural Netwrks Wei Pan Divisin f Bistatistics, Schl f Public Health, University f Minnesta, Minneaplis, MN 55455 Email: weip@bistat.umn.edu PubH 7475/8475 c Wei Pan Intrductin Chapter 11. nly

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must

the results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Chapter 15 & 16: Random Forests & Ensemble Learning

Chapter 15 & 16: Random Forests & Ensemble Learning Chapter 15 & 16: Randm Frests & Ensemble Learning DD3364 Nvember 27, 2012 Ty Prblem fr Bsted Tree Bsted Tree Example Estimate this functin with a sum f trees with 9-terminal ndes by minimizing the sum

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

Artificial Neural Networks MLP, Backpropagation

Artificial Neural Networks MLP, Backpropagation Artificial Neural Netwrks MLP, Backprpagatin 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university

More information

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract

More information

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1 Data Mining: Cncepts and Techniques Classificatin and Predictin Chapter 6.4-6 February 8, 2007 CSE-4412: Data Mining 1 Chapter 6 Classificatin and Predictin 1. What is classificatin? What is predictin?

More information

Physical Layer: Outline

Physical Layer: Outline 18-: Intrductin t Telecmmunicatin Netwrks Lectures : Physical Layer Peter Steenkiste Spring 01 www.cs.cmu.edu/~prs/nets-ece Physical Layer: Outline Digital Representatin f Infrmatin Characterizatin f Cmmunicatin

More information

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems.

Building to Transformations on Coordinate Axis Grade 5: Geometry Graph points on the coordinate plane to solve real-world and mathematical problems. Building t Transfrmatins n Crdinate Axis Grade 5: Gemetry Graph pints n the crdinate plane t slve real-wrld and mathematical prblems. 5.G.1. Use a pair f perpendicular number lines, called axes, t define

More information

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons

Slide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large

More information

Linear Classification

Linear Classification Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

Feedforward Neural Networks

Feedforward Neural Networks Feedfrward Neural Netwrks Yagmur Gizem Cinar, Eric Gaussier AMA, LIG, Univ. Grenble Alpes 17 March 2017 Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March 2017 1 / 42 Reference Bk Deep

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

Lead/Lag Compensator Frequency Domain Properties and Design Methods

Lead/Lag Compensator Frequency Domain Properties and Design Methods Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants 12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal

More information

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics

Module 3: Gaussian Process Parameter Estimation, Prediction Uncertainty, and Diagnostics Mdule 3: Gaussian Prcess Parameter Estimatin, Predictin Uncertainty, and Diagnstics Jerme Sacks and William J Welch Natinal Institute f Statistical Sciences and University f British Clumbia Adapted frm

More information

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y ) (Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well

More information

19 Better Neural Network Training; Convolutional Neural Networks

19 Better Neural Network Training; Convolutional Neural Networks 108 Jnathan Richard Shewchuk 19 Better Neural Netwrk Training; Cnvlutinal Neural Netwrks [I m ging t talk abut a bunch f heuristics that make gradient descent faster, r make it find better lcal minima,

More information

COMP9444 Neural Networks and Deep Learning 3. Backpropagation

COMP9444 Neural Networks and Deep Learning 3. Backpropagation COMP9444 Neural Netwrks and Deep Learning 3. Backprpagatin Tetbk, Sectins 4.3, 5.2, 6.5.2 COMP9444 17s2 Backprpagatin 1 Outline Supervised Learning Ockham s Razr (5.2) Multi-Layer Netwrks Gradient Descent

More information

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001)

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001) CN700 Additive Mdels and Trees Chapter 9: Hastie et al. (2001) Madhusudana Shashanka Department f Cgnitive and Neural Systems Bstn University CN700 - Additive Mdels and Trees March 02, 2004 p.1/34 Overview

More information

Churn Prediction using Dynamic RFM-Augmented node2vec

Churn Prediction using Dynamic RFM-Augmented node2vec Churn Predictin using Dynamic RFM-Augmented nde2vec Sandra Mitrvić, Jchen de Weerdt, Bart Baesens & Wilfried Lemahieu Department f Decisin Sciences and Infrmatin Management, KU Leuven 18 September 2017,

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD

Particle Size Distributions from SANS Data Using the Maximum Entropy Method. By J. A. POTTON, G. J. DANIELL AND B. D. RAINFORD 3 J. Appl. Cryst. (1988). 21,3-8 Particle Size Distributins frm SANS Data Using the Maximum Entrpy Methd By J. A. PTTN, G. J. DANIELL AND B. D. RAINFRD Physics Department, The University, Suthamptn S9

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

A Scalable Recurrent Neural Network Framework for Model-free

A Scalable Recurrent Neural Network Framework for Model-free A Scalable Recurrent Neural Netwrk Framewrk fr Mdel-free POMDPs April 3, 2007 Zhenzhen Liu, Itamar Elhanany Machine Intelligence Lab Department f Electrical and Cmputer Engineering The University f Tennessee

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information

Figure 1a. A planar mechanism.

Figure 1a. A planar mechanism. ME 5 - Machine Design I Fall Semester 0 Name f Student Lab Sectin Number EXAM. OPEN BOOK AND CLOSED NOTES. Mnday, September rd, 0 Write n ne side nly f the paper prvided fr yur slutins. Where necessary,

More information

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design

More information

Differentiation Applications 1: Related Rates

Differentiation Applications 1: Related Rates Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm

More information

Activity Guide Loops and Random Numbers

Activity Guide Loops and Random Numbers Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a

More information

Linear programming III

Linear programming III Linear prgramming III Review 1/33 What have cvered in previus tw classes LP prblem setup: linear bjective functin, linear cnstraints. exist extreme pint ptimal slutin. Simplex methd: g thrugh extreme pint

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling? CS4445 ata Mining and Kwledge iscery in atabases. B Term 2014 Exam 1 Nember 24, 2014 Prf. Carlina Ruiz epartment f Cmputer Science Wrcester Plytechnic Institute NAME: Prf. Ruiz Prblem I: Prblem II: Prblem

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

Early detection of mining truck failure by modelling its operation with neural networks classification algorithms

Early detection of mining truck failure by modelling its operation with neural networks classification algorithms RU, Rand GOLOSINSKI, T.S. Early detectin f mining truck failure by mdelling its peratin with neural netwrks classificatin algrithms. Applicatin f Cmputers and Operatins Research ill the Minerals Industries,

More information

Lyapunov Stability Stability of Equilibrium Points

Lyapunov Stability Stability of Equilibrium Points Lyapunv Stability Stability f Equilibrium Pints 1. Stability f Equilibrium Pints - Definitins In this sectin we cnsider n-th rder nnlinear time varying cntinuus time (C) systems f the frm x = f ( t, x),

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is

I. Analytical Potential and Field of a Uniform Rod. V E d. The definition of electric potential difference is Length L>>a,b,c Phys 232 Lab 4 Ch 17 Electric Ptential Difference Materials: whitebards & pens, cmputers with VPythn, pwer supply & cables, multimeter, crkbard, thumbtacks, individual prbes and jined prbes,

More information

Public Key Cryptography. Tim van der Horst & Kent Seamons

Public Key Cryptography. Tim van der Horst & Kent Seamons Public Key Cryptgraphy Tim van der Hrst & Kent Seamns Last Updated: Oct 5, 2017 Asymmetric Encryptin Why Public Key Crypt is Cl Has a linear slutin t the key distributin prblem Symmetric crypt has an expnential

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

You need to be able to define the following terms and answer basic questions about them:

You need to be able to define the following terms and answer basic questions about them: CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f

More information

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y= Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0

More information

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning

Admin. MDP Search Trees. Optimal Quantities. Reinforcement Learning Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected

More information

ENG2410 Digital Design Arithmetic Circuits

ENG2410 Digital Design Arithmetic Circuits ENG24 Digital Design Arithmetic Circuits Fall 27 S. Areibi Schl f Engineering University f Guelph Recall: Arithmetic -- additin Binary additin is similar t decimal arithmetic N carries + + Remember: +

More information

Synchronous Motor V-Curves

Synchronous Motor V-Curves Synchrnus Mtr V-Curves 1 Synchrnus Mtr V-Curves Intrductin Synchrnus mtrs are used in applicatins such as textile mills where cnstant speed peratin is critical. Mst small synchrnus mtrs cntain squirrel

More information

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification Agenda Artificial Intelligence and its applicatins Lecture 6 Supervised Learning Prfessr Daniel Yeung danyeung@ieee.rg Dr. Patrick Chan patrickchan@ieee.rg Suth China University f Technlgy, China Learning

More information

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA Mental Experiment regarding 1D randm walk Cnsider a cntainer f gas in thermal

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

Data Mining with Linear Discriminants. Exercise: Business Intelligence (Part 6) Summer Term 2014 Stefan Feuerriegel

Data Mining with Linear Discriminants. Exercise: Business Intelligence (Part 6) Summer Term 2014 Stefan Feuerriegel Data Mining with Linear Discriminants Exercise: Business Intelligence (Part 6) Summer Term 2014 Stefan Feuerriegel Tday s Lecture Objectives 1 Recgnizing the ideas f artificial neural netwrks and their

More information

Review Problems 3. Four FIR Filter Types

Review Problems 3. Four FIR Filter Types Review Prblems 3 Fur FIR Filter Types Fur types f FIR linear phase digital filters have cefficients h(n fr 0 n M. They are defined as fllws: Type I: h(n = h(m-n and M even. Type II: h(n = h(m-n and M dd.

More information

Time, Synchronization, and Wireless Sensor Networks

Time, Synchronization, and Wireless Sensor Networks Time, Synchrnizatin, and Wireless Sensr Netwrks Part II Ted Herman University f Iwa Ted Herman/March 2005 1 Presentatin: Part II metrics and techniques single-hp beacns reginal time znes ruting-structure

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

EDA Engineering Design & Analysis Ltd

EDA Engineering Design & Analysis Ltd EDA Engineering Design & Analysis Ltd THE FINITE ELEMENT METHOD A shrt tutrial giving an verview f the histry, thery and applicatin f the finite element methd. Intrductin Value f FEM Applicatins Elements

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

Statistical Learning. 2.1 What Is Statistical Learning?

Statistical Learning. 2.1 What Is Statistical Learning? 2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t

More information

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern

Methods for Determination of Mean Speckle Size in Simulated Speckle Pattern 0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall

Stats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall Stats 415 - Classificatin Ji Zhu, Michigan Statistics 1 Classificatin Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Classificatin Ji Zhu, Michigan Statistics 2 Examples f Classificatin

More information

SPH3U1 Lesson 06 Kinematics

SPH3U1 Lesson 06 Kinematics PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.

More information

Contents. This is page i Printer: Opaque this

Contents. This is page i Printer: Opaque this Cntents This is page i Printer: Opaque this Supprt Vectr Machines and Flexible Discriminants. Intrductin............. The Supprt Vectr Classifier.... Cmputing the Supprt Vectr Classifier........ Mixture

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:

and the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are: Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track

More information

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Interference is when two (or more) sets of waves meet and combine to produce a new pattern. Interference Interference is when tw (r mre) sets f waves meet and cmbine t prduce a new pattern. This pattern can vary depending n the riginal wave directin, wavelength, amplitude, etc. The tw mst extreme

More information

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp

1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*

More information

Reinforcement Learning" CMPSCI 383 Nov 29, 2011!

Reinforcement Learning CMPSCI 383 Nov 29, 2011! Reinfrcement Learning" CMPSCI 383 Nv 29, 2011! 1 Tdayʼs lecture" Review f Chapter 17: Making Cmple Decisins! Sequential decisin prblems! The mtivatin and advantages f reinfrcement learning.! Passive learning!

More information

Department of Electrical Engineering, University of Waterloo. Introduction

Department of Electrical Engineering, University of Waterloo. Introduction Sectin 4: Sequential Circuits Majr Tpics Types f sequential circuits Flip-flps Analysis f clcked sequential circuits Mre and Mealy machines Design f clcked sequential circuits State transitin design methd

More information