VI. Backpropagation Neural Networks (BPNN)

Size: px
Start display at page:

Download "VI. Backpropagation Neural Networks (BPNN)"

Transcription

1 VI. Backpropagation Neural Networks (BPNN) Review of Adaline Newton s ethod Backpropagation algorith definition derivative coputation weight/bias coputation function approxiation exaple network generalization issues potential probles with the BPNN oentu filter iteration schees review Generalization regularization early stopping Ipleentation issues References: [Hagan], [Mathworks] NN FAQ at ftp://ftp.sas.co/pub/neural/faq.htl Pattern Classification, Duda & Hart, Wiley, 00 07/0/06 EC4460.SuFy06/MPF

2 Recall the Adaline (LMS) network: Input Linear Neuron R p R x W S x R b S x a=wp+ b a n S S x S x Restriction to the Adaline (LMS)? Linear activation function Proble solved with Adaline/LMS: Given a set {p i, t i } define the weights and bias which iniize the Mean Square error ( ) E t a x a w p = z = b = x T z ( T ( ) [ ] ) T T F x = E e = E t x z = x Rx x h+ c 07/0/06 EC4460.SuFy06/MPF

3 Practical application: solving F(x) requires coputing R and h, and R - alternative: solve proble iteratively using steepest descent only xk+ = x k α F(x k) = x α( ez ), e = t a k k LMS iteration: pick x(0) a=x kt z k e=t-a x k+ =x k +αez k k=k+ Extensions ==> Multilayer perceptron 07/0/06 EC4460.SuFy06/MPF 3

4 Why use ulti-layer structures? subclasses Class Class K = 9 subclasses, M = classes 07/0/06 EC4460.SuFy06/MPF 4

5 Exaple: pattern classification: the XOR gate 0 0 p =, t = 0 ; p =, t = ; 0 p =, t 3 3 = ; p =, t 4 4 = 0 0 Can it be solved with a single layer perceptron? 07/0/06 EC4460.SuFy06/MPF 5

6 NN block diagra: 07/0/06 EC4460.SuFy06/MPF 6

7 Note: Final network space partitioning varies as a function of the nuber of neurons in the hidden layer 07/0/06 EC4460.SuFy06/MPF 7

8 Exaple: b y b y y 3 b 3 Assue b = 0.5, b =, b 3 = Plot the decision boundaries obtained assuing HL is used as activation functions Derive the weight atrix and bias vector used for this network Design the NN second layer (following given in-class guidelines, i.e., identify weight atrix and bias 07/0/06 EC4460.SuFy06/MPF 8

9 07/0/06 EC4460.SuFy06/MPF 9

10 07/0/06 EC4460.SuFy06/MPF 0

11 Exaple: ultilayer perceptron (classification) Assue dark = 07/0/06 EC4460.SuFy06/MPF

12 07/0/06 EC4460.SuFy06/MPF

13 Exaple: function approxiation Input Log-Sigoid Layer Linear Layer p w, Σ b Σ b n a n a w, w, w, Σ n a b a = logsig (W p + b ) a = purelin (W a + b ) f ( n) ( ) = + e f n = n ω ω n = 0ω = 0, b = 0,, b,, = 0 = ω = b = 0 3 Exaple Function Approxiation Network 3 0 b 0 () w 3 (), a () w (), b p Noinal Response of Network of Figure Above Effect of Paraeter Changes on Network Response 07/0/06 EC4460.SuFy06/MPF 3

14 Backpropagation algorith: Input First Layer Second Layer Third Layer S x R W S f x S x S b W S f 3 x S 3 x S b 3 f 3 p a a W a3 n n n 3 R x b S x S x S x R S S S 3 a = f (W p + b ) a = f (W a + b ) a 3 = f 3 (W 3 a + b 3 ) a 3 = f 3 (W 3 f (W f (W p + b ) + b ) + b 3 ) S x S 3 x S 3 x S 3 x Goal: given a set of {p i, t i }; find the weights and bias which iniize the ean square error (perforance surface) ( ) = F x E[( t a) ] Discard the expected operation T ( ) = ( ) ( ) F x t a t a k k k k k 07/0/06 EC4460.SuFy06/MPF 4

15 For a -layer only and purelin activation function T k+ = k + α k k w w e p b = b + α e k+ k k 07/0/06 EC4460.SuFy06/MPF 5

16 How to copute the derivatives? use SD Recall: F ( x) F w ( k+ ) = w ( k) α F bi ( k+ ) = bi ( k) α bi i, j i, j wi, j Note: F(x) ay not be expressed directly in ters of w, w, etc i, j i, j We need to use the chain rule ( ) ( ) ( ) df n w df n dn w = dw dn dw Exaple: ( ) f n = e 3n n= 5w+ 35 ( w+ ) ( ( )) = e f n w 07/0/06 EC4460.SuFy06/MPF 6

17 F F n w n w n w i = i, j i i, j F F n b n b i = i i i n = w a + b n b i j i, j j i i i, j i i = a = j w n w n i, j th : layer weight associated to j th input th neuron th i : associated with i neuron i, j th : layer, associated to i th neuron and i th input th i : associated with i neuron j F w = w α a b i, j, k+ i, j, k j ni F = b α s i : sensitivity of F(.) to ik, + ik, ni changes in i th neuron eleent at layer 07/0/06 EC4460.SuFy06/MPF 7

18 Expressing Weight/Bias in a Matrix For Matrix For F w ( k+ ) = w ( k) α a i, j i, j j ni F b ( k+ ) = b ( k) α n i i i W k+ = W α k w w w W w w w R = R Associated with neuron () () w w w w F input, neuron, w, k+ = w, k α a j = i= n F = = = input, neuron, w, k+ w, k α a i, j n F = = input, neuron, w, k+ = w, k α a i, j n F F a a w w w w n n = α w w F F k+ k a a n n F n = α a a F T ( a n ) s 07/0/06 EC4460.SuFy06/MPF 8

19 F F a a w w w w n n = α w F w k+ k a n F n = α a a F T ( a n ) s k+ k k+ = k α W = W α s a b b s ( ) t 07/0/06 EC4460.SuFy06/MPF 9

20 Need to use the chain rule F n i Will involve ters of the for: F n + j n n + j i Define the atrix: n n + +, + n n n = + + n n n, n n + i j + + wi, a b i nj n + = n [ ] a = a n j + j i, j j j n j + ( ) i, j ( ) ( j) ( ) ( ) ( ) ( ) f ( n ) 0 0 ( ) + + ( ) + ( ) n w f n w f n = ( ) ( ) n + + w f n w f n j j a = w a = f n = w f n w w w w f n + ( ) = W F n ( ) 07/0/06 EC4460.SuFy06/MPF 0

21 s F F n n F n = = n : first neuron n : second neuron Next, apply the chain rule for vectors: F F n F n n n n n n + + = () Sensitivity of F to change in the st eleent of the net input at layer. F F n F n n n n n n + + = () Sensitivity of F to change in the st eleent of the net input at layer. () & () + + n n F + n n n s = + + n F n + n n n + T + T n + F n F s + = = n n n n. ( ) T. ( ) T + ( ) ( )( ) s = W F n s = F n W s 07/0/06 EC4460.SuFy06/MPF

22 07/0/06 EC4460.SuFy06/MPF We need to copute s M ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) s 0 0 M M M j j M j j M M j j M M M j j M M M M M M M M M M M F n F n t a n t a n t a a a n t a a a n f n t a n f n t a n f n n = = = = = = ( ) ( )( ) ( ) t-a M M M M M M t a t a f n n s F n = Note: a =f(n)

23 Suary: a o = p + + Start + ( + a = f W a + b ) a = a M Copute M ( )( ) M ( M ) s = F n t-a + ( )( ) ( ) T + s = F n W s ; = M,, Update the Weights = b ( k+ ) = b ( k) α s ( + ) = ( ) α ( ) W k W k s a T Note: We will need derivatives for all activation functions 07/0/06 EC4460.SuFy06/MPF 3

24 Exaple: Function Approxiation k= π g( p) = + sin pk 4 t p - + e -- Network a 07/0/06 EC4460.SuFy06/MPF 4

25 p -- Network a Input Log-Sigoid Layer Linear Layer p w n a, Σ b n a Σ b w, w, w, n a Σ b a = logsig (W p + b ) a = purelin (W a + b ) 07/0/06 EC4460.SuFy06/MPF 5

26 Initial Conditions: W W () 0. () 0.5 (0) = ; b = [ ] (0) = 0 0. ; b = 0.5 () () 3 For initial values Network Response Sine Wave Exaple: Textbook, pp /0/06 EC4460.SuFy06/MPF 6

27 What does the -- network look like? 07/0/06 EC4460.SuFy06/MPF 7

28 07/0/06 EC4460.SuFy06/MPF 8

29 3 0 3 i= i= i=4 i= Exaple: function approxiation: iπ g p p p 4 ( ) = + sin ; [ ;] Figure.0 Function Approxiation Using a -3- Network f n = f n = n ( ) ; ( ) () () n + e 6π g p p p 4 ( ) = + sin ; ε [ ;] Figure.0 Effect of Increasing the Nuber of Hidden Neurons Convergence issues: ( ) = + sin ( π ); ε[ ;] g p p p Figure.0 Function Approxiation Using a -3- Network 07/0/06 EC4460.SuFy06/MPF 9

30 3 ( ) = + sin ( π ); ε[ ;] g p p p Figure.3 Convergence to a Local Miniu Network generalization issues Figure.4 -- Network Approxiation of g(p) Figure.5-9- Network Approxiation of g(p) 07/0/06 EC4460.SuFy06/MPF 30

31 07/0/06 EC4460.SuFy06/MPF 3

32 Potential probles with Backpropagation: activation functions ay be nonlinear perforance surface is not uniodal convergence ay be sped up with a variable learning rate increase step size when perforance index is flat decrease step size when perforance index is steep Possible strategy: If error increases by ore than a pre-defined value (typically 4-5%): new weights are discarded learning rate is decreasing (*0.7) If error increases by leass than 4-5%: keep new weights If error decreases: learning rate is increased by 5% 07/0/06 EC4460.SuFy06/MPF 3

33 Recall Convergence ay be sped up with the oentu filter. Wk+ = Wk α s a b = b s k+ k α ( ) T b k W k Introduce eory and LP filter behavior Define to update W, b = γ + ( γ ) x x s k k k s k x k filter response X z X z z S z X( z) γ ==> H( z) = = S( z) γ z ( ) = γ ( ) + ( γ) ( ) Apply above concept to iteration equations ( ) W W W s (a bk+ = bk + γ bk ( γ) αs t k+ = k + γ k= γ α ) b k 07/0/06 EC4460.SuFy06/MPF 33 W k

34 Iteration Techniques Apply above concept to iteration equations x = x + α p where p k+ k k k ( ) < F( x ) F x k+ Use Taylor series expansion. first order expansion: k ( ) ( ) ( ) ( ) selected so that F x = F x + x F x + VF x x T k+ k k k = k x xk ( ) x = x α F x k+ k x= x k SD schee. second order expansion: ( ) ( ) F x + = F x + x k k k T T F ( x ) + VF( x) x= x x + x A k kx k k k k ( ) xk = xk Ak VF x x= x k ( ) A = V F x = k x x k Leads to Newton s schee Recall: potential probles with Newton schee (Hessian, gradient, convergence) 07/0/06 EC4460.SuFy06/MPF 34

35 Levenberg-Marquardt Algorith Recall Assue: Designed to speed up the convergence of the Newton s ethod by reducing the coputational load. T ( ) ( ) ( ) F x V x V x v i. VF ( x) = J T V ( x). ( ) = = T V F x = J ( x) J( x) + S( x) ( ) ( ) x = x F x F x k+ k x= x x= x k T ( ) ( ) µ ( ) ( ) = x J x J x + I J x V x T k k k k k k k General guidelines for µ k Start with µ k =0.0: if F(x) doesn t decrease, repeat with µ k = 0µ k 07/0/06 EC4460.SuFy06/MPF 35

36 Squared Error Surface as a function of the weight values 5 0 w, w, w, w, Figure.3 07/0/06 EC4460.SuFy06/MPF 36

37 Squared Error Surface as a function of the weight values 5 5 w, w, w, w, 0 07/0/06 EC4460.SuFy06/MPF 37

38 Two SDBP (batch ode) trajectories 5 0 w, Figure.6 w, 07/0/06 EC4460.SuFy06/MPF 38

39 Trajectory with learning rate too large 5 0 w, Figure.8 w, 07/0/06 EC4460.SuFy06/MPF 39

40 Moentu Backpropagation Steepest Descent Backpropagation (SDBP) Wk+ = Wk α s a b = b s k+ k α ( ) T W b k+ k+ Moentu Backpropagation (MOBP) = W + k ( ) + γ Wk γ αs (a ) = b + k ( ) + γ bk γ αs t 5 0 γ = 0.8 w, w, 07/0/06 EC4460.SuFy06/MPF 40

41 Variable Learning Rate If the squared error (over the entire training set) increases by ore than soe set percentage ζ after a weight update, then the weight update is discarded, the learning rate is ultiplied by soe factor ( > ρ > 0), and the oentu coefficient γ is set to zero. If the squared error decreases after a weight update, then the weight update is accepted and the learning rate is ultiplied by soe factor η >. If γ has been previously set to zero, it is reset to its original value. If the squared error increases by less than ζ, then the weight update is accepted, but the learning rate and the oentu coefficient are unchanged. 07/0/06 EC4460.SuFy06/MPF 4

42 Variable Learning Rate Trajectory w, η =.05 Weight selection threshold ρ = 0.7 Daping factor for learning rate ζ = 4% Error threshold w,.5 Squared error 60 Learning rate Iteration Nuber Iteration Nuber Figure. 07/0/06 EC4460.SuFy06/MPF 4

43 Conjugate Gradient. The first search direction is the steepest descent. p 0 = g 0 g k F( x) x =. Take a step and choose the learning rate to iniize the function along the search direction. x k + = x k + α k p k x k 3. Select the next search direction according to: where T p k g k = g k β k = T g k p k Hestenes-Steifel update or β k g k + β k p k = or T g k g k T g k g k β k = g k T T g k g k g k Polak-Ribiere update Fletcher-Reeves update 07/0/06 EC4460.SuFy06/MPF 43

44 Conjugate Gradient Trajectory 5 0 w, w, 07/0/06 EC4460.SuFy06/MPF 44

45 Levenberg-Marquardt Trajectory 5 0 w, w, 07/0/06 EC4460.SuFy06/MPF 45

46 Resilient Backpropagation Network BPNN usually use sigoid function (tansig, logsig) as activation functions to introduce nonlinear behavior Can cause the network to have very sall gradient and iterations to stall (alost) Resilient BPNN uses the signs of the gradient coponents only to deterine the direction of the weight update weight change values deterined by separate update value 07/0/06 EC4460.SuFy06/MPF 46

47 Algorith Coparisons It is very difficult to know which training algorith will be the fastest for a given proble. Convergence speed depends on any factors: coplexity of the proble, nuber of data points in the training set, nuber of weights and biases in the network, error goal, whether the network is being used for pattern recognition (discriinant analysis) or function approxiation (regression) etc... 07/0/06 EC4460.SuFy06/MPF 47

48 Toy Exaple : Sinusoid function approxiation Network set-up: -5-; Activation functions (tansig, purelin) Nuber of trials: 30 with rando initial weights and bias Error threshold: MSE<0.00 Algorith Mean Tie (s) Ratio Min. Tie (s) Max. Tie (s) Std. (s) LM BFG RP SCG CGB CGF CGP OSS GDX Sun Sparc workstation Algorith Acrony LM (trainl) - Levenberg-Marquardt BFG (trainbfg) - BFGS Quasi-Newton RP (trainrp) - Resilient Backpropagation SCG (trainscg) - Scaled Conjugate Gradient CGB (traincgb) - Conjugate Gradient with Powell /Beale Restarts CGF(traincgf) - Fletcher-Powell Conjugate Gradient CGP (traincgp) - Polak-Ribiére Conjugate Gradient OSS (trainoss) - One-Step Secant GDX (traingdx) - Variable Learning Rate Backpropagation 07/0/06 EC4460.SuFy06/MPF 48

49 07/0/06 EC4460.SuFy06/MPF 49

50 Exaple : function approxiation (non linear regression) - Engine data set Network set-up: -30- Network inputs: engine speed and fueling levels Network outputs: torque and eission levels. Activation functions (tansig,purelin) Nuber of trials: 30 with rando initial weights and bias Error threshold: MSE < Algorith Mean Tie (s) Ratio Min. Tie (s) Max. Tie (s) Std. (s) LM BFG RP SCG CGB CGF CGP OSS GDX Sun Enterprise 4000 workstation Algorith Acrony LM (trainl) - Levenberg-Marquardt BFG (trainbfg) - BFGS Quasi-Newton RP (trainrp) - Resilient Backpropagation SCG (trainscg) - Scaled Conjugate Gradient CGB (traincgb) - Conjugate Gradient with Powell /Beale Restarts CGF(traincgf) - Fletcher-Powell Conjugate Gradient CGP (traincgp) - Polak-Ribiére Conjugate Gradient OSS (trainoss) - One-Step Secant GDX (traingdx) - Variable Learning Rate Backpropagation 07/0/06 EC4460.SuFy06/MPF 50

51 07/0/06 EC4460.SuFy06/MPF 5

52 Exaple 3: Pattern recognition - Cancer data set Network set-up: Network inputs: clup thickness, unifority of cell size and cell shape, aount of arginal adhesion, frequency of bare nuclei. Network outputs: benign or alignant tuor Activation functions (tansig in all layers) Nuber of trials: 30 with rando initial weights and bias Error threshold: MSE < 0.0 Algorith Mean Tie (s) Ratio Min. Tie (s) Max. Tie (s) Std. (s) CGB RP SCG CGP CGF LM BFG GDX OSS Sun Sparc workstation Algorith Acrony LM (trainl) - Levenberg-Marquardt BFG (trainbfg) - BFGS Quasi-Newton RP (trainrp) - Resilient Backpropagation SCG (trainscg) - Scaled Conjugate Gradient CGB (traincgb) - Conjugate Gradient with Powell /Beale Restarts CGF(traincgf) - Fletcher-Powell Conjugate Gradient CGP (traincgp) - Polak-Ribiére Conjugate Gradient OSS (trainoss) - One-Step Secant GDX (traingdx) - Variable Learning Rate Backpropagation 07/0/06 EC4460.SuFy06/MPF 5

53 07/0/06 EC4460.SuFy06/MPF 53

54 Other exaples available at x/nnet/backpr4.shtl 07/0/06 EC4460.SuFy06/MPF 54

55 EXPERIMENT CONCLUSIONS Several algorith characteristics which can be deuced fro experients: In general, on function approxiation probles, for networks that contain up to a few hundred weights, the LM algorith will have the fastest convergence. This advantage is especially noticeable if very accurate training is required. In any cases, trainl is able to obtain lower ean square errors than any of the other algoriths tested. However, as the nuber of weights in the network increases, the advantage of the trainl decreases. In addition, trainl perforance is relatively poor on pattern recognition probles. The storage requireents of trainl are larger than the other algoriths tested. By adjusting the e_reduc paraeter, discussed earlier, the storage requireents can be reduced, but at a cost of increased execution tie. The trainrp function is the fastest algorith on pattern recognition probles. However, it does not perfor well on function approxiation probles. Its perforance also degrades as the error goal is reduced. The eory requireents for this algorith are relatively sall in coparison to the other algoriths considered. The conjugate gradient algoriths, in particular trainscg, see to perfor well over a wide variety of probles, particularly for networks with a large nuber of weights. The SCG algorith is alost as fast as the LM algorith on function approxiation probles (faster for large networks) and is alost as fast as trainrp on pattern recognition probles. Its perforance does not degrade as quickly as trainrp perforance does when the error is reduced. The conjugate gradient algoriths have relatively odest eory requireents. The trainbfg perforance is siilar to that of trainl. It does not require as uch storage as trainl, but the coputation required does increase geoetrically with the size of the network, since the equivalent of a atrix inverse ust be coputed at each iteration. The variable learning rate algorith traingdx is usually uch slower than the other ethods, and has about the sae storage requireents as trainrp, but it can still be useful for soe probles. There are certain situations in which it is better to converge ore slowly. For exaple, when using early stopping, you ay have inconsistent results if you use an algorith that converges too quickly. You ay overshoot the point at which the error on the validation set is iniized. 07/0/06 EC4460.SuFy06/MPF 55

56 Generalization Issues Network ay be overtrained (overfitting issues) when MSE on training set is set too low Potential Risk: the network eorizes the training exaples, but doesn t learn to generalize to siilar but new situations Consequences: very good perforances on training set, very poor perforance on testing set (-0-) net; noisy sine How to prevent overfitting? Use a network not too large for the proble a-priori network size is difficult to guess Increase training set size if possible Apply regularization early stopping 07/0/06 EC4460.SuFy06/MPF 56

57 Regularization Recall basic perforance (MSE) function is defined as: N N MSE = ei = ( ti ai) N N i= i= Perforance function is odified as: MSE = γ MSE + ( γ ) MSW reg P where MSW = wi N i= & γ : perforance ratio Consequences: MSE reg forces the network to have saller weights and biases, to produce a soother response to be less likely to overfit Drawbacks: difficult to estiate γ γ too large overfitting pb γ too sall no good fit of training data 07/0/06 EC4460.SuFy06/MPF 57

58 Autoated Regularization (MATLAB: trainbr) Definition: Assue weights and bias are rando variables with specific distributions Define new perforance function as: MSE aut =αmse+βmsw Apply statistical concepts (Bayes Rule) to find optiu values for α and β (iterative procedure) Basic MSE MSE aut 07/0/06 EC4460.SuFy06/MPF 58

59 Early Stopping (MATLAB: train with option val ) Definition: Training set split into two sets: training subset: used to copute network weight and biases validation subset: error on the validation is onitored during training: validation error: goes down at training onset goes back up when network starts to overfit the data training continued until validation error increases for a specified nuber of iterations final weights & biases are those obtained for the iniu validation error. Basic MSE Early Stopping MSE 07/0/06 EC4460.SuFy06/MPF 59

60 (MATHWORKS) CONCLUSIONS Both regularization and early stopping can ensure network generalization when properly applied. When using Bayesian regularization, it is iportant to train the network until it reaches convergence. The MSE, MSW, and the effective nuber of paraeters should reach constant values when the network has converged. For early stopping, careful not to use an algorith that converges too rapidly. If you are using a fast algorith (like trainl), you want to set the training paraeters so that the convergence is relatively slow (e.g., set u to a relatively large value, such as, and set u_dec and u_inc to values close to, such as 0.8 and.5, respectively). The training functions trainscg and trainrp usually work well with early stopping. With early stopping, the choice of the validation set is also iportant The validation set should be representative of all points in the training set. With both regularization and early stopping, it is a good idea to train the network starting fro several different initial conditions. It is possible for either ethod to fail in certain circustances. By testing several different initial conditions, you can verify robust network perforance. Based on our (MATWHORKS) experience, Bayesian regularization generally provides better generalization perforance than early stopping, when training function approxiation networks. This is because Bayesian regularization does not require that a validation data set be separated out of the training data set. It uses all of the data. This advantage is especially noticeable when the size of the data set is sall. 07/0/06 EC4460.SuFy06/MPF 60

61 Early Stopping/Validation discussions Data Set Title No. pts. Network Description SINE (5% N) 4-5- Single-cycle sine wave with Gaussian noise at 5% level. SINE (% N) 4-5- Single-cycle sine wave with Gaussian noise at % level. ENGINE (ALL) Engine sensor - full data set. ENGINE (/4) Engine sensor ¼ of data set. Method Engine (All) Engine (/4) Sine (5% N) Sine (% N) ES.3e-.9e-.7e-.3e- BR.6e-3 4.7e-3 3.0e- 6.3e-3 ES/BR Mean Squared Test Set Error 07/0/06 EC4460.SuFy06/MPF 6

62 Soe general design principles (fro NN FAQ) Data encoding issues Nuber of layers issues Nuber of neurons per layer issues Input variable standardization issues Output variable standardization issues Generalization error evaluation issues 07/0/06 EC4460.SuFy06/MPF 6

63 Data encoding issues (fro NN FAQ) X /0/06 EC4460.SuFy06/MPF 63

64 Nuber of layers issues [fro NN FAQ] You ay not need any hidden layers at all. Linear and generalized linear odels are useful in a wide variety of applications. And even if the function you want to learn is ildly nonlinear, you ay get better generalization with a siple linear odel than with a coplicated non-linear odel if there is too little data or too uch noise to estiate the nonlinearities accurately. In MLPs with step/threshold/heaviside activation functions, you need two hidden layers for full generality. In MLPs with any of a wide variety of continuous non-linear hidden-layer activation functions, one hidden layer with an arbitrarily large nuber of units suffices for the universal approxiation property But there is no theory yet to tell you how any hidden units are needed to approxiate any given function. 07/0/06 EC4460.SuFy06/MPF 64

65 Nuber of neurons per layer issues [NN FAQ] The best nuber of hidden units depends in a coplex way on: the nubers of input and output units the nuber of training cases the aount of noise in the targets the coplexity of the function or classification to be learned the architecture the type of hidden unit activation function the training algorith regularization In ost situations, there is no way to deterine the best nuber of hidden units without training several networks and estiating the generalization error of each. If you have too few hidden units, you will get high training error and high generalization error due to underfitting and high statistical bias. If you have too any hidden units, you ay get low training error but still have high generalization error due to overfitting and high variance. 07/0/06 EC4460.SuFy06/MPF 65

66 Input variable standardization issues [NN FAQ] Input contribution depends on its variability relative to other inputs Exaple: Input in range [[- ] Input in range [0 0,000] Input contribution will be swaped by Input Scale inputs so that variability reflects their iportance. * If iportance is not known: scale all inputs to sae variability or sae range * If iportance is known: scale ore iportant inputs so that they have larger variance/ranges Standardizing input variables has different effects on different training algoriths for MLPs. For exaple: ) Steepest descent is very sensitive to scaling. The ore ill-conditioned the Hessian is, the slower the convergence. Hence, scaling is an iportant consideration for gradient descent ethods such as standard backpropagation ) Quasi-Newton and conjugate gradient ethods begin with a steepest descent step and therefore are scale sensitive. However, they accuulate second-order inforation as training proceeds and hence are less scale sensitive than pure gradient descent. 3) Newton-Raphson and Gauss-Newton, if ipleented correctly, are theoretically invariant under scale changes as long as none of the scaling is so extree as to produce underflow or overflow. 4) Levenberg-Marquardt is scale invariant as long as no ridging is required. There are several different ways to ipleent ridging; soe are scale invariant and soe are not. Perforance under bad scaling will depend on details of the ipleentation. 07/0/06 EC4460.SuFy06/MPF 66

67 Output variable standardization issues [NN FAQ] Target ouptuts value ranges should reflect possible neural network output values If the target variable does not have known upper and lower bounds, do not use an output activation function with a bounded range Standardizing target variables is typically ore a convenience for getting good initial weights than a necessity. However, if you have two or ore target variables and your error function is scale-sensitive like the usual least (ean) squares error function, then the variability of each target relative to the others can effect how well the net learns that target. If one target has a range of 0 to, while another target has a range of 0 to 0 6, the net will expend ost of its effort learning the second target to the possible exclusion of the first. So it is essential to rescale the targets so that their variability reflects their iportance, or at least is not in inverse relation to their iportance. If the targets are of equal iportance, they should typically be standardized to the sae range or the sae standard deviation. 07/0/06 EC4460.SuFy06/MPF 67

68 Generalization error evaluation issues [NN FAQ] 3 basic necessary (not sufficient!) conditions for generalization ) Network inputs contain sufficient inforation pertaining to the target, so that there exists a atheatical function relating correct outputs to inputs with the desired degree of accuracy. (neural nets are not clairvoyant!) ) Function which relates inputs to correct outputs ust be in soe sense, sooth, i.e., a sall change in the inputs should, ost of the tie, produce a sall change in the outputs. For continuous inputs and targets, soothness of the function iplies continuity and restrictions on the first derivative over ost of the input space. Soe neural nets can learn discontinuities as long as the function consists of a finite nuber of continuous pieces. Very nonsooth functions such as those produced by pseudo-rando nuber generators and encryption algoriths cannot be generalized by neural nets. Often a nonlinear transforation of the input space can increase the soothness of the function and iprove generalization. 3) the training set ust be a sufficiently large and representative subset of the set of all cases that you want to generalize to. The iportance of this condition is related to the fact that there are, loosely speaking, two different types of generalization: interpolation and extrapolation. interpolation applies to cases that are ore or less surrounded by nearby training cases; everything else is extrapolation. In particular, cases that are outside the range of the training data require extrapolation. Cases inside large "holes" in the training data ay also effectively require extrapolation. Interpolation can often be done reliably, but extrapolation is notoriously unreliable. Hence it is iportant to have sufficient training data to avoid the need for extrapolation. 07/0/06 EC4460.SuFy06/MPF 68

69 Cross-validation and bootstrapping schees to evaluate generalization errors (and copare ipleentations) Schees are called perutation tests because they are based on data resapling ) Cross-validation (Resapling without replaceent) Recoended for sall datasets Can be used to estiate odel error or to copare different NN set-ups How does this work? Split data in k (~0) subsets of equal size. Train the NN k ties, each tie: leave one of the subsets out of the training test NN on the oitted subset When k=saple size leave-one-out cross-validation Overall accuracy is ean of all testing set accuracies 07/0/06 EC4460.SuFy06/MPF 69

70 ) Jackknife estiation Special case of cross-validation Recoended for sall datasets Can be used to estiate odel error or to copare different NN set-ups How does this work? Split data in subsets of size equal to M- (for M data saples available); Train the NN on each set; Each tie, test NN on the leave-one-out oitted saple (i.e., each testing set has only one saple). Overall accuracy is ean of all testing set accuracies 07/0/06 EC4460.SuFy06/MPF 70

71 3) Bootstrapping (Resapling with replaceent) [Boostrap ethods and perutation tests, Hesterberg et al.,w.h. Freean et copany, ] 07/0/06 EC4460.SuFy06/MPF 7

72 Bootstrapping Recoended for sall datasets Is expensive to ipleent. Sees to work better than cross-validation in any cases, but not always in such cases not worth the investent Can be used to estiate odel error or to copare different NN set-ups How does this work? Select k (fro 50 to 000) subsets of the data with replaceent. Train the NN k ties, each tie: Train on one subset Test on another subset Overall accuracy is ean of all testing set accuracies 07/0/06 EC4460.SuFy06/MPF 7

73 Perforance Coparison Which technique is best? Which is ore accurate? Classifier perforance assessent allows to evaluate how well it does and copares with other schees. Useful when cobining decisions/outputs fro several classifiers/detectors (in data fusion applications). evaluates set as a hypothesis test Given two algoriths A and B Hypothesis H 0 : For a randoly drawn set of fixed size, algoriths A and B have the sae error rate. Hypothesis H : For a randoly drawn set of fixed size, algoriths A and B do not have the sae error rate. 07/0/06 EC4460.SuFy06/MPF 73

74 Need to define: Type error rate: Probability of incorrectly rejecting the true null hypothesis. Type error rate: Probability of incorrectly accepting a false null hypothesis. Applied to this proble Type error rate: Probability of incorrectly detecting a difference between classifier perforance when no difference exists. Significance of level α: α represents how selective (i.e., restrictive) the user wants the decision between H 0 and H to be, i.e., for a = 0.05, the user is willing to accept the fact that there is a 5% chance of deciding H 0 is incorrect (or false) when it is in fact correct (or true). 07/0/06 EC4460.SuFy06/MPF 74

75 Thus, The larger α is, the ore likely the user is to decide the clai (H 0 ) is incorrect, when in fact, it is correct, i.e., the user becoes ore selective, as the user rejects ore and ore clais even though they are correct. The saller α is, the less likely the user is to decide the clai is incorrect, when it is, in fact, correct, i.e., the user becoes less selective, as the user will reject fewer clais, however the user will accept ore and ore clais which are, in fact, incorrect. 07/0/06 EC4460.SuFy06/MPF 75

76 McNear s Test Define the following qualities: Nuber of test cases isclassified by A and B Nuber of test cases isclassified by A but not by B n 00 n 0 Nuber of test cases isclassified by B but not by A Nuber of test cases isclassified by neither A nor B n 0 n Note: Total nuber of test cases n = n 0 + n 0 + n + n 00 Under H 0, A and B have sae error rates n 0 = n 0 theoretically, the expected nuber of errors ade only by one of the two algoriths is E E /0/06 EC4460.SuFy06/MPF 76 = n + n

77 McNear s test copares the observed nuber of errors obtained with one of the two algoriths and the expected nuber. Copute z = ( n ) 0 n0 n + n 0 0 Turns out z is χ H 0 (hypothesis that the algoriths A and B have soe error rate) is rejected with a significance level a (i.e., assuing we accept the α% chance deciding H 0 is incorrect when it is, in fact, correct). When How to read χ table χ,0.95 = z > χ, α 07/0/06 EC4460.SuFy06/MPF 77

78 Exaple: Assue we have a proble with 9 classes and 60 text saples. Results give n 00 = n 0 = 4 n 0 = n = 44 Algorith A gives 48 correct decisions. Algorith B gives 45 correct decisions. Are the two algoriths to be considered with sae perforances? 07/0/06 EC4460.SuFy06/MPF 78

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Variations on Backpropagation

Variations on Backpropagation 2 Variations on Backpropagation 2 Variations Heuristic Modifications Moentu Variable Learning Rate Standard Nuerical Optiization Conjugate Gradient Newton s Method (Levenberg-Marquardt) 2 2 Perforance

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Feedforward Networks

Feedforward Networks Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 433 Christian Jacob Dept.of Coputer Science,University of Calgary CPSC 433 - Feedforward Networks 2 Adaptive "Prograing"

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004 Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 533 Winter 2004 Christian Jacob Dept.of Coputer Science,University of Calgary 2 05-2-Backprop-print.nb Adaptive "Prograing"

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Feedforward Networks

Feedforward Networks Feedforward Neural Networks - Backpropagation Feedforward Networks Gradient Descent Learning and Backpropagation CPSC 533 Fall 2003 Christian Jacob Dept.of Coputer Science,University of Calgary Feedforward

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networs Brain s. Coputer Designed to sole logic and arithetic probles Can sole a gazillion arithetic and logic probles in an hour absolute precision Usually one ery fast procesor high

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University NBN Algorith Bogdan M. Wilaoswki Auburn University Hao Yu Auburn University Nicholas Cotton Auburn University. Introduction. -. Coputational Fundaentals - Definition of Basic Concepts in Neural Network

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

2. Image processing. Mahmoud Mohamed Hamoud Kaid, IJECS Volume 6 Issue 2 Feb., 2017 Page No Page 20254

2. Image processing. Mahmoud Mohamed Hamoud Kaid, IJECS Volume 6 Issue 2 Feb., 2017 Page No Page 20254 www.iecs.in International Journal Of Engineering And Coputer Science ISSN: 319-74 Volue 6 Issue Feb. 17, Page No. 54-6 Index Copernicus Value (15): 58., DOI:.18535/iecs/v6i.17 Increase accuracy the recognition

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

List Scheduling and LPT Oliver Braun (09/05/2017)

List Scheduling and LPT Oliver Braun (09/05/2017) List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL paper prepared for the 1996 PTRC Conference, Septeber 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL Nanne J. van der Zijpp 1 Transportation and Traffic Engineering Section Delft University

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

Homework 3 Solutions CSE 101 Summer 2017

Homework 3 Solutions CSE 101 Summer 2017 Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

ZISC Neural Network Base Indicator for Classification Complexity Estimation

ZISC Neural Network Base Indicator for Classification Complexity Estimation ZISC Neural Network Base Indicator for Classification Coplexity Estiation Ivan Budnyk, Abdennasser Сhebira and Kurosh Madani Iages, Signals and Intelligent Systes Laboratory (LISSI / EA 3956) PARIS XII

More information

Neural Network-Aided Extended Kalman Filter for SLAM Problem

Neural Network-Aided Extended Kalman Filter for SLAM Problem 7 IEEE International Conference on Robotics and Autoation Roa, Italy, -4 April 7 ThA.5 Neural Network-Aided Extended Kalan Filter for SLAM Proble Minyong Choi, R. Sakthivel, and Wan Kyun Chung Abstract

More information

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science Proceedings of the 6th WSEAS International Conference on Applied Coputer Science, Tenerife, Canary Islands, Spain, Deceber 16-18, 2006 183 Qualitative Modelling of Tie Series Using Self-Organizing Maps:

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

26 Impulse and Momentum

26 Impulse and Momentum 6 Ipulse and Moentu First, a Few More Words on Work and Energy, for Coparison Purposes Iagine a gigantic air hockey table with a whole bunch of pucks of various asses, none of which experiences any friction

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

Birthday Paradox Calculations and Approximation

Birthday Paradox Calculations and Approximation Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

arxiv: v2 [math.co] 3 Dec 2008

arxiv: v2 [math.co] 3 Dec 2008 arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20

More information

Pattern Classification using Simplified Neural Networks with Pruning Algorithm

Pattern Classification using Simplified Neural Networks with Pruning Algorithm Pattern Classification using Siplified Neural Networks with Pruning Algorith S. M. Karuzzaan 1 Ahed Ryadh Hasan 2 Abstract: In recent years, any neural network odels have been proposed for pattern classification,

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

An Improved Particle Filter with Applications in Ballistic Target Tracking

An Improved Particle Filter with Applications in Ballistic Target Tracking Sensors & ransducers Vol. 72 Issue 6 June 204 pp. 96-20 Sensors & ransducers 204 by IFSA Publishing S. L. http://www.sensorsportal.co An Iproved Particle Filter with Applications in Ballistic arget racing

More information

An Introduction to Meta-Analysis

An Introduction to Meta-Analysis An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl

More information

Efficient Filter Banks And Interpolators

Efficient Filter Banks And Interpolators Efficient Filter Banks And Interpolators A. G. DEMPSTER AND N. P. MURPHY Departent of Electronic Systes University of Westinster 115 New Cavendish St, London W1M 8JS United Kingdo Abstract: - Graphical

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Machine Learning: Fisher s Linear Discriminant. Lecture 05

Machine Learning: Fisher s Linear Discriminant. Lecture 05 Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function

More information

6.2 Grid Search of Chi-Square Space

6.2 Grid Search of Chi-Square Space 6.2 Grid Search of Chi-Square Space exaple data fro a Gaussian-shaped peak are given and plotted initial coefficient guesses are ade the basic grid search strateg is outlined an actual anual search is

More information

ACTIVE VIBRATION CONTROL FOR STRUCTURE HAVING NON- LINEAR BEHAVIOR UNDER EARTHQUAKE EXCITATION

ACTIVE VIBRATION CONTROL FOR STRUCTURE HAVING NON- LINEAR BEHAVIOR UNDER EARTHQUAKE EXCITATION International onference on Earthquae Engineering and Disaster itigation, Jaarta, April 14-15, 8 ATIVE VIBRATION ONTROL FOR TRUTURE HAVING NON- LINEAR BEHAVIOR UNDER EARTHQUAE EXITATION Herlien D. etio

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Chapter 1: Basics of Vibrations for Simple Mechanical Systems

Chapter 1: Basics of Vibrations for Simple Mechanical Systems Chapter 1: Basics of Vibrations for Siple Mechanical Systes Introduction: The fundaentals of Sound and Vibrations are part of the broader field of echanics, with strong connections to classical echanics,

More information

N-Point. DFTs of Two Length-N Real Sequences

N-Point. DFTs of Two Length-N Real Sequences Coputation of the DFT of In ost practical applications, sequences of interest are real In such cases, the syetry properties of the DFT given in Table 5. can be exploited to ake the DFT coputations ore

More information

Testing equality of variances for multiple univariate normal populations

Testing equality of variances for multiple univariate normal populations University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

DETECTION OF NONLINEARITY IN VIBRATIONAL SYSTEMS USING THE SECOND TIME DERIVATIVE OF ABSOLUTE ACCELERATION

DETECTION OF NONLINEARITY IN VIBRATIONAL SYSTEMS USING THE SECOND TIME DERIVATIVE OF ABSOLUTE ACCELERATION DETECTION OF NONLINEARITY IN VIBRATIONAL SYSTEMS USING THE SECOND TIME DERIVATIVE OF ABSOLUTE ACCELERATION Masaki WAKUI 1 and Jun IYAMA and Tsuyoshi KOYAMA 3 ABSTRACT This paper shows a criteria to detect

More information

Kinematics and dynamics, a computational approach

Kinematics and dynamics, a computational approach Kineatics and dynaics, a coputational approach We begin the discussion of nuerical approaches to echanics with the definition for the velocity r r ( t t) r ( t) v( t) li li or r( t t) r( t) v( t) t for

More information

FITTING FUNCTIONS AND THEIR DERIVATIVES WITH NEURAL NETWORKS ARJPOLSON PUKRITTAYAKAMEE

FITTING FUNCTIONS AND THEIR DERIVATIVES WITH NEURAL NETWORKS ARJPOLSON PUKRITTAYAKAMEE FITTING FUNCTIONS AND THEIR DERIVATIVES WITH NEURAL NETWORKS By ARJPOLSON PUKRITTAYAKAMEE Bachelor of Engineering Chulalongkorn University Bangkok, Thailand 997 Master of Sciences Oklahoa State University

More information