VI. Backpropagation Neural Networks (BPNN)
|
|
- Darrell Campbell
- 6 years ago
- Views:
Transcription
1 VI. Backpropagation Neural Networks (BPNN) Review of Adaline Newton s ethod Backpropagation algorith definition derivative coputation weight/bias coputation function approxiation exaple network generalization issues potential probles with the BPNN oentu filter iteration schees review Generalization regularization early stopping Ipleentation issues References: [Hagan], [Mathworks] NN FAQ at ftp://ftp.sas.co/pub/neural/faq.htl Pattern Classification, Duda & Hart, Wiley, 00 07/0/06 EC4460.SuFy06/MPF
2 Recall the Adaline (LMS) network: Input Linear Neuron R p R x W S x R b S x a=wp+ b a n S S x S x Restriction to the Adaline (LMS)? Linear activation function Proble solved with Adaline/LMS: Given a set {p i, t i } define the weights and bias which iniize the Mean Square error ( ) E t a x a w p = z = b = x T z ( T ( ) [ ] ) T T F x = E e = E t x z = x Rx x h+ c 07/0/06 EC4460.SuFy06/MPF
3 Practical application: solving F(x) requires coputing R and h, and R - alternative: solve proble iteratively using steepest descent only xk+ = x k α F(x k) = x α( ez ), e = t a k k LMS iteration: pick x(0) a=x kt z k e=t-a x k+ =x k +αez k k=k+ Extensions ==> Multilayer perceptron 07/0/06 EC4460.SuFy06/MPF 3
4 Why use ulti-layer structures? subclasses Class Class K = 9 subclasses, M = classes 07/0/06 EC4460.SuFy06/MPF 4
5 Exaple: pattern classification: the XOR gate 0 0 p =, t = 0 ; p =, t = ; 0 p =, t 3 3 = ; p =, t 4 4 = 0 0 Can it be solved with a single layer perceptron? 07/0/06 EC4460.SuFy06/MPF 5
6 NN block diagra: 07/0/06 EC4460.SuFy06/MPF 6
7 Note: Final network space partitioning varies as a function of the nuber of neurons in the hidden layer 07/0/06 EC4460.SuFy06/MPF 7
8 Exaple: b y b y y 3 b 3 Assue b = 0.5, b =, b 3 = Plot the decision boundaries obtained assuing HL is used as activation functions Derive the weight atrix and bias vector used for this network Design the NN second layer (following given in-class guidelines, i.e., identify weight atrix and bias 07/0/06 EC4460.SuFy06/MPF 8
9 07/0/06 EC4460.SuFy06/MPF 9
10 07/0/06 EC4460.SuFy06/MPF 0
11 Exaple: ultilayer perceptron (classification) Assue dark = 07/0/06 EC4460.SuFy06/MPF
12 07/0/06 EC4460.SuFy06/MPF
13 Exaple: function approxiation Input Log-Sigoid Layer Linear Layer p w, Σ b Σ b n a n a w, w, w, Σ n a b a = logsig (W p + b ) a = purelin (W a + b ) f ( n) ( ) = + e f n = n ω ω n = 0ω = 0, b = 0,, b,, = 0 = ω = b = 0 3 Exaple Function Approxiation Network 3 0 b 0 () w 3 (), a () w (), b p Noinal Response of Network of Figure Above Effect of Paraeter Changes on Network Response 07/0/06 EC4460.SuFy06/MPF 3
14 Backpropagation algorith: Input First Layer Second Layer Third Layer S x R W S f x S x S b W S f 3 x S 3 x S b 3 f 3 p a a W a3 n n n 3 R x b S x S x S x R S S S 3 a = f (W p + b ) a = f (W a + b ) a 3 = f 3 (W 3 a + b 3 ) a 3 = f 3 (W 3 f (W f (W p + b ) + b ) + b 3 ) S x S 3 x S 3 x S 3 x Goal: given a set of {p i, t i }; find the weights and bias which iniize the ean square error (perforance surface) ( ) = F x E[( t a) ] Discard the expected operation T ( ) = ( ) ( ) F x t a t a k k k k k 07/0/06 EC4460.SuFy06/MPF 4
15 For a -layer only and purelin activation function T k+ = k + α k k w w e p b = b + α e k+ k k 07/0/06 EC4460.SuFy06/MPF 5
16 How to copute the derivatives? use SD Recall: F ( x) F w ( k+ ) = w ( k) α F bi ( k+ ) = bi ( k) α bi i, j i, j wi, j Note: F(x) ay not be expressed directly in ters of w, w, etc i, j i, j We need to use the chain rule ( ) ( ) ( ) df n w df n dn w = dw dn dw Exaple: ( ) f n = e 3n n= 5w+ 35 ( w+ ) ( ( )) = e f n w 07/0/06 EC4460.SuFy06/MPF 6
17 F F n w n w n w i = i, j i i, j F F n b n b i = i i i n = w a + b n b i j i, j j i i i, j i i = a = j w n w n i, j th : layer weight associated to j th input th neuron th i : associated with i neuron i, j th : layer, associated to i th neuron and i th input th i : associated with i neuron j F w = w α a b i, j, k+ i, j, k j ni F = b α s i : sensitivity of F(.) to ik, + ik, ni changes in i th neuron eleent at layer 07/0/06 EC4460.SuFy06/MPF 7
18 Expressing Weight/Bias in a Matrix For Matrix For F w ( k+ ) = w ( k) α a i, j i, j j ni F b ( k+ ) = b ( k) α n i i i W k+ = W α k w w w W w w w R = R Associated with neuron () () w w w w F input, neuron, w, k+ = w, k α a j = i= n F = = = input, neuron, w, k+ w, k α a i, j n F = = input, neuron, w, k+ = w, k α a i, j n F F a a w w w w n n = α w w F F k+ k a a n n F n = α a a F T ( a n ) s 07/0/06 EC4460.SuFy06/MPF 8
19 F F a a w w w w n n = α w F w k+ k a n F n = α a a F T ( a n ) s k+ k k+ = k α W = W α s a b b s ( ) t 07/0/06 EC4460.SuFy06/MPF 9
20 Need to use the chain rule F n i Will involve ters of the for: F n + j n n + j i Define the atrix: n n + +, + n n n = + + n n n, n n + i j + + wi, a b i nj n + = n [ ] a = a n j + j i, j j j n j + ( ) i, j ( ) ( j) ( ) ( ) ( ) ( ) f ( n ) 0 0 ( ) + + ( ) + ( ) n w f n w f n = ( ) ( ) n + + w f n w f n j j a = w a = f n = w f n w w w w f n + ( ) = W F n ( ) 07/0/06 EC4460.SuFy06/MPF 0
21 s F F n n F n = = n : first neuron n : second neuron Next, apply the chain rule for vectors: F F n F n n n n n n + + = () Sensitivity of F to change in the st eleent of the net input at layer. F F n F n n n n n n + + = () Sensitivity of F to change in the st eleent of the net input at layer. () & () + + n n F + n n n s = + + n F n + n n n + T + T n + F n F s + = = n n n n. ( ) T. ( ) T + ( ) ( )( ) s = W F n s = F n W s 07/0/06 EC4460.SuFy06/MPF
22 07/0/06 EC4460.SuFy06/MPF We need to copute s M ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) s 0 0 M M M j j M j j M M j j M M M j j M M M M M M M M M M M F n F n t a n t a n t a a a n t a a a n f n t a n f n t a n f n n = = = = = = ( ) ( )( ) ( ) t-a M M M M M M t a t a f n n s F n = Note: a =f(n)
23 Suary: a o = p + + Start + ( + a = f W a + b ) a = a M Copute M ( )( ) M ( M ) s = F n t-a + ( )( ) ( ) T + s = F n W s ; = M,, Update the Weights = b ( k+ ) = b ( k) α s ( + ) = ( ) α ( ) W k W k s a T Note: We will need derivatives for all activation functions 07/0/06 EC4460.SuFy06/MPF 3
24 Exaple: Function Approxiation k= π g( p) = + sin pk 4 t p - + e -- Network a 07/0/06 EC4460.SuFy06/MPF 4
25 p -- Network a Input Log-Sigoid Layer Linear Layer p w n a, Σ b n a Σ b w, w, w, n a Σ b a = logsig (W p + b ) a = purelin (W a + b ) 07/0/06 EC4460.SuFy06/MPF 5
26 Initial Conditions: W W () 0. () 0.5 (0) = ; b = [ ] (0) = 0 0. ; b = 0.5 () () 3 For initial values Network Response Sine Wave Exaple: Textbook, pp /0/06 EC4460.SuFy06/MPF 6
27 What does the -- network look like? 07/0/06 EC4460.SuFy06/MPF 7
28 07/0/06 EC4460.SuFy06/MPF 8
29 3 0 3 i= i= i=4 i= Exaple: function approxiation: iπ g p p p 4 ( ) = + sin ; [ ;] Figure.0 Function Approxiation Using a -3- Network f n = f n = n ( ) ; ( ) () () n + e 6π g p p p 4 ( ) = + sin ; ε [ ;] Figure.0 Effect of Increasing the Nuber of Hidden Neurons Convergence issues: ( ) = + sin ( π ); ε[ ;] g p p p Figure.0 Function Approxiation Using a -3- Network 07/0/06 EC4460.SuFy06/MPF 9
30 3 ( ) = + sin ( π ); ε[ ;] g p p p Figure.3 Convergence to a Local Miniu Network generalization issues Figure.4 -- Network Approxiation of g(p) Figure.5-9- Network Approxiation of g(p) 07/0/06 EC4460.SuFy06/MPF 30
31 07/0/06 EC4460.SuFy06/MPF 3
32 Potential probles with Backpropagation: activation functions ay be nonlinear perforance surface is not uniodal convergence ay be sped up with a variable learning rate increase step size when perforance index is flat decrease step size when perforance index is steep Possible strategy: If error increases by ore than a pre-defined value (typically 4-5%): new weights are discarded learning rate is decreasing (*0.7) If error increases by leass than 4-5%: keep new weights If error decreases: learning rate is increased by 5% 07/0/06 EC4460.SuFy06/MPF 3
33 Recall Convergence ay be sped up with the oentu filter. Wk+ = Wk α s a b = b s k+ k α ( ) T b k W k Introduce eory and LP filter behavior Define to update W, b = γ + ( γ ) x x s k k k s k x k filter response X z X z z S z X( z) γ ==> H( z) = = S( z) γ z ( ) = γ ( ) + ( γ) ( ) Apply above concept to iteration equations ( ) W W W s (a bk+ = bk + γ bk ( γ) αs t k+ = k + γ k= γ α ) b k 07/0/06 EC4460.SuFy06/MPF 33 W k
34 Iteration Techniques Apply above concept to iteration equations x = x + α p where p k+ k k k ( ) < F( x ) F x k+ Use Taylor series expansion. first order expansion: k ( ) ( ) ( ) ( ) selected so that F x = F x + x F x + VF x x T k+ k k k = k x xk ( ) x = x α F x k+ k x= x k SD schee. second order expansion: ( ) ( ) F x + = F x + x k k k T T F ( x ) + VF( x) x= x x + x A k kx k k k k ( ) xk = xk Ak VF x x= x k ( ) A = V F x = k x x k Leads to Newton s schee Recall: potential probles with Newton schee (Hessian, gradient, convergence) 07/0/06 EC4460.SuFy06/MPF 34
35 Levenberg-Marquardt Algorith Recall Assue: Designed to speed up the convergence of the Newton s ethod by reducing the coputational load. T ( ) ( ) ( ) F x V x V x v i. VF ( x) = J T V ( x). ( ) = = T V F x = J ( x) J( x) + S( x) ( ) ( ) x = x F x F x k+ k x= x x= x k T ( ) ( ) µ ( ) ( ) = x J x J x + I J x V x T k k k k k k k General guidelines for µ k Start with µ k =0.0: if F(x) doesn t decrease, repeat with µ k = 0µ k 07/0/06 EC4460.SuFy06/MPF 35
36 Squared Error Surface as a function of the weight values 5 0 w, w, w, w, Figure.3 07/0/06 EC4460.SuFy06/MPF 36
37 Squared Error Surface as a function of the weight values 5 5 w, w, w, w, 0 07/0/06 EC4460.SuFy06/MPF 37
38 Two SDBP (batch ode) trajectories 5 0 w, Figure.6 w, 07/0/06 EC4460.SuFy06/MPF 38
39 Trajectory with learning rate too large 5 0 w, Figure.8 w, 07/0/06 EC4460.SuFy06/MPF 39
40 Moentu Backpropagation Steepest Descent Backpropagation (SDBP) Wk+ = Wk α s a b = b s k+ k α ( ) T W b k+ k+ Moentu Backpropagation (MOBP) = W + k ( ) + γ Wk γ αs (a ) = b + k ( ) + γ bk γ αs t 5 0 γ = 0.8 w, w, 07/0/06 EC4460.SuFy06/MPF 40
41 Variable Learning Rate If the squared error (over the entire training set) increases by ore than soe set percentage ζ after a weight update, then the weight update is discarded, the learning rate is ultiplied by soe factor ( > ρ > 0), and the oentu coefficient γ is set to zero. If the squared error decreases after a weight update, then the weight update is accepted and the learning rate is ultiplied by soe factor η >. If γ has been previously set to zero, it is reset to its original value. If the squared error increases by less than ζ, then the weight update is accepted, but the learning rate and the oentu coefficient are unchanged. 07/0/06 EC4460.SuFy06/MPF 4
42 Variable Learning Rate Trajectory w, η =.05 Weight selection threshold ρ = 0.7 Daping factor for learning rate ζ = 4% Error threshold w,.5 Squared error 60 Learning rate Iteration Nuber Iteration Nuber Figure. 07/0/06 EC4460.SuFy06/MPF 4
43 Conjugate Gradient. The first search direction is the steepest descent. p 0 = g 0 g k F( x) x =. Take a step and choose the learning rate to iniize the function along the search direction. x k + = x k + α k p k x k 3. Select the next search direction according to: where T p k g k = g k β k = T g k p k Hestenes-Steifel update or β k g k + β k p k = or T g k g k T g k g k β k = g k T T g k g k g k Polak-Ribiere update Fletcher-Reeves update 07/0/06 EC4460.SuFy06/MPF 43
44 Conjugate Gradient Trajectory 5 0 w, w, 07/0/06 EC4460.SuFy06/MPF 44
45 Levenberg-Marquardt Trajectory 5 0 w, w, 07/0/06 EC4460.SuFy06/MPF 45
46 Resilient Backpropagation Network BPNN usually use sigoid function (tansig, logsig) as activation functions to introduce nonlinear behavior Can cause the network to have very sall gradient and iterations to stall (alost) Resilient BPNN uses the signs of the gradient coponents only to deterine the direction of the weight update weight change values deterined by separate update value 07/0/06 EC4460.SuFy06/MPF 46
47 Algorith Coparisons It is very difficult to know which training algorith will be the fastest for a given proble. Convergence speed depends on any factors: coplexity of the proble, nuber of data points in the training set, nuber of weights and biases in the network, error goal, whether the network is being used for pattern recognition (discriinant analysis) or function approxiation (regression) etc... 07/0/06 EC4460.SuFy06/MPF 47
48 Toy Exaple : Sinusoid function approxiation Network set-up: -5-; Activation functions (tansig, purelin) Nuber of trials: 30 with rando initial weights and bias Error threshold: MSE<0.00 Algorith Mean Tie (s) Ratio Min. Tie (s) Max. Tie (s) Std. (s) LM BFG RP SCG CGB CGF CGP OSS GDX Sun Sparc workstation Algorith Acrony LM (trainl) - Levenberg-Marquardt BFG (trainbfg) - BFGS Quasi-Newton RP (trainrp) - Resilient Backpropagation SCG (trainscg) - Scaled Conjugate Gradient CGB (traincgb) - Conjugate Gradient with Powell /Beale Restarts CGF(traincgf) - Fletcher-Powell Conjugate Gradient CGP (traincgp) - Polak-Ribiére Conjugate Gradient OSS (trainoss) - One-Step Secant GDX (traingdx) - Variable Learning Rate Backpropagation 07/0/06 EC4460.SuFy06/MPF 48
49 07/0/06 EC4460.SuFy06/MPF 49
50 Exaple : function approxiation (non linear regression) - Engine data set Network set-up: -30- Network inputs: engine speed and fueling levels Network outputs: torque and eission levels. Activation functions (tansig,purelin) Nuber of trials: 30 with rando initial weights and bias Error threshold: MSE < Algorith Mean Tie (s) Ratio Min. Tie (s) Max. Tie (s) Std. (s) LM BFG RP SCG CGB CGF CGP OSS GDX Sun Enterprise 4000 workstation Algorith Acrony LM (trainl) - Levenberg-Marquardt BFG (trainbfg) - BFGS Quasi-Newton RP (trainrp) - Resilient Backpropagation SCG (trainscg) - Scaled Conjugate Gradient CGB (traincgb) - Conjugate Gradient with Powell /Beale Restarts CGF(traincgf) - Fletcher-Powell Conjugate Gradient CGP (traincgp) - Polak-Ribiére Conjugate Gradient OSS (trainoss) - One-Step Secant GDX (traingdx) - Variable Learning Rate Backpropagation 07/0/06 EC4460.SuFy06/MPF 50
51 07/0/06 EC4460.SuFy06/MPF 5
52 Exaple 3: Pattern recognition - Cancer data set Network set-up: Network inputs: clup thickness, unifority of cell size and cell shape, aount of arginal adhesion, frequency of bare nuclei. Network outputs: benign or alignant tuor Activation functions (tansig in all layers) Nuber of trials: 30 with rando initial weights and bias Error threshold: MSE < 0.0 Algorith Mean Tie (s) Ratio Min. Tie (s) Max. Tie (s) Std. (s) CGB RP SCG CGP CGF LM BFG GDX OSS Sun Sparc workstation Algorith Acrony LM (trainl) - Levenberg-Marquardt BFG (trainbfg) - BFGS Quasi-Newton RP (trainrp) - Resilient Backpropagation SCG (trainscg) - Scaled Conjugate Gradient CGB (traincgb) - Conjugate Gradient with Powell /Beale Restarts CGF(traincgf) - Fletcher-Powell Conjugate Gradient CGP (traincgp) - Polak-Ribiére Conjugate Gradient OSS (trainoss) - One-Step Secant GDX (traingdx) - Variable Learning Rate Backpropagation 07/0/06 EC4460.SuFy06/MPF 5
53 07/0/06 EC4460.SuFy06/MPF 53
54 Other exaples available at x/nnet/backpr4.shtl 07/0/06 EC4460.SuFy06/MPF 54
55 EXPERIMENT CONCLUSIONS Several algorith characteristics which can be deuced fro experients: In general, on function approxiation probles, for networks that contain up to a few hundred weights, the LM algorith will have the fastest convergence. This advantage is especially noticeable if very accurate training is required. In any cases, trainl is able to obtain lower ean square errors than any of the other algoriths tested. However, as the nuber of weights in the network increases, the advantage of the trainl decreases. In addition, trainl perforance is relatively poor on pattern recognition probles. The storage requireents of trainl are larger than the other algoriths tested. By adjusting the e_reduc paraeter, discussed earlier, the storage requireents can be reduced, but at a cost of increased execution tie. The trainrp function is the fastest algorith on pattern recognition probles. However, it does not perfor well on function approxiation probles. Its perforance also degrades as the error goal is reduced. The eory requireents for this algorith are relatively sall in coparison to the other algoriths considered. The conjugate gradient algoriths, in particular trainscg, see to perfor well over a wide variety of probles, particularly for networks with a large nuber of weights. The SCG algorith is alost as fast as the LM algorith on function approxiation probles (faster for large networks) and is alost as fast as trainrp on pattern recognition probles. Its perforance does not degrade as quickly as trainrp perforance does when the error is reduced. The conjugate gradient algoriths have relatively odest eory requireents. The trainbfg perforance is siilar to that of trainl. It does not require as uch storage as trainl, but the coputation required does increase geoetrically with the size of the network, since the equivalent of a atrix inverse ust be coputed at each iteration. The variable learning rate algorith traingdx is usually uch slower than the other ethods, and has about the sae storage requireents as trainrp, but it can still be useful for soe probles. There are certain situations in which it is better to converge ore slowly. For exaple, when using early stopping, you ay have inconsistent results if you use an algorith that converges too quickly. You ay overshoot the point at which the error on the validation set is iniized. 07/0/06 EC4460.SuFy06/MPF 55
56 Generalization Issues Network ay be overtrained (overfitting issues) when MSE on training set is set too low Potential Risk: the network eorizes the training exaples, but doesn t learn to generalize to siilar but new situations Consequences: very good perforances on training set, very poor perforance on testing set (-0-) net; noisy sine How to prevent overfitting? Use a network not too large for the proble a-priori network size is difficult to guess Increase training set size if possible Apply regularization early stopping 07/0/06 EC4460.SuFy06/MPF 56
57 Regularization Recall basic perforance (MSE) function is defined as: N N MSE = ei = ( ti ai) N N i= i= Perforance function is odified as: MSE = γ MSE + ( γ ) MSW reg P where MSW = wi N i= & γ : perforance ratio Consequences: MSE reg forces the network to have saller weights and biases, to produce a soother response to be less likely to overfit Drawbacks: difficult to estiate γ γ too large overfitting pb γ too sall no good fit of training data 07/0/06 EC4460.SuFy06/MPF 57
58 Autoated Regularization (MATLAB: trainbr) Definition: Assue weights and bias are rando variables with specific distributions Define new perforance function as: MSE aut =αmse+βmsw Apply statistical concepts (Bayes Rule) to find optiu values for α and β (iterative procedure) Basic MSE MSE aut 07/0/06 EC4460.SuFy06/MPF 58
59 Early Stopping (MATLAB: train with option val ) Definition: Training set split into two sets: training subset: used to copute network weight and biases validation subset: error on the validation is onitored during training: validation error: goes down at training onset goes back up when network starts to overfit the data training continued until validation error increases for a specified nuber of iterations final weights & biases are those obtained for the iniu validation error. Basic MSE Early Stopping MSE 07/0/06 EC4460.SuFy06/MPF 59
60 (MATHWORKS) CONCLUSIONS Both regularization and early stopping can ensure network generalization when properly applied. When using Bayesian regularization, it is iportant to train the network until it reaches convergence. The MSE, MSW, and the effective nuber of paraeters should reach constant values when the network has converged. For early stopping, careful not to use an algorith that converges too rapidly. If you are using a fast algorith (like trainl), you want to set the training paraeters so that the convergence is relatively slow (e.g., set u to a relatively large value, such as, and set u_dec and u_inc to values close to, such as 0.8 and.5, respectively). The training functions trainscg and trainrp usually work well with early stopping. With early stopping, the choice of the validation set is also iportant The validation set should be representative of all points in the training set. With both regularization and early stopping, it is a good idea to train the network starting fro several different initial conditions. It is possible for either ethod to fail in certain circustances. By testing several different initial conditions, you can verify robust network perforance. Based on our (MATWHORKS) experience, Bayesian regularization generally provides better generalization perforance than early stopping, when training function approxiation networks. This is because Bayesian regularization does not require that a validation data set be separated out of the training data set. It uses all of the data. This advantage is especially noticeable when the size of the data set is sall. 07/0/06 EC4460.SuFy06/MPF 60
61 Early Stopping/Validation discussions Data Set Title No. pts. Network Description SINE (5% N) 4-5- Single-cycle sine wave with Gaussian noise at 5% level. SINE (% N) 4-5- Single-cycle sine wave with Gaussian noise at % level. ENGINE (ALL) Engine sensor - full data set. ENGINE (/4) Engine sensor ¼ of data set. Method Engine (All) Engine (/4) Sine (5% N) Sine (% N) ES.3e-.9e-.7e-.3e- BR.6e-3 4.7e-3 3.0e- 6.3e-3 ES/BR Mean Squared Test Set Error 07/0/06 EC4460.SuFy06/MPF 6
62 Soe general design principles (fro NN FAQ) Data encoding issues Nuber of layers issues Nuber of neurons per layer issues Input variable standardization issues Output variable standardization issues Generalization error evaluation issues 07/0/06 EC4460.SuFy06/MPF 6
63 Data encoding issues (fro NN FAQ) X /0/06 EC4460.SuFy06/MPF 63
64 Nuber of layers issues [fro NN FAQ] You ay not need any hidden layers at all. Linear and generalized linear odels are useful in a wide variety of applications. And even if the function you want to learn is ildly nonlinear, you ay get better generalization with a siple linear odel than with a coplicated non-linear odel if there is too little data or too uch noise to estiate the nonlinearities accurately. In MLPs with step/threshold/heaviside activation functions, you need two hidden layers for full generality. In MLPs with any of a wide variety of continuous non-linear hidden-layer activation functions, one hidden layer with an arbitrarily large nuber of units suffices for the universal approxiation property But there is no theory yet to tell you how any hidden units are needed to approxiate any given function. 07/0/06 EC4460.SuFy06/MPF 64
65 Nuber of neurons per layer issues [NN FAQ] The best nuber of hidden units depends in a coplex way on: the nubers of input and output units the nuber of training cases the aount of noise in the targets the coplexity of the function or classification to be learned the architecture the type of hidden unit activation function the training algorith regularization In ost situations, there is no way to deterine the best nuber of hidden units without training several networks and estiating the generalization error of each. If you have too few hidden units, you will get high training error and high generalization error due to underfitting and high statistical bias. If you have too any hidden units, you ay get low training error but still have high generalization error due to overfitting and high variance. 07/0/06 EC4460.SuFy06/MPF 65
66 Input variable standardization issues [NN FAQ] Input contribution depends on its variability relative to other inputs Exaple: Input in range [[- ] Input in range [0 0,000] Input contribution will be swaped by Input Scale inputs so that variability reflects their iportance. * If iportance is not known: scale all inputs to sae variability or sae range * If iportance is known: scale ore iportant inputs so that they have larger variance/ranges Standardizing input variables has different effects on different training algoriths for MLPs. For exaple: ) Steepest descent is very sensitive to scaling. The ore ill-conditioned the Hessian is, the slower the convergence. Hence, scaling is an iportant consideration for gradient descent ethods such as standard backpropagation ) Quasi-Newton and conjugate gradient ethods begin with a steepest descent step and therefore are scale sensitive. However, they accuulate second-order inforation as training proceeds and hence are less scale sensitive than pure gradient descent. 3) Newton-Raphson and Gauss-Newton, if ipleented correctly, are theoretically invariant under scale changes as long as none of the scaling is so extree as to produce underflow or overflow. 4) Levenberg-Marquardt is scale invariant as long as no ridging is required. There are several different ways to ipleent ridging; soe are scale invariant and soe are not. Perforance under bad scaling will depend on details of the ipleentation. 07/0/06 EC4460.SuFy06/MPF 66
67 Output variable standardization issues [NN FAQ] Target ouptuts value ranges should reflect possible neural network output values If the target variable does not have known upper and lower bounds, do not use an output activation function with a bounded range Standardizing target variables is typically ore a convenience for getting good initial weights than a necessity. However, if you have two or ore target variables and your error function is scale-sensitive like the usual least (ean) squares error function, then the variability of each target relative to the others can effect how well the net learns that target. If one target has a range of 0 to, while another target has a range of 0 to 0 6, the net will expend ost of its effort learning the second target to the possible exclusion of the first. So it is essential to rescale the targets so that their variability reflects their iportance, or at least is not in inverse relation to their iportance. If the targets are of equal iportance, they should typically be standardized to the sae range or the sae standard deviation. 07/0/06 EC4460.SuFy06/MPF 67
68 Generalization error evaluation issues [NN FAQ] 3 basic necessary (not sufficient!) conditions for generalization ) Network inputs contain sufficient inforation pertaining to the target, so that there exists a atheatical function relating correct outputs to inputs with the desired degree of accuracy. (neural nets are not clairvoyant!) ) Function which relates inputs to correct outputs ust be in soe sense, sooth, i.e., a sall change in the inputs should, ost of the tie, produce a sall change in the outputs. For continuous inputs and targets, soothness of the function iplies continuity and restrictions on the first derivative over ost of the input space. Soe neural nets can learn discontinuities as long as the function consists of a finite nuber of continuous pieces. Very nonsooth functions such as those produced by pseudo-rando nuber generators and encryption algoriths cannot be generalized by neural nets. Often a nonlinear transforation of the input space can increase the soothness of the function and iprove generalization. 3) the training set ust be a sufficiently large and representative subset of the set of all cases that you want to generalize to. The iportance of this condition is related to the fact that there are, loosely speaking, two different types of generalization: interpolation and extrapolation. interpolation applies to cases that are ore or less surrounded by nearby training cases; everything else is extrapolation. In particular, cases that are outside the range of the training data require extrapolation. Cases inside large "holes" in the training data ay also effectively require extrapolation. Interpolation can often be done reliably, but extrapolation is notoriously unreliable. Hence it is iportant to have sufficient training data to avoid the need for extrapolation. 07/0/06 EC4460.SuFy06/MPF 68
69 Cross-validation and bootstrapping schees to evaluate generalization errors (and copare ipleentations) Schees are called perutation tests because they are based on data resapling ) Cross-validation (Resapling without replaceent) Recoended for sall datasets Can be used to estiate odel error or to copare different NN set-ups How does this work? Split data in k (~0) subsets of equal size. Train the NN k ties, each tie: leave one of the subsets out of the training test NN on the oitted subset When k=saple size leave-one-out cross-validation Overall accuracy is ean of all testing set accuracies 07/0/06 EC4460.SuFy06/MPF 69
70 ) Jackknife estiation Special case of cross-validation Recoended for sall datasets Can be used to estiate odel error or to copare different NN set-ups How does this work? Split data in subsets of size equal to M- (for M data saples available); Train the NN on each set; Each tie, test NN on the leave-one-out oitted saple (i.e., each testing set has only one saple). Overall accuracy is ean of all testing set accuracies 07/0/06 EC4460.SuFy06/MPF 70
71 3) Bootstrapping (Resapling with replaceent) [Boostrap ethods and perutation tests, Hesterberg et al.,w.h. Freean et copany, ] 07/0/06 EC4460.SuFy06/MPF 7
72 Bootstrapping Recoended for sall datasets Is expensive to ipleent. Sees to work better than cross-validation in any cases, but not always in such cases not worth the investent Can be used to estiate odel error or to copare different NN set-ups How does this work? Select k (fro 50 to 000) subsets of the data with replaceent. Train the NN k ties, each tie: Train on one subset Test on another subset Overall accuracy is ean of all testing set accuracies 07/0/06 EC4460.SuFy06/MPF 7
73 Perforance Coparison Which technique is best? Which is ore accurate? Classifier perforance assessent allows to evaluate how well it does and copares with other schees. Useful when cobining decisions/outputs fro several classifiers/detectors (in data fusion applications). evaluates set as a hypothesis test Given two algoriths A and B Hypothesis H 0 : For a randoly drawn set of fixed size, algoriths A and B have the sae error rate. Hypothesis H : For a randoly drawn set of fixed size, algoriths A and B do not have the sae error rate. 07/0/06 EC4460.SuFy06/MPF 73
74 Need to define: Type error rate: Probability of incorrectly rejecting the true null hypothesis. Type error rate: Probability of incorrectly accepting a false null hypothesis. Applied to this proble Type error rate: Probability of incorrectly detecting a difference between classifier perforance when no difference exists. Significance of level α: α represents how selective (i.e., restrictive) the user wants the decision between H 0 and H to be, i.e., for a = 0.05, the user is willing to accept the fact that there is a 5% chance of deciding H 0 is incorrect (or false) when it is in fact correct (or true). 07/0/06 EC4460.SuFy06/MPF 74
75 Thus, The larger α is, the ore likely the user is to decide the clai (H 0 ) is incorrect, when in fact, it is correct, i.e., the user becoes ore selective, as the user rejects ore and ore clais even though they are correct. The saller α is, the less likely the user is to decide the clai is incorrect, when it is, in fact, correct, i.e., the user becoes less selective, as the user will reject fewer clais, however the user will accept ore and ore clais which are, in fact, incorrect. 07/0/06 EC4460.SuFy06/MPF 75
76 McNear s Test Define the following qualities: Nuber of test cases isclassified by A and B Nuber of test cases isclassified by A but not by B n 00 n 0 Nuber of test cases isclassified by B but not by A Nuber of test cases isclassified by neither A nor B n 0 n Note: Total nuber of test cases n = n 0 + n 0 + n + n 00 Under H 0, A and B have sae error rates n 0 = n 0 theoretically, the expected nuber of errors ade only by one of the two algoriths is E E /0/06 EC4460.SuFy06/MPF 76 = n + n
77 McNear s test copares the observed nuber of errors obtained with one of the two algoriths and the expected nuber. Copute z = ( n ) 0 n0 n + n 0 0 Turns out z is χ H 0 (hypothesis that the algoriths A and B have soe error rate) is rejected with a significance level a (i.e., assuing we accept the α% chance deciding H 0 is incorrect when it is, in fact, correct). When How to read χ table χ,0.95 = z > χ, α 07/0/06 EC4460.SuFy06/MPF 77
78 Exaple: Assue we have a proble with 9 classes and 60 text saples. Results give n 00 = n 0 = 4 n 0 = n = 44 Algorith A gives 48 correct decisions. Algorith B gives 45 correct decisions. Are the two algoriths to be considered with sae perforances? 07/0/06 EC4460.SuFy06/MPF 78
Ch 12: Variations on Backpropagation
Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith
More informationVariations on Backpropagation
2 Variations on Backpropagation 2 Variations Heuristic Modifications Moentu Variable Learning Rate Standard Nuerical Optiization Conjugate Gradient Newton s Method (Levenberg-Marquardt) 2 2 Perforance
More informationIntelligent Systems: Reasoning and Recognition. Artificial Neural Networks
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationFeedforward Networks
Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 433 Christian Jacob Dept.of Coputer Science,University of Calgary CPSC 433 - Feedforward Networks 2 Adaptive "Prograing"
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationFeedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004
Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 533 Winter 2004 Christian Jacob Dept.of Coputer Science,University of Calgary 2 05-2-Backprop-print.nb Adaptive "Prograing"
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationFeedforward Networks
Feedforward Neural Networks - Backpropagation Feedforward Networks Gradient Descent Learning and Backpropagation CPSC 533 Fall 2003 Christian Jacob Dept.of Coputer Science,University of Calgary Feedforward
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationE. Alpaydın AERFAISS
E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy
More informationMultilayer Neural Networks
Multilayer Neural Networs Brain s. Coputer Designed to sole logic and arithetic probles Can sole a gazillion arithetic and logic probles in an hour absolute precision Usually one ery fast procesor high
More informationEnsemble Based on Data Envelopment Analysis
Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationNBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University
NBN Algorith Bogdan M. Wilaoswki Auburn University Hao Yu Auburn University Nicholas Cotton Auburn University. Introduction. -. Coputational Fundaentals - Definition of Basic Concepts in Neural Network
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationExperimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis
City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationKeywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution
Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationLecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon
Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes
More informationBoosting with log-loss
Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the
More informationThis model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.
CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when
More information3.8 Three Types of Convergence
3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to
More informationA MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION
A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University
More information2. Image processing. Mahmoud Mohamed Hamoud Kaid, IJECS Volume 6 Issue 2 Feb., 2017 Page No Page 20254
www.iecs.in International Journal Of Engineering And Coputer Science ISSN: 319-74 Volue 6 Issue Feb. 17, Page No. 54-6 Index Copernicus Value (15): 58., DOI:.18535/iecs/v6i.17 Increase accuracy the recognition
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationGrafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space
Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing
More informationTracking using CONDENSATION: Conditional Density Propagation
Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationAnalyzing Simulation Results
Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient
More informationCOS 424: Interacting with Data. Written Exercises
COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well
More informationBayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)
Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationA Smoothed Boosting Algorithm Using Probabilistic Output Codes
A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu
More informationThe Weierstrass Approximation Theorem
36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined
More informationMulti-Scale/Multi-Resolution: Wavelet Transform
Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics
ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationFast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials
Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter
More informationSupport Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab
Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationPAC-Bayes Analysis Of Maximum Entropy Learning
PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E
More informationExtension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels
Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique
More informationList Scheduling and LPT Oliver Braun (09/05/2017)
List Scheduling and LPT Oliver Braun (09/05/207) We investigate the classical scheduling proble P ax where a set of n independent jobs has to be processed on 2 parallel and identical processors (achines)
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationProc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES
Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic
More informationpaper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL
paper prepared for the 1996 PTRC Conference, Septeber 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL Nanne J. van der Zijpp 1 Transportation and Traffic Engineering Section Delft University
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationA Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair
Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving
More informationSPECTRUM sensing is a core concept of cognitive radio
World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile
More informationHomework 3 Solutions CSE 101 Summer 2017
Hoework 3 Solutions CSE 0 Suer 207. Scheduling algoriths The following n = 2 jobs with given processing ties have to be scheduled on = 3 parallel and identical processors with the objective of iniizing
More informationLecture 21. Interior Point Methods Setup and Algorithm
Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More informationZISC Neural Network Base Indicator for Classification Complexity Estimation
ZISC Neural Network Base Indicator for Classification Coplexity Estiation Ivan Budnyk, Abdennasser Сhebira and Kurosh Madani Iages, Signals and Intelligent Systes Laboratory (LISSI / EA 3956) PARIS XII
More informationNeural Network-Aided Extended Kalman Filter for SLAM Problem
7 IEEE International Conference on Robotics and Autoation Roa, Italy, -4 April 7 ThA.5 Neural Network-Aided Extended Kalan Filter for SLAM Proble Minyong Choi, R. Sakthivel, and Wan Kyun Chung Abstract
More informationQualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science
Proceedings of the 6th WSEAS International Conference on Applied Coputer Science, Tenerife, Canary Islands, Spain, Deceber 16-18, 2006 183 Qualitative Modelling of Tie Series Using Self-Organizing Maps:
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More information26 Impulse and Momentum
6 Ipulse and Moentu First, a Few More Words on Work and Energy, for Coparison Purposes Iagine a gigantic air hockey table with a whole bunch of pucks of various asses, none of which experiences any friction
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationNonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy
Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo
More informationSupport Vector Machines MIT Course Notes Cynthia Rudin
Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance
More informationA note on the multiplication of sparse matrices
Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani
More informationChaotic Coupled Map Lattices
Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each
More informationBootstrapping Dependent Data
Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly
More informationBirthday Paradox Calculations and Approximation
Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays
More informationInteractive Markov Models of Evolutionary Algorithms
Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary
More informationInspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information
Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub
More informationarxiv: v2 [math.co] 3 Dec 2008
arxiv:0805.2814v2 [ath.co] 3 Dec 2008 Connectivity of the Unifor Rando Intersection Graph Sion R. Blacburn and Stefanie Gere Departent of Matheatics Royal Holloway, University of London Egha, Surrey TW20
More informationPattern Classification using Simplified Neural Networks with Pruning Algorithm
Pattern Classification using Siplified Neural Networks with Pruning Algorith S. M. Karuzzaan 1 Ahed Ryadh Hasan 2 Abstract: In recent years, any neural network odels have been proposed for pattern classification,
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationAn Improved Particle Filter with Applications in Ballistic Target Tracking
Sensors & ransducers Vol. 72 Issue 6 June 204 pp. 96-20 Sensors & ransducers 204 by IFSA Publishing S. L. http://www.sensorsportal.co An Iproved Particle Filter with Applications in Ballistic arget racing
More informationAn Introduction to Meta-Analysis
An Introduction to Meta-Analysis Douglas G. Bonett University of California, Santa Cruz How to cite this work: Bonett, D.G. (2016) An Introduction to Meta-analysis. Retrieved fro http://people.ucsc.edu/~dgbonett/eta.htl
More informationEfficient Filter Banks And Interpolators
Efficient Filter Banks And Interpolators A. G. DEMPSTER AND N. P. MURPHY Departent of Electronic Systes University of Westinster 115 New Cavendish St, London W1M 8JS United Kingdo Abstract: - Graphical
More informationA Theoretical Analysis of a Warm Start Technique
A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful
More informationMachine Learning: Fisher s Linear Discriminant. Lecture 05
Machine Learning: Fisher s Linear Discriinant Lecture 05 Razvan C. Bunescu chool of Electrical Engineering and Coputer cience bunescu@ohio.edu Lecture 05 upervised Learning ask learn an (unkon) function
More information6.2 Grid Search of Chi-Square Space
6.2 Grid Search of Chi-Square Space exaple data fro a Gaussian-shaped peak are given and plotted initial coefficient guesses are ade the basic grid search strateg is outlined an actual anual search is
More informationACTIVE VIBRATION CONTROL FOR STRUCTURE HAVING NON- LINEAR BEHAVIOR UNDER EARTHQUAKE EXCITATION
International onference on Earthquae Engineering and Disaster itigation, Jaarta, April 14-15, 8 ATIVE VIBRATION ONTROL FOR TRUTURE HAVING NON- LINEAR BEHAVIOR UNDER EARTHQUAE EXITATION Herlien D. etio
More informationare equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,
Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations
More informationDetection and Estimation Theory
ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer
More informationChapter 1: Basics of Vibrations for Simple Mechanical Systems
Chapter 1: Basics of Vibrations for Siple Mechanical Systes Introduction: The fundaentals of Sound and Vibrations are part of the broader field of echanics, with strong connections to classical echanics,
More informationN-Point. DFTs of Two Length-N Real Sequences
Coputation of the DFT of In ost practical applications, sequences of interest are real In such cases, the syetry properties of the DFT given in Table 5. can be exploited to ake the DFT coputations ore
More informationTesting equality of variances for multiple univariate normal populations
University of Wollongong Research Online Centre for Statistical & Survey Methodology Working Paper Series Faculty of Engineering and Inforation Sciences 0 esting equality of variances for ultiple univariate
More informationFigure 1: Equivalent electric (RC) circuit of a neurons membrane
Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of
More informationCSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13
CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture
More informationDETECTION OF NONLINEARITY IN VIBRATIONAL SYSTEMS USING THE SECOND TIME DERIVATIVE OF ABSOLUTE ACCELERATION
DETECTION OF NONLINEARITY IN VIBRATIONAL SYSTEMS USING THE SECOND TIME DERIVATIVE OF ABSOLUTE ACCELERATION Masaki WAKUI 1 and Jun IYAMA and Tsuyoshi KOYAMA 3 ABSTRACT This paper shows a criteria to detect
More informationKinematics and dynamics, a computational approach
Kineatics and dynaics, a coputational approach We begin the discussion of nuerical approaches to echanics with the definition for the velocity r r ( t t) r ( t) v( t) li li or r( t t) r( t) v( t) t for
More informationFITTING FUNCTIONS AND THEIR DERIVATIVES WITH NEURAL NETWORKS ARJPOLSON PUKRITTAYAKAMEE
FITTING FUNCTIONS AND THEIR DERIVATIVES WITH NEURAL NETWORKS By ARJPOLSON PUKRITTAYAKAMEE Bachelor of Engineering Chulalongkorn University Bangkok, Thailand 997 Master of Sciences Oklahoa State University
More information