Using Noise to Speed Up Video Classification with Recurrent Backpropagation

Size: px
Start display at page:

Download "Using Noise to Speed Up Video Classification with Recurrent Backpropagation"

Transcription

1 Using Noise to Speed Up Video Classification with Recurrent Bacpropagation Olaoluwa Adigun Department of Electrical Engineering University of Southern California Los Angeles, CA Bart oso Department of Electrical Engineering University of Southern California Los Angeles, CA Abstract Carefully inected noise can speed the convergence and accuracy of video classification with recurrent bacpropagation (RBP. This noise-boost uses the recent results that bacpropagation is a special case of the generalized expectation maximization (EM algorithm and that careful noise inection can always speed the average convergence of the EM algorithm to a local maximum of the log-lielihood surface. We ext this result to the time-varying case of recurrent bacpropagation and prove sufficient noise-benefit conditions for both classification and regression. Inecting noise that satisfies the noisy-em positivity condition (NEM noise speeds up RBP training. The classification simulations used eleven categories of sports videos based on standard UCF YouTube sports-action video clips. Training RBP with NEM noise in ust the output neurons led to 60% fewer iterations in training as compared with noiseless training. This corresponded to a 20.6% maximum decrease in training cross entropy. NEM noise inection also outperformed simple blind noise inection: RBP training with NEM noise gave a 5.6% maximum decrease in training cross entropy compared with RBP training with blind noise. Inecting NEM noise also improved the relative classification accuracy by 5% over noiseless RBP training. NEM noise improved the classification accuracy from 8% to 83% on the test set. I. NOISE BOOSTING RECURRENT BACPROPAGATION We show how carefully inected noise can speed up the convergence of recurrent bacpropagation (RBP for classifying time-varying patterns such as videos. Simulations used the categories of sampled UCF YouTube sports videos [], [2]. We prove a related result for speeding up RBP for regression. These new noise-benefit results turn on the recent result that bacpropagation (BP is a special case of the generalized expectation-maximization (EM algorithm [3]. BP is the benchmar algorithm for training deep neural networs [4] [6]. EM is the standard algorithm for finding maximumlielihood parameters in the case of missing data or hidden parameters [7]. BP finds the networ weights and parameters that minimize some error or other performance measure. It equivalently maximizes the networ lielihood and this permits the EM connection. The error function is usually crossentropy for classification and squared error for regression. The Noisy EM (NEM Theorem [8] [0] gives a sufficient condition for noise to speed up the average convergence of the EM algorithm. It holds for additive or multiplicative or any other type of noise inection [0]. The NEM Theorem ensures that NEM noise improves training with the EM algorithm by ensuring that at each iteration NEM taes a larger step on average up the nearest hill of probability or log-lielihood. The BP algorithm is a form of maximum lielihood estimation [] and a special case of generalized EM algorithm. So NEM noise also improves the convergence of BP [3]. It also ts to improve classification accuracy because the lielihood lowerbounds the accuracy [2]. The NEM noise benefit is a type of of stochastic resonance where a small amount of noise improves the performance of a nonlinear system while too much noise harms the performance [3] [2]. Figure 2 shows that noise-boosted RBP too only 8 iterations to reach the same cross-entropy value that noiseless RBP reached after 34 iterations. The noise benefit became more pronounced when we added more internal memory neurons. Figure 3 shows that the benefit reached diminishing returns with about 200 memory neurons. This 200-memory-neuron case produced a 5% improvement in the relative classification accuracy. A recurrent neural networ (RNN is a neural networ whose directed graph contains cycles or feedbac. This feedbac allows an RNN to sustain an internal memory state that can hold and reveal information about recent inputs. So the RNN output deps not only on the input but on the recent output and on the internal memory state. This internal dynamical structure maes RNNs suitable for time-varying tass such as speech processing [5], [22] and video recognition [23], [24]. RBP is the most common method for training RNNs [25], [26]. We apply the NEM-based RBP algorithm below to video recognition. The NEM noise-boost applies in principle to any time-varying tas of classification or regression. The next section reviews the recent result that BP is a special case of generalized EM. Then section III presents the new noisy RBP training theorems for classification and for regression with RNNs. We also present the noisy RBP algorithm for training long short-term memory (LSTM [26] or recurrent learning from time-series data [27]. Section IV presents the results of NEM-noise training of a LSTM RNN for video classification. II. BACPROPAGATION AS GENERALIZED EXPECTATION MAXIMIZATION This section states the recent result that bacpropagation is a special case of the generalized Expectation-Maximization /7/$ IEEE 08

2 (EM algorithm and that careful noise inection can always speed up EM convergence. The next section states the companion result that carefully chosen noise can speed convergence of the EM on average and thereby noise-boost BP. The BP-as-EM theorem casts BP in terms of maximum lielihood. Both classification and regression have the same BP (EM learning laws but they differ in the type of neurons in their output layer. The output neurons of a classifier networ are softmax or Gibbs neurons. The output neurons of a regression networ are identity neurons. But the regression case assumes that the output training vectors are the vector population means of a multidimensional normal probability density with an identity population covariance matrix. We now restate the recent theorem that BP is a special case of the generalized EM (GEM algorithm [3]. Theorem : Bacpropagation as Generalized Expectation Maximization The bacpropagation update equation for a differentiable lielihood function p(y x, Θ at epoch n Θ n+ Θ n +η Θ ln p(y x,θ ( equals the GEM update equation at epoch n Θ n+ Θ n +η Θ Q(Θ Θ n ΘΘ n ΘΘ n where GEM uses the differentiable Q-function { } Q(Θ Θ n E p(h y,x,θn ln p(h,y X,Θ. The next section shows how selective noise inection can speed up the average convergence of the EM algorithm. A. Noise-boosting Bacpropagation via EM The Noisy EM Theorem shows that noise inection can only speed up the EM algorithm on average if the noise obeys the NEM positivity condition [8], [0]. The Noisy EM Theorem for additive noise states that a noise benefit holds at each iteration n if the following positivity condition holds: [ ( p(x+n,h Θ n ] E x,h,n Θ ln p(x,h Θ n 0. (3 Then the EM noise benefit (2 Q(Θ n Θ Q N (Θ n Θ (4 holds on average at iteration n: [ ] E x,n Θ n Q(Θ n Θ Q N (Θ n Θ [ ] E x Θ n Q(Θ Θ Q(Θ n Θ where Θ denotes the maximum-lielihood vector of parameters. The NEM positivity condition (3 has a simple form for Gaussian mixture models [28] and for classification and regression networs [3]. The intuition behind the NEM sufficient condition (3 is that some noise realizations n mae a signal x more probable: f(x+n Θ f(x Θ. Taing logarithms givesln( f(x+n θ f(x θ 0. Taing expectations gives a NEM-lie positivity condition. The actual proof of the NEM Theorem uses ullbac-liebler divergence to show that the noise-boosted lielihood is closer on average at each iteration to the optimal lielihood function than is the noiseless lielihood [0]. The NEM positivity inequality (3 is not vacuous because the expectation conditions on the converged parameter vector Θ. Vacuity would result in the usual case of averaging a log-lielihood ratio. Tae the expectation of the log-lielihood ratio ln f(x Θ g(x Θ with respect to the probability density function g(x Θ to give E g [ln f(x Θ g(x Θ ]. Then Jensen s inequality and the concavity of the logarithm give E g [ln f(x Θ g(x Θ ] lne g [ f(x Θ g(x Θ ] ln f(x Θ X g(x Θ g(x Θ dx ln g(x Θ dx X ln 0. So E g [ln f(x θ g(x θ ] 0 and thus in this case strict positivity is impossible [0]. But the expectation in (3 does not in general lead to this cancellation of probability densities because the integrating density in (3 deps on the optimal maximum-lielihood parameter Θ rather than on ust Θ n. So density cancellation occurs only when the NEM algorithm has converged to a local lielihood maximum because then Θ n Θ. The NEM Theorem simplifies for a classifier networ with softmax output neurons. Then the additive noise must lie above the defining NEM hyperplane [3]. A similar NEM result holds for regression except that the noise-benefit region is a hypersphere. NEM noise can also inect into the hidden-neuron layers in accord with their lielihood functions [2]. The next section exts these results to recurrent BP for classification and regression. III. NOISE-BOOSTED RECURRENT BACPROPAGATION We next develop the recurrent bacpropagation (RBP algorithm and then show how to noise-boost it using the NEM Theorem. NEM noise can inect in all neurons [2]. This paper focuses only on noise inection into the output neurons. A. Recurrent Bacpropagation This section presents the learning algorithm for training Long Short-Term Memory (LSTM with RBP. The LSTM is a deep RNN architecture suitable for problems with long timelag depencies [26]. The LSTM networ consists of I input neurons, J hidden neurons, and output neurons. We also have control gates for the input, hidden, and output layers. The input gate controls the input activation. The forget gate controls the hidden activation. And the output gate controls the output activation. These gates each have J neurons. The LSTM networ captures the depency of the input data with respect to time. The time t can tae on any value in {,2,...,T}. The J I matrix W x connects the hidden neurons to the input neurons. The J matrix U y connects the output neurons to the hidden neurons. Superscripts x,h, and y denote the respective input, hidden, and output layers. The J I matrix W γ connects the input gates to the input neurons and J matrix V γ connects the input gates to the output 09

3 neurons. The J I matrix W φ connects the forget gates to the input neurons and J matrix V φ connects the forget gate to the output neurons. The J I matrix W ω connects the output gates to the input neurons and J matrix V ω connects the output gates to the output neurons. Superscripts γ,φ, and ω denote the respective input, forget, and output gates. The training involves the forward pass of the input vector and the bacward pass of the error. The forward pass propagates the input vector from the input layer to the output layer that has output activation a y. The bacward pass propagates the error from the output and hidden layers bac to the input layer and updates the networ s weights. We alternate these two passes starting with the forward pass until the networ converges. RBP sees those weights that minimize the error function E(Θ or maximize the lielihood L(Θ. The forward pass initializes the internal memory h (0 to 0 for all values of. The output activation a y(0 is 0 for all values of. We then compute a y(t for t,2,..., and T. The input vector x (t has I input neuron values. The input neurons have identity activations: a x(t i x (t i where a x(t i denotes the activation for the th input neuron due to the input vector x (t at time t. The term o x(t denotes the th value for the transformation from the input space to the hidden state: o x(t I i w x ia x(t i +b x (5 where b x is the bias for the th hidden neuron due to the input activation. This transformation is the logistic sigmoidal activation function. The term a x(t denotes the activation of the th hidden neuron with respect to the input-space transformation of x t : a x(t +e ox(t The term o γ(t is the input to the th input gate at time t: o γ(t I i w γ i ax(t i + (6 v γ ay(t +b γ (7 where b γ is the bias for the th input gate and a y(t is the output activation for time t. The activation a γ(t for the th input gate at time t also uses the logistic sigmoid: The term o h(t time t: a γ(t +e oγ(t. (8 denotes the input to the th hidden neuron at o h(t a x(t a γ(t (9 where denotes Hadamard product. The term o φ(t the input to the th forget gate. It has the form o φ(t I i w φ i ax(t i + denotes v φ ay(t +b φ (0 where b φ is the bias for the th forget gate. The term s (t denotes the state of the th internal memory at time time t: ( s (t o h(t + s (t a γ(t ( where s (t is the state of th internal memory at time t. The activation a s(t of the th internal memory is also logistic: a s(t +e s(t (2 where superscript s denotes the internal memory. The term denotes the input to the th output gate. It has the form o ω(t o ω(t I i w ω ia x(t i + v ω a y(t +b ω (3 where b ω is the bias for the th output gate. The term a ω(t is the activation for the th output gate at time t. The output gate also uses the logistic sigmoid The activation a h(t a ω(t +e oω(t. (4 for the th output neuron at time t obeys a h(t a s(t a ω(t (5 where a s(t and a ω(t are from (2 and (4. The term o y(t is the input to the th output neuron at time t: o y(t J u x a h(t +b y (6 where b y is the bias for th output neuron. The output-layer neurons use softmax activations because this is a classification networ [3], []. The th output neuron a y(t for th has the softmax form a y(t exp(o y(t l exp(oy(t l. (7 The bacward pass computes the error and its derivative with respect to the weights. The error function E (t is the cross-entropy because this is a classifier networ []: E(Θ E (t ( y (t ln(ay(t (8. The derivative of the error term E (t with respect to u y for t T,T,..., and is u y a y(t (a y(t a y(t o y(t o y(t u y y (t ah(t. (9 0

4 The partial derivative of the E (t with respect to the b y is b y a y(t (a y(t a y(t o y(t o y(t b y y (t. (20 Let E (t denote the derivative of E (t with respect to a h(t. Then E (t a h(t ( (a y(t y (t uy. (2 We now list the many other partial derivatives in the RPP algorithm. The derivative of E (t with respect to w ω i is w ω i a h(t E (t a s(t a h(t a ω(t a ω(t o ω(t o ω(t w ω i a ω(t ( a ω(t a x(t The derivative of E (t with respect to v ω is v ω a h(t E (t a s(t a h(t a ω(t a ω(t o ω(t o ω(t v ω a ω(t ( a ω(t a y(t The derivative of E (t with respect to b ω is b ω a h(t E (t a s(t a h(t a ω(t a ω(t o ω(t a ω(t ( a ω(t The derivative of E (t with respect to s (t Let a φ(t o φ(t s (t a h(t E (t a ω(t a h(t s (t i. (22 i. (23 o ω(t b ω. (24 is a s(t ( a s(t. (25 denote the derivative of a φ(t with respect to with respect to o φ(t is. The derivative of a φ(t a φ(t aφ(t o φ(t a φ(t ( a φ(t. (26 The derivative of E (t with respect to w φ i is w φ i ( T n mn+ a φ(m+ s (n a φ(n a x(n i. (27 The derivative of E (t with respect to v φ is v φ ( T n mn+ a φ(m+ s (n a φ(n The derivative of E (t with respect to b φ is b φ ( T n mn+ a φ(m+ s (n a y(n. (28 a φ(n So the derivative of E (t with respect to a γ(t a γ(t s (t s (t o h(t o h(t a γ(t ( E (t a ω(t a s(t ( a s(t The derivative of E (t with respect to w γ i is w γ i a γ(t a γ(t a γ(t o γ(t a γ(t o γ(t w γ i. (29 is a x(t ( a γ(t a x(t The derivative of E (t with respect to v γ is v γ a γ(t a γ(t a γ(t o γ(t a γ(t o γ(t v γ ( a γ(t a y(t The derivative of E (t with respect to b γ is b γ a γ(t a γ(t a γ(t o γ(t o γ(t b γ The derivative of E (t with respect to a x(t a x(t s (t s (t o h(t. (30 i. (3. (32 a γ(t ( a γ(t. (33 o h(t a x(t ( E (t a ω(t a s(t ( a s(t The derivative of E (t with respect to w x i is w x i a x(t a x(t a x(t o x(t a x(t i o x(t w x i is a γ(t ( a γ(t a x(t. (34 i. (35

5 The derivative of E (t with respect to the b x is b x a x(t a x(t a x(t o x(t o x(t b x a x(t i ( a γ(t. (36 These partial derivatives form the update rules for training the RNN with the RBP algorithm. B. Noisy EM Theorems for RBP We can now state and prove the new NEM RBP theorems for a classifier RNN and for a regression RNN. We start with the classifier result. Theorem 2: RBP Noise Benefit for RNN Classification The NEM positivity condition (3 holds for the maximum lielihood training of a classifier recurrent neural networ with noise inection in the output softmax neurons if the following hyperplane condition holds : E y,h,n x,θ T n (tt ln a y(t} 0. (37 Proof : We first show that the cross entropy E(Θ equals the sum of the negative log-lielihood functions over time t,2,..., and T. The error E(Θ is the cross entropy for a classifier networ: E(Θ E (t (38 ln [ ln [ ln y (t ln ay(t (39 (a y(t y (t ] p (y (t y (t x,θ ] (40 (4 ln p(y (t x,θ (42 T p(y (t x,θ (43 L(Θ (44 So minimizing the cross entropy E(Θ maximizes the logliehood function L(Θ: p(y x,θ e E(Θ. The NEM sufficient condition (3 simplifies to the product of exponentiated output activations over time t and each with output neurons. Then additive noise inection gives p(n+y,h x,θ p(y,h x,θ T T p(n (t +y (t,h x,θ p(h x,θ p(h x,θ p(y (t,h x,θ p(n (t +y (t h,x,θ p(y (t h,x,θ T [ T [ (a y(t n(t +y(t (45 (a y(t y(t ] (a y(t n(t. (47 Then the NEM positivity condition in (3 becomes { ln T [ This simplifies to the inequality ] (46 ]} (a y(t n(t 0. (48 T [ ]} n (t ln(a y(t 0. (49 The inner summation represents the inner-product of n (t with ln a y(t. Then we can rewrite (49 as the hyperplane inequality T n (tt ln a y(t} 0. (50 We next state and prove the NEM RBP theorem for regression networs. Theorem 3: RBP Noise Benefit for RNN Regression The NEM positivity condition (3 holds for the maximum lielihood training of a regression RNN with Gaussian target vector y N(y a y,i if the following inequality condition holds: T (y (t +n (t a y(t 2} T y (t a y(t 2} 0. (5 Proof: The error function E(Θ is the squared error for a regression networ [3]. We first show that minimizing the squared error E(Θ is the same as maximizing the lielihood function L(Θ: 2

6 Fig. : The first 5 frames from a UCF YouTube video sample of uggling a soccer ball. Such soccer videos were from the 6th of categories in the classification experiment. So the target output for this sampled video clip was the -in--coded bit vector[ ]. E(Θ E (t (52 y(t a y(t 2 2 [ ln exp { y(t a y(t 2 }] 2 (53 (54 T ln(2π 2 +T ln(2π 2 (55 [ T ] ln p(y y (t x,θ +T ln(2π 2 L(Θ+T ln(2π 2 (56 where the networ s output probability density function p(y y (t x,θ is the vector normal density N(y (t a (t,θ. So minimizing the squared error E(Θ maximizes the lielihood L(Θ because the last term in (56 does not dep on Θ. Inecting additive noise into the output neurons gives p(n+y,h x,θ p(y,h x,θ T p(n (t +y (t,h x,θ p(h x,θ p(h x,θ p(y (t,h x,θ T p(n (t +y (t h,x,θ (57 p(y (t h,x,θ ( T exp 2 (y(t a y(t 2 ( exp 2 (y(t +n (t a y(t 2. (58 Then the NEM positivity condition in (3 simplifies to { ln T exp( 2 (y(t a y(t 2 } exp( 2 (y(t +n (t a y(t 2 0. (59 This positivity condition simplifies to the hyperspherical inequality T (y (t +n (t a y(t 2} T y (t a y(t 2} 0. (60 These NEM theorems ext directly to other RNN architectures such as those with gated recurrent units and Elman RNNs [29]. Equation (6 below gives the update rule for training a RNN with BP through time. It applies to both classification and regression networs. The update rule for the n th training iteration is Θ (n+ Θ (n +η Θ L(Θ (6 ΘΘ (n Θ (n η Θ E(Θ (62 ΘΘ (n forθ {u y,b y,w ω i,vω,bω,wφ i,vφ,bφ,wγ i,vγ,bγ,wx i,bx } and η is the learning rate. The partial derivatives in the gradient of (62 come from (9, (20, (22 (24, (27 (29, (3 (33, (35, and (36. Algorithm states the details of the Noisy RBP algorithm with a LSTM architecture and for softmax output activation. A related algorithm inects noise into the hidden units as well [3]. This produces further speed-ups in general. IV. VIDEO CLASSIFICATION SIMULATION RESULTS We inected additive NEM noise during the RBP training of a LSTM-RNN video classifier. We used 80:20 splits for training and test sets. We extracted 26,000 training instances from the standard UCF sports action YouTube videos [], [2]. We used 6,700 video-clip samples for testing the networ after training. Each training instance had 2 image frames sampled uniformly from a sports-action video at the rate of 6 frames per second. Each image frame in the dataset had pixels. Each pixel value was between 0 and. We fed the pixel values into the input neurons of the LSTM- RNN and tried different numbers of hidden neurons. The 3

7 Cross Entropy Noise Benefit with RBP Training No Noise Blind Noise NEM Noise Training Iterations Fig. 2: Noise benefit comparisons. NEM Noise benefit from noise inection into the output layer of a LSTM RNN. Adding NEM noise to the output neurons of the LSTM RNN improved the performance of RBP training. Training RBP with NEM noise in ust the output softmax neurons led to 60% fewer iterations to reach a common low value of cross entropy. Noise-boosted RBP too 8 iterations to reach the cross-entropy value 2.7 while ordinary noiseless RBP too 34 iterations to reach the same cross entropy value. There was a 20.6 % maximum decrease in cross-entropy. NEM noise can also inect into the hidden neurons. Blind noise gave little improvement. input, forget, and output gates all had the same number of neurons. The output layer had neurons that represented the categories of sports actions in the dataset. These sport actions were basetball shooting, biing, diving, golf swinging, horsebac riding, soccer uggling, swinging, tennis swinging, trampoline umping, volleyball spiing, and waling. Figure shows 5 sampled consecutive frames from a video in the soccer-uggling category. We fed the frames into the LTSM networ as a sequence of data over time. The RNN used softmax neurons at the output layer because the LSTM-RNN was a classification networ. The other layers and the gates used logistic neurons as discussed above. Then the noise benefit tapered off. We used uniform noise fromu(0,0. with annealing factor. This scaled down the noise by n as the iteration number n grew. The Adaptive Moment Estimation optimizer piced the learning rate as n grew. The initial learning rate was η We compared the training performance over 200 iterations and found that the best noise benefit occurred with 200 internal memory neurons. Figure 2 shows the cross entropy for different RBP training regimens and the resulting NEM- RBP speed-up of convergence. Simply inecting faint blind (uniform noise into the output neurons performed slightly better than no noise inection but not nearly as well as NEM noise inection. The figure also shows that noiseless RBP too 26 more iterations to reach the same cross-entropy value of 2.7. So NEM-boosted RBP needed 60% fewer training iterations to converge to the same low cross-entropy value as did noiseless RBP. The inection of NEM noise also improved the classification accuracy on the video-clip test set. This apparently occurred because the networ lielihood lower-bounds the classification accuracy as we have shown elsewhere [2]. We compared the classification accuracy of NEM-noise RBP training with the Data: 0 input sets <: 5 áäääá: Ç where each input set : á L < á :5; á äääá á :Í;. 0 target sets <; 5 áäääá; Ç where target set ; á L<! á :5; á äääá! á :Í;. Number of epochs for running bacpropagation through-time 4. Result: Trained RNN weights. Initialize the networ weights #. while: epoch N s\4 while training input set J s\0 FEEDFORWARD while P s\6 x Compute the activation of input, forget, and output gates from :ç; and! :ç?5; á. á x Compute the memory cell activation á :ç; and output activation ì:ç; á. ADD NOISE while P s\6š x Generate noise vector : á ì:ç; ARrA x Add the noise:! á :ç; Z! á :ç; E else x Do nothing BACPROPAGATE ERROR THOUGH TIME Initialize: Ï ':#;Lr while P 6\sŠ x Compute the cross-entropy ' ç :#;. x Compute cross-entropy gradient Ï ' ç :#; x Update the gradient Ï ':#;LÏ ':#;E Ï ' ç :#; x Update the parameters # using gradient descent: #L#FßÏ ':#; Algorithm : Noisy RBP algorithm for training a classifier RNN with an LSTM architecture. accuracy for noiseless RBP training. The accuracy measure counted all true positives. NEM noise produced a 2% relative gain in classification accuracy. The noiseless RBP training gave 8% classification accuracy on the test set. The inection of NEM noise gave 83% classification accuracy on the test set. This represents 34 more correct classifications than we found for noiseless RBP training. We ran the simulation on a 4

8 Cross Entropy Cross Entropy Cross Entropy Noisy RBP (50 Hidden Neurons No Noise Blind Noise NEM Noise Training Iterations Noisy RBP (50 Hidden Neurons No Noise Blind Noise NEM Noise Training Iterations Noisy RBP (200 Hidden Neurons Training Iterations No Noise Blind Noise NEM Noise Fig. 3: Hidden neuron memory-size effects on inecting noise into the output neurons of the RNN. The NEM noise benefit became more pronounced as the number of hidden neurons increased. The best noise effect occurred with 200 memory neurons and tapered off thereafter. Google cloud instance with 8 vcpus, 30GB RAM, and 50GB system dis. V. CONCLUSION Careful noise inection speeded up convergence of the recurrent bacpropagation (RBP algorithm for video samples. The ey insight in the noise boost is that ordinary BP is a special case of the generalized EM algorithm. Then noiseboosting EM leads to noise-boosting BP and its recurrent versions for both classification and regression. Simulations showed that noise-boosted RBP training substantially outperformed noiseless RBP for classifying the UCF sports action Youtube video clips. Further noise benefits should result for noise inection into the hidden neurons and the use of convolutional mass [3]. REFERENCES [] J. A. Miel D. Rodriguez and M. Shah, Action mach: A spatio-temporal maximum average correlation height filter for action recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, [2]. Soomro and A. R. Zamir, Action recognition in realistic sports videos, in Computer Vision in Sports. Springer, 204, pp [3]. Audhhasi, O. Osoba, and B. oso, Noise-enhanced convolutional neural networs, Neural Networs, vol. 78, pp. 5 23, 206. [4] D. Rumelhart, G. Hinton, and R. Williams, Learning representations by bac-propagating errors, Nature, pp , 986. [5] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 52, pp , 205. [6] M. Jordan and T. Mitchell, Machine learning: trs, perspectives, and prospects, Science, vol. 349, pp , 205. [7] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum lielihood from incomplete data via the em algorithm, Journal of the royal statistical society. Series B (methodological, pp. 38, 977. [8] O. Osoba, S. Mitaim, and B. oso, The noisy expectation maximization algorithm, Fluctuation and Noise Letters, vol. 2, no. 03, p , 203. [9] O. Osoba and B. oso, Noise-Enhanced Clustering and Competitive Learning Algorithms, Neural Networs, Jan [0], The noisy expectation-maximization algorithm for multiplicative noise inection, Fluctuation and Noise Letters, p , 206. [] C. M. Bishop, Pattern recognition and machine learning. springer, [2] B. oso,. Audhhasi, and O. Osoba, Noise can speed bacpropagation learning and deep bidirectional pretraining, in review. [3] B. oso, Neural Networs and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence. Prentice Hall, 99. [4] P. Hänggi, Stochastic resonance in biology, ChemPhysChem, vol. 3, no. 3, pp , [5] B. oso, Noise. Penguin, [6] A. Patel and B. oso, Stochastic resonance in continuous and spiing neuron models with Levy noise, IEEE Transactions on Neural Networs, vol. 9, no. 2, pp , [7] M. McDonnell, N. Stocs, C. Pearce, and D. Abbott, Stochastic resonance: from suprathreshold stochastic resonance to stochastic signal quantization. Cambridge University Press, [8] M. Wilde and B. oso, Quantum forbidden-interval theorems for stochastic resonance, Journal of Physical A: Mathematical Theory, vol. 42, no. 46, [9] A. Patel and B. oso, Error-probability noise benefits in threshold neural signal detection, Neural Networs, vol. 22, no. 5, pp , [20] B. Franze and B. oso, Using noise to speed up Marov chain Monte Carlo estimation, Procedia Computer Science, vol. 53, pp. 3 20, 205. [2] L. Gammaitoni, P. Hänggi, P. Jung, and F. Marchesoni, Stochastic Resonance, Reviews of Modern Physics, vol. 70, no., pp , January 998. [22] A. Graves, A.-r. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networs, in 203 IEEE international conference on acoustics, speech and signal processing. IEEE, 203, pp [23] J. Donahue, L. Anne Hrics, S. Guadarrama, M. Rohrbach, S. Venugopalan,. Saeno, and T. Darrell, Long-term recurrent convolutional networs for visual recognition and description, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 205, pp [24] J. Yue-Hei Ng, M. Hausnecht, S. Viayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, Beyond short snippets: Deep networs for video classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 205, pp [25] P. J. Werbos, Bacpropagation through time: what it does and how to do it, Proceedings of the IEEE, vol. 78, no. 0, pp , 990. [26] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol. 9, no. 8, pp , 997. [27] A.-r. M. Alex Graves and G. Hinton, Speech recognition with deep recurrent neural networs, Proceedings of International Conference on Acoustic, Speech and Signal Processing, pp , 203. [28]. Audhhasi, O. Osoba, and B. oso, Noisy hidden Marov models for speech recognition, in Proceedings of the 203 International Joint Conference on Neural Networs. IEEE, 203, pp. 6. [29] P. Rodriguez, J. Wiles, and J. L. Elman, A recurrent neural networ that learns to count, Connection Science, vol., no., pp. 5 40,

Bidirectional Representation and Backpropagation Learning

Bidirectional Representation and Backpropagation Learning Int'l Conf on Advances in Big Data Analytics ABDA'6 3 Bidirectional Representation and Bacpropagation Learning Olaoluwa Adigun and Bart Koso Department of Electrical Engineering Signal and Image Processing

More information

Kartik Audhkhasi, Osonde Osoba, Bart Kosko

Kartik Audhkhasi, Osonde Osoba, Bart Kosko To appear: 2013 International Joint Conference on Neural Networks (IJCNN-2013 Noise Benefits in Backpropagation and Deep Bidirectional Pre-training Kartik Audhkhasi, Osonde Osoba, Bart Kosko Abstract We

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

NOISE BENEFITS IN CONVOLUTIONAL NEURAL NETWORKS. Kartik Audhkhasi, Osonde Osoba, Bart Kosko

NOISE BENEFITS IN CONVOLUTIONAL NEURAL NETWORKS. Kartik Audhkhasi, Osonde Osoba, Bart Kosko NOISE BENEFITS IN CONVOLUTIONAL NEURAL NETWORKS Kartik Audhkhasi, Osonde Osoba, Bart Kosko Signal and Information Processing Institute Electrical Engineering Department University of Southern California,

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

The Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection

The Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection The Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection Osonde Osoba, Bart Kosko We generalize the noisy expectation-maximization (NEM) algorithm to allow arbitrary noise-injection

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Technical Details about the Expectation Maximization (EM) Algorithm

Technical Details about the Expectation Maximization (EM) Algorithm Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Optimal Mean-Square Noise Benefits in Quantizer-Array Linear Estimation Ashok Patel and Bart Kosko

Optimal Mean-Square Noise Benefits in Quantizer-Array Linear Estimation Ashok Patel and Bart Kosko IEEE SIGNAL PROCESSING LETTERS, VOL. 17, NO. 12, DECEMBER 2010 1005 Optimal Mean-Square Noise Benefits in Quantizer-Array Linear Estimation Ashok Patel and Bart Kosko Abstract A new theorem shows that

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

CS60010: Deep Learning

CS60010: Deep Learning CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

An artificial neural networks (ANNs) model is a functional abstraction of the

An artificial neural networks (ANNs) model is a functional abstraction of the CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Recurrent Neural Network Training with Preconditioned Stochastic Gradient Descent

Recurrent Neural Network Training with Preconditioned Stochastic Gradient Descent Recurrent Neural Network Training with Preconditioned Stochastic Gradient Descent 1 Xi-Lin Li, lixilinx@gmail.com arxiv:1606.04449v2 [stat.ml] 8 Dec 2016 Abstract This paper studies the performance of

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Feedforward Neural Networks

Feedforward Neural Networks Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Other Topologies The back-propagation procedure is not limited to feed-forward cascades. It can be applied to networks of module with any topology,

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

RECURRENT NETWORKS I. Philipp Krähenbühl

RECURRENT NETWORKS I. Philipp Krähenbühl RECURRENT NETWORKS I Philipp Krähenbühl RECAP: CLASSIFICATION conv 1 conv 2 conv 3 conv 4 1 2 tu RECAP: SEGMENTATION conv 1 conv 2 conv 3 conv 4 RECAP: DETECTION conv 1 conv 2 conv 3 conv 4 RECAP: GENERATION

More information

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning  Ian Goodfellow Last updated Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units

More information

Fast Learning with Noise in Deep Neural Nets

Fast Learning with Noise in Deep Neural Nets Fast Learning with Noise in Deep Neural Nets Zhiyun Lu U. of Southern California Los Angeles, CA 90089 zhiyunlu@usc.edu Zi Wang Massachusetts Institute of Technology Cambridge, MA 02139 ziwang.thu@gmail.com

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction Neural Turning Machine: Couple a Neural Network with external memory resources The combined

More information

Lecture 16: Introduction to Neural Networks

Lecture 16: Introduction to Neural Networks Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information

Intelligent Handwritten Digit Recognition using Artificial Neural Network

Intelligent Handwritten Digit Recognition using Artificial Neural Network RESEARCH ARTICLE OPEN ACCESS Intelligent Handwritten Digit Recognition using Artificial Neural Networ Saeed AL-Mansoori Applications Development and Analysis Center (ADAC), Mohammed Bin Rashid Space Center

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Part 8: Neural Networks

Part 8: Neural Networks METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Lecture 13 Back-propagation

Lecture 13 Back-propagation Lecture 13 Bac-propagation 02 March 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/21 Notes: Problem set 4 is due this Friday Problem set 5 is due a wee from Monday (for those of you with a midterm

More information

Neural Networks. Intro to AI Bert Huang Virginia Tech

Neural Networks. Intro to AI Bert Huang Virginia Tech Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

More information

A Novel Activity Detection Method

A Novel Activity Detection Method A Novel Activity Detection Method Gismy George P.G. Student, Department of ECE, Ilahia College of,muvattupuzha, Kerala, India ABSTRACT: This paper presents an approach for activity state recognition of

More information

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check

More information

Course 10. Kernel methods. Classical and deep neural networks.

Course 10. Kernel methods. Classical and deep neural networks. Course 10 Kernel methods. Classical and deep neural networks. Kernel methods in similarity-based learning Following (Ionescu, 2018) The Vector Space Model ò The representation of a set of objects as vectors

More information

Artificial Neural Networks. Historical description

Artificial Neural Networks. Historical description Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of

More information

The Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection

The Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection The Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection Osonde Osoba, Bart Kosko We generalize the noisy expectation-maximization (NEM) algorithm to allow arbitrary modes of noise

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Topic 3: Neural Networks

Topic 3: Neural Networks CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information