Using Noise to Speed Up Video Classification with Recurrent Backpropagation
|
|
- Madeleine Dean
- 6 years ago
- Views:
Transcription
1 Using Noise to Speed Up Video Classification with Recurrent Bacpropagation Olaoluwa Adigun Department of Electrical Engineering University of Southern California Los Angeles, CA Bart oso Department of Electrical Engineering University of Southern California Los Angeles, CA Abstract Carefully inected noise can speed the convergence and accuracy of video classification with recurrent bacpropagation (RBP. This noise-boost uses the recent results that bacpropagation is a special case of the generalized expectation maximization (EM algorithm and that careful noise inection can always speed the average convergence of the EM algorithm to a local maximum of the log-lielihood surface. We ext this result to the time-varying case of recurrent bacpropagation and prove sufficient noise-benefit conditions for both classification and regression. Inecting noise that satisfies the noisy-em positivity condition (NEM noise speeds up RBP training. The classification simulations used eleven categories of sports videos based on standard UCF YouTube sports-action video clips. Training RBP with NEM noise in ust the output neurons led to 60% fewer iterations in training as compared with noiseless training. This corresponded to a 20.6% maximum decrease in training cross entropy. NEM noise inection also outperformed simple blind noise inection: RBP training with NEM noise gave a 5.6% maximum decrease in training cross entropy compared with RBP training with blind noise. Inecting NEM noise also improved the relative classification accuracy by 5% over noiseless RBP training. NEM noise improved the classification accuracy from 8% to 83% on the test set. I. NOISE BOOSTING RECURRENT BACPROPAGATION We show how carefully inected noise can speed up the convergence of recurrent bacpropagation (RBP for classifying time-varying patterns such as videos. Simulations used the categories of sampled UCF YouTube sports videos [], [2]. We prove a related result for speeding up RBP for regression. These new noise-benefit results turn on the recent result that bacpropagation (BP is a special case of the generalized expectation-maximization (EM algorithm [3]. BP is the benchmar algorithm for training deep neural networs [4] [6]. EM is the standard algorithm for finding maximumlielihood parameters in the case of missing data or hidden parameters [7]. BP finds the networ weights and parameters that minimize some error or other performance measure. It equivalently maximizes the networ lielihood and this permits the EM connection. The error function is usually crossentropy for classification and squared error for regression. The Noisy EM (NEM Theorem [8] [0] gives a sufficient condition for noise to speed up the average convergence of the EM algorithm. It holds for additive or multiplicative or any other type of noise inection [0]. The NEM Theorem ensures that NEM noise improves training with the EM algorithm by ensuring that at each iteration NEM taes a larger step on average up the nearest hill of probability or log-lielihood. The BP algorithm is a form of maximum lielihood estimation [] and a special case of generalized EM algorithm. So NEM noise also improves the convergence of BP [3]. It also ts to improve classification accuracy because the lielihood lowerbounds the accuracy [2]. The NEM noise benefit is a type of of stochastic resonance where a small amount of noise improves the performance of a nonlinear system while too much noise harms the performance [3] [2]. Figure 2 shows that noise-boosted RBP too only 8 iterations to reach the same cross-entropy value that noiseless RBP reached after 34 iterations. The noise benefit became more pronounced when we added more internal memory neurons. Figure 3 shows that the benefit reached diminishing returns with about 200 memory neurons. This 200-memory-neuron case produced a 5% improvement in the relative classification accuracy. A recurrent neural networ (RNN is a neural networ whose directed graph contains cycles or feedbac. This feedbac allows an RNN to sustain an internal memory state that can hold and reveal information about recent inputs. So the RNN output deps not only on the input but on the recent output and on the internal memory state. This internal dynamical structure maes RNNs suitable for time-varying tass such as speech processing [5], [22] and video recognition [23], [24]. RBP is the most common method for training RNNs [25], [26]. We apply the NEM-based RBP algorithm below to video recognition. The NEM noise-boost applies in principle to any time-varying tas of classification or regression. The next section reviews the recent result that BP is a special case of generalized EM. Then section III presents the new noisy RBP training theorems for classification and for regression with RNNs. We also present the noisy RBP algorithm for training long short-term memory (LSTM [26] or recurrent learning from time-series data [27]. Section IV presents the results of NEM-noise training of a LSTM RNN for video classification. II. BACPROPAGATION AS GENERALIZED EXPECTATION MAXIMIZATION This section states the recent result that bacpropagation is a special case of the generalized Expectation-Maximization /7/$ IEEE 08
2 (EM algorithm and that careful noise inection can always speed up EM convergence. The next section states the companion result that carefully chosen noise can speed convergence of the EM on average and thereby noise-boost BP. The BP-as-EM theorem casts BP in terms of maximum lielihood. Both classification and regression have the same BP (EM learning laws but they differ in the type of neurons in their output layer. The output neurons of a classifier networ are softmax or Gibbs neurons. The output neurons of a regression networ are identity neurons. But the regression case assumes that the output training vectors are the vector population means of a multidimensional normal probability density with an identity population covariance matrix. We now restate the recent theorem that BP is a special case of the generalized EM (GEM algorithm [3]. Theorem : Bacpropagation as Generalized Expectation Maximization The bacpropagation update equation for a differentiable lielihood function p(y x, Θ at epoch n Θ n+ Θ n +η Θ ln p(y x,θ ( equals the GEM update equation at epoch n Θ n+ Θ n +η Θ Q(Θ Θ n ΘΘ n ΘΘ n where GEM uses the differentiable Q-function { } Q(Θ Θ n E p(h y,x,θn ln p(h,y X,Θ. The next section shows how selective noise inection can speed up the average convergence of the EM algorithm. A. Noise-boosting Bacpropagation via EM The Noisy EM Theorem shows that noise inection can only speed up the EM algorithm on average if the noise obeys the NEM positivity condition [8], [0]. The Noisy EM Theorem for additive noise states that a noise benefit holds at each iteration n if the following positivity condition holds: [ ( p(x+n,h Θ n ] E x,h,n Θ ln p(x,h Θ n 0. (3 Then the EM noise benefit (2 Q(Θ n Θ Q N (Θ n Θ (4 holds on average at iteration n: [ ] E x,n Θ n Q(Θ n Θ Q N (Θ n Θ [ ] E x Θ n Q(Θ Θ Q(Θ n Θ where Θ denotes the maximum-lielihood vector of parameters. The NEM positivity condition (3 has a simple form for Gaussian mixture models [28] and for classification and regression networs [3]. The intuition behind the NEM sufficient condition (3 is that some noise realizations n mae a signal x more probable: f(x+n Θ f(x Θ. Taing logarithms givesln( f(x+n θ f(x θ 0. Taing expectations gives a NEM-lie positivity condition. The actual proof of the NEM Theorem uses ullbac-liebler divergence to show that the noise-boosted lielihood is closer on average at each iteration to the optimal lielihood function than is the noiseless lielihood [0]. The NEM positivity inequality (3 is not vacuous because the expectation conditions on the converged parameter vector Θ. Vacuity would result in the usual case of averaging a log-lielihood ratio. Tae the expectation of the log-lielihood ratio ln f(x Θ g(x Θ with respect to the probability density function g(x Θ to give E g [ln f(x Θ g(x Θ ]. Then Jensen s inequality and the concavity of the logarithm give E g [ln f(x Θ g(x Θ ] lne g [ f(x Θ g(x Θ ] ln f(x Θ X g(x Θ g(x Θ dx ln g(x Θ dx X ln 0. So E g [ln f(x θ g(x θ ] 0 and thus in this case strict positivity is impossible [0]. But the expectation in (3 does not in general lead to this cancellation of probability densities because the integrating density in (3 deps on the optimal maximum-lielihood parameter Θ rather than on ust Θ n. So density cancellation occurs only when the NEM algorithm has converged to a local lielihood maximum because then Θ n Θ. The NEM Theorem simplifies for a classifier networ with softmax output neurons. Then the additive noise must lie above the defining NEM hyperplane [3]. A similar NEM result holds for regression except that the noise-benefit region is a hypersphere. NEM noise can also inect into the hidden-neuron layers in accord with their lielihood functions [2]. The next section exts these results to recurrent BP for classification and regression. III. NOISE-BOOSTED RECURRENT BACPROPAGATION We next develop the recurrent bacpropagation (RBP algorithm and then show how to noise-boost it using the NEM Theorem. NEM noise can inect in all neurons [2]. This paper focuses only on noise inection into the output neurons. A. Recurrent Bacpropagation This section presents the learning algorithm for training Long Short-Term Memory (LSTM with RBP. The LSTM is a deep RNN architecture suitable for problems with long timelag depencies [26]. The LSTM networ consists of I input neurons, J hidden neurons, and output neurons. We also have control gates for the input, hidden, and output layers. The input gate controls the input activation. The forget gate controls the hidden activation. And the output gate controls the output activation. These gates each have J neurons. The LSTM networ captures the depency of the input data with respect to time. The time t can tae on any value in {,2,...,T}. The J I matrix W x connects the hidden neurons to the input neurons. The J matrix U y connects the output neurons to the hidden neurons. Superscripts x,h, and y denote the respective input, hidden, and output layers. The J I matrix W γ connects the input gates to the input neurons and J matrix V γ connects the input gates to the output 09
3 neurons. The J I matrix W φ connects the forget gates to the input neurons and J matrix V φ connects the forget gate to the output neurons. The J I matrix W ω connects the output gates to the input neurons and J matrix V ω connects the output gates to the output neurons. Superscripts γ,φ, and ω denote the respective input, forget, and output gates. The training involves the forward pass of the input vector and the bacward pass of the error. The forward pass propagates the input vector from the input layer to the output layer that has output activation a y. The bacward pass propagates the error from the output and hidden layers bac to the input layer and updates the networ s weights. We alternate these two passes starting with the forward pass until the networ converges. RBP sees those weights that minimize the error function E(Θ or maximize the lielihood L(Θ. The forward pass initializes the internal memory h (0 to 0 for all values of. The output activation a y(0 is 0 for all values of. We then compute a y(t for t,2,..., and T. The input vector x (t has I input neuron values. The input neurons have identity activations: a x(t i x (t i where a x(t i denotes the activation for the th input neuron due to the input vector x (t at time t. The term o x(t denotes the th value for the transformation from the input space to the hidden state: o x(t I i w x ia x(t i +b x (5 where b x is the bias for the th hidden neuron due to the input activation. This transformation is the logistic sigmoidal activation function. The term a x(t denotes the activation of the th hidden neuron with respect to the input-space transformation of x t : a x(t +e ox(t The term o γ(t is the input to the th input gate at time t: o γ(t I i w γ i ax(t i + (6 v γ ay(t +b γ (7 where b γ is the bias for the th input gate and a y(t is the output activation for time t. The activation a γ(t for the th input gate at time t also uses the logistic sigmoid: The term o h(t time t: a γ(t +e oγ(t. (8 denotes the input to the th hidden neuron at o h(t a x(t a γ(t (9 where denotes Hadamard product. The term o φ(t the input to the th forget gate. It has the form o φ(t I i w φ i ax(t i + denotes v φ ay(t +b φ (0 where b φ is the bias for the th forget gate. The term s (t denotes the state of the th internal memory at time time t: ( s (t o h(t + s (t a γ(t ( where s (t is the state of th internal memory at time t. The activation a s(t of the th internal memory is also logistic: a s(t +e s(t (2 where superscript s denotes the internal memory. The term denotes the input to the th output gate. It has the form o ω(t o ω(t I i w ω ia x(t i + v ω a y(t +b ω (3 where b ω is the bias for the th output gate. The term a ω(t is the activation for the th output gate at time t. The output gate also uses the logistic sigmoid The activation a h(t a ω(t +e oω(t. (4 for the th output neuron at time t obeys a h(t a s(t a ω(t (5 where a s(t and a ω(t are from (2 and (4. The term o y(t is the input to the th output neuron at time t: o y(t J u x a h(t +b y (6 where b y is the bias for th output neuron. The output-layer neurons use softmax activations because this is a classification networ [3], []. The th output neuron a y(t for th has the softmax form a y(t exp(o y(t l exp(oy(t l. (7 The bacward pass computes the error and its derivative with respect to the weights. The error function E (t is the cross-entropy because this is a classifier networ []: E(Θ E (t ( y (t ln(ay(t (8. The derivative of the error term E (t with respect to u y for t T,T,..., and is u y a y(t (a y(t a y(t o y(t o y(t u y y (t ah(t. (9 0
4 The partial derivative of the E (t with respect to the b y is b y a y(t (a y(t a y(t o y(t o y(t b y y (t. (20 Let E (t denote the derivative of E (t with respect to a h(t. Then E (t a h(t ( (a y(t y (t uy. (2 We now list the many other partial derivatives in the RPP algorithm. The derivative of E (t with respect to w ω i is w ω i a h(t E (t a s(t a h(t a ω(t a ω(t o ω(t o ω(t w ω i a ω(t ( a ω(t a x(t The derivative of E (t with respect to v ω is v ω a h(t E (t a s(t a h(t a ω(t a ω(t o ω(t o ω(t v ω a ω(t ( a ω(t a y(t The derivative of E (t with respect to b ω is b ω a h(t E (t a s(t a h(t a ω(t a ω(t o ω(t a ω(t ( a ω(t The derivative of E (t with respect to s (t Let a φ(t o φ(t s (t a h(t E (t a ω(t a h(t s (t i. (22 i. (23 o ω(t b ω. (24 is a s(t ( a s(t. (25 denote the derivative of a φ(t with respect to with respect to o φ(t is. The derivative of a φ(t a φ(t aφ(t o φ(t a φ(t ( a φ(t. (26 The derivative of E (t with respect to w φ i is w φ i ( T n mn+ a φ(m+ s (n a φ(n a x(n i. (27 The derivative of E (t with respect to v φ is v φ ( T n mn+ a φ(m+ s (n a φ(n The derivative of E (t with respect to b φ is b φ ( T n mn+ a φ(m+ s (n a y(n. (28 a φ(n So the derivative of E (t with respect to a γ(t a γ(t s (t s (t o h(t o h(t a γ(t ( E (t a ω(t a s(t ( a s(t The derivative of E (t with respect to w γ i is w γ i a γ(t a γ(t a γ(t o γ(t a γ(t o γ(t w γ i. (29 is a x(t ( a γ(t a x(t The derivative of E (t with respect to v γ is v γ a γ(t a γ(t a γ(t o γ(t a γ(t o γ(t v γ ( a γ(t a y(t The derivative of E (t with respect to b γ is b γ a γ(t a γ(t a γ(t o γ(t o γ(t b γ The derivative of E (t with respect to a x(t a x(t s (t s (t o h(t. (30 i. (3. (32 a γ(t ( a γ(t. (33 o h(t a x(t ( E (t a ω(t a s(t ( a s(t The derivative of E (t with respect to w x i is w x i a x(t a x(t a x(t o x(t a x(t i o x(t w x i is a γ(t ( a γ(t a x(t. (34 i. (35
5 The derivative of E (t with respect to the b x is b x a x(t a x(t a x(t o x(t o x(t b x a x(t i ( a γ(t. (36 These partial derivatives form the update rules for training the RNN with the RBP algorithm. B. Noisy EM Theorems for RBP We can now state and prove the new NEM RBP theorems for a classifier RNN and for a regression RNN. We start with the classifier result. Theorem 2: RBP Noise Benefit for RNN Classification The NEM positivity condition (3 holds for the maximum lielihood training of a classifier recurrent neural networ with noise inection in the output softmax neurons if the following hyperplane condition holds : E y,h,n x,θ T n (tt ln a y(t} 0. (37 Proof : We first show that the cross entropy E(Θ equals the sum of the negative log-lielihood functions over time t,2,..., and T. The error E(Θ is the cross entropy for a classifier networ: E(Θ E (t (38 ln [ ln [ ln y (t ln ay(t (39 (a y(t y (t ] p (y (t y (t x,θ ] (40 (4 ln p(y (t x,θ (42 T p(y (t x,θ (43 L(Θ (44 So minimizing the cross entropy E(Θ maximizes the logliehood function L(Θ: p(y x,θ e E(Θ. The NEM sufficient condition (3 simplifies to the product of exponentiated output activations over time t and each with output neurons. Then additive noise inection gives p(n+y,h x,θ p(y,h x,θ T T p(n (t +y (t,h x,θ p(h x,θ p(h x,θ p(y (t,h x,θ p(n (t +y (t h,x,θ p(y (t h,x,θ T [ T [ (a y(t n(t +y(t (45 (a y(t y(t ] (a y(t n(t. (47 Then the NEM positivity condition in (3 becomes { ln T [ This simplifies to the inequality ] (46 ]} (a y(t n(t 0. (48 T [ ]} n (t ln(a y(t 0. (49 The inner summation represents the inner-product of n (t with ln a y(t. Then we can rewrite (49 as the hyperplane inequality T n (tt ln a y(t} 0. (50 We next state and prove the NEM RBP theorem for regression networs. Theorem 3: RBP Noise Benefit for RNN Regression The NEM positivity condition (3 holds for the maximum lielihood training of a regression RNN with Gaussian target vector y N(y a y,i if the following inequality condition holds: T (y (t +n (t a y(t 2} T y (t a y(t 2} 0. (5 Proof: The error function E(Θ is the squared error for a regression networ [3]. We first show that minimizing the squared error E(Θ is the same as maximizing the lielihood function L(Θ: 2
6 Fig. : The first 5 frames from a UCF YouTube video sample of uggling a soccer ball. Such soccer videos were from the 6th of categories in the classification experiment. So the target output for this sampled video clip was the -in--coded bit vector[ ]. E(Θ E (t (52 y(t a y(t 2 2 [ ln exp { y(t a y(t 2 }] 2 (53 (54 T ln(2π 2 +T ln(2π 2 (55 [ T ] ln p(y y (t x,θ +T ln(2π 2 L(Θ+T ln(2π 2 (56 where the networ s output probability density function p(y y (t x,θ is the vector normal density N(y (t a (t,θ. So minimizing the squared error E(Θ maximizes the lielihood L(Θ because the last term in (56 does not dep on Θ. Inecting additive noise into the output neurons gives p(n+y,h x,θ p(y,h x,θ T p(n (t +y (t,h x,θ p(h x,θ p(h x,θ p(y (t,h x,θ T p(n (t +y (t h,x,θ (57 p(y (t h,x,θ ( T exp 2 (y(t a y(t 2 ( exp 2 (y(t +n (t a y(t 2. (58 Then the NEM positivity condition in (3 simplifies to { ln T exp( 2 (y(t a y(t 2 } exp( 2 (y(t +n (t a y(t 2 0. (59 This positivity condition simplifies to the hyperspherical inequality T (y (t +n (t a y(t 2} T y (t a y(t 2} 0. (60 These NEM theorems ext directly to other RNN architectures such as those with gated recurrent units and Elman RNNs [29]. Equation (6 below gives the update rule for training a RNN with BP through time. It applies to both classification and regression networs. The update rule for the n th training iteration is Θ (n+ Θ (n +η Θ L(Θ (6 ΘΘ (n Θ (n η Θ E(Θ (62 ΘΘ (n forθ {u y,b y,w ω i,vω,bω,wφ i,vφ,bφ,wγ i,vγ,bγ,wx i,bx } and η is the learning rate. The partial derivatives in the gradient of (62 come from (9, (20, (22 (24, (27 (29, (3 (33, (35, and (36. Algorithm states the details of the Noisy RBP algorithm with a LSTM architecture and for softmax output activation. A related algorithm inects noise into the hidden units as well [3]. This produces further speed-ups in general. IV. VIDEO CLASSIFICATION SIMULATION RESULTS We inected additive NEM noise during the RBP training of a LSTM-RNN video classifier. We used 80:20 splits for training and test sets. We extracted 26,000 training instances from the standard UCF sports action YouTube videos [], [2]. We used 6,700 video-clip samples for testing the networ after training. Each training instance had 2 image frames sampled uniformly from a sports-action video at the rate of 6 frames per second. Each image frame in the dataset had pixels. Each pixel value was between 0 and. We fed the pixel values into the input neurons of the LSTM- RNN and tried different numbers of hidden neurons. The 3
7 Cross Entropy Noise Benefit with RBP Training No Noise Blind Noise NEM Noise Training Iterations Fig. 2: Noise benefit comparisons. NEM Noise benefit from noise inection into the output layer of a LSTM RNN. Adding NEM noise to the output neurons of the LSTM RNN improved the performance of RBP training. Training RBP with NEM noise in ust the output softmax neurons led to 60% fewer iterations to reach a common low value of cross entropy. Noise-boosted RBP too 8 iterations to reach the cross-entropy value 2.7 while ordinary noiseless RBP too 34 iterations to reach the same cross entropy value. There was a 20.6 % maximum decrease in cross-entropy. NEM noise can also inect into the hidden neurons. Blind noise gave little improvement. input, forget, and output gates all had the same number of neurons. The output layer had neurons that represented the categories of sports actions in the dataset. These sport actions were basetball shooting, biing, diving, golf swinging, horsebac riding, soccer uggling, swinging, tennis swinging, trampoline umping, volleyball spiing, and waling. Figure shows 5 sampled consecutive frames from a video in the soccer-uggling category. We fed the frames into the LTSM networ as a sequence of data over time. The RNN used softmax neurons at the output layer because the LSTM-RNN was a classification networ. The other layers and the gates used logistic neurons as discussed above. Then the noise benefit tapered off. We used uniform noise fromu(0,0. with annealing factor. This scaled down the noise by n as the iteration number n grew. The Adaptive Moment Estimation optimizer piced the learning rate as n grew. The initial learning rate was η We compared the training performance over 200 iterations and found that the best noise benefit occurred with 200 internal memory neurons. Figure 2 shows the cross entropy for different RBP training regimens and the resulting NEM- RBP speed-up of convergence. Simply inecting faint blind (uniform noise into the output neurons performed slightly better than no noise inection but not nearly as well as NEM noise inection. The figure also shows that noiseless RBP too 26 more iterations to reach the same cross-entropy value of 2.7. So NEM-boosted RBP needed 60% fewer training iterations to converge to the same low cross-entropy value as did noiseless RBP. The inection of NEM noise also improved the classification accuracy on the video-clip test set. This apparently occurred because the networ lielihood lower-bounds the classification accuracy as we have shown elsewhere [2]. We compared the classification accuracy of NEM-noise RBP training with the Data: 0 input sets <: 5 áäääá: Ç where each input set : á L < á :5; á äääá á :Í;. 0 target sets <; 5 áäääá; Ç where target set ; á L<! á :5; á äääá! á :Í;. Number of epochs for running bacpropagation through-time 4. Result: Trained RNN weights. Initialize the networ weights #. while: epoch N s\4 while training input set J s\0 FEEDFORWARD while P s\6 x Compute the activation of input, forget, and output gates from :ç; and! :ç?5; á. á x Compute the memory cell activation á :ç; and output activation ì:ç; á. ADD NOISE while P s\6š x Generate noise vector : á ì:ç; ARrA x Add the noise:! á :ç; Z! á :ç; E else x Do nothing BACPROPAGATE ERROR THOUGH TIME Initialize: Ï ':#;Lr while P 6\sŠ x Compute the cross-entropy ' ç :#;. x Compute cross-entropy gradient Ï ' ç :#; x Update the gradient Ï ':#;LÏ ':#;E Ï ' ç :#; x Update the parameters # using gradient descent: #L#FßÏ ':#; Algorithm : Noisy RBP algorithm for training a classifier RNN with an LSTM architecture. accuracy for noiseless RBP training. The accuracy measure counted all true positives. NEM noise produced a 2% relative gain in classification accuracy. The noiseless RBP training gave 8% classification accuracy on the test set. The inection of NEM noise gave 83% classification accuracy on the test set. This represents 34 more correct classifications than we found for noiseless RBP training. We ran the simulation on a 4
8 Cross Entropy Cross Entropy Cross Entropy Noisy RBP (50 Hidden Neurons No Noise Blind Noise NEM Noise Training Iterations Noisy RBP (50 Hidden Neurons No Noise Blind Noise NEM Noise Training Iterations Noisy RBP (200 Hidden Neurons Training Iterations No Noise Blind Noise NEM Noise Fig. 3: Hidden neuron memory-size effects on inecting noise into the output neurons of the RNN. The NEM noise benefit became more pronounced as the number of hidden neurons increased. The best noise effect occurred with 200 memory neurons and tapered off thereafter. Google cloud instance with 8 vcpus, 30GB RAM, and 50GB system dis. V. CONCLUSION Careful noise inection speeded up convergence of the recurrent bacpropagation (RBP algorithm for video samples. The ey insight in the noise boost is that ordinary BP is a special case of the generalized EM algorithm. Then noiseboosting EM leads to noise-boosting BP and its recurrent versions for both classification and regression. Simulations showed that noise-boosted RBP training substantially outperformed noiseless RBP for classifying the UCF sports action Youtube video clips. Further noise benefits should result for noise inection into the hidden neurons and the use of convolutional mass [3]. REFERENCES [] J. A. Miel D. Rodriguez and M. Shah, Action mach: A spatio-temporal maximum average correlation height filter for action recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, [2]. Soomro and A. R. Zamir, Action recognition in realistic sports videos, in Computer Vision in Sports. Springer, 204, pp [3]. Audhhasi, O. Osoba, and B. oso, Noise-enhanced convolutional neural networs, Neural Networs, vol. 78, pp. 5 23, 206. [4] D. Rumelhart, G. Hinton, and R. Williams, Learning representations by bac-propagating errors, Nature, pp , 986. [5] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 52, pp , 205. [6] M. Jordan and T. Mitchell, Machine learning: trs, perspectives, and prospects, Science, vol. 349, pp , 205. [7] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum lielihood from incomplete data via the em algorithm, Journal of the royal statistical society. Series B (methodological, pp. 38, 977. [8] O. Osoba, S. Mitaim, and B. oso, The noisy expectation maximization algorithm, Fluctuation and Noise Letters, vol. 2, no. 03, p , 203. [9] O. Osoba and B. oso, Noise-Enhanced Clustering and Competitive Learning Algorithms, Neural Networs, Jan [0], The noisy expectation-maximization algorithm for multiplicative noise inection, Fluctuation and Noise Letters, p , 206. [] C. M. Bishop, Pattern recognition and machine learning. springer, [2] B. oso,. Audhhasi, and O. Osoba, Noise can speed bacpropagation learning and deep bidirectional pretraining, in review. [3] B. oso, Neural Networs and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence. Prentice Hall, 99. [4] P. Hänggi, Stochastic resonance in biology, ChemPhysChem, vol. 3, no. 3, pp , [5] B. oso, Noise. Penguin, [6] A. Patel and B. oso, Stochastic resonance in continuous and spiing neuron models with Levy noise, IEEE Transactions on Neural Networs, vol. 9, no. 2, pp , [7] M. McDonnell, N. Stocs, C. Pearce, and D. Abbott, Stochastic resonance: from suprathreshold stochastic resonance to stochastic signal quantization. Cambridge University Press, [8] M. Wilde and B. oso, Quantum forbidden-interval theorems for stochastic resonance, Journal of Physical A: Mathematical Theory, vol. 42, no. 46, [9] A. Patel and B. oso, Error-probability noise benefits in threshold neural signal detection, Neural Networs, vol. 22, no. 5, pp , [20] B. Franze and B. oso, Using noise to speed up Marov chain Monte Carlo estimation, Procedia Computer Science, vol. 53, pp. 3 20, 205. [2] L. Gammaitoni, P. Hänggi, P. Jung, and F. Marchesoni, Stochastic Resonance, Reviews of Modern Physics, vol. 70, no., pp , January 998. [22] A. Graves, A.-r. Mohamed, and G. Hinton, Speech recognition with deep recurrent neural networs, in 203 IEEE international conference on acoustics, speech and signal processing. IEEE, 203, pp [23] J. Donahue, L. Anne Hrics, S. Guadarrama, M. Rohrbach, S. Venugopalan,. Saeno, and T. Darrell, Long-term recurrent convolutional networs for visual recognition and description, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 205, pp [24] J. Yue-Hei Ng, M. Hausnecht, S. Viayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, Beyond short snippets: Deep networs for video classification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 205, pp [25] P. J. Werbos, Bacpropagation through time: what it does and how to do it, Proceedings of the IEEE, vol. 78, no. 0, pp , 990. [26] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation, vol. 9, no. 8, pp , 997. [27] A.-r. M. Alex Graves and G. Hinton, Speech recognition with deep recurrent neural networs, Proceedings of International Conference on Acoustic, Speech and Signal Processing, pp , 203. [28]. Audhhasi, O. Osoba, and B. oso, Noisy hidden Marov models for speech recognition, in Proceedings of the 203 International Joint Conference on Neural Networs. IEEE, 203, pp. 6. [29] P. Rodriguez, J. Wiles, and J. L. Elman, A recurrent neural networ that learns to count, Connection Science, vol., no., pp. 5 40,
Bidirectional Representation and Backpropagation Learning
Int'l Conf on Advances in Big Data Analytics ABDA'6 3 Bidirectional Representation and Bacpropagation Learning Olaoluwa Adigun and Bart Koso Department of Electrical Engineering Signal and Image Processing
More informationKartik Audhkhasi, Osonde Osoba, Bart Kosko
To appear: 2013 International Joint Conference on Neural Networks (IJCNN-2013 Noise Benefits in Backpropagation and Deep Bidirectional Pre-training Kartik Audhkhasi, Osonde Osoba, Bart Kosko Abstract We
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationEstimating Gaussian Mixture Densities with EM A Tutorial
Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables
More informationNOISE BENEFITS IN CONVOLUTIONAL NEURAL NETWORKS. Kartik Audhkhasi, Osonde Osoba, Bart Kosko
NOISE BENEFITS IN CONVOLUTIONAL NEURAL NETWORKS Kartik Audhkhasi, Osonde Osoba, Bart Kosko Signal and Information Processing Institute Electrical Engineering Department University of Southern California,
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationarxiv: v3 [cs.lg] 14 Jan 2018
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationThe Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection
The Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection Osonde Osoba, Bart Kosko We generalize the noisy expectation-maximization (NEM) algorithm to allow arbitrary noise-injection
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationRecurrent Neural Networks. Jian Tang
Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating
More informationTechnical Details about the Expectation Maximization (EM) Algorithm
Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationOptimal Mean-Square Noise Benefits in Quantizer-Array Linear Estimation Ashok Patel and Bart Kosko
IEEE SIGNAL PROCESSING LETTERS, VOL. 17, NO. 12, DECEMBER 2010 1005 Optimal Mean-Square Noise Benefits in Quantizer-Array Linear Estimation Ashok Patel and Bart Kosko Abstract A new theorem shows that
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationLearning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationCS60010: Deep Learning
CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric
More informationUnit 8: Introduction to neural networks. Perceptrons
Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad
More informationEE-559 Deep learning Recurrent Neural Networks
EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationAn artificial neural networks (ANNs) model is a functional abstraction of the
CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationRecurrent Neural Network Training with Preconditioned Stochastic Gradient Descent
Recurrent Neural Network Training with Preconditioned Stochastic Gradient Descent 1 Xi-Lin Li, lixilinx@gmail.com arxiv:1606.04449v2 [stat.ml] 8 Dec 2016 Abstract This paper studies the performance of
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationFeedforward Neural Networks
Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which
More informationLong-Short Term Memory and Other Gated RNNs
Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationOther Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3
Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Other Topologies The back-propagation procedure is not limited to feed-forward cascades. It can be applied to networks of module with any topology,
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationCSC321 Lecture 15: Exploding and Vanishing Gradients
CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationRecurrent Neural Networks with Flexible Gates using Kernel Activation Functions
2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,
More informationVariables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)
CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training
More informationBut if z is conditioned on, we need to model it:
Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationRECURRENT NETWORKS I. Philipp Krähenbühl
RECURRENT NETWORKS I Philipp Krähenbühl RECAP: CLASSIFICATION conv 1 conv 2 conv 3 conv 4 1 2 tu RECAP: SEGMENTATION conv 1 conv 2 conv 3 conv 4 RECAP: DETECTION conv 1 conv 2 conv 3 conv 4 RECAP: GENERATION
More informationDeep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated
Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units
More informationFast Learning with Noise in Deep Neural Nets
Fast Learning with Noise in Deep Neural Nets Zhiyun Lu U. of Southern California Los Angeles, CA 90089 zhiyunlu@usc.edu Zi Wang Massachusetts Institute of Technology Cambridge, MA 02139 ziwang.thu@gmail.com
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationThe Expectation Maximization Algorithm
The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining
More informationDoes the Wake-sleep Algorithm Produce Good Density Estimators?
Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationLECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form
LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationNeural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)
Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction Neural Turning Machine: Couple a Neural Network with external memory resources The combined
More informationLecture 16: Introduction to Neural Networks
Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationIntelligent Handwritten Digit Recognition using Artificial Neural Network
RESEARCH ARTICLE OPEN ACCESS Intelligent Handwritten Digit Recognition using Artificial Neural Networ Saeed AL-Mansoori Applications Development and Analysis Center (ADAC), Mohammed Bin Rashid Space Center
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 24 th, 2007 1 Generative v. Discriminative classifiers Intuition Want to Learn: h:x a Y X features
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationPart 8: Neural Networks
METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationLecture 13 Back-propagation
Lecture 13 Bac-propagation 02 March 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/21 Notes: Problem set 4 is due this Friday Problem set 5 is due a wee from Monday (for those of you with a midterm
More informationNeural Networks. Intro to AI Bert Huang Virginia Tech
Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg
More informationA Novel Activity Detection Method
A Novel Activity Detection Method Gismy George P.G. Student, Department of ECE, Ilahia College of,muvattupuzha, Kerala, India ABSTRACT: This paper presents an approach for activity state recognition of
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationCourse 10. Kernel methods. Classical and deep neural networks.
Course 10 Kernel methods. Classical and deep neural networks. Kernel methods in similarity-based learning Following (Ionescu, 2018) The Vector Space Model ò The representation of a set of objects as vectors
More informationArtificial Neural Networks. Historical description
Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of
More informationThe Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection
The Noisy Expectation-Maximization Algorithm for Multiplicative Noise Injection Osonde Osoba, Bart Kosko We generalize the noisy expectation-maximization (NEM) algorithm to allow arbitrary modes of noise
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationTopic 3: Neural Networks
CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More information