A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning

Size: px
Start display at page:

Download "A Generalized Recurrent Neural Architecture for Text Classification with Multi-Task Learning"

Transcription

1 Proceedings of he Tweny-Sixh Inernaional Join Conference on Arificial Inelligence (IJCAI-17) A Generalized Recurren Neural Archiecure for Tex Classificaion wih Muli-Task Learning Honglun Zhang 1, Liqiang Xiao 1, Yongkun Wang 2, Yaohui Jin 1,2 1 Sae Key Lab of Advanced Opical Communicaion Sysem and Nework 2 Nework and Informaion Cener Shanghai Jiao Tong Universiy {ykw}@sju.edu.cn Absrac Muli-ask learning leverages poenial correlaions among relaed asks o exrac common feaures and yield performance gains. However, mos previous works only consider simple or weak ineracions, hereby failing o model complex correlaions among hree or more asks. In his paper, we propose a muli-ask learning archiecure wih four ypes of recurren neural layers o fuse informaion across muliple relaed asks. The archiecure is srucurally flexible and considers various ineracions among asks, which can be regarded as a generalized case of many previous works. Exensive experimens on five benchmark daases for ex classificaion show ha our model can significanly improve performances of relaed asks wih addiional informaion from ohers. 1 Inroducion Neural nework based models have been widely exploied wih he prosperiies of Deep Learning [Bengio e al., 2013] and achieved inspiring performances on many NLP asks, such as ex classificaion [Chen e al., 2015; Liu e al., 2015a], semanic maching [Liu e al., 2016d; 2016a] and machine ranslaion [Suskever e al., 2014]. These models are robus a feaure engineering and can represen words, senences and documens as fix-lengh vecors, which conain rich semanic informaion and are ideal for subsequen NLP asks. One formidable consrain of deep neural neworks (DNN) is heir srong reliance on large amouns of annoaed corpus due o subsanial parameers o rain. A DNN rained on limied daa is prone o overfiing and incapable o generalize well. However, consrucions of large-scale highqualiy labeled daases are exremely labor-inensive. To solve he problem, hese models usually employ a pre-rained lookup able, also known as Word Embedding [Mikolov e al., 2013b], o map words ino vecors wih semanic implicaions. However, his mehod jus inroduces exra knowledge and does no direcly opimize he argeed ask. The problem of insufficien annoaed resources is no solved eiher. Muli-ask learning leverages poenial correlaions among relaed asks o exrac common feaures, increase corpus size implicily and yield classificaion improvemens. Inspired by [Caruana, 1997], here are a large lieraure dedicaed for muli-ask learning wih neural nework based models [Collober and Weson, 2008; Liu e al., 2015b; 2016b; 2016c]. These models basically share some lower layers o capure common feaures and furher feed hem o subsequen ask-specific layers, which can be classified ino hree ypes: Type-I One daase annoaed wih muliple labels and one inpu wih muliple oupus. Type-II Muliple daases wih respecive labels and one inpu wih muliple oupus, where samples from differen asks are fed one by one ino he models sequenially. Type-III Muliple daases wih respecive labels and muliple inpus wih muliple oupus, where samples from differen asks are joinly learned in parallel. In his paper, we propose a generalized muli-ask learning archiecure wih four ypes of recurren neural layers for ex classificaion. The archiecure focuses on Type-III, which involves more complicaed ineracions bu has no been researched ye. All he relaed asks are joinly inegraed ino a single sysem and samples from differen asks are rained in parallel. In our model, every wo asks can direcly inerac wih each oher and selecively absorb useful informaion, or communicae indirecly via a shared inermediae layer. We also design a global memory sorage o share common feaures and collec ineracions among all asks. We conduc exensive experimens on five benchmark daases for ex classificaion. Compared o learning separaely, joinly learning muliple relaive asks in our model demonsrae significan performance gains for each ask. Our conribuions are hree-folds: Our model is srucurally flexible and considers various ineracions, which can be concluded as a generalized case of many previous works wih deliberae designs. Our model allows for ineracions among hree or more asks simulaneously and samples from differen asks are rained in parallel wih muliple inpus. We consider differen scenarios of muli-ask learning and demonsrae srong resuls on several benchmark classificaion daases. Our model ouperforms mos of sae-of-he-ar baselines. 3385

2 Proceedings of he Tweny-Sixh Inernaional Join Conference on Arificial Inelligence (IJCAI-17) 2 Problem Saemens 2.1 Single-Task Learning For a single supervised ex classificaion ask, he inpu is a word sequences denoed by x = {,,..., }, and he oupu is he corresponding class label y or class disribuion y. A lookup layer is used firs o ge he vecor represenaion x i R d of each word x i. A classificaion model f is rained o ransform each x = {,,..., } ino a prediced disribuion ŷ. f(,,..., ) = ŷ and he raining objecive is o minimize he oal crossenropy of he prediced and rue disribuions over all samples. N C L = y ij log ŷ ij i=1 j=1 where N denoes he number of raining samples and C is he class number. 2.2 Muli-Task Learning Given K supervised ex classificaion asks, T 1, T 2,..., T K, a joinly learning model F is rained o ransform muliple inpus ino a combinaion of prediced disribuions in parallel. F(x, x,..., x (K) ) = (ŷ, ŷ,..., ŷ (K) ) where x (k) are sequences from each asks and ŷ (k) are he corresponding predicions. The overall raining objecive of F is o minimize he weighed linear combinaion of coss for all asks. L = N K i=1 k=1 λ k C k j=1 y (k) ij log ŷ (k) ij (4) where N denoes he number of sample collecions, C k and λ k are class numbers and weighs for each ask T k respecively. 2.3 Three Perspecives of Muli-Task Learning Differen asks may differ in characerisics of he word sequences x or he labels y. We compare los of benchmark asks for ex classificaion and conclude hree differen perspecives of muli-ask learning. Muli-Cardinaliy Tasks are similar excep for cardinaliy parameers, for example, movie review daases wih differen average sequence lenghs and class numbers. Muli-Domain Tasks involve conens of differen domains, for example, produc review daases on books, DVDs, elecronics and kichen appliances. Muli-Objecive Tasks are designed for differen objecives, for example, senimen analysis, opics classificaion and quesion ype judgmen. The simples muli-ask learning scenario is ha all asks share he same cardinaliy, domain and objecive, while come from differen sources, so i is inuiive ha hey can obain useful informaion from each oher. However, in he mos complex scenario, asks may vary in cardinaliy, domain and even objecive, where he ineracions among differen asks can be quie complicaed and implici. We will evaluae our model on differen scenarios in he Experimen secion. 3 Mehodology Recenly neural nework based models have obained subsanial ineress in many naural language processing asks for heir capabiliies o represen variable-lengh ex sequences as fix-lengh vecors, for example, Neural Bag-of- Words (NBOW), Recurren Neural Neworks (RNN), Recursive Neural Neworks (RecNN) and Convoluional Neural Nework (CNN). Mos of hem firs map sequences of words, n-grams or oher semanic unis ino embedding represenaions wih a pre-rained lookup able, hen fuse hese vecors wih differen archiecures of neural neworks, and finally uilize a sofmax layer o predic caegorical disribuion for specific classificaion asks. For recurren neural nework, inpu vecors are absorbed one by one in a recurren way, which makes RNN paricularly suiable for naural language processing asks. 3.1 Recurren Neural Nework A recurren neural nework mainains a inernal hidden sae vecor h ha is recurrenly updaed by a ransiion funcion f. A each ime sep, he hidden sae h is updaed according o he curren inpu vecor x and he previous hidden sae h 1. { 0 = 0 h = (5) f(h 1, x ) oherwise where f is usually a composiion of an elemen-wise nonlineariy wih an affine ransformaion of boh x and h 1. In his way, recurren neural neworks can comprehend a sequence of arbirary lengh ino a fix-lengh vecor and feed i o a sofmax layer for ex classificaion or oher NLP asks. However, gradien vecor of f can grow or decay exponenially over long sequences during raining, also known as he gradien exploding or vanishing problems, which makes i difficul o learn long-erm dependencies and correlaions for RNNs. [Hochreier and Schmidhuber, 1997] proposed Long Shor-Term Memory Nework (LSTM) o ackle he above problems. Apar from he inernal hidden sae h, LSTM also mainains a inernal hidden memory cell and hree gaing mechanisms. While here are numerous varians of he sandard LSTM, here we follow he implemenaion of [Graves, 2013]. A each ime sep, saes of he LSTM can be fully represened by five vecors in R n, an inpu gae i, a forge gae f, an oupu gae o, he hidden sae h and he memory cell c, which adhere o he following ransiion funcions. i = σ(w i x + U i h 1 + V i c 1 + b i ) (6) f = σ(w f x + U f h 1 + V f c 1 + b f ) (7) o = σ(w o x + U o h 1 + V o c 1 + b o ) (8) c = anh(w c x + U c h 1 ) (9) c = f c 1 + i c (10) h = o anh(c ) (11) 3386

3 Proceedings of he Tweny-Sixh Inernaional Join Conference on Arificial Inelligence (IJCAI-17) where x is he curren inpu, σ denoes logisic sigmoid funcion and denoes elemen-wise muliplicaion. By selecively conrolling porions of he memory cell c o updae, erase and forge a each ime sep, LSTM can beer comprehend long-erm dependencies wih respec o labels of he whole sequences. 3.2 A Generalized Archiecure Based on he LSTM implemenaion of [Graves, 2013], we propose a generalized muli-ask learning archiecure for ex classificaion wih four ypes of recurren neural layers o convey informaion inside and among asks. Figure 1 illusraes he srucure design and informaion flows of our model, where hree asks are joinly learned in parallel. As Figure 1a shows, each ask owns a LSTM-based Single Layer for inra-ask learning. Pair-wise Coupling Layer and Local Fusion Layer are designed for direc and indirec iner-ask ineracions. And we furher uilize a Global Fusion Layer o mainain a global memory for informaion shared among all asks. Single Layer Each ask owns a LSTM-based Single Layer wih a collecion of parameers Φ (k), aking Eqs.(9) for example. = anh(w c (k) x (k) + U (k) c h (k) 1 ) (12) Inpu sequences of each ask are ransformed ino vecor represenaions (x, x,..., x (K) ), which are laer recurrenly fed ino he corresponding Single Layers. The hidden saes a he las ime sep h (k) T of each Single Layer can be regarded as fix-lengh represenaions of he whole sequences, which are followed by a fully conneced layer and a sofmax non-linear layer o produce class disribuions. ŷ (k) = sofmax(w (k) h (k) T + b(k) ) (13) where ŷ (k) is he prediced class disribuion for x (k). Coupling Layer Besides Single Layers, we design Coupling Layers o model direc pair-wise ineracions beween asks. For each pair of asks, hidden saes and memory cells of he Single Layers can obain exra informaion direcly from each oher, as shown in Figure 1b. We re-define Eqs.(12) and uilize a gaing mechanism o conrol he porion of informaion flows from one ask o anoher. The memory conen of each Single Layer is updaed on he leverage of pair-wise couplings. K = anh(w c (k) x (k) + g (j k) U (j k) c h (j) 1 ) (14) j=1 g (j k) = σ(w gc (k) x (k) + U (j) gc h (j) 1 ) (15) where g (j k) conrols he porion of informaion flow from T j o T k, based on he correlaion srengh beween x (k) and h (j) 1 a he curren ime sep. In his way, he hidden saes and memory cells of each Single Layer can obain exra informaion from oher asks and sronger relevance resuls in higher chances of recepion. y sofmax SL1 Single Layer Coupling Layer CL<1,3> LFL<1,3> CL<1,2> LFL<1,2> y sofmax SL2 GFL CL<2,3> LFL<2,3> y sofmax SL3 Local Fusion Layer Global Fusion Layer (a) Overall archiecure wih Single Layers, Coupling Layers, Local Fusion Layers and Global Fusion Layer (1,2) sofmax sofmax (b) Deails of Coupling Layer Beween T 1 and T 2 (1,2) (1,2) sofmax sofmax (c) Deails of Local Fusion Layer Beween T 1 and T 2 Figure 1: A generalized recurren neural archiecure for modeling ex wih muli-ask learning Local Fusion Layer Differen from Coupling Layers, Local Fusion Layers inroduce a shared bi-direcional LSTM Layer o model indirec pair-wise ineracions beween asks. For each pair of asks, we feed he Local Fusion Layer wih he concaenaion of boh inpus, x (j,k) = x (j) x (k), as shown in Figure 1c. We denoe he oupu of he Local Fusion Layer as y y y y 3387

4 Proceedings of he Tweny-Sixh Inernaional Join Conference on Arificial Inelligence (IJCAI-17) h (j,k) = h (j,k) h (j,k), a concaenaion of hidden saes from he forward and backward LSTM a each ime sep. Similar o Coupling Layers, hidden saes and memory cells of he Single Layers can selecively decide how much informaion o accep from he pair-wise Local Fusion Layers. We re-define Eqs.(14) by considering he ineracions beween he memory conen and oupus of he Local Fusion Layers as follows. = anh(w c (k) K LF (k) = j=1,j k x (k) + C (k) + LF (k) ) (16) g (j,k) U (j,k) c h (j,k) (17) g (j,k) = σ(w (k) gf x(k) + U (j) gf h(j,k) ) (18) where C (k) denoes he coupling erm in Eqs.(14) and LF (k) represens he local fusion erm. Again, we employ a gaing mechanism g (j,k) o conrol he porion of informaion flow from he Local Coupling Layers o T k. Global Fusion Layer Indirec ineracions beween Single Layers can be pair-wise or global, so we furher propose he Global Fusion Layer as a shared memory sorage among all asks. The Global Fusion Layer consiss of a bi-direcional LSTM Layer wih he inpus x (g) = x x (g) h h (g). x (K) and he mem- We denoe he global fusion erm as GF (k) ory conen is calculaed as follows. GF (k) and he oupus h (g) = = anh(w c (k) x (k) + C (k) + LF (k) + GF (k) ) (19) = σ(w gg (k) x (k) + U (k) gg h (g) )U (g) c h (g) (20) As a resul, our archiecure covers complicaed ineracions among differen asks. I is capable of mapping a collecion of inpu sequences from differen asks ino a combinaion of prediced class disribuions in parallel, as shown in Eqs Sampling & Training Mos previous muli-ask learning models [Collober and Weson, 2008; Liu e al., 2015b; 2016b; 2016c] belongs o Type-I or Type-II. The oal number of inpu samples is N = K k=1 N k, where N k are he sample numbers of each ask. However, our model focuses on Type-III and requires a 4-D ensor N K T d as inpus, where N, K, T, d are oal number of inpu collecions, ask number, sequence lengh and embedding size respecively. Samples from differen asks are joinly learned in parallel so he oal number of all possible inpu collecions is N max = K k=1 N k. We propose a Task Oriened Sampling algorihm o generae sample collecions for improvemens of a specific ask T k. Given he generaed sequence collecions X and label combinaions Y, he overall loss funcion can be calculaed based on Eqs.(4) and (13). The raining process is conduced in a Algorihm 1 Task Oriened Sampling Inpu: N i samples from each ask T i ; k, he oriened ask index; n 0, upsampling coefficien s.. N = n 0 N k Oupu: sequence collecions X and label combinaions Y 1: for each i [1, K] do 2: generae a se S i wih N samples for each ask: 3: if i = k hen 4: repea each sample for n 0 imes 5: else if N i N hen 6: randomly selec N samples wihou replacemens 7: else 8: randomly selec N samples wih replacemens 9: end if 10: end for 11: for each j [1, N] do 12: randomly selec a sample from each S i wihou replacemens 13: combine heir feaures and labels as X j and Y j 14: end for 15: merge all X j and Y j o produce he sequence collecions X and label combinaions Y sochasic manner unil convergence. For each loop, we randomly selec a collecion from he N candidaes and updae he parameers by aking a gradien sep. 4 Experimen In his secion, we design hree differen scenarios of muliask learning based on five benchmark daases for ex classificaion. we invesigae he empirical performances of our model and compare i o exising sae-of-he-ar models. 4.1 Daases As Table 1 shows, we selec five benchmark daases for ex classificaion and design hree experimen scenarios o evaluae he performances of our model. Muli-Cardinaliy Movie review daases wih differen average lenghs and class numbers, including SST- 1 [Socher e al., 2013], SST-2 and IMDB [Maas e al., 2011]. Muli-Domain Produc review daases on differen domains from Muli-Domain Senimen Daase [Blizer e al., 2007], including Books, DVDs, Elecronics and Kichen. Muli-Objecive Classificaion daases wih differen objecives, including IMDB, RN [Apé e al., 1994] and QC [Li and Roh, 2002]. 4.2 Hyperparameers and Training The whole nework is rained hrough back propagaion wih sochasic gradien descen [Amari, 1993]. We obain a prerained lookup able by applying Word2Vec [Mikolov e al., 2013a] on he Google News corpus, which conains more han 100B words wih a vocabulary size of abou 3M. All involved parameers are randomly iniialized from a runcaed normal disribuion wih zero mean and sandard deviaion. 3388

5 Proceedings of he Tweny-Sixh Inernaional Join Conference on Arificial Inelligence (IJCAI-17) Table 1: Five benchmark classificaion daases: SST, IMDB, MDSD, RN, QC. Daase Descripion Type Lengh Class Objecive SST Movie reviews in Sanford Senimen Treebank including SST-1 and SST-2 Senence 19 / 19 5 / 2 Senimen IMDB Inerne Movie Daabase Documen Senimen MDSD Produc reviews on books, DVDs, elecronics and kichen appliances Documen 176 / 189 / 115 / 97 2 Senimen RN Reuers Newswire opics classificaion Documen Topics QC Quesion Classificaion Senence 10 6 Quesion Types For each ask T k, we conduc TOS wih n 0 = 2 o improve is performance. Afer raining our model on he generaed sample collecions, we evaluae he performance of ask T k by comparing ŷ (k) and y (k) on he es se. We apply 10-fold cross-validaion and differen combinaions of hyperparameers are invesigaed, of which he bes one, as shown in Table 2, is reserved for comparisons wih sae-of-he-ar models. rade-off beween efficiency and effeciveness, we deermine n 0 = 2 as he opimal value for our experimens. Table 2: Hyperparameer seings Embedding size d = 300 Hidden layer size of LSTM n = 100 Iniial learning rae η = 0.1 Regularizaion weigh λ = Resuls We compare performances of our model wih he implemenaion of [Graves, 2013] and he resuls are shown in Table 3. Our model obains beer performances in Muli- Domain scenario wih an average improvemen of 4.5%, where daases are produc reviews on differen domains wih similar sequence lenghs and he same class number, hus producing sronger correlaions. Muli-Cardinaliy scenario also achieves significan improvemens of 2.77% on average, where daases are movie reviews wih differen cardinaliies. However, Muli-Objecive scenario benefis less from muli-ask learning due o lacks of salien correlaion among senimen, opic and quesion ype. The QC daase aims o classify each quesion ino six caegories and is performance even ges worse, which may be caused by poenial noises inroduced by oher asks. In pracice, he srucure of our model is flexible, as couplings and fusions beween some empirically unrelaed asks can be removed o alleviae compuaion coss. Influences of n 0 in TOS We furher explore he influence of n 0 in TOS on our model, which can be any posiive ineger. A higher value means larger and more various samples combinaions, while requires higher compuaion coss. Figure 2 shows he performances of daases in Muli- Domain scenario wih differen n 0. Compared o n 0 = 1, our model can achieve considerable improvemens when n 0 = 2 as more samples combinaions are available. However, here are no more salien gains as n 0 ges larger and poenial noises from oher asks may lead o performance degradaions. For a Figure 2: Influences of n 0 in TOS on differen daases Pair-wise Performance Gain In order o measure he correlaion srengh beween wo ask T i and T j, we learn hem joinly wih our model and define Pair-wise Performance Gain as P P G ij = Pi P j P ip j, where P i, P j, P i, P j are he performances of asks T i and T j when learned individually and joinly. We calculae PPGs for every wo asks in Table 1 and illusrae he resuls in Figure 3, where darkness of colors indicae srengh of correlaion. I is inuiive ha daases of Muli- Domain scenario obain relaively higher PPGs wih each oher as hey share similar cardinaliies and abundan lowlevel linguisic characerisics. Senences of QC daase are much shorer and convey unique characerisics from oher asks, hus resuling in quie lower PPGs. 4.4 Comparisons wih Sae-of-he-ar Models We apply he opimal hyperparameer seings and compare our model agains he following sae-of-he-ar models: NBOW Neural Bag-of-Words ha simply sums up embedding vecors of all words. PV Paragraph Vecors followed by logisic regression [Le and Mikolov, 2014]. MT-RNN Muli-Task learning wih Recurren Neural Neworks by a shared-layer archiecure [Liu e al., 2016c]. 3389

6 Proceedings of he Tweny-Sixh Inernaional Join Conference on Arificial Inelligence (IJCAI-17) Table 3: Resuls of our model on differen scenarios Model Muli-Cardinaliy Muli-Domain Muli-Objecive SST-1 SST-2 IMDB Books DVDs Elecronics Kichen IMDB RN QC Single Task Our Model Table 4: Comparisons wih sae-of-he-ar models Model SST-1 SST-2 IMDB Books DVDs Elecronics Kichen QC NBOW PV MT-RNN MT-CNN MT-DNN GRNN Our Model SST-1 SST-2 IMDB Books DVDs Elecronics Kichen QC SST-1 SST-2 IMDB Books DVDs Elecronics Kichen QC Figure 3: Visualizaion of Pair-wise Performance Gains MT-CNN Muli-Task learning wih Convoluional Neural Neworks [Collober and Weson, 2008] where lookup ables are parially shared. MT-DNN Muli-Task learning wih Deep Neural Neworks [Liu e al., 2015b] ha uilizes bag-of-word represenaions and a hidden shared layer. GRNN Gaed Recursive Neural Nework for senence modeling [Chen e al., 2015]. As Table 4 shows, our model obains compeiive or beer performances on all asks excep for he QC daase, as i conains poor correlaions wih oher asks. MT-RNN slighly ouperforms our model on SST, as senences from his daase are much shorer han hose from IMDB and MDSD, and anoher possible reason may be ha our model are more complex and requires larger daa for raining. Our model proposes he designs of various ineracions including coupling, local and global fusion, which can be furher implemened by oher sae-of-he-ar models and produce beer performances. 5 Relaed Work There are a large body of lieraures relaed o muli-ask learning wih neural neworks in NLP [Collober and Weson, ; Liu e al., 2015b; 2016b; 2016c]. [Collober and Weson, 2008] belongs o Type-I and uilizes shared lookup ables for common feaures, followed by ask-specific neural layers for several radiional NLP asks such as par-of-speech agging and semanic parsing. They use a fix-size window o solve he problem of variable-lengh exs, which can be beer handled by recurren neural neworks. [Liu e al., 2015b; 2016b; 2016c] all belong o Type- II where samples from differen asks are learned sequenially. [Liu e al., 2015b] applies bag-of-word represenaion and informaion of word orders are los. [Liu e al., 2016b] inroduces an exernal memory for informaion sharing wih a reading/wriing mechanism for communicaing, and [Liu e al., 2016c] proposes hree differen models for muli-ask learning wih recurren neural neworks. However, models of hese wo papers only involve pair-wise ineracions, which can be regarded as specific implemenaions of Coupling Layer and Fusion Layer in our model. Differen from he above models, our model focuses on Type-III and uilize recurren neural neworks o comprehensively capure various ineracions among asks, boh direc and indirec, local and global. Three or more asks are learned simulaneously and samples from differen asks are rained in parallel benefiing from each oher, hus obaining beer senence represenaions. 6 Conclusion and Fuure Work In his paper, we propose a muli-ask learning archiecure for ex classificaion wih four ypes of recurren neural layers. The archiecure is srucurally flexible and can be regarded as a generalized case of many previous works wih deliberae designs. We explore hree differen scenarios of muli-ask learning and our model can improve performances of mos asks wih addiional relaed informaion from ohers in all scenarios. In fuure work, we would like o invesigae furher implemenaions of couplings and fusions, and conclude more muli-ask learning perspecives. 3390

7 Proceedings of he Tweny-Sixh Inernaional Join Conference on Arificial Inelligence (IJCAI-17) References [Amari, 1993] Shun-ichi Amari. Backpropagaion and sochasic gradien descen mehod. Neurocompuing, 5: , [Apé e al., 1994] Chidanand Apé, Fred Damerau, and Sholom M. Weiss. Auomaed Learning of Decision Rules for Tex Caegorizaion. ACM Trans. Inf. Sys., 12: , [Bengio e al., 2013] Yoshua Bengio, Aaron C. Courville, and Pascal Vincen. Represenaion Learning: A Review and New Perspecives. IEEE Trans. Paern Anal. Mach. Inell., 35(8): , [Blizer e al., 2007] John Blizer, Mark Dredze, and Fernando Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adapaion for Senimen Classificaion. In ACL, [Caruana, 1997] Rich Caruana. Muliask Learning. Machine Learning, 28:41 75, [Chen e al., 2015] Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Shiyu Wu, and Xuanjing Huang. Senence Modeling wih Gaed Recursive Neural Nework. In EMNLP, pages , [Collober and Weson, 2008] Ronan Collober and Jason Weson. A unified archiecure for naural language processing: deep neural neworks wih muliask learning. In ICML, pages , [Graves, 2013] Alex Graves. Generaing Sequences Wih Recurren Neural Neworks. CoRR, abs/ , [Hochreier and Schmidhuber, 1997] Sepp Hochreier and Jürgen Schmidhuber. Long Shor-Term Memory. Neural Compuaion, 9(8): , [Le and Mikolov, 2014] Quoc V. Le and Tomas Mikolov. Disribued Represenaions of Senences and Documens. In Proceedings of he 31h Inernaional Conference on Machine Learning, ICML 2014, Beijing, China, June 2014, pages , [Li and Roh, 2002] Xin Li and Dan Roh. Learning Quesion Classifiers. In COLING, [Liu e al., 2015a] Pengfei Liu, Xipeng Qiu, Xinchi Chen, Shiyu Wu, and Xuanjing Huang. Muli-Timescale Long Shor-Term Memory Neural Nework for Modelling Senences and Documens. In EMNLP, pages , [Liu e al., 2015b] Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li Deng, Kevin Duh, and Ye-Yi Wang. Represenaion Learning Using Muli-Task Deep Neural Neworks for Semanic Classificaion and Informaion Rerieval. In NAACL HLT, pages , [Liu e al., 2016a] Pengfei Liu, Xipeng Qiu, Jifan Chen, and Xuanjing Huang. Deep Fusion LSTMs for Tex Semanic Maching. In ACL, [Liu e al., 2016b] Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. Deep Muli-Task Learning wih Shared Memory for Tex Classificaion. In EMNLP, pages , [Liu e al., 2016c] Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. Recurren Neural Nework for Tex Classificaion wih Muli-Task Learning. In IJCAI, pages , [Liu e al., 2016d] Pengfei Liu, Xipeng Qiu, Yaqian Zhou, Jifan Chen, and Xuanjing Huang. Modelling Ineracion of Senence Pair wih Coupled-LSTMs. In EMNLP, pages , [Maas e al., 2011] Andrew L. Maas, Raymond E. Daly, Peer T. Pham, Dan Huang, Andrew Y. Ng, and Chrisopher Pos. Learning Word Vecors for Senimen Analysis. In NAACL HLT, pages Associaion for Compuaional Linguisics, June [Mikolov e al., 2013a] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficien Esimaion of Word Represenaions in Vecor Space. CoRR, abs/ , [Mikolov e al., 2013b] Tomas Mikolov, Ilya Suskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Disribued Represenaions of Words and Phrases and heir Composiionaliy. In Advances in Neural Informaion Processing Sysems, pages , [Socher e al., 2013] Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Chrisopher D. Manning, Andrew Y. Ng, and Chrisopher Pos. Recursive Deep Models for Semanic Composiionaliy Over a Senimen Treebank. In EMNLP, pages , Sroudsburg, PA, Ocober Associaion for Compuaional Linguisics. [Suskever e al., 2014] Ilya Suskever, Oriol Vinyals, and Quoc V. Le. Sequence o Sequence Learning wih Neural Neworks. In Advances in Neural Informaion Processing Sysems, pages ,

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks -

Deep Learning: Theory, Techniques & Applications - Recurrent Neural Networks - Deep Learning: Theory, Techniques & Applicaions - Recurren Neural Neworks - Prof. Maeo Maeucci maeo.maeucci@polimi.i Deparmen of Elecronics, Informaion and Bioengineering Arificial Inelligence and Roboics

More information

Learning to Process Natural Language in Big Data Environment

Learning to Process Natural Language in Big Data Environment CCF ADL 2015 Nanchang Oc 11, 2015 Learning o Process Naural Language in Big Daa Environmen Hang Li Noah s Ark Lab Huawei Technologies Par 2: Useful Deep Learning Tools Powerful Deep Learning Tools (Unsupervised

More information

Deep Multi-Task Learning with Shared Memory

Deep Multi-Task Learning with Shared Memory Deep Muli-Task Learning wih Shared Memory Pengfei Liu Xipeng Qiu Xuanjing Huang Shanghai Key Laboraory of Inelligen Informaion Processing, Fudan Universiy School of Compuer Science, Fudan Universiy 825

More information

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK

CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 175 CHAPTER 10 VALIDATION OF TEST WITH ARTIFICAL NEURAL NETWORK 10.1 INTRODUCTION Amongs he research work performed, he bes resuls of experimenal work are validaed wih Arificial Neural Nework. From he

More information

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis

Speaker Adaptation Techniques For Continuous Speech Using Medium and Small Adaptation Data Sets. Constantinos Boulis Speaker Adapaion Techniques For Coninuous Speech Using Medium and Small Adapaion Daa Ses Consaninos Boulis Ouline of he Presenaion Inroducion o he speaker adapaion problem Maximum Likelihood Sochasic Transformaions

More information

Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks

Simplified Gating in Long Short-term Memory (LSTM) Recurrent Neural Networks Simplified Gaing in Long Shor-erm Memory (LSTM) Recurren Neural Neworks Yuzhen Lu and Fahi M. Salem Circuis, Sysems, and Neural Neworks (CSANN) Lab Deparmen of Biosysems and Agriculural Engineering Deparmen

More information

Ensamble methods: Bagging and Boosting

Ensamble methods: Bagging and Boosting Lecure 21 Ensamble mehods: Bagging and Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Ensemble mehods Mixure of expers Muliple base models (classifiers, regressors), each covers a differen par

More information

Ensamble methods: Boosting

Ensamble methods: Boosting Lecure 21 Ensamble mehods: Boosing Milos Hauskrech milos@cs.pi.edu 5329 Senno Square Schedule Final exam: April 18: 1:00-2:15pm, in-class Term projecs April 23 & April 25: a 1:00-2:30pm in CS seminar room

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2

Dimitri Solomatine. D.P. Solomatine. Data-driven modelling (part 2). 2 Daa-driven modelling. Par. Daa-driven Arificial di Neural modelling. Newors Par Dimiri Solomaine Arificial neural newors D.P. Solomaine. Daa-driven modelling par. 1 Arificial neural newors ANN: main pes

More information

Deep Convolutional Recurrent Network for Segmentation-free Offline Handwritten Japanese Text Recognition

Deep Convolutional Recurrent Network for Segmentation-free Offline Handwritten Japanese Text Recognition 2017 14h IAPR Inernaional Conference on Documen Analysis and Recogniion Deep Convoluional Recurren Nework for Segmenaion-free Offline Handwrien Japanese Tex Recogniion Nam-Tuan Ly Dep. of Compuer Science

More information

Experiments on logistic regression

Experiments on logistic regression Experimens on logisic regression Ning Bao March, 8 Absrac In his repor, several experimens have been conduced on a spam daa se wih Logisic Regression based on Gradien Descen approach. Firs, he overfiing

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems

Single-Pass-Based Heuristic Algorithms for Group Flexible Flow-shop Scheduling Problems Single-Pass-Based Heurisic Algorihms for Group Flexible Flow-shop Scheduling Problems PEI-YING HUANG, TZUNG-PEI HONG 2 and CHENG-YAN KAO, 3 Deparmen of Compuer Science and Informaion Engineering Naional

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part V Language Models, RNN, GRU and LSTM 2 Winter 2019

CS224n: Natural Language Processing with Deep Learning 1 Lecture Notes: Part V Language Models, RNN, GRU and LSTM 2 Winter 2019 CS224n: Naural Language Processing wih Deep Learning 1 Lecure Noes: Par V Language Models, RNN, GRU and LSTM 2 Winer 2019 1 Course Insrucors: Chrisopher Manning, Richard Socher 2 Auhors: Milad Mohammadi,

More information

Isolated-word speech recognition using hidden Markov models

Isolated-word speech recognition using hidden Markov models Isolaed-word speech recogniion using hidden Markov models Håkon Sandsmark December 18, 21 1 Inroducion Speech recogniion is a challenging problem on which much work has been done he las decades. Some of

More information

Shortcut Sequence Tagging

Shortcut Sequence Tagging Shorcu Sequence Tagging Huijia Wu 1,3, Jiajun Zhang 1,2, and Chengqing Zong 1,2,3 1 Naional Laboraory of Paern Recogniion, Insiue of Auomaion, CAS 2 CAS Cener for Excellence in Brain Science and Inelligence

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model

Retrieval Models. Boolean and Vector Space Retrieval Models. Common Preprocessing Steps. Boolean Model. Boolean Retrieval Model 1 Boolean and Vecor Space Rerieval Models Many slides in his secion are adaped from Prof. Joydeep Ghosh (UT ECE) who in urn adaped hem from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) Rerieval

More information

Power Market Price Forecasting via Deep Learning

Power Market Price Forecasting via Deep Learning Power Marke Price Forecasing via Deep Learning Yongli Zhu, Renchang Dai, Guangyi Liu, Zhiwei Wang GEIRI Norh America Graph Compuing and Grid Modernizaion Deparmen San Jose, California yzhu16@vols.uk.edu

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Forecasting Stock Exchange Movements Using Artificial Neural Network Models and Hybrid Models

Forecasting Stock Exchange Movements Using Artificial Neural Network Models and Hybrid Models Forecasing Sock Exchange Movemens Using Arificial Neural Nework Models and Hybrid Models Erkam GÜREEN and Gülgün KAYAKUTLU Isanbul Technical Universiy, Deparmen of Indusrial Engineering, Maçka, 34367 Isanbul,

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Distributed Language Models Using RNNs

Distributed Language Models Using RNNs Disribued Language Models Using RNNs Ting-Po Lee ingpo@sanford.edu Taman Narayan amann@sanford.edu 1 Inroducion Language models are a fundamenal par of naural language processing. Given he prior words

More information

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure

Learning Objectives: Practice designing and simulating digital circuits including flip flops Experience state machine design procedure Lab 4: Synchronous Sae Machine Design Summary: Design and implemen synchronous sae machine circuis and es hem wih simulaions in Cadence Viruoso. Learning Objecives: Pracice designing and simulaing digial

More information

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints IJCSI Inernaional Journal of Compuer Science Issues, Vol 9, Issue 1, No 1, January 2012 wwwijcsiorg 18 Applying Geneic Algorihms for Invenory Lo-Sizing Problem wih Supplier Selecion under Sorage Capaciy

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS

A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS A DELAY-DEPENDENT STABILITY CRITERIA FOR T-S FUZZY SYSTEM WITH TIME-DELAYS Xinping Guan ;1 Fenglei Li Cailian Chen Insiue of Elecrical Engineering, Yanshan Universiy, Qinhuangdao, 066004, China. Deparmen

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

Removing Useless Productions of a Context Free Grammar through Petri Net

Removing Useless Productions of a Context Free Grammar through Petri Net Journal of Compuer Science 3 (7): 494-498, 2007 ISSN 1549-3636 2007 Science Publicaions Removing Useless Producions of a Conex Free Grammar hrough Peri Ne Mansoor Al-A'ali and Ali A Khan Deparmen of Compuer

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Numerical Dispersion

Numerical Dispersion eview of Linear Numerical Sabiliy Numerical Dispersion n he previous lecure, we considered he linear numerical sabiliy of boh advecion and diffusion erms when approimaed wih several spaial and emporal

More information

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4.

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4. PHY1 Elecriciy Topic 7 (Lecures 1 & 11) Elecric Circuis n his opic, we will cover: 1) Elecromoive Force (EMF) ) Series and parallel resisor combinaions 3) Kirchhoff s rules for circuis 4) Time dependence

More information

1 Evaluating Chromatograms

1 Evaluating Chromatograms 3 1 Evaluaing Chromaograms Hans-Joachim Kuss and Daniel Sauffer Chromaography is, in principle, a diluion process. In HPLC analysis, on dissolving he subsances o be analyzed in an eluen and hen injecing

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

RL Lecture 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 RL Lecure 7: Eligibiliy Traces R. S. Suon and A. G. Baro: Reinforcemen Learning: An Inroducion 1 N-sep TD Predicion Idea: Look farher ino he fuure when you do TD backup (1, 2, 3,, n seps) R. S. Suon and

More information

Written HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence

Written HW 9 Sol. CS 188 Fall Introduction to Artificial Intelligence CS 188 Fall 2018 Inroducion o Arificial Inelligence Wrien HW 9 Sol. Self-assessmen due: Tuesday 11/13/2018 a 11:59pm (submi via Gradescope) For he self assessmen, fill in he self assessmen boxes in your

More information

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization

A Forward-Backward Splitting Method with Component-wise Lazy Evaluation for Online Structured Convex Optimization A Forward-Backward Spliing Mehod wih Componen-wise Lazy Evaluaion for Online Srucured Convex Opimizaion Yukihiro Togari and Nobuo Yamashia March 28, 2016 Absrac: We consider large-scale opimizaion problems

More information

arxiv: v1 [cs.cl] 21 Nov 2017

arxiv: v1 [cs.cl] 21 Nov 2017 Cross Temporal Recurren Neworks for Ranking Quesion Answer Pairs Yi Tay 1, Luu Anh Tuan 2 and Siu Cheung Hui 3 1, 3 Nanyang Technological Universiy School of Compuer Science and Engineering, Singapore

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LDA, logisic

More information

Orientation. Connections between network coding and stochastic network theory. Outline. Bruce Hajek. Multicast with lost packets

Orientation. Connections between network coding and stochastic network theory. Outline. Bruce Hajek. Multicast with lost packets Connecions beween nework coding and sochasic nework heory Bruce Hajek Orienaion On Thursday, Ralf Koeer discussed nework coding: coding wihin he nework Absrac: Randomly generaed coded informaion blocks

More information

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi

Navneet Saini, Mayank Goyal, Vishal Bansal (2013); Term Project AML310; Indian Institute of Technology Delhi Creep in Viscoelasic Subsances Numerical mehods o calculae he coefficiens of he Prony equaion using creep es daa and Herediary Inegrals Mehod Navnee Saini, Mayank Goyal, Vishal Bansal (23); Term Projec

More information

Electrical and current self-induction

Electrical and current self-induction Elecrical and curren self-inducion F. F. Mende hp://fmnauka.narod.ru/works.hml mende_fedor@mail.ru Absrac The aricle considers he self-inducance of reacive elemens. Elecrical self-inducion To he laws of

More information

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model

Modal identification of structures from roving input data by means of maximum likelihood estimation of the state space model Modal idenificaion of srucures from roving inpu daa by means of maximum likelihood esimaion of he sae space model J. Cara, J. Juan, E. Alarcón Absrac The usual way o perform a forced vibraion es is o fix

More information

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time.

Nature Neuroscience: doi: /nn Supplementary Figure 1. Spike-count autocorrelations in time. Supplemenary Figure 1 Spike-coun auocorrelaions in ime. Normalized auocorrelaion marices are shown for each area in a daase. The marix shows he mean correlaion of he spike coun in each ime bin wih he spike

More information

Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks

Collaborative Recurrent Autoencoder: Recommend while Learning to Fill in the Blanks Collaboraive Recurren Auoencoder: Recommend while Learning o Fill in he Blanks Hao Wang, Xingjian Shi, Di-Yan Yeung Hong Kong Universiy of Science and Technology {hwangaz,xshiab,dyyeung}@cse.us.hk Absrac

More information

Stability and Bifurcation in a Neural Network Model with Two Delays

Stability and Bifurcation in a Neural Network Model with Two Delays Inernaional Mahemaical Forum, Vol. 6, 11, no. 35, 175-1731 Sabiliy and Bifurcaion in a Neural Nework Model wih Two Delays GuangPing Hu and XiaoLing Li School of Mahemaics and Physics, Nanjing Universiy

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear

The Rosenblatt s LMS algorithm for Perceptron (1958) is built around a linear neuron (a neuron with a linear In The name of God Lecure4: Percepron and AALIE r. Majid MjidGhoshunih Inroducion The Rosenbla s LMS algorihm for Percepron 958 is buil around a linear neuron a neuron ih a linear acivaion funcion. Hoever,

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

Hardware-Software Co-design of Slimmed Optical Neural Networks

Hardware-Software Co-design of Slimmed Optical Neural Networks Hardware-Sofware Co-design of Slimmed Opical Neural Neworks Zheng Zhao 1, Derong Liu 1, Meng Li 1, Zhoufeng Ying 1, Lu Zhang 2, Biying Xu 1, Bei Yu 2, Ray Chen 1, David Pan 1 The Universiy of Texas a Ausin

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Ensemble Confidence Estimates Posterior Probability

Ensemble Confidence Estimates Posterior Probability Ensemble Esimaes Poserior Probabiliy Michael Muhlbaier, Aposolos Topalis, and Robi Polikar Rowan Universiy, Elecrical and Compuer Engineering, Mullica Hill Rd., Glassboro, NJ 88, USA {muhlba6, opali5}@sudens.rowan.edu

More information

An Empirical Study on Energy Disaggregation via Deep Learning

An Empirical Study on Energy Disaggregation via Deep Learning Advances in Inelligen Sysems Research, volume 133 2nd Inernaional Conference on Arificial Inelligence and Indusrial Engineering (AIIE2016) An Empirical Sudy on Energy Disaggregaion via Deep Learning Wan

More information

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important

Non-parametric techniques. Instance Based Learning. NN Decision Boundaries. Nearest Neighbor Algorithm. Distance metric important on-parameric echniques Insance Based Learning AKA: neares neighbor mehods, non-parameric, lazy, memorybased, or case-based learning Copyrigh 2005 by David Helmbold 1 Do no fi a model (as do LTU, decision

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI: 0.038/NCLIMATE893 Temporal resoluion and DICE * Supplemenal Informaion Alex L. Maren and Sephen C. Newbold Naional Cener for Environmenal Economics, US Environmenal Proecion

More information

An EM based training algorithm for recurrent neural networks

An EM based training algorithm for recurrent neural networks An EM based raining algorihm for recurren neural neworks Jan Unkelbach, Sun Yi, and Jürgen Schmidhuber IDSIA,Galleria 2, 6928 Manno, Swizerland {jan.unkelbach,yi,juergen}@idsia.ch hp://www.idsia.ch Absrac.

More information

control properties under both Gaussian and burst noise conditions. In the ~isappointing in comparison with convolutional code systems designed

control properties under both Gaussian and burst noise conditions. In the ~isappointing in comparison with convolutional code systems designed 535 SOFT-DECSON THRESHOLD DECODNG OF CONVOLUTONAL CODES R.M.F. Goodman*, B.Sc., Ph.D. W.H. Ng*, M.S.E.E. Sunnnary Exising majoriy-decision hreshold decoders have so far been limied o his paper a new mehod

More information

EECE251. Circuit Analysis I. Set 4: Capacitors, Inductors, and First-Order Linear Circuits

EECE251. Circuit Analysis I. Set 4: Capacitors, Inductors, and First-Order Linear Circuits EEE25 ircui Analysis I Se 4: apaciors, Inducors, and Firs-Order inear ircuis Shahriar Mirabbasi Deparmen of Elecrical and ompuer Engineering Universiy of Briish olumbia shahriar@ece.ubc.ca Overview Passive

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Solutions for Assignment 2

Solutions for Assignment 2 Faculy of rs and Science Universiy of Torono CSC 358 - Inroducion o Compuer Neworks, Winer 218 Soluions for ssignmen 2 Quesion 1 (2 Poins): Go-ack n RQ In his quesion, we review how Go-ack n RQ can be

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n

Module 2 F c i k c s la l w a s o s f dif di fusi s o i n Module Fick s laws of diffusion Fick s laws of diffusion and hin film soluion Adolf Fick (1855) proposed: d J α d d d J (mole/m s) flu (m /s) diffusion coefficien and (mole/m 3 ) concenraion of ions, aoms

More information

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t)

More Digital Logic. t p output. Low-to-high and high-to-low transitions could have different t p. V in (t) EECS 4 Spring 23 Lecure 2 EECS 4 Spring 23 Lecure 2 More igial Logic Gae delay and signal propagaion Clocked circui elemens (flip-flop) Wriing a word o memory Simplifying digial circuis: Karnaugh maps

More information

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9)

CSE/NB 528 Lecture 14: Reinforcement Learning (Chapter 9) CSE/NB 528 Lecure 14: Reinforcemen Learning Chaper 9 Image from hp://clasdean.la.asu.edu/news/images/ubep2001/neuron3.jpg Lecure figures are from Dayan & Abbo s book hp://people.brandeis.edu/~abbo/book/index.hml

More information

COMPRESSED KERNEL PERCEPTRONS

COMPRESSED KERNEL PERCEPTRONS COMPRESSED KERNEL PERCEPRONS Slobodan Vuceic * Vladimir Coric Zhuang Wang Deparmen of Compuer and Informaion Sciences emple Universiy Philadelphia, PA 9, USA * email: vuceic@is.emple.edu Absrac Kernel

More information

Comparing Means: t-tests for One Sample & Two Related Samples

Comparing Means: t-tests for One Sample & Two Related Samples Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion

More information

Phys1112: DC and RC circuits

Phys1112: DC and RC circuits Name: Group Members: Dae: TA s Name: Phys1112: DC and RC circuis Objecives: 1. To undersand curren and volage characerisics of a DC RC discharging circui. 2. To undersand he effec of he RC ime consan.

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Georey E. Hinton. University oftoronto. Technical Report CRG-TR February 22, Abstract

Georey E. Hinton. University oftoronto.   Technical Report CRG-TR February 22, Abstract Parameer Esimaion for Linear Dynamical Sysems Zoubin Ghahramani Georey E. Hinon Deparmen of Compuer Science Universiy oftorono 6 King's College Road Torono, Canada M5S A4 Email: zoubin@cs.orono.edu Technical

More information

Content-Based Shape Retrieval Using Different Shape Descriptors: A Comparative Study Dengsheng Zhang and Guojun Lu

Content-Based Shape Retrieval Using Different Shape Descriptors: A Comparative Study Dengsheng Zhang and Guojun Lu Conen-Based Shape Rerieval Using Differen Shape Descripors: A Comparaive Sudy Dengsheng Zhang and Guojun Lu Gippsland School of Compuing and Informaion Technology Monash Universiy Churchill, Vicoria 3842

More information

Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism

Learning to Jointly Translate and Predict Dropped Pronouns with a Shared Reconstruction Mechanism Learning o Joinly Translae and Predic Dropped Pronouns wih a Shared Reconsrucion Mechanism Longyue Wang Tencen AI Lab vinnylywang@encen.com Andy Way Dublin Ciy Universiy andy.way@adapcenre.ie Absrac Pronouns

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings

Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings Fine-grained Opinion Mining wih Recurren Neural Neworks and Word Embeddings Pengfei Liu 1, Shafiq Joy 2 and Helen Meng 1 1 Deparmen of Sysems Engineering and Engineering Managemen, The Chinese Universiy

More information

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN

STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN Inernaional Journal of Applied Economerics and Quaniaive Sudies. Vol.1-3(004) STRUCTURAL CHANGE IN TIME SERIES OF THE EXCHANGE RATES BETWEEN YEN-DOLLAR AND YEN-EURO IN 001-004 OBARA, Takashi * Absrac The

More information

Single and Double Pendulum Models

Single and Double Pendulum Models Single and Double Pendulum Models Mah 596 Projec Summary Spring 2016 Jarod Har 1 Overview Differen ypes of pendulums are used o model many phenomena in various disciplines. In paricular, single and double

More information

Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension. Wolfgang Klippel,

Mechanical Fatigue and Load-Induced Aging of Loudspeaker Suspension. Wolfgang Klippel, Mechanical Faigue and Load-Induced Aging of Loudspeaker Suspension Wolfgang Klippel, Insiue of Acousics and Speech Communicaion Dresden Universiy of Technology presened a he ALMA Symposium 2012, Las Vegas

More information

A unit root test based on smooth transitions and nonlinear adjustment

A unit root test based on smooth transitions and nonlinear adjustment MPRA Munich Personal RePEc Archive A uni roo es based on smooh ransiions and nonlinear adjusmen Aycan Hepsag Isanbul Universiy 5 Ocober 2017 Online a hps://mpra.ub.uni-muenchen.de/81788/ MPRA Paper No.

More information

Tensorial Recurrent Neural Networks for Longitudinal Data Analysis

Tensorial Recurrent Neural Networks for Longitudinal Data Analysis 1 Tensorial Recurren Neural Neworks for Longiudinal Daa Analysis Mingyuan Bai Boyan Zhang and Junbin Gao arxiv:1708.00185v1 [cs.lg] 1 Aug 2017 Absrac Tradiional Recurren Neural Neworks assume vecorized

More information

EE100 Lab 3 Experiment Guide: RC Circuits

EE100 Lab 3 Experiment Guide: RC Circuits I. Inroducion EE100 Lab 3 Experimen Guide: A. apaciors A capacior is a passive elecronic componen ha sores energy in he form of an elecrosaic field. The uni of capaciance is he farad (coulomb/vol). Pracical

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

arxiv: v1 [cs.lg] 18 Jul 2018

arxiv: v1 [cs.lg] 18 Jul 2018 General Value Funcion Neworks Mahew Schlegel Universiy of Albera mkschleg@ualbera.ca Adam Whie Universiy of Albera amw8@ualbera.ca Andrew Paerson Indiana Universiy andnpa@indiana.edu arxiv:1807.06763v1

More information

Pattern Classification (VI) 杜俊

Pattern Classification (VI) 杜俊 Paern lassificaion VI 杜俊 jundu@usc.edu.cn Ouline Bayesian Decision Theory How o make he oimal decision? Maximum a oserior MAP decision rule Generaive Models Join disribuion of observaion and label sequences

More information

Presentation Overview

Presentation Overview Acion Refinemen in Reinforcemen Learning by Probabiliy Smoohing By Thomas G. Dieerich & Didac Busques Speaer: Kai Xu Presenaion Overview Bacground The Probabiliy Smoohing Mehod Experimenal Sudy of Acion

More information

5 The fitting methods used in the normalization of DSD

5 The fitting methods used in the normalization of DSD The fiing mehods used in he normalizaion of DSD.1 Inroducion Sempere-Torres e al. 1994 presened a general formulaion for he DSD ha was able o reproduce and inerpre all previous sudies of DSD. The mehodology

More information

Fusion and Inference from Multiple Data Sources in a Commensurate Space

Fusion and Inference from Multiple Data Sources in a Commensurate Space Fusion and Inference from Muliple Daa Sources in a Commensurae Space Zhiliang Ma 1, David. Marchee 2 and Carey E. Priebe 1 1 Applied Mahemaics & Saisics, ohns Hopkins Universiy, Balimore, MD, USA 2 Naval

More information

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate.

Introduction D P. r = constant discount rate, g = Gordon Model (1962): constant dividend growth rate. Inroducion Gordon Model (1962): D P = r g r = consan discoun rae, g = consan dividend growh rae. If raional expecaions of fuure discoun raes and dividend growh vary over ime, so should he D/P raio. Since

More information

References are appeared in the last slide. Last update: (1393/08/19)

References are appeared in the last slide. Last update: (1393/08/19) SYSEM IDEIFICAIO Ali Karimpour Associae Professor Ferdowsi Universi of Mashhad References are appeared in he las slide. Las updae: 0..204 393/08/9 Lecure 5 lecure 5 Parameer Esimaion Mehods opics o be

More information

Temporal Abstraction in Temporal-difference Networks

Temporal Abstraction in Temporal-difference Networks Temporal Absracion in Temporal-difference Neworks Richard S. Suon, Eddie J. Rafols, Anna Koop Deparmen of Compuing Science Universiy of Albera Edmonon, AB, Canada T6G 2E8 {suon,erafols,anna}@cs.ualbera.ca

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information