ON THE COMPRESSION OF RECURRENT NEURAL NETWORKS WITH AN APPLICATION TO LVCSR ACOUSTIC MODELING FOR EMBEDDED SPEECH RECOGNITION

Size: px
Start display at page:

Download "ON THE COMPRESSION OF RECURRENT NEURAL NETWORKS WITH AN APPLICATION TO LVCSR ACOUSTIC MODELING FOR EMBEDDED SPEECH RECOGNITION"

Transcription

1 ON THE COMPRESSION OF RECURRENT NEURAL NETWORKS WITH AN APPLICATION TO LVCSR ACOUSTIC MODELING FOR EMBEDDED SPEECH RECOGNITION Roit Prabavakar Ouais Asarif Antoine Bruguier Ian McGraw Googe Inc. ABSTRACT We study te probem of compressing recurrent neura networks (RNNs). In particuar, we focus on te compression of RNN acoustic modes, wic are motivated by te goa of buiding compact and accurate speec recognition systems wic can be run efficienty on mobie devices. In tis work, we present a tecnique for genera recurrent mode compression tat jointy compresses bot recurrent and non-recurrent inter-ayer weigt matrices. We find tat te proposed tecnique aows us to reduce te size of our Long Sort-Term Memory (LSTM) acoustic mode to a tird of its origina size wit negigibe oss in accuracy. Index Terms mode compression, LSTM, RNN, SVD, embedded speec recognition 1. INTRODUCTION Neura networks (NNs) wit mutipe feed-forward [1, 2] or recurrent idden ayers [3, 4] ave emerged as state-of-teart acoustic modes (AMs) for automatic speec recognition (ASR) tasks. Advances in computationa capabiities couped wit te avaiabiity of arge annotated speec corpora ave made it possibe to train NN-based AMs wit a arge number of parameters [5] wit great success. As speec recognition tecnoogies continue to improve, tey are becoming increasingy ubiquitous on mobie devices: voice assistants suc as Appe s Siri, Microsoft s Cortana, Amazon s Aexa and Googe Now [6] enabe users to searc for information using teir voice. Atoug te traditiona mode for tese appications as been to recognize speec remotey on arge servers, tere as been growing interest in deveoping ASR tecnoogies tat can recognize te input speec directy on-device [7]. Tis as te promise to reduce atency wie enabing user interaction even in cases were a mobie data connection is eiter unavaiabe, sow or unreiabe. Some of te main caenges in tis regard are te disk, memory and computationa constraints imposed by tese devices. Since te number of operations in neura Equa contribution. Te autors woud ike to tank Haşim Sak and Razie Avarez for epfu comments and suggestions on tis work, and Cris Tornton and Yu-sin Cen for comments on an earier draft. networks is proportiona to te number of mode parameters, compressing te mode is desirabe from te point of view of reducing memory usage and power consumption. In tis paper, we study tecniques for compressing recurrent neura networks (RNNs), specificay RNN acoustic modes. We demonstrate ow a generaization of conventiona inter-ayer matrix factorization tecniques (e.g., [8, 9]), were we jointy compress bot recurrent and inter-ayer weigt matrices, aows us to compress acoustic modes up to a tird of teir origina size wit negigibe oss in accuracy. Wie we focus on acoustic modeing, te tecniques presented can be appied to RNNs in oter domains, e.g., andwriting recognition [10] and macine transation [11] inter aia. Te tecnique presented in tis paper encompasses bot traditiona recurrent neura networks (RNNs) as we as Long Sort-Term Memory (LSTM) neura networks. In Section 2, we review previous work tat as focussed on tecniques for compressing neura networks. Our proposed compression tecnique is presented in Section 3. We examine te effectiveness of proposed tecniques in Sections 4 and 5. Finay, we concude wit a discussion of our findings in Section RELATED WORK Tere ave been a number of previous proposas to compress neura networks, bot in te context of ASR as we as in te broader fied of macine earning. We summarize a number of proposed approaces in tis section. It as been noted in previous work tat tere is a arge amount of redundancy in te parameters of a neura network. For exampe, Deni et a. [12] sow tat te entire neura network can be reconstructed given te vaues of a sma number of parameters. Caruana and coeagues sow tat te output distribution earned by a arger neura network can be approximated by a neura network wit fewer parameters by training te smaer network to directy predict te outputs of te arger network [13, 14]. Tis approac, termed mode compression [13] is cosey reated to te recent distiation approac proposed by Hinton et a. [15]. Te redundancy in a neura network as aso been expoited in te HasNet approac of Cen et a. [16], wic imposes parameter tying in /16/$ IEEE 5970 ICASSP 2016

2 network based on a set of as functions. In te context of ASR, previous approaces to acoustic mode compression ave focused mainy on te case of feedforward DNNs. One popuar tecnique is based on sparsifying te weigt matrices in te neura network, for exampe, by setting weigts wose magnitude fas beow a certain tresod to zero [1] or based on te second-derivative of te oss function in te optima brain damage procedure [17]. In fact, Seide et a. [1] demonstrate tat up to two-tirds of te weigts of te feed-forward network can be set to zero witout incurring any oss in performance. Atoug tecniques based on sparsification do decrease te number of effective weigts, encoding te subset of weigts wic can be zeroed out requires additiona memory. Furter, if te weigt matrices are represented as dense matrices for efficient computation, ten te parameter savings on disk wi not transate in to savings of runtime memory. Oter tecniques to reduce te number of mode parameters is based on canging te neura network arcitecture, e.g., by introducing botteneck ayers [18] or troug a ow-rank matrix factorization ayer [19]. We aso note recent work by Wang et a. [20] wic uses a combination of singuar vaue decomposition (SVD) and vector quantization to compress acoustic modes. Te metods investigated in our work are most simiar to previous work tat as examined using SVD to reduce te number of parameters in te network in te context of feedforward DNNs [8, 9, 21]. As we describe in Section 3, our metods can be tougt of as an extension of te tecniques proposed by Xue et a. [8], werein we jointy factorize bot recurrent and (non-recurrent) inter-ayer weigt matrices in te network. 3. MODEL COMPRESSION In tis section, we present a genera tecnique for compressing individua recurrent ayers in a recurrent neura network, tus generaizing te metods proposed by Xue et a. [8]. We describe our approac in te most genera setting of a standard RNN. We denote te activations of te -t idden ayer, consisting of N nodes, at time t by t R N. Te inputs to tis ayer at time t wic are in turn te activations from te previous ayer or te input features are denoted by R N 1. We can ten write te foowing equations wic define te output activations of te -t and ( + 1)-t ayers in a standard RNN: 1 t t = σ(wx 1 1 t + W t 1 + b ) (1) +1 t = σ(wx t + W t 1 + b +1 ) (2) were, b R N and b +1 R N +1 represent bias vectors, σ( ) denotes a non-inear activation function, and W x R N +1 N and W RN N denote weigt matrices tat we refer to respectivey as te inter-ayer and te recurrent Fig. 1. Te initia mode (Figure (a)) is compressed by jointy factorizing recurrent (W ) and inter-ayer (W x) matrices, using a sared recurrent projection matrix (P ) [3] (Figure (b)). weigt matrices, respectivey 1. Since our proposed approac can be appied independenty for eac recurrent idden ayer, we ony describe te compression operations for a particuar ayer. We jointy compress te recurrent and inter-ayer matrices corresponding to a specific ayer by determining a suitabe recurrent projection matrix [3], denoted by P R r N, of rank r < N suc tat, W = Z P and W x = Z xp, tus aowing us to re-write (1) and (2) as, t = σ(wx 1 1 t + ZP t 1 + b ) (3) +1 t = σ(zxp t + W t 1 + b +1 ) (4) were, Z RN r and Z x R N +1 r. Tis compression process is depicted grapicay in Figure 1. We note tat saring P across te recurrent and interayer matrices aows for more efficient parameterization of te weigt matrices; as sown in Section 5, tis does not resut in a significant oss of performance. Tus, te degree of compression in te mode can be controed by setting te ranks r of te projection matrices in eac of te ayers of te network. We determine te recurrent projection matrix P, by first computing an SVD of te recurrent weigt matrix, wic we ten truncate, retaining ony te top r singuar vaues (denoted by Σ ) and te corresponding singuar vectors from U and V (denoted by Ũ and Ṽ, respectivey): W = U Σ V T (Ũ Σ ) Ṽ T = Z P (5) were Z = Ũ Σ and P = Ṽ T. Finay, we determine Z x, as te soution to te foowing east-squares probem: Zx = arg min Y P Wx 2 F (6) Y 1 Te equations are sigty more compicated wen using LSTM ces in te recurrent ayer, but te basic form remains te same. See Section

3 were, X F denotes te Frobenius norm of te matrix. In piot experiments we found tat te proposed SVD-based initiaization performed better tan training a mode wit recurrent projection matrices (i.e., same mode arcitecture) but wit random initiaization of te network weigts Appying our tecnique to LSTM RNNs Generaizing te procedure described above in te context of standard RNNs to te case of LSTM RNNs [3, 22, 23] is straigtforward. Using te notation in [3], note tat te recurrent-weigt matrix W in te case of te LSTM is te concatenation of te four gate weigt matrices, obtained by stacking tem verticay: [W im, W om, W fm, W cm ] T wic represent respectivey, recurrent connections to te input gate, te output gate, te forget gate and te ce state. Simiary, te inter-ayer matrix W x is te concatenation of te matrices: [W ix, W fx, W ox, W cx ] T wic correspond to te input gate, te forget gate, te output gate and te ce state (of te next ayer). Wit tese definitions, compression can be appied as described in Section 3. Note tat we do not compress te peep-oe weigts, since tey are aready narrow, singe coumn matrices and do not contribute significanty to te tota number of parameters in te network. 4. EXPERIMENTAL SETUP In order to determine te effectiveness of te proposed RNN compression tecnique, we conduct experiments on a openended arge-vocabuary dictation task. As we mentioned in Section 1, one of our primary motivations beind investigating acoustic mode compression is to buid compact acoustic modes tat can be depoyed on mobie devices. In recent work, Sak et a. ave demonstrated tat deep LSTM-based AMs trained to predict eiter contextindependent (CI) poneme targets [22] or context-dependent (CD) poneme targets [23] approac state-of-te-art performance on speec tasks. Tese systems ave two important caracteristics: in addition to te CI or CD poneme abes, te system can aso ypotesize a bank abe if it is unsure of te identity of te current poneme, and te systems are trained to optimize te connectionist tempora cassification (CTC) criterion [24] wic maximizes te tota probabiity of correct abe sequence conditioned on te input sequence. More detais can be found in [22, 23]. Foowing [22], our baseine mode is tus a CTC mode: a five idden ayer RNN wit 500 LSTM ces in eac ayer, wic predicts 41 CI ponemes (pus bank ). As a point of comparison, we aso present resuts obtained using a muc arger state-of-te-art server-sized mode wic is too arge to depoy on embedded devices but nonetess serves as an upper-bound performance for our modes on tis dataset. Tis mode consists of five idden ayers wit 600 LSTM ces per ayer, and is trained to predict one of 9287 context-dependent ponemes (pus bank ). Our systems are trained using distributed asyncronous stocastic gradient descent wit a parameter server [25]. Te systems are first trained to convergence to optimize te CTC criterion, foowing wic tese are discriminativey sequence trained to optimize te state-eve minimum Bayes risk (smbr) criterion [26, 27]. As discussed in Section 5, after appying te proposed compression sceme, we furter fine-tune te network: first wit te CTC criterion, foowed by sequence discriminative training wit te smbr criterion. Tis additiona fine-tuning step was found to be necessary to acieve good performance, particuary as te amount of compression was increased. Te anguage mode used in tis work is a 5-gram mode trained on 100M sentences of in-domain data, wit entropybased pruning appied to reduce te size of te LM down to rougy 1.5M n-grams (mainy bigrams) wit a 64K vocabuary. Since our goa is to buid a recognizer to run efficienty on mobie devices, we minimize te size of te decoder grap used for recognition, foowing te approac outined in [7]: we perform an additiona pruning step to generate a muc smaer first-pass anguage mode (69.5K n-grams; mainy unigrams), wic is composed wit te exicon transducer to construct te decoder grap. We ten perform on-te-fy rescoring wit te arger LM. Te resuting modes, wen compressed for use on-device, tota about 20.3 MB, tus enabing tem to be run many times faster tan rea-time on recent mobie devices [28]. We parameterize te input acoustics by computing 40- dimensiona og me-fiterbank energies over te 8Kz range, wic are computed every 10ms over 25ms windowed speec segments. Te server-sized system uses 80-dimensiona features computed over te same range since tis resuted in sigty improved performance. Foowing [23], we stabiize CTC training by stacking togeter 8 consecutive speec frames (7 rigt context frames); ony every tird stacked frame is presented as an input to te network Training and Evauation Data Our systems are trained on 3M and-transcribed anonymized utterances extracted from Googe voice searc traffic ( 2000 ours). We create muti-stye training data by synteticay distorting utterances to simuate background noise and reverberation using a room simuator wit noise sampes extracted from YouTube videos and environmenta recordings of everyday events; 20 distorted exampes are created for eac utterance in te training set. Systems are additionay adapted using te smbr criterion [26, 27] on a set of 5972

4 1M anonymized and-transcribed (in-domain) dictation utterances extracted from Googe traffic, processed to generate muti-stye training data as described above, wic improves performance on our dictation task. A resuts are reported on a set of 13.3K and-transcribed anonymized utterances extracted from Googe traffic from an open-ended dictation domain. 5. RESULTS In our experiments, we seek to determine te impact of te proposed joint SVD-based compression tecnique on system performance. In particuar, we are interested in determining ow system performance varies as a function of te degree of compression, wic is controed by setting te ranks of te recurrent projection matrices r as described in Section 3. Notice tat since te proposed compression sceme is appied to a idden ayers of te baseine system, tere are numerous settings of te ranks r for te projection matrices in eac ayer wic resut in te same number of tota parameters in te compressed network. In order to avoid tis ambiguity, we set te various projection ranks using te foowing criterion: Given a tresod τ, for eac ayer, we set te rank r of te corresponding projection matrix suc tat it corresponds to retaining a fraction of at most τ of te expained variance after te truncated SVD of W. More specificay, if te singuar vaues in Σ in (5) are sorted in non-increasing order as σ1 σ2 σn, we set eac r as: r = arg max 1 k N { k j=1 σ 2 } j N j=1 σ 2 τ j Coosing te projection ranks using (7) aows us to contro te degree of compression, and tus compressed mode size by varying a singe parameter, τ. In piot experiments we found tat tis sceme performed better tan setting ranks to be equa for a ayers (given te same tota parameter budget). Once te projection ranks r ave been determined for te various projection matrices we fine-tune te compressed modes by first optimizing te CTC criterion, foowed by sequence training wit te smbr criterion and adaptation on in-domain data as described in Section 4.1. Te resuts of our experiments are presented in Tabe 1. As can be seen in Tabe 1, te baseine system wic predicts CI poneme targets is ony 10% reative worse tan te arger server-sized system, atoug it as af as many parameters. Since te ranks r are a cosen to retain a given fraction of te expained variance in te SVD operation, we aso note tat earier idden ayers in te network appear to ave ower ranks tan ater ayers, since most of te variance is accounted for by a smaer number of singuar vaues. It can be seen from Tabe 1 tat word error rates increase as te amount of compression is increased, atoug performance of (7) System Projection ranks, r Params WER server M 11.3 baseine - 9.7M 12.4 τ = , 375, 395, 405, M 12.3 τ = , 305, 335, 345, M 12.5 τ = , 215, 245, 260, M 12.5 τ = , 150, 180, 195, M 12.6 τ = , 105, 130, 145, M 12.9 τ = , 70, 90, 100, M 13.2 τ = , 45, 55, 65, M 14.4 Tabe 1. Word error rates (%) on te test set as a function of te percentage of expained variance retained (τ) after te SVDs of te recurrent weigt matrices W in te idden ayers of te RNN. te compressed systems are cose to te baseine for moderate compression (τ 0.7). Using a vaue of τ = 0.6, enabes te mode to be compressed to a tird of its origina size, wit ony a sma degradation in accuracy. However, performance begins to degrade significanty for τ 0.5. Future work wi consider aternative tecniques for setting te projection ranks r in order to examine teir impact on system performance. 6. CONCLUSIONS We presented a tecnique to compress RNNs using a joint factorization of recurrent and inter-ayer weigt matrices, generaizing previous work [8]. Te proposed tecnique was appied to te task of compressing LSTM RNN acoustic modes for embedded speec recognition, were we found tat we coud compress our baseine acoustic mode to a tird of its origina size wit negigibe oss in accuracy. Te proposed tecniques, in combination wit weigt quantization, aow us to buid a sma and efficient speec recognizer tat run many times faster tan rea-time on recent mobie devices [28]. 7. REFERENCES [1] F. Seide, G. Li, and D. Yu, Conversationa speec transcription using context-dependent deep neura networks, in Proc. of Interspeec, 2011, pp [2] G. Hinton, L. Deng, D. Yu, G. E. Da, A.-R. Moamed, N. Jaity, A. Senior, V. Vanoucke, P. Nguyen, T. N. Sainat, and B. Kingsbury, Deep neura networks for acoustic modeing in speec recognition: Te sared views of four researc groups, IEEE Signa Processing Magazine, vo. 29, no. 6, pp , [3] H. Sak, A. Senior, and F. Beaufays, Long sort-term memory recurrent neura network arcitectures for arge scae acoustic modeing, in Proc. of Interspeec, 2014, pp

5 [4] T. N. Sainat, O. Vinyas, A. Senior, and H. Sak, Convoutiona, ong sort-term memory, fuy connected deep neura networks, in Proc. of ICASSP, 2015, pp [5] L. Deng and D. Yu, Deep earning: metods and appications, Foundations and Trends in Signa Processing, vo. 7, no. 3 4, pp , [6] J. Scakwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Ceba, M. Coen, M. Kamvar, and B. Strope, Your Word is my Command : googe searc by voice: A case study, in Advances in Speec Recognition, pp Springer US, [7] X. Lei, A. Senior, A. Gruenstein, and J. Sorensen, Accurate and compact arge vocabuary speec recognition on mobie devices, in Proc. of Interspeec, 2013, pp [8] J. Xue, J. Li, and Y. Gong, Restructuring of deep neura network acoustic modes wit singuar vaue decomposition, in Proc. of Interspeec, 2013, pp [9] J. Xue, J. Li, D. Yu, M. Setzer, and Y. Gong, Singuar vaue decomposition based ow-footprint speaker adaptation and personaization for deep neura network, in Proc. of ICASSP, 2014, pp [10] A. Graves, M. Liwicki, S. Fernández, R. Bertoami, H. Bunke, and J. Scmiduber, A nove connectionist system for unconstrained andwriting recognition, IEEE Transactions on Pattern Anaysis and Macine Inteigence, vo. 31, no. 5, pp , [11] I. Sutskever, O. Vinyas, and Q. V. Le, Sequence to sequence earning wit neura networks, in Proc. of NIPS, 2014, pp [12] M. Deni, B. Sakibi, L. Din, M. Ranzato, and N. de Freitas, Predicting parameters in deep earning, in Proc. of NIPS, 2013, pp [13] C. Buciuă, R. Caruana, and A. Nicuescu-Mizi, Mode compression, in Proc. of te 12t ACM SIGKDD internationa conference on Knowedge discovery and data mining, 2006, pp [14] L. J. Ba and R. Caruana, Do deep nets reay need to be deep?, in Proc. of NIPS, 2014, pp [15] G. Hinton, O. Vinyas, and J. Dean, Distiing te knowedge in a neura network, arxiv preprint arxiv: , [16] W. Cen, J. T. Wison, S. Tyree, K. Q. Weinberger, and Y. Cen, Compressing neura networks wit te asing trick, in Proc. of ICML, 2015, pp [17] Y. LeCun, J. S. Denker, and S. A. Soa, Optima brain damage, in Proc. of NIPS, 1989, pp [18] F. Gréz and P. Fousek, Optimizing botte-neck features for LVCSR, in Proc. of ICASSP, Marc 2008, pp [19] T. N. Sainat, B. Kingsbury, V. Sindwani, E. Arisoy, and B. Ramabadran, Low-rank matrix factorization for deep neura network training wit ig-dimensiona output targets, in Proc. of ICASSP, 2013, pp [20] Y. Wang, J. Li, and Y. Gong, Sma-footprint igperformance deep neura network-based speec recognition using spit-vq, in Proc. of ICASSP, 2015, pp [21] P. Nakkiran, R. Avarez, R. Prabavakar, and C. Parada, Compressing deep neura networks using a rankconstrained topoogy, in Proc. of Interspeec, 2015, pp [22] H. Sak, A. Senior, K. Rao, O. İrsoy, A. Graves, F. Beaufays, and J. Scakwyk, Learning acoustic frame abeing for speec recognition wit recurrent neura networks, in Proc. of ICASSP, 2015, pp [23] H. Sak, A. Senior, K. Rao, and F. Beaufays, Fast and accurate recurrent neura network acoustic modes for speec recognition, in Proc. of Interspeec, 2015, pp [24] A. Graves, S. Fernández, F. Gomez, and J. Scmiduber, Connectionist tempora cassification: Labeing unsegmented sequence data wit recurrent neura networks, in Proc. of ICML, 2006, pp [25] J. Dean, G. S. Corrado, R. Monga, K. Cen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng, Large scae distributed deep networks, in Proc. of NIPS, 2012, pp [26] B. Kingsbury, Lattice-based optimization of sequence cassification criteria for neura-network acoustic modeing, in Proc. of ICASSP, 2009, pp [27] H. Sak, O. Vinyas, G. Heigod, A. Senior, E. McDermott, R. Monga, and M. Mao, Sequence discriminative distributed training of ong sort-term memory recurrent neura networks, in Proc. of Interspeec, 2014, pp [28] I. McGraw, R. Prabavakar, R. Avarez, M. Gonzaez Arenas, K. Rao, D. Rybac, O. Asarif, H. Sak, A. Gruenstein, F. Beaufays, and C. Parada, Personaized speec recognition on mobie devices, in Proc. of ICASSP,

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Recap: probability, language models, and feedforward networks Simple Recurrent Networks Adam Lopez Credits: Mirella Lapata

More information

Theory and implementation behind: Universal surface creation - smallest unitcell

Theory and implementation behind: Universal surface creation - smallest unitcell Teory and impementation beind: Universa surface creation - smaest unitce Bjare Brin Buus, Jaob Howat & Tomas Bigaard September 15, 218 1 Construction of surface sabs Te aim for tis part of te project is

More information

SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ. Yongqiang Wang, Jinyu Li and Yifan Gong

SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ. Yongqiang Wang, Jinyu Li and Yifan Gong SMALL-FOOTPRINT HIGH-PERFORMANCE DEEP NEURAL NETWORK-BASED SPEECH RECOGNITION USING SPLIT-VQ Yongqiang Wang, Jinyu Li and Yifan Gong Microsoft Corporation, One Microsoft Way, Redmond, WA 98052 {erw, jinyli,

More information

International Journal "Information Technologies & Knowledge" Vol.5, Number 1,

International Journal Information Technologies & Knowledge Vol.5, Number 1, Internationa Journa "Information Tecnoogies & Knowedge" Vo.5, Number, 0 5 EVOLVING CASCADE NEURAL NETWORK BASED ON MULTIDIMESNIONAL EPANECHNIKOV S KERNELS AND ITS LEARNING ALGORITHM Yevgeniy Bodyanskiy,

More information

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks

Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks Modeling Time-Frequency Patterns with LSTM vs Convolutional Architectures for LVCSR Tasks Tara N Sainath, Bo Li Google, Inc New York, NY, USA {tsainath, boboli}@googlecom Abstract Various neural network

More information

Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks

Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks Haşim Sak, Andrew Senior, Oriol Vinyals, Georg Heigold, Erik McDermott, Rajat Monga, Mark Mao, Françoise Beaufays

More information

Neural Networks Compression for Language Modeling

Neural Networks Compression for Language Modeling Neura Networks Compression for Language Modeing Artem M. Grachev 1,2, Dmitry I. Ignatov 2, and Andrey V. Savchenko 3 arxiv:1708.05963v1 [stat.ml] 20 Aug 2017 1 Samsung R&D Institute Rus, Moscow, Russia

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

Online Appendix. to Add-on Policies under Vertical Differentiation: Why Do Luxury Hotels Charge for Internet While Economy Hotels Do Not?

Online Appendix. to Add-on Policies under Vertical Differentiation: Why Do Luxury Hotels Charge for Internet While Economy Hotels Do Not? Onine Appendix to Add-on Poicies under Vertica Differentiation: Wy Do Luxury Hotes Carge for Internet Wie Economy Hotes Do Not? Song Lin Department of Marketing, Hong Kong University of Science and Tecnoogy

More information

Supplemental Notes to. Physical Geodesy GS6776. Christopher Jekeli. Geodetic Science School of Earth Sciences Ohio State University

Supplemental Notes to. Physical Geodesy GS6776. Christopher Jekeli. Geodetic Science School of Earth Sciences Ohio State University Suppementa Notes to ysica Geodesy GS6776 Cristoper Jekei Geodetic Science Scoo of Eart Sciences Oio State University 016 I. Terrain eduction (or Correction): Te terrain correction is a correction appied

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

Joint Flow Control, Routing and Medium Access Control in Random Access Multi-Hop Wireless Networks with Time Varying Link Capacities

Joint Flow Control, Routing and Medium Access Control in Random Access Multi-Hop Wireless Networks with Time Varying Link Capacities 22 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.8, NO.1 February 2010 Joint Fow Contro, Routing and Medium Access Contro in Random Access Muti-Hop Wireess Networks wit Time

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Moammad Ali Keyvanrad a, Moammad Medi Homayounpour a a Laboratory for Intelligent Multimedia Processing (LIMP), Computer

More information

Sample Problems for Third Midterm March 18, 2013

Sample Problems for Third Midterm March 18, 2013 Mat 30. Treibergs Sampe Probems for Tird Midterm Name: Marc 8, 03 Questions 4 appeared in my Fa 000 and Fa 00 Mat 30 exams (.)Let f : R n R n be differentiabe at a R n. (a.) Let g : R n R be defined by

More information

Highway-LSTM and Recurrent Highway Networks for Speech Recognition

Highway-LSTM and Recurrent Highway Networks for Speech Recognition Highway-LSTM and Recurrent Highway Networks for Speech Recognition Golan Pundak, Tara N. Sainath Google Inc., New York, NY, USA {golan, tsainath}@google.com Abstract Recently, very deep networks, with

More information

Convolutional Networks 2: Training, deep convolutional networks

Convolutional Networks 2: Training, deep convolutional networks Convoutiona Networks 2: Training, deep convoutiona networks Hakan Bien Machine Learning Practica MLP Lecture 8 30 October / 6 November 2018 MLP Lecture 8 / 30 October / 6 November 2018 Convoutiona Networks

More information

Notes on Neural Networks

Notes on Neural Networks Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat

More information

Supervised i-vector Modeling - Theory and Applications

Supervised i-vector Modeling - Theory and Applications Supervised i-vector Modeing - Theory and Appications Shreyas Ramoji, Sriram Ganapathy Learning and Extraction of Acoustic Patterns LEAP) Lab, Eectrica Engineering, Indian Institute of Science, Bengauru,

More information

MIMO Multiway Relaying with Pairwise Data Exchange: A Degrees of Freedom Perspective

MIMO Multiway Relaying with Pairwise Data Exchange: A Degrees of Freedom Perspective IO utiway Reaying wit Pairwise Data Excange: A Degrees of Freedom Perspective Rui Wang, ember, IEEE and Xiaojun Yuan, ember, IEEE arxiv:40.79v cs.it] 7 Aug 04 Abstract In tis paper, we study acievabe degrees

More information

Chemistry 3502 Physical Chemistry II (Quantum Mechanics) 3 Credits Fall Semester 2003 Christopher J. Cramer. Lecture 12, October 1, 2003

Chemistry 3502 Physical Chemistry II (Quantum Mechanics) 3 Credits Fall Semester 2003 Christopher J. Cramer. Lecture 12, October 1, 2003 Cemistry 350 Pysica Cemistry II (Quantum Mecanics) 3 Credits Fa Semester 003 Cristoper J. Cramer Lecture 1, October 1, 003 Soved Homework We are asked to demonstrate te ortogonaity of te functions Φ(φ)

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

5.1 We will begin this section with the definition of a rational expression. We

5.1 We will begin this section with the definition of a rational expression. We Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Densely Connected Bidirectional LSTM with Applications to Sentence Classification

Densely Connected Bidirectional LSTM with Applications to Sentence Classification Densey Connected Bidirectiona LSTM wit Appications to Sentence Cassification Zixiang Ding 1, Rui Xia 1, Jianfei Yu 2, Xiang Li 1, Jian Yang 1 1 Scoo of Computer Science and Engineering, Nanjing University

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

THE hidden Markov model (HMM)-based parametric

THE hidden Markov model (HMM)-based parametric JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Modeling Spectral Envelopes Using Restricted Boltzmann Macines and Deep Belief Networks for Statistical Parametric Speec Syntesis Zen-Hua Ling,

More information

Deep Recurrent Neural Networks

Deep Recurrent Neural Networks Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

2.11 That s So Derivative

2.11 That s So Derivative 2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

A Theoretically Grounded Application of Dropout in Recurrent Neural Networks A Teoretically Grounded Application of Dropout in Recurrent Neural Networks Yarin Gal University of Cambridge {yg279,zg201}@cam.ac.uk oubin Garamani Abstract Recurrent neural networks (RNNs) stand at te

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

A Long Short-Term Memory Recurrent Neural Network Framework for Network Traffic Matrix Prediction

A Long Short-Term Memory Recurrent Neural Network Framework for Network Traffic Matrix Prediction A Long Sort-Term Memory Recurrent Neural Network Framework for Network Traffic Matrix Prediction Abdeladi Azzouni and Guy Pujolle LIP6 / UPMC; Paris, France {abdeladi.azzouni,guy.pujolle}@lip6.fr arxiv:1705.05690v3

More information

Online appendix for Household heterogeneity, aggregation, and the distributional impacts of environmental taxes

Online appendix for Household heterogeneity, aggregation, and the distributional impacts of environmental taxes Onine appendix for Houseod eterogeneity, aggregation, and te distributiona impacts of environmenta taxes Tis onine appendix contains suppementary anaysis () for Section 4.3 exporing te distributiona impacts

More information

Non-Linear Approximations in Linear Cryptanalysis

Non-Linear Approximations in Linear Cryptanalysis Non-Linear Approximations in Linear Cryptanaysis Lars R. Knudsen 1 and M.J.B. Robsaw 2 1 K.U. Leuven, ESAT, Kardinaa Mercieraan 94, B-3001 Heveree emai:knudsen@esat.kueuven.ac.be 2 RSA Laboratories, 100

More information

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Target Location Estimation in Wireless Sensor Networks Using Binary Data Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344

More information

Compression of End-to-End Models

Compression of End-to-End Models Interspeech 2018 2-6 September 2018, Hyderabad Compression of End-to-End Models Ruoming Pang, Tara N. Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang, Chung-cheng Chiu Google Inc.,

More information

An advanced variant of an interpolatory graphical display algorithm

An advanced variant of an interpolatory graphical display algorithm App. Num. Ana. Comp. Mat., No., 04 2 (2004) / DOI 0.002/anac.20030009 An advanced variant o an interpoatory grapica dispay agoritm G. Aa, E. Francomano 2,3, A.Tortorici 2, 3, E.Toscano 4, and F. Vioa Dipartimento

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

Overdispersed Variational Autoencoders

Overdispersed Variational Autoencoders Overdispersed Variational Autoencoders Harsil Sa, David Barber and Aleksandar Botev Department of Computer Science, University College London Alan Turing Institute arsil.sa.15@ucl.ac.uk, david.barber@ucl.ac.uk,

More information

Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rules 1

Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rules 1 Steepest Descent Adaptation of Min-Max Fuzzy If-Then Rues 1 R.J. Marks II, S. Oh, P. Arabshahi Λ, T.P. Caude, J.J. Choi, B.G. Song Λ Λ Dept. of Eectrica Engineering Boeing Computer Services University

More information

Continuity and Differentiability Worksheet

Continuity and Differentiability Worksheet Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

Chemical Kinetics Part 2

Chemical Kinetics Part 2 Integrated Rate Laws Chemica Kinetics Part 2 The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates the rate

More information

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,

More information

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

On Energy Efficiency of Prioritized IoT Systems

On Energy Efficiency of Prioritized IoT Systems On Energy Efficiency of Prioritized IoT Systems Abduraman Aabbasi, Basem Siada, and Cicek Cavdar KTH Roya Institute of Tecnoogy, Emai: {aabbasi, cavdar}@ktse King Abdua University of Science and Tecnoogy

More information

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c) A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

Estimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution

Estimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution International Journal of Clinical Medicine Researc 2016; 3(5): 76-80 ttp://www.aascit.org/journal/ijcmr ISSN: 2375-3838 Estimating Peak Bone Mineral Density in Osteoporosis Diagnosis by Maximum Distribution

More information

Average Rate of Change

Average Rate of Change Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope

More information

Analysis on the Diversity-Multiplexing Tradeoff for Ordered MIMO SIC Receivers

Analysis on the Diversity-Multiplexing Tradeoff for Ordered MIMO SIC Receivers Anaysis on te Diversity-Mutipexing radeoff for Ordered MIMO SIC eceivers ongyuan Zang Member IEEE uaiyu Dai Member IEEE and Brian L. uges Member IEEE * Abstract e diversity-mutipexing tradeoff for mutipe-input

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households Volume 29, Issue 3 Existence of competitive equilibrium in economies wit multi-member ouseolds Noriisa Sato Graduate Scool of Economics, Waseda University Abstract Tis paper focuses on te existence of

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

Soft Network Coding in Wireless Two-Way Relay Channels

Soft Network Coding in Wireless Two-Way Relay Channels Soft network coding in wireess two-way reay cannes Submitted to Journa of Communication and Networks Soft Network Coding in Wireess Two-Way Reay Cannes Sengi Zang *+, Yu Zu, Soung-cang Liew * * Department

More information

End-to-end Automatic Speech Recognition

End-to-end Automatic Speech Recognition End-to-end Automatic Speech Recognition Markus Nussbaum-Thom IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, USA Markus Nussbaum-Thom. February 22, 2017 Nussbaum-Thom: IBM Thomas J. Watson

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

Introduction to Derivatives

Introduction to Derivatives Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))

More information

Limited magnitude error detecting codes over Z q

Limited magnitude error detecting codes over Z q Limited magnitude error detecting codes over Z q Noha Earief choo of Eectrica Engineering and Computer cience Oregon tate University Corvais, OR 97331, UA Emai: earief@eecsorstedu Bea Bose choo of Eectrica

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Chemical Kinetics Part 2. Chapter 16

Chemical Kinetics Part 2. Chapter 16 Chemica Kinetics Part 2 Chapter 16 Integrated Rate Laws The rate aw we have discussed thus far is the differentia rate aw. Let us consider the very simpe reaction: a A à products The differentia rate reates

More information

Sequencing situations with just-in-time arrival, and related games Lohmann, E.M.; Borm, P.E.M.; Slikker, M.

Sequencing situations with just-in-time arrival, and related games Lohmann, E.M.; Borm, P.E.M.; Slikker, M. Sequencing situations wit just-in-time arriva, and reated games Lomann, E.M.; Borm, P.E.M.; Sikker, M. Pubised in: Matematica Metods of Operations Researc DOI: 10.1007/s00186-014-0481-x Pubised: 01/01/2014

More information

A Neural-Network Approach for Visual Cryptography and Authorization

A Neural-Network Approach for Visual Cryptography and Authorization Neura-Networ pproac for Visua Cryptograpy and utorization ai-wen Yue Computer Science and Engineering atung University aiwan twyu@mai.cse.ttu.edu.tw Sucen Ciang Computer Science and Engineering atung University

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,

More information

How the backpropagation algorithm works Srikumar Ramalingam School of Computing University of Utah

How the backpropagation algorithm works Srikumar Ramalingam School of Computing University of Utah How the backpropagation agorithm works Srikumar Ramaingam Schoo of Computing University of Utah Reference Most of the sides are taken from the second chapter of the onine book by Michae Nieson: neuranetworksanddeepearning.com

More information

Keywords: Surrogate modelling, data fusion, incomplete factorial DoE, tensor approximation

Keywords: Surrogate modelling, data fusion, incomplete factorial DoE, tensor approximation Buiding Data Fusion Surrogate Modes for Spacecraft Aerodynamic Probems wit Incompete Factoria Design of Experiments Miai Beyaev 1,, 3, a, Evgeny Burnaev 1,, 3, b, Erme apusev 1,, c, Stepane Aestra 4, d,

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

REVIEW LAB ANSWER KEY

REVIEW LAB ANSWER KEY REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g

More information

A General Correlation to Predict The Flow Boiling Heat Transfer of R410A in Macro/Mini Channels

A General Correlation to Predict The Flow Boiling Heat Transfer of R410A in Macro/Mini Channels Purdue University Purdue e-pubs Internationa Refrigeration and Air Conditioning Conference Scoo of Mecanica Engineering 1 A Genera Correation to Predict Te Fow Boiing Heat Transfer of R1A in Macro/Mini

More information

Instructional Objectives:

Instructional Objectives: Instructiona Objectives: At te end of tis esson, te students soud be abe to understand: Ways in wic eccentric oads appear in a weded joint. Genera procedure of designing a weded joint for eccentric oading.

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

MULTI-DISTRIBUTION DEEP BELIEF NETWORK FOR SPEECH SYNTHESIS. Shiyin Kang, Xiaojun Qian and Helen Meng

MULTI-DISTRIBUTION DEEP BELIEF NETWORK FOR SPEECH SYNTHESIS. Shiyin Kang, Xiaojun Qian and Helen Meng MULTI-DISTRIBUTION DEEP BELIEF NETORK FOR SPEECH SYNTHESIS Siyin Kang, Xiaojun Qian and Helen Meng Human Computer Communications Laboratory, Department of Systems Engineering and Engineering Management,

More information

Common Value Auctions with Costly Entry

Common Value Auctions with Costly Entry Common Vaue Auctions wit Costy Entry Paui Murto Juuso Väimäki Marc 25, 2019 Abstract We anayze an affiiated common vaues auction wit costy participation wit an unknown number of competing bidders. We ca

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

f a h f a h h lim lim

f a h f a h h lim lim Te Derivative Te derivative of a function f at a (denoted f a) is f a if tis it exists. An alternative way of defining f a is f a x a fa fa fx fa x a Note tat te tangent line to te grap of f at te point

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems Source and Reay Matrices Optimization for Mutiuser Muti-Hop MIMO Reay Systems Yue Rong Department of Eectrica and Computer Engineering, Curtin University, Bentey, WA 6102, Austraia Abstract In this paper,

More information

Learning based super-resolution land cover mapping

Learning based super-resolution land cover mapping earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc

More information

Chapter 1 Functions and Graphs. Section 1.5 = = = 4. Check Point Exercises The slope of the line y = 3x+ 1 is 3.

Chapter 1 Functions and Graphs. Section 1.5 = = = 4. Check Point Exercises The slope of the line y = 3x+ 1 is 3. Capter Functions and Graps Section. Ceck Point Exercises. Te slope of te line y x+ is. y y m( x x y ( x ( y ( x+ point-slope y x+ 6 y x+ slope-intercept. a. Write te equation in slope-intercept form: x+

More information

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs

More information

Combining reaction kinetics to the multi-phase Gibbs energy calculation

Combining reaction kinetics to the multi-phase Gibbs energy calculation 7 th European Symposium on Computer Aided Process Engineering ESCAPE7 V. Pesu and P.S. Agachi (Editors) 2007 Esevier B.V. A rights reserved. Combining reaction inetics to the muti-phase Gibbs energy cacuation

More information

Department of Statistics & Operations Research, Aligarh Muslim University, Aligarh, India

Department of Statistics & Operations Research, Aligarh Muslim University, Aligarh, India Open Journal of Optimization, 04, 3, 68-78 Publised Online December 04 in SciRes. ttp://www.scirp.org/ournal/oop ttp://dx.doi.org/0.436/oop.04.34007 Compromise Allocation for Combined Ratio Estimates of

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

MAT 145. Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points

MAT 145. Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points MAT 15 Test #2 Name Solution Guide Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points Use te grap of a function sown ere as you respond to questions 1 to 8. 1. lim f (x) 0 2. lim

More information

2 Forward Vehicle Dynamics

2 Forward Vehicle Dynamics 2 Forward Veice Dynamics Straigt motion of an idea rigid veice is te subject of tis capter. We ignore air friction and examine te oad variation under te tires to determine te veice s imits of acceeration,

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note

More information