Richer Interpolative Smoothing Based on Modified Kneser-Ney Language Modeling
|
|
- Tracy James
- 5 years ago
- Views:
Transcription
1 Richer Interpolative Soothing Based on Modified Kneser-Ney Language Modeling Ehsan Shareghi, Trevor Cohn and Gholareza Haffari Faculty of Inforation Technology, Monash University Coputing and Inforation Systes, The University of Melbourne Abstract In this work we present a generalisation of the Modified Kneser-Ney interpolative soothing for richer soothing via additional discount paraeters. We provide atheatical underpinning for the estiator of the new discount paraeters, and showcase the utility of our rich MKN language odels on several European languages. We further explore the interdependency aong the training data size, language odel order, and nuber of discount paraeters. Our epirical results illustrate that larger nuber of discount paraeters, i) allows for better allocation of ass in the soothing process, particularly on sall data regie where statistical sparsity is severe, and ii) leads to significant reduction in perplexity, particularly for out-of-doain test sets which introduce higher ratio of out-ofvocabulary words. 1 1 Introduction Probabilistic language odels (LMs) are the core of any natural language processing tasks, such as achine translation and autoatic speech recognition. -gra odels, the corner stone of language odeling, decopose the probability of an utterance into conditional probabilities of words given a fixed-length context. Due to sparsity of the events in natural language, soothing techniques are critical for generalisation beyond the training text when estiating the paraeters of -gra LMs. This is particularly iportant when the training text is 1 For the ipleentation see: eehsan/cstl sall, e.g. building language odels for translation or speech recognition in low-resource languages. A widely used and successful soothing ethod is interpolated Modified Kneser-Ney (MKN) (Chen and Goodan, 1999). This ethod uses a linear interpolation of higher and lower order -gra probabilities by preserving probability ass via absolute discounting. In this paper, we extend MKN by introducing additional discount paraeters, leading to a richer soothing schee. This is particularly iportant when statistical sparsity is ore severe, i.e., in building high-order LMs on sall data, or when out-of-doain test sets are used. Previous research in MKN language odeling, and ore generally -gra odels, has ainly dedicated efforts to ake the faster and ore copact (Stolcke et al., 011; Heafield, 011; Shareghi et al., 015) using advanced data structures such as succinct suffix trees. An exception is Hierarchical Pitan-Yor Process LMs (Teh, 006a; Teh, 006b) providing a rich Bayesian soothing schee, for which Kneser-Ney soothing corresponds to an approxiate inference ethod. Inspired by this work, we directly enrich MKN soothing realising soe of the reductions while reaining ore efficient in learning and inference. We provide estiators for our additional discount paraeters by extending the discount bounds in MKN. We epirically analyze our enriched MKN LMs on several European languages in in- and outof-doain settings. The results show that our discounting echanis significantly iproves the perplexity copared to MKN and offers a ore elegant 9 Proceedings of the 016 Conference on Epirical Methods in Natural Language Processing, pages 9 99, Austin, Texas, Noveber 1-5, 016. c 016 Association for Coputational Linguistics
2 way of dealing with out-of-vocabulary (OOV) words and doain isatch. Enriched Modified Kneser-Ney Interpolative Modified Kneser-Ney (MKN) (Chen and Goodan, 1999) soothing is widely accepted as a state-of-the-art technique and is ipleented in leading LM toolkits, e.g., SRILM (Stolcke, 00) and KenLM (Heafield, 011). MKN uses lower order k-gra probabilities to sooth higher order probabilities. P (w u) is defined as, c(uw) D (c(uw)) c(u) + γ(u) c(u) P (w π(u)) where c(u) is the frequency of the pattern u, γ(.) is a constant ensuring the distribution sus to one, and P (w π(u)) is the soothed probability coputed recursively based on a siilar forula conditioned on the suffix of the pattern u denoted by π(u). Of particular interest are the discount paraeters D (.) which reove soe probability ass fro the axiu likelihood estiate for each event which is redistributed over the soothing distribution. The discounts are estiated as 0, if i = 0 D 1 n [] n 1 [] n (i) =, if i = 1 1 [] n 1 []+n [] n [] n 1 [], if i = n [] n 1 []+n [] n []. n 1 [], if i n [] n 1 []+n [] where n i () is the nuber of unique -gras of frequency i. This effectively leads to three discount paraeters {D (1), D (), D (+)} for the distributions on a particular context length,..1 Generalised MKN Ney et al. (199) characterized the data sparsity using the following epirical inequalities, n [] < n [] < n 1 [] for It can be shown (see Appendix A) that these epirical inequalities can be extended to higher fre- Note that in all but the top layer of the hierarchy, continuation counts, which count the nuber of unique contexts, are used in place of the frequency counts (Chen and Goodan, 1999). Continuation counts are used for the lower layers. quencies and larger contexts >, (N )n N [] <... < n [] < n 1 [] < i>0 n i [] n 0 [] < σ where σ is the possible nuber of -gras over a vocabulary of size σ, n 0 [] is the nuber of - gras that never occurred, and i>0 n i[] is the nuber of -gras observed in the training data. We use these inequalities to extend the discount depth of MKN, resulting in new discount paraeters. The additional discount paraeters increase the flexibility of the odel in altering a wider range of raw counts, resulting in a ore elegant way of assigning the ass in the soothing process. In our experients, we set the nuber of discounts to for all the levels of the hierarchy, (copare this to these in MKN). This results in the following estiators for the discounts, 0, if i = 0 D (i) = i (i + 1) n i+1[] n 1 [] n i [] n 1 []+n, if i < [] 11 n 11[] n 1 [], if i. n [] n 1 []+n [] It can be shown that the above estiators for our discount paraeters are derived by axiizing a lower bound on the leave-one-out likelihood of the training set, following (Ney et al., 199; Chen and Goodan, 1999) (see Appendix B for the proof sketch). Experients We copare the effect of using different nubers of discount paraeters on perplexity using the Finnish (FI), Spanish (ES), Geran (DE), English (EN) portions of the Europarl v7 (Koehn, 005) corpus. For each language we excluded the first K sentences and used it as the in-doain test set (denoted as EU), skipped the second K sentences, and used the rest as the training set. The data was tokenized, sentence split, and the XML arkup discarded. We tested the effect of doain isatch, under two settings for out-of-doain test sets: i) ild using the Spanish section of news-test 01, the Geran, English sections of news-test 01, and the Finnish section We have selected the value of arbitrarily; however our approach can be used with larger nuber of discount paraeters, with the caveat that we would need to handle sparse counts in the higher orders. 95
3 size (M) size (K) MKN (D 1... ) MKN (D [1...] ) MKN (D [1...] ) Training tokens sents Test tokens sents OOV% = = 5 = = = 5 = = = 5 = NT FI 6.5. EU TW NT ES EU TW NT DE 61.. EU MED NT EN EU MED Table 1: for various -gra orders,, and training languages fro Europarl, using different nubers of discount paraeters for MKN. MKN (D [1...] ), MKN (D [1...] ), MKN (D [1...] ) represent vanilla MKN, MKN with 1 ore discounts, and MKN with 7 ore discount paraeters, respectively. Test sets sources EU, NT, TW, MED are Europarl, news-test, Twitter, and edical patent descriptions, respectively. OOV is reported as the ratio {OOV test-set} {w test-set}. [% Reduction] ppl D[1...] ppl D[1...] CZ FI ES DE EN FR CZ FI ES DE EN FR [Corpus] Figure 1: Percentage of perplexity reduction for pplx D[1...] and pplx D[1...] copared with pplx D[1..] on different training corpora (Europarl CZ, FI, ES, DE, EN, FR) and on news-test sets (NT) for = (left), and = (right). of news-test 015 (all denoted as NT) 5, and ii) extree using a hour period of streaed Finnish, and Spanish tweets 6 (denoted as TW), and the Geran and English sections of the patent description of edical translation task 7 (denoted as MED). See Table 1 for statistics of the training and test sets..1 Table 1 shows substantial reduction in perplexity on all languages for out-of-doain test sets when expanding the nuber of discount paraeters fro in vanilla MKN to and. Consider the English Streaed via Twitter API on 17/05/ news-test (NT), in which even for a -gra language odel a single extra discount paraeter ( =, D [1...] ) iproves the perplexity by 18 points and this iproveent quadruples to 77 points when using discounts ( =, D [1...] ). This effect is consistent across the Europarl corpora, and for all LM orders. We observe a substantial iproveents even for = -gra odels (see Figure 1). On the edical test set which has 9 ties higher OOV ratio, the perplexity reduction shows a siilar trend. However, these reductions vanish when an in-doain test set is used. Note that we use the sae treatent of OOV words for coputing the perplexities which is used in KenLM (Heafield, 01).. Analysis Out-of-doain and Out-of-vocabulary We selected the Finnish language for which the nuber and ratio of OOVs are close on its out-of-doain and in-doain test sets (NT and EU), while showing substantial reduction in perplexity on out-ofdoain test set, see FI bars on Figure 1. Figure (left), shows the full perplexity results for Finnish for vanilla MKN, and our extensions when tested on in-doain (EU) and out-of-doain (NT) test sets. The discount plot, Figure (iddle) illustrates the behaviour of the various discount paraeters. We also easured the average hit length for queries by varying on in-doain and out-of-doain test sets. As illustrated in Figure (right) the in-doain test set allows for longer atches to the training data as 96
4 D [1...] D [1...] D [1...] NT EU Discount gra gra gra gra 5 gra 6 gra 7 gra 8 gra 9 gra gra Average hit length NT EU i Figure : Statistics for the Finnish section of Europarl. The left plot illustrates the perplexity when tested on an outof-doain (NT) and in-doain (EU) test sets varying LM order,. The iddle plot shows the discount paraeters D i [1...] for different -gra orders. The right plot correspond to average hit length on EU and NT test sets Discount Discount Figure : (z-axis) vs. [...] (x-axis) vs. nuber of discounts D i,, (y-axis) for Geran language trained on Europarl (left), and CoonCrawl01 (right) and tested on news-test. Arrows show the direction of the increase. grows. This indicates that having ore discount paraeters is not only useful for test sets with extreely high nuber of OOV, but also allows for a ore elegant way of assigning ass in the soothing process when there is a doain isatch. Interdependency of, data size, and discounts To explore the correlation between these factors we selected the Geran and investigated this correlation on two different training data sizes: Europarl (61M words), and CoonCrawl 01 (98M words). Figure illustrates the correlation between these factors using the sae test set but with sall and large training sets. Considering the slopes of the surfaces indicates that the sall training data regie (left) which has higher sparsity, and ore OOV in the test tie benefits substantially fro the ore accurate discounting copared to the large training set (right) in which the gain fro discounting is slight. 8 8 Nonetheless, the iproveent in perplexity consistently grows with introducing ore discount paraeters even under Conclusions In this work we proposed a generalisation of Modified Kneser-Ney interpolative language odeling by introducing new discount paraeters. We provide the atheatical proof for the discount bounds used in Modified Kneser-Ney and extend it further and illustrate the ipact of our extension epirically on different Europarl languages using in-doain and out-of-doain test sets. The epirical results on various training and test sets show that our proposed approach allows for a ore elegant way of treating OOVs and ass assignents in interpolative soothing. In future work, we will integrate our language odel into the Moses achine translation pipeline to intrinsically easure its ipact on translation qualities, which is of particular use for out-of-doain scenario. Acknowledgeents This research was supported by the National ICT Australia (NICTA) and Australian Research Council Future Fellowship (project nuber FT15). This work was done when Ehsan Shareghi was an intern at IBM Research Australia. A. Inequalities We prove that these inequalities hold in expectation by aking the reasonable assuption that events in the large training data regie, which suggests that ore discount paraeters, e.g., up to D 0, ay be required for larger training corpus to reflect the fact that even an event with frequency of 0 ight be considered rare in a corpus of nearly 1 billion words. 97
5 the natural language follow the power law (Clauset et al., 009), p ( C(u) = f ) f 1 1 s, where s is the paraeter of the distribution, and C(u) is the rando variable denoting the frequency of the - gras pattern u. We now copute the expected nuber of unique patterns having a specific frequency E[n i []]. Corresponding to each -gras pattern u, let us define a rando variable X u which is 1 if the frequency of u is i and zero otherwise. It is not hard to see that n i [] = u X u, and E [ n i [] ] = E [ ] X u = E[X u ] = σ E[X u ] u u = σ ( p ( C(u) = i ) 1 + p ( C(u) i ) ) 0 σ i 1 1 s. We can verify that (i + 1)E [ n i+1 [] ] < ie [ n i [] ] (i + 1)σ (i + 1) 1 1 s < iσ i 1 1 s i 1 s < (i + 1) 1 s. which copletes the proof of the inequalities. B. Discount bounds proof sketch The leave-one-out (leaving those -gras which occurred only once) log-likelihood function of the interpolative soothing is lower bounded by backoff odel s (Ney et al., 199), hence the estiated discounts for later can be considered as an approxiation for the discounts of the forer. Consider a backoff odel with absolute discounting paraeter D, were P (w i wi +1 i 1 ) is defined as: c(w i i +1 ) D i +1 ) if c(w i i +1 ) > 0 Dn 1+(wi +1 ) i 1 P c(w (w i w i 1 i 1 i +1 ) i + ) if c(wi i +1 ) = 0 where n 1+ (wi +1 ) i 1 is the nuber of unique right contexts for the wi +1 i 1 pattern. Assue that for any choice of 0 < D < 1 we can define P such that P (w i wi i 1 +1 ) sus to 1. For readability we use the λ(wi +1 i 1 ) = n 1+(wi +1 ) i 1 replaceent. i +1 ) 1 Following (Chen and Goodan, 1999), rewriting the leave-one-out log-likelihood for KN (Ney et al., 199) to include ore discounts (in this proof up to 98 D ), results in: c(wi +1) i log c(wi i +1 ) 1 D i +1 ) 1 + w i i +1 c(w i i +1 )> ( c(wi +1) i log c(wi i +1 ) 1 D j 1 i +1 ) 1 j= w i i +1 c(w i i +1 )=j w i i +1 c(w i i +1 )=1 ( c(w i i +1) log( n j []D j )λ(w i 1 i +1 ) P ) which can be siplified to, c(wi +1) i log(c(wi +1) i 1 D )+ w i i +1 c(w i i +1 )> j= ( ) jn j [] log(j 1 D j 1 ) + n 1 [] log( n j []D j ) + const ) + To find the optial D 1, D, D, D we set the partial derivatives to zero. For D, n [] = n 1 [] D n n [] = 0 j[]d j D n 1 []n []( D ) = n [] n j []D j n 1 []n [] D n 1 []n [] n []n 1 []D 1 > 0 n [] n [] D 1 > D And after taking c(wi +1 i ) = 5 out of the suation, for D : D = c(w i i +1 )>5 c(w i i +1 ) c(w i i +1 ) 1 D 5n 5[] D n [] + n 1 [] n = 0 5n 5[] j[]d j D n [] + n 1 [] n > 0 n 1 []n []( D ) j[]d j > 5n 5 [] n j []D j 5 n 5[] n [] D 1 > D
6 References Stanley F. Chen and Joshua Goodan An epirical study of soothing techniques for language odeling. Coputer Speech & Language, 1():59 9. Aaron Clauset, Cosa Rohilla Shalizi, and Mark EJ Newan Power-law distributions in epirical data. SIAM review, 51(): Kenneth Heafield KenLM: Faster and saller language odel queries. In Proceedings of the Workshop on Statistical Machine Translation. Kenneth Heafield. 01. Efficient Language Modeling Algoriths with Applications to Statistical Machine Translation. Ph.D. thesis, Carnegie Mellon University. Philipp Koehn Europarl: A parallel corpus for statistical achine translation. In Proceedings of the Machine Translation suit. Herann Ney, Ute Essen, and Reinhard Kneser On structuring probabilistic dependences in stochastic language odelling. Coputer Speech & Language, 8(1):1 8. Ehsan Shareghi, Matthias Petri, Gholareza Haffari, and Trevor Cohn Copact, efficient and unliited capacity: Language odeling with copressed suffix trees. In Proceedings of the Conference on Epirical Methods in Natural Language Processing. Andreas Stolcke, Jing Zheng, Wen Wang, and Victor Abrash SRILM at sixteen: Update and outlook. In Proceedings of IEEE Autoatic Speech Recognition and Understanding Workshop. Andreas Stolcke. 00. SRILM an extensible language odeling toolkit. In Proceedings of the International Conference of Spoken Language Processing. Yee Whye Teh. 006a. A Bayesian interpretation of interpolated Kneser-Ney. Technical report, NUS School of Coputing. Yee Whye Teh. 006b. A hierarchical Bayesian language odel based on Pitan-Yor processes. In Proceedings of the Annual Meeting of the Association for Coputational Linguistics. 99
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider
More informationFoundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model
Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January
More informationThis model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.
CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationN-gram Language Model. Language Models. Outline. Language Model Evaluation. Given a text w = w 1...,w t,...,w w we can compute its probability by:
N-gram Language Model 2 Given a text w = w 1...,w t,...,w w we can compute its probability by: Language Models Marcello Federico FBK-irst Trento, Italy 2016 w Y Pr(w) =Pr(w 1 ) Pr(w t h t ) (1) t=2 where
More informationMachine Learning for natural language processing
Machine Learning for natural language processing N-grams and language models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 25 Introduction Goals: Estimate the probability that a
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationA method to determine relative stroke detection efficiencies from multiplicity distributions
A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationA Smoothed Boosting Algorithm Using Probabilistic Output Codes
A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu
More informationA Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair
Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving
More informationAnalyzing Simulation Results
Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient
More informationInspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information
Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub
More informationANLP Lecture 6 N-gram models and smoothing
ANLP Lecture 6 N-gram models and smoothing Sharon Goldwater (some slides from Philipp Koehn) 27 September 2018 Sharon Goldwater ANLP Lecture 6 27 September 2018 Recap: N-gram models We can model sentence
More informationGeneralized Queries on Probabilistic Context-Free Grammars
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 1, JANUARY 1998 1 Generalized Queries on Probabilistic Context-Free Graars David V. Pynadath and Michael P. Wellan Abstract
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationDefect-Aware SOC Test Scheduling
Defect-Aware SOC Test Scheduling Erik Larsson +, Julien Pouget*, and Zebo Peng + Ebedded Systes Laboratory + LIRMM* Departent of Coputer Science Montpellier 2 University Linköpings universitet CNRS Sweden
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationIntelligent Systems: Reasoning and Recognition. Artificial Neural Networks
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationN-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24
L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,
More informationNUMERICAL MODELLING OF THE TYRE/ROAD CONTACT
NUMERICAL MODELLING OF THE TYRE/ROAD CONTACT PACS REFERENCE: 43.5.LJ Krister Larsson Departent of Applied Acoustics Chalers University of Technology SE-412 96 Sweden Tel: +46 ()31 772 22 Fax: +46 ()31
More informationSoft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis
Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES
More informationSpeech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =
More informationLanguage Models. Philipp Koehn. 11 September 2018
Language Models Philipp Koehn 11 September 2018 Language models 1 Language models answer the question: How likely is a string of English words good English? Help with reordering p LM (the house is small)
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationAnalysis of ground vibration transmission in high precision equipment by Frequency Based Substructuring
Analysis of ground vibration transission in high precision equipent by Frequency Based Substructuring G. van Schothorst 1, M.A. Boogaard 2, G.W. van der Poel 1, D.J. Rixen 2 1 Philips Innovation Services,
More informationApproximation in Stochastic Scheduling: The Power of LP-Based Priority Policies
Approxiation in Stochastic Scheduling: The Power of -Based Priority Policies Rolf Möhring, Andreas Schulz, Marc Uetz Setting (A P p stoch, r E( w and (B P p stoch E( w We will assue that the processing
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationScalable Non-Markovian Sequential Modelling for Natural Language Processing
Scalable Non-Markovian Sequential Modelling for Natural Language Processing by Ehsan Shareghi Nojehdeh Thesis Submitted for the fulfillment of the requirements for the degree of Doctor of Philosophy Faculty
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationA Theoretical Analysis of a Warm Start Technique
A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3
More informationHandwriting Detection Model Based on Four-Dimensional Vector Space Model
Journal of Matheatics Research; Vol. 10, No. 4; August 2018 ISSN 1916-9795 E-ISSN 1916-9809 Published by Canadian Center of Science and Education Handwriting Detection Model Based on Four-Diensional Vector
More informationAlgorithms for parallel processor scheduling with distinct due windows and unit-time jobs
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and
More informationUpper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition
Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and
More informationFairness via priority scheduling
Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation
More informationLower Bounds for Quantized Matrix Completion
Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &
More informatione-companion ONLY AVAILABLE IN ELECTRONIC FORM
OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer
More informationInter-document Contextual Language Model
Inter-document Contextual Language Model Quan Hung Tran and Ingrid Zukerman and Gholamreza Haffari Faculty of Information Technology Monash University, Australia hung.tran,ingrid.zukerman,gholamreza.haffari@monash.edu
More informationFixed-to-Variable Length Distribution Matching
Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de
More information8.1 Force Laws Hooke s Law
8.1 Force Laws There are forces that don't change appreciably fro one instant to another, which we refer to as constant in tie, and forces that don't change appreciably fro one point to another, which
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial
More informationA Note on the Applied Use of MDL Approximations
A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention
More informationEasy Evaluation Method of Self-Compactability of Self-Compacting Concrete
Easy Evaluation Method of Self-Copactability of Self-Copacting Concrete Masanori Maruoka 1 Hiroi Fujiwara 2 Erika Ogura 3 Nobu Watanabe 4 T 11 ABSTRACT The use of self-copacting concrete (SCC) in construction
More informationperplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3)
Chapter 5 Language Modeling 5.1 Introduction A language model is simply a model of what strings (of words) are more or less likely to be generated by a speaker of English (or some other language). More
More informationWarning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network
565 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 59, 07 Guest Editors: Zhuo Yang, Junie Ba, Jing Pan Copyright 07, AIDIC Servizi S.r.l. ISBN 978-88-95608-49-5; ISSN 83-96 The Italian Association
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationIdentical Maximum Likelihood State Estimation Based on Incremental Finite Mixture Model in PHD Filter
Identical Maxiu Lielihood State Estiation Based on Increental Finite Mixture Model in PHD Filter Gang Wu Eail: xjtuwugang@gail.co Jing Liu Eail: elelj20080730@ail.xjtu.edu.cn Chongzhao Han Eail: czhan@ail.xjtu.edu.cn
More informationCOS 424: Interacting with Data. Written Exercises
COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well
More informationSymbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm
Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution
More informationGeneral Properties of Radiation Detectors Supplements
Phys. 649: Nuclear Techniques Physics Departent Yarouk University Chapter 4: General Properties of Radiation Detectors Suppleents Dr. Nidal M. Ershaidat Overview Phys. 649: Nuclear Techniques Physics Departent
More informationNatural Language Processing. Statistical Inference: n-grams
Natural Language Processing Statistical Inference: n-grams Updated 3/2009 Statistical Inference Statistical Inference consists of taking some data (generated in accordance with some unknown probability
More informationRandomized Recovery for Boolean Compressed Sensing
Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch
More informationIn this chapter, we consider several graph-theoretic and probabilistic models
THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions
More informationOptimizing energy potentials for success in protein tertiary structure prediction Ting-Lan Chiu 1 and Richard A Goldstein 1,2
Research Paper 223 Optiizing energy potentials for success in protein tertiary structure prediction Ting-Lan Chiu 1 and Richard A Goldstein 1,2 Background: Success in solving the protein structure prediction
More informationOn the Analysis of the Quantum-inspired Evolutionary Algorithm with a Single Individual
6 IEEE Congress on Evolutionary Coputation Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-1, 6 On the Analysis of the Quantu-inspired Evolutionary Algorith with a Single Individual
More informationWeek 13: Language Modeling II Smoothing in Language Modeling. Irina Sergienya
Week 13: Language Modeling II Smoothing in Language Modeling Irina Sergienya 07.07.2015 Couple of words first... There are much more smoothing techniques, [e.g. Katz back-off, Jelinek-Mercer,...] and techniques
More informationAppendix. Bayesian Information Criterion (BIC) E Proof of Proposition 1. Compositional Kernel Search Algorithm. Base Kernels.
Hyunjik Ki and Yee Whye Teh Appendix A Bayesian Inforation Criterion (BIC) The BIC is a odel selection criterion that is the arginal likelihood with a odel coplexity penalty: BIC = log p(y ˆθ) 1 2 p log(n)
More informationThe accelerated expansion of the universe is explained by quantum field theory.
The accelerated expansion of the universe is explained by quantu field theory. Abstract. Forulas describing interactions, in fact, use the liiting speed of inforation transfer, and not the speed of light.
More informationResearch in Area of Longevity of Sylphon Scraies
IOP Conference Series: Earth and Environental Science PAPER OPEN ACCESS Research in Area of Longevity of Sylphon Scraies To cite this article: Natalia Y Golovina and Svetlana Y Krivosheeva 2018 IOP Conf.
More informationA Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)
1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu
More informationLanguage Modelling. Marcello Federico FBK-irst Trento, Italy. MT Marathon, Edinburgh, M. Federico SLM MT Marathon, Edinburgh, 2012
Language Modelling Marcello Federico FBK-irst Trento, Italy MT Marathon, Edinburgh, 2012 Outline 1 Role of LM in ASR and MT N-gram Language Models Evaluation of Language Models Smoothing Schemes Discounting
More informationDeflation of the I-O Series Some Technical Aspects. Giorgio Rampa University of Genoa April 2007
Deflation of the I-O Series 1959-2. Soe Technical Aspects Giorgio Rapa University of Genoa g.rapa@unige.it April 27 1. Introduction The nuber of sectors is 42 for the period 1965-2 and 38 for the initial
More informationC na (1) a=l. c = CO + Clm + CZ TWO-STAGE SAMPLE DESIGN WITH SMALL CLUSTERS. 1. Introduction
TWO-STGE SMPLE DESIGN WITH SMLL CLUSTERS Robert G. Clark and David G. Steel School of Matheatics and pplied Statistics, University of Wollongong, NSW 5 ustralia. (robert.clark@abs.gov.au) Key Words: saple
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationLogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting
LogLog-Beta and More: A New Algorith for Cardinality Estiation Based on LogLog Counting Jason Qin, Denys Ki, Yuei Tung The AOLP Core Data Service, AOL, 22000 AOL Way Dulles, VA 20163 E-ail: jasonqin@teaaolco
More informationThe Transactional Nature of Quantum Information
The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.
More informationEnsemble Based on Data Envelopment Analysis
Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807
More informationEstimating Entropy and Entropy Norm on Data Streams
Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer
More informationWhat to Expect from Expected Kneser-Ney Smoothing
What to Expect from Expected Kneser-Ney Smoothing Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang Microsoft, USA {mlevit sarangp shchang}@microsoft.com Abstract Kneser-Ney smoothing on expected
More informationŞtefan ŞTEFĂNESCU * is the minimum global value for the function h (x)
7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not
More informationOBJECTIVES INTRODUCTION
M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and
More informationHierarchical Bayesian Nonparametric Models of Language and Text
Hierarchical Bayesian Nonparametric Models of Language and Text Gatsby Computational Neuroscience Unit, UCL Joint work with Frank Wood *, Jan Gasthaus *, Cedric Archambeau, Lancelot James SIGIR Workshop
More informationExperimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis
City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna
More informationMeasures of average are called measures of central tendency and include the mean, median, mode, and midrange.
CHAPTER 3 Data Description Objectives Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance,
More informationOptimal Resource Allocation in Multicast Device-to-Device Communications Underlaying LTE Networks
1 Optial Resource Allocation in Multicast Device-to-Device Counications Underlaying LTE Networks Hadi Meshgi 1, Dongei Zhao 1 and Rong Zheng 2 1 Departent of Electrical and Coputer Engineering, McMaster
More informationAn Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period
An Approxiate Model for the Theoretical Prediction of the Velocity... 77 Central European Journal of Energetic Materials, 205, 2(), 77-88 ISSN 2353-843 An Approxiate Model for the Theoretical Prediction
More informationA Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay
A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer
More informationPULSE-TRAIN BASED TIME-DELAY ESTIMATION IMPROVES RESILIENCY TO NOISE
PULSE-TRAIN BASED TIME-DELAY ESTIMATION IMPROVES RESILIENCY TO NOISE 1 Nicola Neretti, 1 Nathan Intrator and 1,2 Leon N Cooper 1 Institute for Brain and Neural Systes, Brown University, Providence RI 02912.
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationMathematical Model and Algorithm for the Task Allocation Problem of Robots in the Smart Warehouse
Aerican Journal of Operations Research, 205, 5, 493-502 Published Online Noveber 205 in SciRes. http://www.scirp.org/journal/ajor http://dx.doi.org/0.4236/ajor.205.56038 Matheatical Model and Algorith
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationA remark on a success rate model for DPA and CPA
A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance
More informationRecovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)
Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains
More informationRANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION
RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed
More informationUsing a De-Convolution Window for Operating Modal Analysis
Using a De-Convolution Window for Operating Modal Analysis Brian Schwarz Vibrant Technology, Inc. Scotts Valley, CA Mark Richardson Vibrant Technology, Inc. Scotts Valley, CA Abstract Operating Modal Analysis
More informationLanguage Model. Introduction to N-grams
Language Model Introduction to N-grams Probabilistic Language Model Goal: assign a probability to a sentence Application: Machine Translation P(high winds tonight) > P(large winds tonight) Spelling Correction
More informationMSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE
Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL
More informationANALYTICAL INVESTIGATION AND PARAMETRIC STUDY OF LATERAL IMPACT BEHAVIOR OF PRESSURIZED PIPELINES AND INFLUENCE OF INTERNAL PRESSURE
DRAFT Proceedings of the ASME 014 International Mechanical Engineering Congress & Exposition IMECE014 Noveber 14-0, 014, Montreal, Quebec, Canada IMECE014-36371 ANALYTICAL INVESTIGATION AND PARAMETRIC
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationA Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness
A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,
More informationVariational Bayes for generic topic models
Variational Bayes for generic topic odels Gregor Heinrich 1 and Michael Goesele 2 1 Fraunhofer IGD and University of Leipzig 2 TU Darstadt Abstract. The article contributes a derivation of variational
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationOn Constant Power Water-filling
On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives
More information