Collocation Map for Overcoming Data Sparseness
|
|
- Esmond Holmes
- 6 years ago
- Views:
Transcription
1 Cllcatin Map fr Overcming Data Sparseness Mnj Kim, Yung S. Han, and Key-Sun Chi Department f Cmputer Science Krea Advanced Institute f Science and Technlgy Taejn, , Krea mj0712~eve.kaist.ac.kr, yshan~csking.kaist.ac.kr, kschi~csking.kai~t.ac.k~ Abstract Statistical language mdels are useful because they can prvide prbabilistic infrmatin upn uncertain decisin making. The mst cmmn statistic is n-grams measuring wrd cccurrences in texts. The methd suffers frm data shrtage prblem, hwever. In this paper, we suggest Bayesian netwrks be used in apprximating the statistics f insufficient ccurrences and f thse that d nt ccur in the sample texts with graceful degradatin. Cllcatin map is a sigmid belief netwrk that can be cnstructed frm bigrams. We cmpared the cnditinal prbabilities and mutual infrmatin cmputed frm bigrams and Cllcatin map. The results shw that the variance f the values frm Cllcatin map is smaller than that frm frequency measure fr the infrequent pairs by 48%. The predictive pwer f Cllcatin map fr arbitrary assciatins nt bserved frm sample texts is als demnstrated. 1 Intrductin In statistical language prcessing, n-grams are bar sic t many prbabilistic mdels including Hidden Markv mdels that wrk n the limited dependency f linguistic events. In this regard, Bayesian mdels (Bayesian netwrk, Belief netwrk, Inference diagram t name a few) are nt very different frm ItMMs. Bayesian mdels capture the cnditinal independence amng prbabilistic variables, and can cmpute the cnditinal distributin f the variables, which is knwn as a prbabilistic inferencing. The pure n-gram statistic, hwever, is smewhat crude in that it cannt d anything abut unbserved events and its apprximatin n infrequent events can be unreliable. In this paper we shw by way f extensive experiments that the Bayesian methd that als can be cmpsed frm bigrams can vercme the data sparseness prblem that is inherent in frequency cunting methds. Accrding t the empirical results, Cllcatin map that is a Bayesian mdel fr lexical variables induced graceful apprximatin ver unbserved and infrequent events. There are tw knwn methds t deal with the data sparseness prblem. They are smthing and class based methds (Dagan 1992). Smthing methds (Church and Gale 1991) readjust the distributin f frequencies f wrd ccurrences btained frm sample texts, and verify the distributin thrugh held-ut texts. As Dagan (1992) pinted ut, hwever, the values frm the smthing methds clsely agree with the prbability f a bigram cnsisting f tw independent wrds. Class based methds (Pereira et al. 1993) apprximate the likelihd f unbserved wrds based n similar wrds. Dagan and et al. (1992) prpsed a nn-hierarchical class based methd. The tw appraches reprt limited successes f purely experimental nature. This is s because they are based n strng assumptins. In the case f smthing methds, frequency readjustment is smewhat arbitrary and will nt be gd fr heavily dependent bigrams. As t the class based methds, the ntin f similar wrds differs acrss different methds, and the assciatin f prbabilistic dependency with the similarity (class) f wrds is t strng t assume in generm. Cllcatin map that is first suggested in (Itan 1993) is a sigmid belief netwrk with wrds as prbabilistic variables. Sigmid belief netwrk is extensively studied by Neal (1992), and has an efficient inferencing algrithm. Unlike ther Bayesian mdels, the inferencing n sigmid belief netwrk is nt NP-hard, and inference methds by reducing the netwrk and sampling are discussed in (Han 1995). Bayesian mdels cnstructed frm lcal dependencies prvide frmal apprximatin amng the variables, thus using Cllcatin map des nt require strng assumptin r intuitin t justify the assciatins amng wrds prduced by the map. The results f inferencing n Cllcatin map are prbabilities amng any cmbinatins f wrds represented in the map, which is nt fund 53
2 in ther mdels. One significant shrtcming f Bayesian mdels lies in the heavy cst f inferencing. Our implementatin f Cllcatin map includes 988 ndes, and takes 2 t 3 minutes t cmpute an assciatin between wrds. The purpse f experiments is t find ut hw gracefully Cllcatin map deals with the unbserved cccurrences in cmparisn with a naive bigram statistic. In the next sectin, Cllcatin map is reviewed fllwing the definitin in (Flail 1993). In sectin 3, mutual infrmatin and cnditinal prbabilities cmputed using bigrams and Cllcatin map are cmpared. Sectin 4 cncludes the paper by summarizing the gd and bad pints f the Cllcatin map and ther methds. 2 Cllcatin Map In this sectin, we make a brief intrductin n Cllcatin map, and refer t (ttan 1993) fr mre discussin n the definitin and t (ttan 1995) n infi~rence methds. Bayesian mdel cnsists f a netwrk and prbability tables defined n the ndes f the netwrk. The ndes in the netwrk repre.sent prbabilistic variables f a prblem dmain. The netwrk can cmpute prbabilistic dependency between an)" cmbinatin f the variables. The mdel is well dcumented as subjective prbability thery (Pearl 1988). Cllcatin map is an applicatin mdel f sigmld belief netwrk (Neal 1992) that belngs t belief netwrks which in turn is a type f Bayesian mdel. Unlike belief netwrks, Cllcatin map des nt have deterministic variables thus cnsists nly f prbabilistic variables that crrespnd t wrds in this case. Sigmid belief netwrk is different frm ther belief netwrks in that it des nt have prbability distributin table at each nde but weights n the edges between the ndes. A nde takes binary utcmes (1, -1) and the prbability that a nde takes an utcme given the vectr f utcmes f its preceding ndes is a sigmid functin f the utcmes and the weights f assciated edges. In this regard, the sigmid belief netwrk resembles artificial neural netwrk. Such prbabilities used t be stred at ndes in rdinary Bayesian mdels, and this makes the inferencing very difficult because the prbability table can be very big. Sigmid belief netwrk des away with the NP-hard cmplexity by aviding the tables at the lss f expressive generality f prbability distributins that can be encded in the tables. One wh wrks with Cllcatin map has t deal with tw prblems. The first is hw t cnstruct the netwrk, and the ther is hw t cmpute the prbabilities n the netwrk. Netwrk can be cnstructed directly frm a set f bigrams btained frm a training sample. Because Cllcatin map is a directed a~yclic graph, P( prfit I investment ) = P( risk-taking I investment ) = P( stck } investment ) = P( high-incme I investment ) = P( investment I high-incme ) = P( high-incme I risk-taking prfit ) = P( investment I prtfli high-incme risk-taking ) = P( prtfli I blue-chip ) = P( prtfli stck I prtfli stck ) = Figure 1: Example Cllcatin map and example inferences. Graph reductin methd (Hall 1995) is used in cmputing the prbabilities. cycles are avided by making additinal nde f a wrd when facing cycle due t the nde. N mre than tw ndes fr each wrd are needed t avid the cycle in any case (ltan 1993). Once the netwrk is setup, edges f the netwrk are assigned with weights that are nrmalized frequency f the edges at a nde. The inferencing n Cllcatin map is nt different frm that fr sigmid belief netwrk. The time cmplexity f inferencing by reducing graph n sigmid belief netwrks is O(N a) given N ndes (Han 1995). It turned ut that inferencing n netwrks cntaining mre than a few hundred ndes was nt practical using either nde reductin methd r sampling methd, thus we adpted the hybrid inferencing methd that first reduces the netwrk and applies Gibbs sampling methd (Hall 1995). Using the hybrid inferencing methd, cmputatin f cnditinal prbabilities tk less than a secnd fr a netwrk with 50 ndes, tw secnds fr a netwrk with 100 ndes, abut nine secnds fr a netwrk with 200 ndes, and abut tw minutes fr a netwrk with abut 1000 ndes. Cnditinal and marginal prbabilities can be apprximated frm Gibb's sampling. Sme cnditinal prbabilities cmputed frm a small netwrk are shwn in figure 1. Thugh the netwrk may nt be big enugh t mdel the dmain f finance, the resulting values frm the small netwrk cmpsed f 9 dependencies seem useful and intuitive. 54
3 20 average MI * variance 15 Mutual in Infrmatin ~v e~ O ~-~ ~,,. ~.'~:~. " e ~ g D ~ q 0 w ~@ ee.~dr'ee~ 0 ~ 0 C P O ie ~' 50 I Frequency f bigrams Figure 2: Average MI's and variances. 378,888 unique bigrams are classified accrding t frequency. 55
4 The cmputatin in figure 1 was dne by using graph reductin methd. As it is shwn in the example inferences, the assciatin between any cmbinatin f variables can be measured. 3 Experiments The gal f ur experiment is first t find hw data sparseness is related with the frequency based statistics and t shw Cllcatin map based methd gives mre reliable apprximatins. particular, frm the experiments we bserved the variances f statistics might suggest the level f data sparseness. The less frequent data tended t have higher variances thugh the values f statistics (mutual infrmatin fr instance) did nt distinguish the level f ccurrences. The predictive accunt f Cllcatin map is demnstrated by bserving the variances f apprximatins n the infrequent events. The tagged Wall Street Jurnal articles f Penn Tree crpus were used that cntain abut 2.6 millin wrd units. In the experiments, abut 1.2 millin f them was used. Prgrams were cded in C language, and run n a Sun Spare 10 wrkstatin. Fr the first 1.2 millin wrds, the bigrams cnsisting f fur types f categries (NN, NNS, IN, J J) were btained, and mutual infrmatin f each bigram (rder insensitive) was cmputed. The bi- grams were classified int 200 sets accrding t their ccurrences. Figure 2 summarizes the the average MI value and the variance f each frequency range. Frm figure 3 that shws the ccurrence distributin f 378,888 unique bigrams, abut 70% f them ccur nly ne time. One interesting and imprtant bservatin is that thse f 1 t 3 frequency range that take abut 90% f the ppulatin have very high MI values. This results als agree with Dunning's argument abut verestimatin n the infrequent ccurrences in which many infrequent pairs tend t get higher estimatin (Dunning 1993). The prblem is due t the assumptin f nrmality in naive frequency based statistics accrding t Dunning (1993). Apprximated values, thus, d nt indicate the level f data quality. Figure 3 shws variances can suggest the level f data sufficiency. Frm this bservatin we prpse the fllwing definitin n the ntin f data sparseness. A set f units belnging t a sample f rdered wrd units (texts) is cz datasparse if and nly if the variance f measurements n the set is greater than ~. The definitin sets the cncept f sparseness within the cntext f a fcused set f linguistic units. Fr a set f units unberved frm a sample, the given sample text is fr sure data-sparse. The abve definitin then gives a way t judge In with respect t bserved units. The measurement f data sparseness can be a gd issue t study where it may depend n the cntexts f research. Here we suggest a simple methd perhaps fr the first time in the literature. Figure 4 cmpares the results frm using Cllcatin map and simple frequency statistic. The variances are smaller and the pairs in frequency 1 class have nn zer apprximatins. Because cmputatin n Cllcatin map is very high, we have chsen 2000 unique pairs at randm. The netwrk cnsists f 988 ndes. Cmputing an apprximatin (inferencing) tk abut 3 minutes. The test size f 2000 pairs may nt be sufficient, but it shwed the cnsistent tendency f graceful degradatin f variances. The verestimatin prblem was nt significant in the apprximatins by Cllcatin map. The average value f zer frequency class t which 50 unbserved pairs belng was als n the line f smth degradatin, and figure 4 shws nly the variance. Table 1 summarizes the details f perfrmance gain by using Cllcatin map. 4 Cnclusin Crpus based natural language prcessing has been ne f the central subjects gaining rapid attentin frm the research cmmunity. The majr virtue f statistical appraches is in evaluating linguistic events and determining the relative imprtance f the events t reslve ambiguities. The evaluatin n the events (mstly cccurrences) in many cases, hwever, has been unreliable because f the lack f data. Data sparseness addresses the shrtage f data in estimating prbabilistic parameters. As a result, there are t many events unbserved, and even if events have been fund, the ccurrence is nt sufficient enugh fr the estimatin t be reliable. In cntrast with existing methds that are based n strng assumptins, the methd using Cllcatin map prmises a lgical apprximatin since it is built n a thrugh frmal argument f Bayesian prbability thery. The pwerful feature f the framewrk is the ability t make use f the cnditinal independence amng wrd units and t make assciatins abut unseen cccurrences based n bserved nes. This naturally induces the attributes required t deal with data sparseness. Our experiments cnfirm that Cllcatin map makes predictive apprximatin and avids verestimatin f infrequent ccurrences. One critical drawback f Cllcatin map is the time cmplexity, but it can be useful fr applicatins f limited scpe. 56
5 Percentage Frequency f bigrams Figure 3: The distributin f 378,888 unique bigrams. First ten classes are shwn % % % % % % average % Table 1: Cmparisn f variances between frequency based and Cllcatin map based MI cmputatins. 57
6 12 fre, luency based Cl[catic n ma[ 10 MI variance q 4 O. w 0 0 r 0 : uo ~ 0 ~ Ug 0 5 I Frequency f bigrams Figure 4: Variances by frequency based and Cllcatin map based MI cmputatins fr 2000 unique bigrarns. 58
7 References Kenneth W. Church, and William A. Gale A cmparisn f the enhanced Gd-Turing and deleted estimatin methds fr estimating prbabilities f English bigrams. Cmputer Speech and Language Ted Dunning Accurate methds fr the statistics f surprise and cincidence. Cmputatinal Linguistics. 19 (1) Id Dagan, Shaul Marcus, and Shaul Markvitch Cntextual wrd similarity and estimatin frm sparse data. In Prceedings f AAAI fall sympsium, Cambridge, MI Yung S. Han, Yung G. Han, and Key-sun Chi Recursive Markv chain as a stchastic grammar. In Prceedings f a SIGLEX wrkshp, Clumbus, Ohi Yung S. Han, Yung C. Park, and Key-sun Chi Efficient inferencing fr sigmid Bayesian netwrks, t appear in Applied Intelligence. Radfrd M. Neal Cnnectinist learning f belief netwrks. J f Artificial Intelligence Judea Pearl Prbabilistic Reasning in Intelligent Systems. Mrgan Kaufmann Publishers. Fernand Pereira, Naftali Tishby, and Lillian Lee Distributinal clustering f English wrds. In Prceedings f the Annual Meeting f the A CL. 59
Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >
Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);
More informationCAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank
CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal
More informationNUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION
NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science
More informationENSC Discrete Time Systems. Project Outline. Semester
ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551
More informationEnhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme
Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr
More informationChapter 3: Cluster Analysis
Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA
More informationEric Klein and Ning Sa
Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure
More informationINSTRUMENTAL VARIABLES
INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment
More informationMATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank
MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use
More informationPart 3 Introduction to statistical classification techniques
Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms
More informationChecking the resolved resonance region in EXFOR database
Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,
More informationinitially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur
Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract
More informationthe results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must
M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins
More informationA Scalable Recurrent Neural Network Framework for Model-free
A Scalable Recurrent Neural Netwrk Framewrk fr Mdel-free POMDPs April 3, 2007 Zhenzhen Liu, Itamar Elhanany Machine Intelligence Lab Department f Electrical and Cmputer Engineering The University f Tennessee
More informationResampling Methods. Chapter 5. Chapter 5 1 / 52
Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and
More informationMultiple Source Multiple. using Network Coding
Multiple Surce Multiple Destinatin Tplgy Inference using Netwrk Cding Pegah Sattari EECS, UC Irvine Jint wrk with Athina Markpulu, at UCI, Christina Fraguli, at EPFL, Lausanne Outline Netwrk Tmgraphy Gal,
More informationSUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis
SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm
More informationCS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007
CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is
More information1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp
THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*
More informationModelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA
Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview
More information5 th grade Common Core Standards
5 th grade Cmmn Cre Standards In Grade 5, instructinal time shuld fcus n three critical areas: (1) develping fluency with additin and subtractin f fractins, and develping understanding f the multiplicatin
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationPerfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart
Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student
More informationWriting Guidelines. (Updated: November 25, 2009) Forwards
Writing Guidelines (Updated: Nvember 25, 2009) Frwards I have fund in my review f the manuscripts frm ur students and research assciates, as well as thse submitted t varius jurnals by thers that the majr
More informationA Matrix Representation of Panel Data
web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins
More informationMath Foundations 20 Work Plan
Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant
More informationA New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation
III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.
More informationHypothesis Tests for One Population Mean
Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be
More informationComputational modeling techniques
Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins
More informationWRITING THE REPORT. Organizing the report. Title Page. Table of Contents
WRITING THE REPORT Organizing the reprt Mst reprts shuld be rganized in the fllwing manner. Smetime there is a valid reasn t include extra chapters in within the bdy f the reprt. 1. Title page 2. Executive
More informationOptimization Programming Problems For Control And Management Of Bacterial Disease With Two Stage Growth/Spread Among Plants
Internatinal Jurnal f Engineering Science Inventin ISSN (Online): 9 67, ISSN (Print): 9 676 www.ijesi.rg Vlume 5 Issue 8 ugust 06 PP.0-07 Optimizatin Prgramming Prblems Fr Cntrl nd Management Of Bacterial
More informationMATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank
MATCHING TECHNIQUES Technical Track Sessin VI Céline Ferré The Wrld Bank When can we use matching? What if the assignment t the treatment is nt dne randmly r based n an eligibility index, but n the basis
More informationThe blessing of dimensionality for kernel methods
fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented
More informationNETSYN : a connectionist approach to synthesis knowledge acquisition and use
Carnegie Melln University Research Shwcase @ CMU Department f Electrical and Cmputer Engineering Carnegie Institute f Technlgy 1992 NETSYN : a cnnectinist apprach t synthesis knwledge acquisitin and use
More informationOn Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION
Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate
More informationData Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1
Data Mining: Cncepts and Techniques Classificatin and Predictin Chapter 6.4-6 February 8, 2007 CSE-4412: Data Mining 1 Chapter 6 Classificatin and Predictin 1. What is classificatin? What is predictin?
More informationChurn Prediction using Dynamic RFM-Augmented node2vec
Churn Predictin using Dynamic RFM-Augmented nde2vec Sandra Mitrvić, Jchen de Weerdt, Bart Baesens & Wilfried Lemahieu Department f Decisin Sciences and Infrmatin Management, KU Leuven 18 September 2017,
More informationComparing Several Means: ANOVA. Group Means and Grand Mean
STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal
More informationDrought damaged area
ESTIMATE OF THE AMOUNT OF GRAVEL CO~TENT IN THE SOIL BY A I R B O'RN EMS S D A T A Y. GOMI, H. YAMAMOTO, AND S. SATO ASIA AIR SURVEY CO., l d. KANAGAWA,JAPAN S.ISHIGURO HOKKAIDO TOKACHI UBPREFECTRAl OffICE
More informationThe standards are taught in the following sequence.
B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and
More informationAdmissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs
Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department
More informationKinetic Model Completeness
5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins
More informationWeathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science
Weathering Title: Chemical and Mechanical Weathering Grade Level: 9-12 Subject/Cntent: Earth and Space Science Summary f Lessn: Students will test hw chemical and mechanical weathering can affect a rck
More informationResampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017
Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with
More information, which yields. where z1. and z2
The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin
More informationCHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.
MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the
More informationEmphases in Common Core Standards for Mathematical Content Kindergarten High School
Emphases in Cmmn Cre Standards fr Mathematical Cntent Kindergarten High Schl Cntent Emphases by Cluster March 12, 2012 Describes cntent emphases in the standards at the cluster level fr each grade. These
More informationDetermining the Accuracy of Modal Parameter Estimation Methods
Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system
More informationFall 2013 Physics 172 Recitation 3 Momentum and Springs
Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.
More informationLab 1 The Scientific Method
INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific
More informationAnalysis on the Stability of Reservoir Soil Slope Based on Fuzzy Artificial Neural Network
Research Jurnal f Applied Sciences, Engineering and Technlgy 5(2): 465-469, 2013 ISSN: 2040-7459; E-ISSN: 2040-7467 Maxwell Scientific Organizatin, 2013 Submitted: May 08, 2012 Accepted: May 29, 2012 Published:
More informationLecture 13: Markov Chain Monte Carlo. Gibbs sampling
Lecture 13: Markv hain Mnte arl Gibbs sampling Gibbs sampling Markv chains 1 Recall: Apprximate inference using samples Main idea: we generate samples frm ur Bayes net, then cmpute prbabilities using (weighted)
More informationSequential Allocation with Minimal Switching
In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University
More informationBiocomputers. [edit]scientific Background
Bicmputers Frm Wikipedia, the free encyclpedia Bicmputers use systems f bilgically derived mlecules, such as DNA and prteins, t perfrm cmputatinal calculatins invlving string, retrieving, and prcessing
More informationSIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.
SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST Mark C. Ott Statistics Research Divisin, Bureau f the Census Washingtn, D.C. 20233, U.S.A. and Kenneth H. Pllck Department f Statistics, Nrth Carlina State
More informationPSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa
There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the
More informationApplication Of Mealy Machine And Recurrence Relations In Cryptography
Applicatin Of Mealy Machine And Recurrence Relatins In Cryptgraphy P. A. Jytirmie 1, A. Chandra Sekhar 2, S. Uma Devi 3 1 Department f Engineering Mathematics, Andhra University, Visakhapatnam, IDIA 2
More informationROUNDING ERRORS IN BEAM-TRACKING CALCULATIONS
Particle Acceleratrs, 1986, Vl. 19, pp. 99-105 0031-2460/86/1904-0099/$15.00/0 1986 Grdn and Breach, Science Publishers, S.A. Printed in the United States f America ROUNDING ERRORS IN BEAM-TRACKING CALCULATIONS
More informationChapter 8: The Binomial and Geometric Distributions
Sectin 8.1: The Binmial Distributins Chapter 8: The Binmial and Gemetric Distributins A randm variable X is called a BINOMIAL RANDOM VARIABLE if it meets ALL the fllwing cnditins: 1) 2) 3) 4) The MOST
More informationk-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels
Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t
More informationCOMP 551 Applied Machine Learning Lecture 4: Linear classification
COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted
More informationChapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms
Chapter 5 1 Chapter Summary Mathematical Inductin Strng Inductin Recursive Definitins Structural Inductin Recursive Algrithms Sectin 5.1 3 Sectin Summary Mathematical Inductin Examples f Prf by Mathematical
More informationAdmin. MDP Search Trees. Optimal Quantities. Reinforcement Learning
Admin Reinfrcement Learning Cntent adapted frm Berkeley CS188 MDP Search Trees Each MDP state prjects an expectimax-like search tree Optimal Quantities The value (utility) f a state s: V*(s) = expected
More informationCHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS
CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,
More informationCombining Dialectical Optimization and Gradient Descent Methods for Improving the Accuracy of Straight Line Segment Classifiers
Cmbining Dialectical Optimizatin and Gradient Descent Methds fr Imprving the Accuracy f Straight Line Segment Classifiers Rsari A. Medina Rdriguez and Rnald Fumi Hashimt University f Sa Paul Institute
More informationMethods for Determination of Mean Speckle Size in Simulated Speckle Pattern
0.478/msr-04-004 MEASUREMENT SCENCE REVEW, Vlume 4, N. 3, 04 Methds fr Determinatin f Mean Speckle Size in Simulated Speckle Pattern. Hamarvá, P. Šmíd, P. Hrváth, M. Hrabvský nstitute f Physics f the Academy
More informationTechnical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology
Technical Bulletin Generatin Intercnnectin Prcedures Revisins t Cluster 4, Phase 1 Study Methdlgy Release Date: Octber 20, 2011 (Finalizatin f the Draft Technical Bulletin released n September 19, 2011)
More informationCONSTRUCTING STATECHART DIAGRAMS
CONSTRUCTING STATECHART DIAGRAMS The fllwing checklist shws the necessary steps fr cnstructing the statechart diagrams f a class. Subsequently, we will explain the individual steps further. Checklist 4.6
More informationA mathematical model for complete stress-strain curve prediction of permeable concrete
A mathematical mdel fr cmplete stress-strain curve predictin f permeable cncrete M. K. Hussin Y. Zhuge F. Bullen W. P. Lkuge Faculty f Engineering and Surveying, University f Suthern Queensland, Twmba,
More informationNAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?
CS4445 ata Mining and Kwledge iscery in atabases. B Term 2014 Exam 1 Nember 24, 2014 Prf. Carlina Ruiz epartment f Cmputer Science Wrcester Plytechnic Institute NAME: Prf. Ruiz Prblem I: Prblem II: Prblem
More informationWhat is Statistical Learning?
What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,
More informationWe can see from the graph above that the intersection is, i.e., [ ).
MTH 111 Cllege Algebra Lecture Ntes July 2, 2014 Functin Arithmetic: With nt t much difficulty, we ntice that inputs f functins are numbers, and utputs f functins are numbers. S whatever we can d with
More informationLeast Squares Optimal Filtering with Multirate Observations
Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical
More informationModule 4: General Formulation of Electric Circuit Theory
Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated
More informationA Regression Solution to the Problem of Criterion Score Comparability
A Regressin Slutin t the Prblem f Criterin Scre Cmparability William M. Pugh Naval Health Research Center When the criterin measure in a study is the accumulatin f respnses r behavirs fr an individual
More informationBLAST / HIDDEN MARKOV MODELS
CS262 (Winter 2015) Lecture 5 (January 20) Scribe: Kat Gregry BLAST / HIDDEN MARKOV MODELS BLAST CONTINUED HEURISTIC LOCAL ALIGNMENT Use Cmmnly used t search vast bilgical databases (n the rder f terabases/tetrabases)
More informationSubject description processes
Subject representatin 6.1.2. Subject descriptin prcesses Overview Fur majr prcesses r areas f practice fr representing subjects are classificatin, subject catalging, indexing, and abstracting. The prcesses
More informationTHE LIFE OF AN OBJECT IT SYSTEMS
THE LIFE OF AN OBJECT IT SYSTEMS Persns, bjects, r cncepts frm the real wrld, which we mdel as bjects in the IT system, have "lives". Actually, they have tw lives; the riginal in the real wrld has a life,
More informationTree Structured Classifier
Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients
More informationAN INTERMITTENTLY USED SYSTEM WITH PREVENTIVE MAINTENANCE
J. Operatins Research Sc. f Japan V!. 15, N. 2, June 1972. 1972 The Operatins Research Sciety f Japan AN INTERMITTENTLY USED SYSTEM WITH PREVENTIVE MAINTENANCE SHUNJI OSAKI University f Suthern Califrnia
More informationHomology groups of disks with holes
Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.
More information1 The limitations of Hartree Fock approximation
Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants
More informationEarly detection of mining truck failure by modelling its operation with neural networks classification algorithms
RU, Rand GOLOSINSKI, T.S. Early detectin f mining truck failure by mdelling its peratin with neural netwrks classificatin algrithms. Applicatin f Cmputers and Operatins Research ill the Minerals Industries,
More informationDocument for ENES5 meeting
HARMONISATION OF EXPOSURE SCENARIO SHORT TITLES Dcument fr ENES5 meeting Paper jintly prepared by ECHA Cefic DUCC ESCOM ES Shrt Titles Grup 13 Nvember 2013 OBJECTIVES FOR ENES5 The bjective f this dcument
More informationBOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky
BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS Christpher Cstell, Andrew Slw, Michael Neubert, and Stephen Plasky Intrductin The central questin in the ecnmic analysis f climate change plicy cncerns
More information15-381/781 Bayesian Nets & Probabilistic Inference
15-381/781 Bayesian Nets & Prbabilistic Inference Emma Brunskill (this time) Ariel Prcaccia With thanks t Dan Klein (Berkeley), Percy Liang (Stanfrd) and Past 15-381 Instructrs fr sme slide cntent, and
More informationA Quick Overview of the. Framework for K 12 Science Education
A Quick Overview f the NGSS EQuIP MODULE 1 Framewrk fr K 12 Science Educatin Mdule 1: A Quick Overview f the Framewrk fr K 12 Science Educatin This mdule prvides a brief backgrund n the Framewrk fr K-12
More informationYou need to be able to define the following terms and answer basic questions about them:
CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f
More informationDead-beat controller design
J. Hetthéssy, A. Barta, R. Bars: Dead beat cntrller design Nvember, 4 Dead-beat cntrller design In sampled data cntrl systems the cntrller is realised by an intelligent device, typically by a PLC (Prgrammable
More informationPure adaptive search for finite global optimization*
Mathematical Prgramming 69 (1995) 443-448 Pure adaptive search fr finite glbal ptimizatin* Z.B. Zabinskya.*, G.R. Wd b, M.A. Steel c, W.P. Baritmpa c a Industrial Engineering Prgram, FU-20. University
More informationCHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India
CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce
More informationApplications of latent trait theory to the development of norm-referenced tests.
University f Massachusetts Amherst SchlarWrks@UMass Amherst Dctral Dissertatins 1896 - February 2014 1-1-1979 Applicatins f latent trait thery t the develpment f nrm-referenced tests. Linda L. Ck University
More informationo o IMPORTANT REMINDERS Reports will be graded largely on their ability to clearly communicate results and important conclusions.
BASD High Schl Frmal Lab Reprt GENERAL INFORMATION 12 pt Times New Rman fnt Duble-spaced, if required by yur teacher 1 inch margins n all sides (tp, bttm, left, and right) Always write in third persn (avid
More informationNGSS High School Physics Domain Model
NGSS High Schl Physics Dmain Mdel Mtin and Stability: Frces and Interactins HS-PS2-1: Students will be able t analyze data t supprt the claim that Newtn s secnd law f mtin describes the mathematical relatinship
More informationAPPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL
JP2.11 APPLICATION OF THE BRATSETH SCHEME FOR HIGH LATITUDE INTERMITTENT DATA ASSIMILATION USING THE PSU/NCAR MM5 MESOSCALE MODEL Xingang Fan * and Jeffrey S. Tilley University f Alaska Fairbanks, Fairbanks,
More informationMidwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter
Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline
More informationNAME TEMPERATURE AND HUMIDITY. I. Introduction
NAME TEMPERATURE AND HUMIDITY I. Intrductin Temperature is the single mst imprtant factr in determining atmspheric cnditins because it greatly influences: 1. The amunt f water vapr in the air 2. The pssibility
More informationMACE For Conformation Traits
MACE Fr Cnfrmatin raits L. Klei and. J. Lawlr Hlstein Assciatin USA, Inc., Brattlebr, Vermnt, USA Intrductin Multiple acrss cuntry evaluatins (MACE) fr prductin traits are nw rutinely cmputed and used
More information