An Improved Stemming Approach Using HMM for a Highly Inflectional Language
|
|
- Peregrine Shields
- 5 years ago
- Views:
Transcription
1 An Improved Stemming Approach Using HMM for a Highly Inflectional Language Navanath Saharia 1, Kishori M. Konwar 2, Utpal Sharma 1, and Jugal K. Kalita 3 1 Department of CSE, Tezpur University, India {nava tu,utpal}@tezu.ernet.in 2 Department of MI, University of British Columbia Canada kishori82@yahoo.com 3 Department of CS, University of Colorado at Colorado Springs, USA jkalita@uccs.edu A. Gelbukh (Ed.): CICLing 2013, Part I, LNCS 7816, pp , c Springer-Verlag Berlin Heidelberg 2013
2 An Improved Stemming Approach Using HMM 165
3 166 N. Saharia et al. 3 rd 3 rd
4 An Improved Stemming Approach Using HMM w 0 w 1 w n 1 w i p i s i p i s i S ɛ w p ɛ w p s s = ɛ w
5 168 N. Saharia et al. (ɛ) (S 1 ) (S m ) (S 1) (S m) (ɛ) (S m) (S 1 ) (S m ) (ɛ) s S s ɛ w = p s w p s S w p s s S s = ɛ S w = p s G s S s ɛ w = p s w M N l w 0,w 1, w l 1 N Mq 0,q 1,,q l 1 q i Q {N,M} G G N M G (a) (b)
6 An Improved Stemming Approach Using HMM 169 w w 0 w 1 w 2 w 3 w 4 w 5 w 6 nabinhatar ghar aamar gharar para man durat p nabin ghar aamar ghar para dur s ɛ ɛ ɛ q M N N M N M M S 1 S m S {ɛ} S 1 S m s i S m w i q i = M q i = N e qi (s) =0 s S m S 1 S 1 S m S = {ɛ, s 1,s m } s 1 s m a kl e k (b) a kl e k (b) G {ɛ} S 1 S m A kl E k (b) a kl e k (b) â kl = A kl l A kl + δ and ê k(b) = E k (b) b E k (b )+δ δ 0
7 170 N. Saharia et al. (M sm ) (M s1 ) (N e ) (N s1 ) (S 1 ) (S m ) (ɛ) t t 1
8 An Improved Stemming Approach Using HMM 171 S 0 M N W T P (W T ) S 0 M N ɛ s 1 s m S 0 M N S 0 M N ɛ s 1 s m S 0 M N N S N SN
9 172 N. Saharia et al. References 1. Porter, M.F.: An algorithm for suffix stripping. Program 14, (1980) 2. Ramanathan, A., Rao, D.: A lightweight stemmer for Hindi. In: Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL), on Computatinal Linguistics for South Asian Languages, Budapest, pp (2003) 3. Majumder, P., Mitra, M., Parui, S.K., Kole, G., Mitra, P., Datta, K.: Yass: Yet another suffix stripper. ACM Trans. Inf. Syst. 25(4) (October 2007) 4. Pandey, A.K., Siddiqui, T.J.: An unsupervised Hindi stemmer with heuristic improvements. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, AND 2008, Singapore, pp (2008) 5. Aswani, N., Gaizauskas, R.: Developing morphological analysers for South Asian Languages: Experimenting with the Hindi and Gujarati languages. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC), Malta, pp (2010) 6. Kumar, D., Rana, P.: Design and development of a stemmer for Punjabi. International Journal of Computer Applications 11(12), (2010) 7. Majgaonker, M.M., Siddiqui, T.J.: Discovering suffixes: A case study for Marathi language. International Journal on Computer Science and Engineering 04, (2010) 8. Sharma, U., Kalita, J., Das, R.: Unsupervised learning of morphology for building lexicon for a highly inflectional language. In: Proceedings of the ACL 2002 Workshop on Morphological and Phonological Learning, Philadelphia, pp. 1 6 (2002) 9. Sharma, U., Kalita, J., Das, R.: Root word stemming by multiple evidence from corpus. In: Proceedings of 6th International Conference on Computational Intelligence and Natural Computing (CINC 2003), North Carolina, pp (2003) 10. Sharma, U., Kalita, J.K., Das, R.K.: Acquisition of morphology of an indic language from text corpus. ACM Transactions of Asian Language Information Processing (TALIP) 7(3), 9:1 9:33 (2008)
10 An Improved Stemming Approach Using HMM Saharia, N., Sharma, U., Kalita, J.: Analysis and evaluation of stemming algorithms: a case study with Assamese. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics, ICACCI 2012, Chennai, India, pp ACM (2012) 12. Saharia, N., Sharma, U., Kalita, J.: A suffix-based noun and verb classifier for an inflectional language. In: Proceedings of the 2010 International Conference on Asian Language Processing, IALP 2010, Harbin, China, pp IEEE Computer Society (2010) 13. Al-Shammari, E.T., Lin, J.: Towards an error-free Arabic stemming. In: Proceedings of the 2nd ACM Workshop on Improving Non English Web Searching, inews 2008, pp ACM, New York (2008) 14. Gaustad, T., Bouma, G.: Accurate stemming of Dutch for text classification. Language and Computers 14, (2002) 15. Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid inflectional stemmer and rulebased derivational stemmer for Gujrati. In: 2nd Workshop on South and Southeast Asian Natural Languages Processing, Chiang Mai, Thailand (2011) 16. Ram, V.S., Devi, S.L.: Malayalam stemmer. In: Parakh, M. (ed.) Morphological Analysers and Generators, LDC-IL, Mysore, pp (2010) 17. Bora, L.S.: Asamiya Bhasar Ruptattva. M/s Banalata, Guwahati, Assam, India (2006) 18. Creutz, M., Lagus, K.: Induction of a simple morphology for highly-inflecting languages. In: Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology, SIGMorPhon 2004, Barcelona, Spain, pp ACL (2004) 19. Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. SIGIR Forum 37(1), (2003)
CS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis
More informationA Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR)
A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR) Chirag Shah Dept. of CSE IIT Bombay, Powai Mumbai - 400 076, Maharashtra, India. Email: chirag@cse.iitb.ac.in
More informationTnT Part of Speech Tagger
TnT Part of Speech Tagger By Thorsten Brants Presented By Arghya Roy Chaudhuri Kevin Patel Satyam July 29, 2014 1 / 31 Outline 1 Why Then? Why Now? 2 Underlying Model Other technicalities 3 Evaluation
More informationCLRG Biocreative V
CLRG ChemTMiner @ Biocreative V Sobha Lalitha Devi., Sindhuja Gopalan., Vijay Sundar Ram R., Malarkodi C.S., Lakshmi S., Pattabhi RK Rao Computational Linguistics Research Group, AU-KBC Research Centre
More informationAn Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication
More informationReducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance Alberto Barrón-Cedeño, Paolo Rosso, and José-Miguel Benedí Department of Information Systems and Computation,
More informationSpatial Role Labeling CS365 Course Project
Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationLatent Dirichlet Allocation Based Multi-Document Summarization
Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in
More informationP R + RQ P Q: Transliteration Mining Using Bridge Language
P R + RQ P Q: Transliteration Mining Using Bridge Language Mitesh M. Khapra Raghavendra Udupa n Institute of Technology Microsoft Research, Bombay, Bangalore, Powai, Mumbai 400076 raghavu@microsoft.com
More informationAutomatically Evaluating Text Coherence using Anaphora and Coreference Resolution
1 1 Barzilay 1) Automatically Evaluating Text Coherence using Anaphora and Coreference Resolution Ryu Iida 1 and Takenobu Tokunaga 1 We propose a metric for automatically evaluating discourse coherence
More informationModeling User s Cognitive Dynamics in Information Access and Retrieval using Quantum Probability (ESR-6)
Modeling User s Cognitive Dynamics in Information Access and Retrieval using Quantum Probability (ESR-6) SAGAR UPRETY THE OPEN UNIVERSITY, UK QUARTZ Mid-term Review Meeting, Padova, Italy, 07/12/2018 Self-Introduction
More informationA Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification
A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification Santanu Sen 1 and Tandra Pal 2 1 Tejas Networks India Ltd., Bangalore - 560078, India
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationOptimum parameter selection for K.L.D. based Authorship Attribution for Gujarati
Optimum parameter selection for K.L.D. based Authorship Attribution for Gujarati Parth Mehta DA-IICT, Gandhinagar parth.mehta126@gmail.com Prasenjit Majumder DA-IICT, Gandhinagar prasenjit.majumder@gmail.com
More informationInternational Research Journal of Engineering and Technology (IRJET) e-issn: Volume: 03 Issue: 11 Nov p-issn:
Analysis of Document using Approach Sahinur Rahman Laskar 1, Bhagaban Swain 2 1,2Department of Computer Science & Engineering Assam University, Silchar, India ---------------------------------------------------------------------***---------------------------------------------------------------------
More informationGenerative MaxEnt Learning for Multiclass Classification
Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,
More informationUncertain Logic with Multiple Predicates
Uncertain Logic with Multiple Predicates Kai Yao, Zixiong Peng Uncertainty Theory Laboratory, Department of Mathematical Sciences Tsinghua University, Beijing 100084, China yaok09@mails.tsinghua.edu.cn,
More informationCOURSE CONTENT for Computer Science & Engineering [CSE]
COURSE CONTENT for Computer Science & Engineering [CSE] 1st Semester 1 HU 101 English Language & Communication 2 1 0 3 3 2 PH 101 Engineering Physics 3 1 0 4 4 3 M 101 Mathematics 3 1 0 4 4 4 ME 101 Mechanical
More informationCHAPTER 4 CIRCULATION OF PUBLICATIONS
Circulation of Publications 49 CHAPTER 4 CIRCULATION OF PUBLICATIONS 4.1. A total number of 29,599 publications furnished their circulation figures online for 2016-17 claiming a total circulation of 48,80,89,490
More informationAccelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance
Accelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance Sharon Goldwater (based on slides by Philipp Koehn) 20 September 2018 Sharon Goldwater ANLP Lecture
More informationLarge Scale Semi-supervised Linear SVM with Stochastic Gradient Descent
Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng
More informationInverted Fuzzy Implications in Backward Reasoning Without Yager Implication
Inverted Fuy Implications in Backward Reasoning Without Yager Implication Zbigniew Suraj 1 and Agnieska Lasek 1 Chair of Computer Science, Faculty of Mathematics and Natural Sciences, University of Resow,
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationEntropy as an Indicator of Context Boundaries An Experiment Using a Web Search Engine
Entropy as an Indicator of Context Boundaries An Experiment Using a Web Search Engine Kumiko Tanaka-Ishii Graduate School of Information Science and Technology, University of Tokyo kumiko@i.u-tokyo.ac.jp
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Hidden Markov Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 33 Introduction So far, we have classified texts/observations
More informationPoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2
Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets 1 Shaanxi Normal University, Xi'an, China E-mail: lizekun@snnu.edu.cn Shuyu Li Shaanxi Normal University, Xi'an,
More informationIntegrating induced knowledge in an expert fuzzy-based system for intelligent motion analysis on ground robots
Integrating induced knowledge in an expert fuzzy-based system for intelligent motion analysis on ground robots José M. Alonso 1, Luis Magdalena 1, Serge Guillaume 2, Miguel A. Sotelo 3, Luis M. Bergasa
More informationText Mining. March 3, March 3, / 49
Text Mining March 3, 2017 March 3, 2017 1 / 49 Outline Language Identification Tokenisation Part-Of-Speech (POS) tagging Hidden Markov Models - Sequential Taggers Viterbi Algorithm March 3, 2017 2 / 49
More informationAutomatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics
Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California
More informationPredicting Neighbor Goodness in Collaborative Filtering
Predicting Neighbor Goodness in Collaborative Filtering Alejandro Bellogín and Pablo Castells {alejandro.bellogin, pablo.castells}@uam.es Universidad Autónoma de Madrid Escuela Politécnica Superior Introduction:
More informationArtificial Intelligence
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20-21 Natural Language Parsing Parsing of Sentences Are sentences flat linear structures? Why tree? Is
More informationMetrics for Anonymity
Metrics for Anonymity Rodolfo Leonardo Sumoza Matos 1, Ana Lucila Sandoval Orozco 1, Luis Javier García Villalba 1, and Tai-hoon Kim 2,3 1 Group of Analysis, Security and Systems (GASS) Department of Software
More informationAcquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe
1 1 16 Web 96% 79.1% 2 Acquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe Tomohide Shibata 1 and Sadao Kurohashi 1 This paper proposes a method for automatically
More informationScalable Term Selection for Text Categorization
Scalable Term Selection for Text Categorization Jingyang Li National Lab of Intelligent Tech. & Sys. Department of Computer Sci. & Tech. Tsinghua University, Beijing, China lijingyang@gmail.com Maosong
More informationLatent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization
Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Rachit Arora Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India.
More informationA Comparison of Approaches for Geospatial Entity Extraction from Wikipedia
A Comparison of Approaches for Geospatial Entity Extraction from Wikipedia Daryl Woodward, Jeremy Witmer, and Jugal Kalita University of Colorado, Colorado Springs Computer Science Department 1420 Austin
More informationPure Strategy or Mixed Strategy?
Pure Strategy or Mixed Strategy? Jun He, Feidun He, Hongbin Dong arxiv:257v4 [csne] 4 Apr 204 Abstract Mixed strategy evolutionary algorithms EAs) aim at integrating several mutation operators into a single
More informationA Note on the Effect of Term Weighting on Selecting Intrinsic Dimensionality of Data
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 9, No 1 Sofia 2009 A Note on the Effect of Term Weighting on Selecting Intrinsic Dimensionality of Data Ch. Aswani Kumar 1,
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationBagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy
and for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy Marina Skurichina, Liudmila I. Kuncheva 2 and Robert P.W. Duin Pattern Recognition Group, Department of Applied Physics,
More informationMachine Learning for Interpretation of Spatial Natural Language in terms of QSR
Machine Learning for Interpretation of Spatial Natural Language in terms of QSR Parisa Kordjamshidi 1, Joana Hois 2, Martijn van Otterlo 1, and Marie-Francine Moens 1 1 Katholieke Universiteit Leuven,
More informationClassification of Publications Based on Statistical Analysis of Scientific Terms Distributions
AUSTRIAN JOURNAL OF STATISTICS Volume 37 (2008), Number 1, 109 118 Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions Vaidas Balys and Rimantas Rudzkis Institute
More informationLecture 1b: Text, terms, and bags of words
Lecture 1b: Text, terms, and bags of words Trevor Cohn (based on slides by William Webber) COMP90042, 2015, Semester 1 Corpus, document, term Body of text referred to as corpus Corpus regarded as a collection
More informationCS460/626 : Natural Language Processing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web Lecture 23: Binding Theory Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th Oct, 2012 Parsing Problem Semantics Part of Speech Tagging NLP
More informationMultiple Similarities Based Kernel Subspace Learning for Image Classification
Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese
More informationLING 473: Day 13. START THE RECORDING Evaluation. Lecture 13: Evaluation
LING 473: Day 13 START THE RECORDING Slides sourced from Glenn Slayden, Will Lewis 1 Reminders Project 4 is due today. Writing assignment is due next Tuesday. Project 5 is due a week from today. Course
More informationHidden Markov Model and Speech Recognition
1 Dec,2006 Outline Introduction 1 Introduction 2 3 4 5 Introduction What is Speech Recognition? Understanding what is being said Mapping speech data to textual information Speech Recognition is indeed
More informationAalborg Universitet. FuzzyPR Christensen, Hans Ulrich; Ortiz-Arroyo, Daniel. Published in: Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Aalborg Universitet FuzzyPR Christensen, Hans Ulrich; Ortiz-Arroyo, Daniel Published in: Rough Sets, Fuzzy Sets, Data Mining and Granular Computing DOI (link to publication from Publisher): 10.1007/978-3-540-72530-5_23
More informationBlog Distillation via Sentiment-Sensitive Link Analysis
Blog Distillation via Sentiment-Sensitive Link Analysis Giacomo Berardi, Andrea Esuli, Fabrizio Sebastiani, and Fabrizio Silvestri Istituto di Scienza e Tecnologie dell Informazione, Consiglio Nazionale
More informationReducing False Alarm Rate in Anomaly Detection with Layered Filtering
Reducing False Alarm Rate in Anomaly Detection with Layered Filtering Rafa l Pokrywka 1,2 1 Institute of Computer Science AGH University of Science and Technology al. Mickiewicza 30, 30-059 Kraków, Poland
More informationBipartite spectral graph partitioning to co-cluster varieties and sound correspondences
Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences Martijn Wieling Department of Computational Linguistics, University of Groningen Seminar in Methodology and Statistics
More informationMaja Popović Humboldt University of Berlin Berlin, Germany 2 CHRF and WORDF scores
CHRF deconstructed: β parameters and n-gram weights Maja Popović Humboldt University of Berlin Berlin, Germany maja.popovic@hu-berlin.de Abstract Character n-gram F-score (CHRF) is shown to correlate very
More informationEffectiveness of complex index terms in information retrieval
Effectiveness of complex index terms in information retrieval Tokunaga Takenobu, Ogibayasi Hironori and Tanaka Hozumi Department of Computer Science Tokyo Institute of Technology Abstract This paper explores
More informationA Scientometrics Study of Rough Sets in Three Decades
A Scientometrics Study of Rough Sets in Three Decades JingTao Yao and Yan Zhang Department of Computer Science University of Regina [jtyao, zhang83y]@cs.uregina.ca Oct. 8, 2013 J. T. Yao & Y. Zhang A Scientometrics
More informationMODELLING OF RECIPROCAL TRANSDUCER SYSTEM ACCOUNTING FOR NONLINEAR CONSTITUTIVE RELATIONS
MODELLING OF RECIPROCAL TRANSDUCER SYSTEM ACCOUNTING FOR NONLINEAR CONSTITUTIVE RELATIONS L. X. Wang 1 M. Willatzen 1 R. V. N. Melnik 1,2 Abstract The dynamics of reciprocal transducer systems is modelled
More informationStructured Output Prediction: Generative Models
Structured Output Prediction: Generative Models CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 17.3, 17.4, 17.5.1 Structured Output Prediction Supervised
More informationCS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 8 POS tagset) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th Jan, 2012 HMM: Three Problems Problem Problem 1: Likelihood of a
More informationUtilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms
1 1 1 2 2 30% 70% 70% NTCIR-7 13% 90% 1,000 Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms Itsuki Toyota 1 Yusuke Takahashi 1 Kensaku
More informationGeneralization of Dominance Relation-Based Replacement Rules for Memetic EMO Algorithms
Generalization of Dominance Relation-Based Replacement Rules for Memetic EMO Algorithms Tadahiko Murata 1, Shiori Kaige 2, and Hisao Ishibuchi 2 1 Department of Informatics, Kansai University 2-1-1 Ryozenji-cho,
More informationEXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT
Abstract EXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT Xueying Zhang zhangsnowy@163.com Guonian Lv Zhiren Xie Yizhong Sun 210046 Key Laboratory of Virtual Geographical Environment (MOE) Naning
More informationMeasuring Term Specificity Information for Assessing Sentiment Orientation of Documents in a Bayesian Learning Framework
Measuring Term Specificity Information for Assessing Sentiment Orientation of Documents in a Bayesian Learning Framework D. Cai School of Computing and Engineering University of Huddersfield, HD DH, UK
More informationEffect of Rule Weights in Fuzzy Rule-Based Classification Systems
506 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 4, AUGUST 2001 Effect of Rule Weights in Fuzzy Rule-Based Classification Systems Hisao Ishibuchi, Member, IEEE, and Tomoharu Nakashima, Member, IEEE
More informationAn Introduction to String Re-Writing Kernel
An Introduction to String Re-Writing Kernel Fan Bu 1, Hang Li 2 and Xiaoyan Zhu 3 1,3 State Key Laboratory of Intelligent Technology and Systems 1,3 Tsinghua National Laboratory for Information Sci. and
More informationAN EXPECTED-COST ANALYSIS OF BACKTRACKING AND NON-BACKTRACKING ALGORITHMS
AN EXPECTED-COST ANALYSIS OF BACKTRACKING AND NON-BACKTRACKING ALGORITHMS C.J.H. McDiarmid Department of Statistics University of Oxford Oxford England 0X1 3TG email: MCD@vax.oxford.ac.uk G.M.A. Provan
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationDetermining Word Sense Dominance Using a Thesaurus
Determining Word Sense Dominance Using a Thesaurus Saif Mohammad and Graeme Hirst Department of Computer Science University of Toronto EACL, Trento, Italy (5th April, 2006) Copyright cfl2006, Saif Mohammad
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationDepartment of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling
Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal
More informationScale Space Smoothing, Image Feature Extraction and Bessel Filters
Scale Space Smoothing, Image Feature Extraction and Bessel Filters Sasan Mahmoodi and Steve Gunn School of Electronics and Computer Science, Building 1, Southampton University, Southampton, SO17 1BJ, UK
More informationA New Hybrid System for Recognition of Handwritten-Script
computing@tanet.edu.te.ua www.tanet.edu.te.ua/computing ISSN 177-69 A New Hybrid System for Recognition of Handwritten-Script Khalid Saeed 1) and Marek Tabdzki ) Faculty of Computer Science, Bialystok
More informationA Bayesian Model of Diachronic Meaning Change
A Bayesian Model of Diachronic Meaning Change Lea Frermann and Mirella Lapata Institute for Language, Cognition, and Computation School of Informatics The University of Edinburgh lea@frermann.de www.frermann.de
More informationIdentification of Dominant and Non-Dominant States of Large Scale Discrete-Time Linear Systems and its Applications to Reduced Order Models
Applied Mathematical Sciences, Vol. 6, 2012, no. 116, 5755-5762 Identification of Dominant and Non-Dominant States of Large Scale Discrete-Time Linear Systems and its Applications to Reduced Order Models
More informationPutting Suffix-Tree-Stemming to Work
Putting Suffix-Tree- to Work Benno Stein Bauhaus University Weimar Martin Potthast Paderborn University Index terms Text with markups [Reuters]: CHRYSLER> DEAL LEAVES UNCERTAINTY FOR AMC
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationNon-negative matrix factorization with fixed row and column sums
Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van
More informationP E R E N C O - C H R I S T M A S P A R T Y
L E T T I C E L E T T I C E I S A F A M I L Y R U N C O M P A N Y S P A N N I N G T W O G E N E R A T I O N S A N D T H R E E D E C A D E S. B A S E D I N L O N D O N, W E H A V E T H E P E R F E C T R
More informationCHAPTER 6 : LITERATURE REVIEW
CHAPTER 6 : LITERATURE REVIEW Chapter : LITERATURE REVIEW 77 M E A S U R I N G T H E E F F I C I E N C Y O F D E C I S I O N M A K I N G U N I T S A B S T R A C T A n o n l i n e a r ( n o n c o n v e
More informationTone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition
Tone Analysis in Harmonic-Frequency Domain and Feature Reduction using KLT+LVQ for Thai Isolated Word Recognition SARITCHAI PREDAWAN 1 PRASIT JIYAPANICHKUL 2 and CHOM KIMPAN 3 Faculty of Information Technology
More informationA NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY
A NONPARAMETRIC BAYESIAN APPROACH FOR SPOKEN TERM DETECTION BY EXAMPLE QUERY Amir Hossein Harati Nead Torbati and Joseph Picone College of Engineering, Temple University Philadelphia, Pennsylvania, USA
More informationBalanced Boolean Function on 13-variables having Nonlinearity strictly greater than the Bent Concatenation Bound
Balanced Boolean Function on 13-variables having Nonlinearity strictly greater than the Bent Concatenation Bound Subhamoy Maitra Applied Statistics Unit, Indian Statistical Institute, 203 B T Road, Kolkata
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information
More informationEnhancing Generalization Capability of SVM Classifiers with Feature Weight Adjustment
Enhancing Generalization Capability of SVM Classifiers ith Feature Weight Adjustment Xizhao Wang and Qiang He College of Mathematics and Computer Science, Hebei University, Baoding 07002, Hebei, China
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationhow to *do* computationally assisted research
how to *do* computationally assisted research digital literacy @ comwell Kristoffer L Nielbo knielbo@sdu.dk knielbo.github.io/ March 22, 2018 1/30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class Person(object):
More informationExpected Running Time Analysis of a Multiobjective Evolutionary Algorithm on Pseudo-boolean Functions
Expected Running Time Analysis of a Multiobjective Evolutionary Algorithm on Pseudo-boolean Functions Nilanjan Banerjee and Rajeev Kumar Department of Computer Science and Engineering Indian Institute
More informationAlgorithmic probability, Part 1 of n. A presentation to the Maths Study Group at London South Bank University 09/09/2015
Algorithmic probability, Part 1 of n A presentation to the Maths Study Group at London South Bank University 09/09/2015 Motivation Effective clustering the partitioning of a collection of objects such
More informationNote on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing
Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,
More informationAn Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling
An Algorithm for Fast Calculation of Back-off N-gram Probabilities with Unigram Rescaling Masaharu Kato, Tetsuo Kosaka, Akinori Ito and Shozo Makino Abstract Topic-based stochastic models such as the probabilistic
More informationActive Sonar Target Classification Using Classifier Ensembles
International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 11, Number 12 (2018), pp. 2125-2133 International Research Publication House http://www.irphouse.com Active Sonar Target
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationNew Steganographic scheme based of Reed- Solomon codes
New Steganographic scheme based of Reed- Solomon codes I. DIOP; S.M FARSSI ;O. KHOUMA ; H. B DIOUF ; K.TALL ; K.SYLLA Ecole Supérieure Polytechnique de l Université Dakar Sénégal Email: idydiop@yahoo.fr;
More informationMulti-theme Sentiment Analysis using Quantified Contextual
Multi-theme Sentiment Analysis using Quantified Contextual Valence Shifters Hongkun Yu, Jingbo Shang, MeichunHsu, Malú Castellanos, Jiawei Han Presented by Jingbo Shang University of Illinois at Urbana-Champaign
More informationMULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION
MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION Ehsan Hosseini Asl 1, Jacek M. Zurada 1,2 1 Department of Electrical and Computer Engineering University of Louisville, Louisville,
More informationA Product Partition Model (PPM) is a Bayesian inference procedure for segmentation of a sequence of random variables, based on the heterogeneity of th
Text Segmentation by Product Partition Models and Dynamic Programming Ath. Kehagias Λ,A.Nicolaou y,v.petridis z and P. Fragkou z October 30, 2002 Abstract In this paper we use Barry and Hartigan's Product
More informationSelection of the Appropriate Lag Structure of Foreign Exchange Rates Forecasting Based on Autocorrelation Coefficient
Selection of the Appropriate Lag Structure of Foreign Exchange Rates Forecasting Based on Autocorrelation Coefficient Wei Huang 1,2, Shouyang Wang 2, Hui Zhang 3,4, and Renbin Xiao 1 1 School of Management,
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationQuantum metrology from a quantum information science perspective
1 / 41 Quantum metrology from a quantum information science perspective Géza Tóth 1 Theoretical Physics, University of the Basque Country UPV/EHU, Bilbao, Spain 2 IKERBASQUE, Basque Foundation for Science,
More informationConcepts of a Discrete Random Variable
Concepts of a Discrete Random Variable Richard Emilion Laboratoire MAPMO, Université d Orléans, B.P. 6759 45067 Orléans Cedex 2, France, richard.emilion@univ-orleans.fr Abstract. A formal concept is defined
More informationAutomatic Rank Determination in Projective Nonnegative Matrix Factorization
Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and
More information