Additive Regularization for Hierarchical Multimodal Topic Modeling
|
|
- Blanche Owens
- 6 years ago
- Views:
Transcription
1 Additive Regularization for Hierarchical Multimodal Topic Modeling N. A. Chirkova 1,2, K. V. Vorontsov 3 1 JSC Antiplagiat, 2 Lomonosov Moscow State University 3 Federal Research Center Computer Science and Control of RAS October 14, 2016 N. A. Chirkova October 14, / 31
2 Topic hierarchies for automatic text categorization How to overview a large text collection in a few minutes? Topic hierarchy: soft hierarchical documents clustering into topics; topics are described by specific terminology. A fragment of English Wikipedia topic hierarchy N. A. Chirkova October 14, / 31
3 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31
4 Topic hierarchies for automatic text categorization Topic articles: Toccata and Fugue, F major, E minor, Carl Friedrich Abel, List of compositions by Frédéric Chopin by genre, Piano quintet, F minor... N. A. Chirkova October 14, / 31
5 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31
6 Topic hierarchies for automatic text categorization Topic articles: Filmfare Award for Best Actor, Filmfare Award for Best Film, Karisma Kapoor, Rishi Kapoor, Arjun Rampal, Shammi Kapoor... N. A. Chirkova October 14, / 31
7 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31
8 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31
9 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31
10 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31
11 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31
12 Topic hierarchies for automatic text categorization Topic articles: Functional (C++), SQL/CLI, SQL/JRT, Constructor (object-oriented programming), Static cast, Copy constructor, C++/CX, Java Persistence Query Language... N. A. Chirkova October 14, / 31
13 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31
14 Applications of topic hierarchies Navigation through large text collection Harmonization of existing categorizations duplicate categories detection splitting of miscellaneous topics Searching of semantically similar documents News filtering The need for automatic learning of topic hierarchies. N. A. Chirkova October 14, / 31
15 Applications of topic hierarchies: real world tasks Navigation through large multilingual, multisource, multilmodal text collection Harmonization of existing categorizations duplicate categories detection miscellaneous categories splitting detecting of relations between categories Personalized searching for semantically similar documents News filtering with respect to geography and time The need for automatic learning of flexible topic hierarchies. N. A. Chirkova October 14, / 31
16 Topic hierarchies in ARTM Additive Regularization of Topic Models: Modeling fixed number of topics from a set of multimodal documents: text, tags, authors, categories, geotags ans timestamps, commented users, etc flexibility Regularization to satisfy additional requirements: topics sparsity, decorrelation, interpretability; consistency with partial markup, etc flexibility Scalable open-source implementation: BigARTM.org The goal of the research: to extend ARTM to learn topic hierarchies and to implement approach in BigARTM. N. A. Chirkova October 14, / 31
17 Topic hierarchies in ARTM: key features Topic hierarchy is a multipartite (multilevel) graph of topics: The flexibility of hierarchical structure: multiple inheritance (a topic may have several parent topics); control over hierarchy sparsity. Automatic determination of children topics number. N. A. Chirkova October 14, / 31
18 Topic hierarchies in ARTM: approach 1 Each level (except Root) is a flat topic model with its own regularizers. 2 When learning topics of l-th level we use specific regularier to find parent topics from (l 1)-th level. 3 We propose a regularizer to control hierarchy sparsity. N. A. Chirkova October 14, / 31
19 ARTM: a flat topic model Given: documents set d D, modalities m M, modalities disjoint dictionaries W = m M W m of tokens w W, document-token counters matrix n dw used to estimate p(w d): n dw p(w d) = w W m n dw Flat topic model for each modality m: p(w d) t T p(w t)p(t d) = t T φ wt θ td d D, w W m, with topics set T and model parameters Φ m = {φ wt } W m T with p(w t) and Θ = {θ td } T D with p(t d) values, Φ = m M Φm Vorontsov K., Frei O., Apishev M., Romov P., Suvorova M., Yanina A. Non-bayesian additive regularization for multimodal topic modeling of large collections N. A. Chirkova October 14, / 31
20 ARTM: flat model learning Optimization task: κ m n dw ln φ wt θ td + τ i R i (Φ, Θ) m M d D w W m t T i Log Likelihood w W m φ ws = 1; φ ws 0 m; EM-algorithm for topic model training: E-step : p(t d, w) = norm [φ wtθ td ] t T [ M-step : n wt + R φ wt φ wt φ wt = norm w W m [ θ td = norm t T n td + R θ td θ td Regularizers θ sd = 1; θ sd 0 s norm[y i ] = i I max Φ,Θ max{y i,0} i I max{y i,0} ], n wt = d D n dw p(t d, w) ], n td = w W n dw p(t d, w) Vorontsov K., Frei O., Apishev M., Romov P., Suvorova M., Yanina A. Non-bayesian additive regularization for multimodal topic modeling of large collections N. A. Chirkova October 14, / 31
21 ARTM: regularizers example The goal: distributions p(w t) and p(t d) should be sparse. Θ sparsing: R 1 (Θ) = 1 T ln θ td d D t T Updated M-step: [ θ td = norm n td τ ] 2 t T T Φ sparsing: R 2 (Φ m ) = 1 W m ln φ wt t T w W m Updated M-step: [ φ wt = norm n wt τ ] 1 w W m W m Vorontsov K., Frei O., Apishev M., Romov P., Suvorova M., Yanina A. Non-bayesian additive regularization for multimodal topic modeling of large collections N. A. Chirkova October 14, / 31
22 hartm: Φ interlevel regularizer Already learned: levels 1,..., l, l-th level: topics set a A, parameters Φ l R W A and Θ l R A D. Level to learn: topics set t T, parameters Φ R W T and Θ R T D. The goal: to establish parent-child relations t is a child of a. Hypothesis: parent topic is a mixture of children topics p(w a) = t T p(w t)p(t a), w W m, a A. Φ regularization criteria with new parameters Ψ = {ψ ta } T A, ψ ta = p(t a): Φ l ΦΨ R 3 (Φ, Ψ) = n wa ln φ wt ψ ta m M a A w W m t T Implementation: A pseudodocuments with n wa (counted on M-step). N. A. Chirkova October 14, / 31
23 hartm: Θ interlevel regularizer Already learned: levels 1,..., l, l-th level: topics set a A, parameters Φ l R W A and Θ l R A D. Level to learn: topics set t T, parameters Φ R W T and Θ R T D. The goal: to establish parent-child relations t is a child of a. Hypothesis: p(a d) = t T p(a t)p(t d), a A, d D. Θ regularization criteria with new parameters Ψ = { ψ at } A T, ψ at = p(a t): Θ l ΨΘ R 4 (Θ, Ψ) = n ad ln ψ at θ td a A d D t T Implementation: new modality with tokens corresponding to a A. N. A. Chirkova October 14, / 31
24 hartm: interlevel regularizers illustration PLSA ARTM F.. Φ Θ F 1 F 2.. Φ1 Φ 2 Θ hartm with F Φ l.. Φ Φ reg. hartm with Θ reg. Θ Ψ F 1 F 2 Θ l.. Φ 1 Φ 2 F = m M F m, F m = {f dw } W m T, f dw = norm w W m[n dw ] Ψ Θ N. A. Chirkova October 14, / 31
25 hartm: hierarchy sparsing with Θ interlevel regularizer The goal: topics have small number of parent topics p(a t) is sparse. Entropy sparsing regularizer: R 5 ( Ψ) = t T a A 1 A ln ψ at Updated M-step: [ ψ at = norm n at τ ] 5 a A A Drawback: the possibility of p(a t) = 0 a Power sparsing regularizer: R 5 ( Ψ) = 1 ψ q q at, q > 1 Updated M-step: t T a A [ ] ψ at = norm n at + τ 5 ψ at q a A N. A. Chirkova October 14, / 31
26 hartm: hierarchy sparsing with Φ interlevel regularizer The goal: topics have small number of parent topics p(a t) is sparse. Entropy sparsing regularizer: R 5 (Ψ) = t T Updated M-step: a A ψ ta = norm t T At any time t a : p(a t) > ln p(a t) = A A [ ln ( ) ] 1 n ta τ 5 A p(a t) a t ψ ta p(a) a ψ ta p(a ) N. A. Chirkova October 14, / 31
27 hartm in BigARTM Key BigARTM concepts: Documents set is split into batches and stored on disk 1 EM-step = a pass through batches iterating over each batch Storing Φ permanently, retraining Θ for any loaded batch Φ interlevel regularizer implementation: 1 Learn levels l = 1, 2, For levels l > 1 add 1 extra batch composed from (l 1)-th level s Φ 3 Extract Ψ as Θ corresponding to extra batch Θ intervelel regularizer implementation: 1 Learn levels l = 1, 2, For levels l > 1 modify all batches: add extra modality composed from (l 1)-th level s Θ 3 Extract Ψ as Φ corresponding to extra modality N. A. Chirkova October 14, / 31
28 Experiments: comparison of Φ and Θ interlevel regularizers Wikipedia: D = , W = Learning 2 nd level, A = 50, T = 250, vary number of batches. Measuring the quality of approximation Φ l ΦΨ and Θ l = ΨΘ. ρ(φ l, ΦΨ) Φ ρ(θ l, ΨΘ) Φ ρ(φ l, ΦΨ) Θ ρ(θ l, ΨΘ) Θ Approximation is quite the same with both regularizers, Φ-reg. is better. N. A. Chirkova October 14, /
29 Experiments: children number study Postnauka: D = 1728, W = Learning 2 nd level with Φ-reg., A = 10, T = 30, vary hierarchy sparsing reg. τ 5. Measuring the mean and standard deviation of estimated subtopics count over 10 restarts. t is a child of a if p(t a) > threshold. log 10 τ log 10 τ The bigger τ 5, the more sparse the hierarchy. For large τ 5 subtopics count estimation is robust (std < 1). N. A. Chirkova October 14, / 31
30 Experiments: parent-child relations study Postnauka: D = 1728, W = Learning topics hierarchy with Φ-reg. Generating 100 pairs topic-subtopic, asking an expert to mark a pair as relation exists or not. p(a t) no sparsing p(a t) Ψ sparsing When using the hierarchy sparcing, we can impose a threshold with minimum errors. N. A. Chirkova October 14, / 31
31 Summary Contributions: An approach to learn topic hierarchies from multimodal data with additional requirements. A method to control hierarchy sparsity. Open-source implementation in BigARTM with friendly interface. Ongoing projects with hartm: Creating a user-friendly navigator through Postnauka.ru materials. Developing a system for online news flow filtration. N. A. Chirkova October 14, / 31
Additive Regularization of Topic Models for Topic Selection and Sparse Factorization
Additive Regularization of Topic Models for Topic Selection and Sparse Factorization Konstantin Vorontsov 1, Anna Potapenko 2, and Alexander Plavin 3 1 Moscow Institute of Physics and Technology, Dorodnicyn
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationMatrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang
Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationLarge-Scale Behavioral Targeting
Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationFrom Non-Negative Matrix Factorization to Deep Learning
The Math!! From Non-Negative Matrix Factorization to Deep Learning Intuitions and some Math too! luissarmento@gmailcom https://wwwlinkedincom/in/luissarmento/ October 18, 2017 The Math!! Introduction Disclaimer
More informationPachinko Allocation: DAG-Structured Mixture Models of Topic Correlations
: DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationClustering, K-Means, EM Tutorial
Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information
More informationOnline Dictionary Learning with Group Structure Inducing Norms
Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,
More informationDiscovering Geographical Topics in Twitter
Discovering Geographical Topics in Twitter Liangjie Hong, Lehigh University Amr Ahmed, Yahoo! Research Alexander J. Smola, Yahoo! Research Siva Gurumurthy, Twitter Kostas Tsioutsiouliklis, Twitter Overview
More informationLecture 8: Clustering & Mixture Models
Lecture 8: Clustering & Mixture Models C4B Machine Learning Hilary 2011 A. Zisserman K-means algorithm GMM and the EM algorithm plsa clustering K-means algorithm K-means algorithm Partition data into K
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationNatural Language Processing. Topics in Information Retrieval. Updated 5/10
Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationSum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017
Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia
More informationTopic Modeling: Beyond Bag-of-Words
University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a
More informationPredicting New Search-Query Cluster Volume
Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive
More informationUniversity of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models
University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes Lecture 1 Slides March 28 th, 2006 Lec 1: March 28th, 2006 EE512
More informationN-gram Language Modeling Tutorial
N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationSparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations
Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationDecision Trees (Cont.)
Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationAdditive regularization of topic models
Mach Learn (2015) 101:303 323 DOI 10.1007/s10994-014-5476-6 Additive regularization of topic models Konstantin Vorontsov AnnaPotapenko Received: 22 January 2014 / Accepted: 18 November 2014 / Published
More informationAPPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS
APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015 Heterogeneous Information Networks
More informationFactor Modeling for Advertisement Targeting
Ye Chen 1, Michael Kapralov 2, Dmitry Pavlov 3, John F. Canny 4 1 ebay Inc, 2 Stanford University, 3 Yandex Labs, 4 UC Berkeley NIPS-2009 Presented by Miao Liu May 27, 2010 Introduction GaP model Sponsored
More informationMultimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data
1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
More informationA Continuous-Time Model of Topic Co-occurrence Trends
A Continuous-Time Model of Topic Co-occurrence Trends Wei Li, Xuerui Wang and Andrew McCallum Department of Computer Science University of Massachusetts 140 Governors Drive Amherst, MA 01003-9264 Abstract
More informationInformation Retrieval and Organisation
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationDeep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of
More informationWord Representations via Gaussian Embedding. Luke Vilnis Andrew McCallum University of Massachusetts Amherst
Word Representations via Gaussian Embedding Luke Vilnis Andrew McCallum University of Massachusetts Amherst Vector word embeddings teacher chef astronaut composer person Low-Level NLP [Turian et al. 2010,
More informationBinary Principal Component Analysis in the Netflix Collaborative Filtering Task
Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationNotifications and Accretions: Facility Role
Notifications and Accretions: Facility Role Goal: Learn to view and resolve notification and accretion discrepancies in CROWNWeb. Estimated Time: 25 to 30 minutes PDF: Download a screen reader compatible
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationPart-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationData Science and Scientific Computation Track Core Course
Data Science and Scientific Computation Track Core Course Christoph Lampert Spring Semester 2016/17 Segment 1, Lecture 2 1 / 32 Overview Date no. Topic Feb 27 Mon 1 predictive models, least squares regression,
More informationLarge Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.
Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, 2006 Dr. Overview Brief introduction Chemical Structure Recognition (chemocr) Manual conversion
More informationOnline Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationSt. Kitts and Nevis Heritage and Culture
St. Kitts and Nevis Heritage and Culture Eloise Stancioff, Habiba, Departmet of Culture St. Kitts HERA workshop: March 17-20, 2015 Goals Using freely available open source platforms, we implement two different
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationCategorical and Zero Inflated Growth Models
Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).
More information10-701/15-781, Machine Learning: Homework 4
10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationSingle-tree GMM training
Single-tree GMM training Ryan R. Curtin May 27, 2015 1 Introduction In this short document, we derive a tree-independent single-tree algorithm for Gaussian mixture model training, based on a technique
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More information4CitySemantics. GIS-Semantic Tool for Urban Intervention Areas
4CitySemantics GIS-Semantic Tool for Urban Intervention Areas Nuno MONTENEGRO 1 ; Jorge GOMES 2 ; Paulo URBANO 2 José P. DUARTE 1 1 Faculdade de Arquitectura da Universidade Técnica de Lisboa, Rua Sá Nogueira,
More informationEntropy. Expected Surprise
Entropy Let X be a discrete random variable The surprise of observing X = x is defined as log 2 P(X=x) Surprise of probability 1 is zero. Surprise of probability 0 is (c) 200 Thomas G. Dietterich 1 Expected
More informationAn overview of word2vec
An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations
More informationNote for plsa and LDA-Version 1.1
Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More informationDiscriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos
Discriminative Learning of Sum-Product Networks Robert Gens Pedro Domingos X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6 Distributions X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3
More informationPutting the Bayes update to sleep
Putting the Bayes update to sleep Manfred Warmuth UCSC AMS seminar 4-13-15 Joint work with Wouter M. Koolen, Dmitry Adamskiy, Olivier Bousquet Menu How adding one line of code to the multiplicative update
More informationA Tutorial on Learning with Bayesian Networks
A utorial on Learning with Bayesian Networks David Heckerman Presented by: Krishna V Chengavalli April 21 2003 Outline Introduction Different Approaches Bayesian Networks Learning Probabilities and Structure
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationLanguage Information Processing, Advanced. Topic Models
Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More informationLogic-based probabilistic modeling language. Syntax: Prolog + msw/2 (random choice) Pragmatics:(very) high level modeling language
1 Logic-based probabilistic modeling language Turing machine with statistically learnable state transitions Syntax: Prolog + msw/2 (random choice) Variables, terms, predicates, etc available for p.-modeling
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationMatrix Factorization Techniques for Recommender Systems
Matrix Factorization Techniques for Recommender Systems Patrick Seemann, December 16 th, 2014 16.12.2014 Fachbereich Informatik Recommender Systems Seminar Patrick Seemann Topics Intro New-User / New-Item
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More informationNaïve Bayes Classifiers
Naïve Bayes Classifiers Example: PlayTennis (6.9.1) Given a new instance, e.g. (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong ), we want to compute the most likely hypothesis: v NB
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More information