Summarizing Creative Content

Size: px
Start display at page:

Download "Summarizing Creative Content"

Transcription

1 Summarizing Creative Content Olivier Toubia Columbia Business School Behavioral Insights from Text Conference, January / 49

2 Background and Motivation More than 40 million ( one third of the employed population) of Americans belong to the Creative Class (Florida 2014) Science and engineering, education, arts, entertainment, etc. Primary economic function is to create new ideas/content Output often takes the form of creative documents (e.g., academic papers, books, scripts, business models) Creative Documents usually come with summaries (e.g., abstracts, synopses, executive summaries) Indispensable, given that the average american spends approx. 12 hours a day consuming media (Statista, 2017) 2 / 49

3 Paper Overview Objectives: Quantify how humans summarize creative documents Computer-assisted writing of summaries of creative documents Natural Language Processing Model Inspired by creativity literature, based on Poisson Factorization Capture both inside the cone (based on common topics) and outside the cone (residual) content in documents Capture writing norms that govern summarization process Empirical applications Marketing academic papers and their abstracts Movie scripts and their synopses Online interactive tool (publicly available at 3 / 49

4 Outline Relevant Literatures Model Empirical Applications Practical Application 4 / 49

5 Relevant literatures Creativity Creativity lies in balance between novelty and familiarity (Giora 2003; Uzzi et al., 2013; Toubia and Netzer, 2017) Summaries should capture both the familiar and novel aspects of the creative document, possibly with different weights Novelty and Familiarity should be measured by combinations of words rather than individual words (Mednick 1962; Finke, Ward and Smith 1992; Toubia and Netzer 2017) Text Summarization (e.g., Radev et al. 2002; Nenkova and McKeown, 2012) Focused primarily on automatic text summarization This project: Focus on how humans summarize creative documents Use computers to assist humans 5 / 49

6 Relevant literatures Poisson Factorization (e.g., Gopalan, Hofman and Blei, 2013; Gopalan, Charlin and Blei, 2014) Topic model Offset variables (e.g., explain choices of articles by academics) This project: Leverage offset variables to capture changes in topic weights in documents vs. summaries Introduce residual topics that capture outside the cone content 6 / 49

7 Traditional Content-Based Poisson Factorization (Gopalan, Charlin and Blei, 2014) Document d (e.g., academic article, movie script, book, pitch, product description) Words v = 1,...V Document d has w dv occurrences of word w K topics Each topic k has weight β kv on word v Document d has topic intensity θ dk on topic k 7 / 49

8 Traditional Content-Based Poisson Factorization: Data Generating Process 1. For each topic k = 1,...K: For each word v, draw βkv Gamma(a, b) 2. For each document d = 1,...D: For each topic, draw topic intensity θdk Gamma(c, d) For each word v, draw word count wdv Poisson( k θ dkβ kv ) 8 / 49

9 Outline Relevant Literatures Model Empirical Applications Practical Application 9 / 49

10 Geometric Interpretation of Poisson Factorization Traditional Poisson Factorization approximates frequency of words in document d as a weighted average of topics: w d Poisson( k θ dkβ k ) E(wd ) = k θ dkβ k E(wd ) is a point in the cone defined by the topics {β k }, in the Euclidean space defined by the words in the vocabulary Observed word frequency w d is: E(w d ) (projection on the cone - inside the cone ) + residual ( outside the cone ) Residual ( outside the cone ) should help explain content in summary May reflect some novel aspects of the document 10 / 49

11 Geometric Interpretation of Poisson Factorization three words, three topics 11 / 49

12 Proposed Model: Regular vs. Residual Topics K regular topics Similar to traditional topics in Content-Based Poisson Factorization and other topic models Each regular topic k has weight β reg kv on word v Document d has topic intensity θ reg dk on regular topic k D residual topics One per document Each residual topic d has weight β res dv on word v 12 / 49

13 Proposed Model: Offset Parameters Capture writing norms that govern the appearance of topics in summaries vs. full documents (e.g., abstracts of marketing academic papers rarely mention limitations) Each regular topic k has an offset parameter ɛ reg k that captures the relation between occurrences of this topic in full documents vs. summaries Residual topics have their own offset parameters (common across residual topics): ɛ res 13 / 49

14 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 14 / 49

15 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 2. For each residual topic d = 1,...D: For each word v, draw βdv res Gamma(a, b) 14 / 49

16 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 2. For each residual topic d = 1,...D: For each word v, draw βdv res Gamma(a, b) 3. Draw (single) offset parameter for residual topics ɛ res Gamma(g, h) 14 / 49

17 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 2. For each residual topic d = 1,...D: For each word v, draw βdv res Gamma(a, b) 3. Draw (single) offset parameter for residual topics ɛ res Gamma(g, h) 4. For each document d = 1,...D: For each regular topic, draw topic intensity θ reg dk Gamma(c, d) For each word v, draw word count w dv Poisson( k θreg dk βreg kv + βres dv ) 14 / 49

18 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 2. For each residual topic d = 1,...D: For each word v, draw βdv res Gamma(a, b) 3. Draw (single) offset parameter for residual topics ɛ res Gamma(g, h) 4. For each document d = 1,...D: For each regular topic, draw topic intensity θ reg dk Gamma(c, d) For each word v, draw word count w dv Poisson( k θreg dk βreg kv + βres dv ) 5. For each document summary d = 1,...D, For each word v, draw word count w summary dv Poisson( k θreg dk βreg kv ɛreg k + βdv res ɛ res ) 14 / 49

19 Inference: Auxiliary Variables ( Gopalan, Hofman and Blei, 2013; Gopalan, Charlin and Blei, 2014) Assign occurrences of word v in document d across the various topics (latent variables) z reg dv,k Poisson(θreg dk βreg kv ) ; zres dv w dv = k zreg dv,k + zres dv Poisson(βres dv ) such that z sum,reg dv,k that w summary dv Poisson(θ reg dk βreg = k zsum,reg dv,k kv ɛreg k ); zsum,res dv + z sum,res dv Poisson(β res dv ɛres ) such 15 / 49

20 Inference: Posterior Distributions - ALL CONDITIONALLY CONJUGATE! ( Gopalan, Hofman and Blei, 2013; Gopalan, Charlin and Blei, 2014) β reg kv Gamma(a + d (zreg dv,k + zsum,reg dv,k ), b + d θreg dk (1 + ɛreg k )) β res dv ɛ reg Gamma(a + zres dv + zsum,res dv, b ɛ res ) k Gamma(g + d,v zsum,reg dv,k, h + d,v θreg dk βreg kv ) ɛ res Gamma(g + d,v zsum,res dv, h + d,v βres dv ) θ reg dk Gamma(c + v (zreg dv,k + zsum,reg dv,k ), d + v βreg vk (1 + ɛreg k )) [{z reg dv,k } k, z res dv ] Mult([{θreg dk βreg kv } k, β res dv ]) [{z sum,reg dv,k } k, z sum,res dv ] Mult([{θ reg dk βreg kv ɛreg k } k, β res dv ɛres ]) 16 / 49

21 Inference Using Variational Inference (Blei et al., 2003, 2016) Approximate posterior distribution of each parameter with member of a family of distribution Identify member of the family that minimizes the distance (KL divergence) between the true and approximated distribution Coordinate Ascent Mean-Field Variational Inference Algorithm: iteratively minimize the distance between the posterior distribution of each model parameter and the approximate distribution Order of magnitude faster than MCMC 17 / 49

22 Variational Inference P(θ reg reg dk...) approximated by Gamma( θ dk ) P(β reg kv...) approximated by Gamma( β reg kv ); P(βres kv...) approximated by Gamma( β kv res) P(ɛ reg k...) approximated by Gamma( ɛ reg k ); P(ɛ res...) approximated by Gamma( ɛ res ) P({z reg dv,k } k, zdv res...) = Mult({φreg dv,k } k, φ res dv ) where φ reg dv,k θreg dk βreg kv = exp(log(θ reg dk ) + log(βreg kv )) approximated by φ reg dv,k φ res dv βres kv P({z sum,reg dv,k φ sum,reg dv,k φ sum,reg dv,k φ sum,res dv } k, z sum,res dv...) = Mult({φ sum,reg dv,k } k, φ sum,res dv ) where θ reg dk βreg kv ɛreg k = exp(log(θ reg dk ) + log(βreg β res kv ɛres approximated by φ sum,res dv kv ) + log(ɛreg k )) approximated by 18 / 49

23 Coordinate Ascent Mean-Field Variational Inference Algorithm Blei et al., 2016 θ reg dk = < c + v (w dv φ reg β reg kv = < a + d (w dv φ reg β res dv = < a + w dv φ res ɛ reg k = < g + d,v ɛ res = < g + d,v dv,k + w summary dv dv,k + w summary dv dv + w summary dv (w summary dv (w summary dv φ summary,reg dv,k ), d + v φ summary,reg dv,k ), b + d φ summary,res SHAPE ɛres dv, b φ summary,reg dv,k ), h + d,v φ summary,res dv ), h + d,v β reg SHAPE kv β reg kv RATE θ reg SHAPE dk θ reg dk RATE (1 + ɛ (1 + ɛ > ɛ res RATE θ reg SHAPE β reg SHAPE dk θ reg dk RATE kv β reg > kv RATE β dv res SHAPE > β dv res RATE reg SHAPE k ɛ reg k reg SHAPE k ɛ reg k RATE ) > RATE ) > [{φ reg dv,k } k, φ res dv ] Exp({Ψ( θ reg dk SHAPE ) log( θ reg dk RATE ) + Ψ( β reg kv SHAPE ) log( β reg kv RATE )} k, Ψ( β dv res SHAPE ) log( β dv res RATE )) [{φ sum,reg dv,k } k, φ sum,res dv ] Exp({Ψ( θ reg dk SHAPE ) log( θ reg dk RATE )+Ψ( β reg kv SHAPE ) log( β reg kv RATE )+ Ψ( ɛ reg k SHAPE ) log( ɛ reg k RATE )} k, Ψ( β dv res SHAPE ) log( β dv res RATE )+Ψ( ɛ res SHAPE ) log( ɛ res RATE )) Where Ψ is the digamma function 19 / 49

24 Predicting Summary of an Out-of-Sample Document Based on its Full Text Input: Parameter estimates based on in-sample documents: {β reg k } k ; {ɛ reg k } k; ɛ res Full text of out-of-sample document d out Estimate topic intensities {θ reg d out k } k and residual βd res out out-of-sample document (using Variational Inference) topic for Predict word occurrences in summary of out-of-sample document: λ summary d out v = k θreg d out k βreg kv ɛreg k + βd res out v ɛres 20 / 49

25 Outline Relevant Literatures Model Empirical Applications Practical Application 21 / 49

26 Application I: Marketing Academic Papers and their Abstracts Abstracts and full texts of all 1,333 research papers published in Marketing Science, Journal of Marketing, Journal of Marketing Research, and Journal of Consumer Research, between 2010 and 2015 Preprocessing: Spelling corrector (Python) Eliminate non-english characters and words, numbers, punctuation Tokenize text Remove stopwords and words that contain only one character No stemming or Lemmatization Randomly split documents between calibration (75%) and validation (25%) 22 / 49

27 Vocabulary of words (based on calibration set of documents) Term frequency (tf ) of word v: total number of occurrences of word across all documents Remove words with tf < 100 Document frequency (df ) of word v: number of documents with at least one occurrence of word Term-frequency document inverse-frequency tf -idf (v) = tf (v) log( #documents df (v) ) Keep 1,000 words with highest tf -idf Remove words that appear in too many documents or that appear too infrequently 23 / 49

28 Descriptive Statistics Statistic Unit of analysis Mean St. dev. Min Max Occurrences of words from vocabulary Occurrences of words from vocabulary Number of words from vocabulary with at least one occurrence Number of words from vocabulary with at least one occurrence Number of occurrences across full texts Number of occurrences across abstracts Number of full texts with at least one occurrence Number of abstracts with at least one occurrence Full text of paper (N=1,333) Abstract of paper (N=1,333) Full text of paper (N=1,333) Abstract of paper (N=1,333) Word in vocabulary (N=1,000) Word in vocabulary (N=1,000) Word in vocabulary (N=1,000) Word in vocabulary (N=1,000) 2, , , , , , , / 49

29 Number of Topics Could be determined using cross-validation Instead, simply set number of regular topics K to 100 (Gopalan, Chaplin and Blei, 2014) Gamma prior induces sparsity If K = 100 is more than enough, some topics can be flat 25 / 49

30 Results - Distribution of Offset Parameters Among 29 non-flat topics 26 / 49

31 Results - Examples of Regular Topics with Small Offset Parameters (relatively weaker representation in summaries vs. documents) Visualize topic by creating word cloud of content simulated based on topic distribution 27 / 49

32 Results - Regular Topic with the highest Offset Parameter (relatively stronger representation in summaries vs. documents) 28 / 49

33 Results - Example Iyengar, Van den Bulte, Valente. Opinion leadership and social contagion in new product diffusion. Marketing Science 30.2 (2011) Content of actual paper Inside the cone content Outside the cone content 29 / 49

34 Outside the Cone Content Novelty? Covariates Journal fixed effects Publication year fixed effects Intensities on non-flat regular topics Proportion of outside the cone content DV=log(1+#citations) Proportion of outside the cone content Number of parameters 40 Number of observations 1,000 R Regression estimated separately using OLS. : significant at p < : significant at p < Residual is the proportion of words in the paper assigned to the residual topic, standardized across papers for interpretability. 30 / 49

35 Nested Benchmarks Full model with residual topics and offset parameters No residual topic Offset parameter constant across all topics (assume relative topic intensities are the same in summaries vs. documents) No residual topic and constant offset parameter traditional Poisson Factorization Residual topic only (each document is unique, no learning across documents) 31 / 49

36 Non-Nested Benchmarks Latent Dirichlet Allocation (Blei et al., 2003) Models probability of occurrence of each word conditional on total number of words Each document in calibration sample is merged with its summary No residual topic, no offset parameter Also estimated using Variational Inference (Blei et al., 2003) 32 / 49

37 Fit criterion: Perplexity Model output: fitted Poisson rate for each word in each document: λ dv = k θreg dk βreg kv in each summary: λsummary dv + βres dv = k θreg dk βreg kv ɛreg k + β res dv ɛres Transform Poisson rates into probability weights for each word φdv = probability that a random word in document d is word v Fit measured by Perplexity Perplexity = exp( d obs d log( φ d,obs ) N ) Inversely related to geometric mean of the likelihood function 33 / 49

38 Results - Fit Measures on Calibration Documents Train model on calibration documents and their summaries: Estimate regular topics β reg kv Estimate topic intensities θ reg dk Estimate offset parameters ɛ reg k, ɛres Estimate residual topics β res dv Perplexity of calibration documents and summaries (in-sample) 34 / 49

39 Results - Fit Measures on Validation Documents Based on parameter estimates from calibration documents + full texts of validation documents: Estimate topic intensities of validation documents: θ reg d out k Estimate residual topics of validation documents: β res d out v Perplexity of validation documents (in-sample) Predict content of validation summaries: λ summary d out v = k θreg d out k βreg kv ɛreg k + βd res out v ɛres Perplexity of validation summaries (out-of-sample) 35 / 49

40 Benchmark Comparisons (Perplexity - less is better) Approach Calibration documents Fit Calibration summaries Validation documents Predictive perf. Validation summaries Full Model No residual topic ɛ constant No residual topic and ɛ constant Residual topic only LDA / 49

41 Application II: Movie Scripts and their Synopses Scripts (documents) and synopses (summaries) of 858 movies Scripts from International Movie Script Database (imsdb.com) Synopses from International Movie Database (imdb.com) Same pre-processing as with marketing papers Calibration sample (75%) and validation sample (25%) 1,000 words selected based on tf -idf (cutoff of 65 vs. 100 because fewer documents) 37 / 49

42 Descriptive Statistics Statistic Unit of analysis Mean St. dev. Min Max Number of occurrences of words from vocabulary Number of occurrences of words from vocabulary Number of words from vocabulary with at least one occurrence Number of words from vocabulary with at least one occurrence Number of occurrences across scripts Number of occurrences across synopses Number of scripts with at least one occurrence Number of synopses with at least one occurrence Script 1, ,489 Synopsis Script Synopsis Word in vocabulary 1, , ,633 Word in vocabulary ,322 Word in vocabulary Word in vocabulary / 49

43 Results - Distribution of Offset Parameters Among 29 non-flat topics 39 / 49

44 Results - Examples of Regular Topics with Small Offset Parameters (relatively weaker representation in summaries vs. documents) 40 / 49

45 Results - Examples of Regular Topics with Large Offset Parameters (relatively stronger representation in summaries vs. documents) 41 / 49

46 Results - Example: Forrest Gump Content of actual script Inside the cone content Outside the cone content 42 / 49

47 Outside the Cone Content Novelty? Covariates DV=Movie rating DV=Log(ROI) MPAA rating fixed effects Genre fixed effects Intensities on non-flat regular topics Log(inflation-adjusted production budget) Movie rating N/A (Movie rating) 2 N/A Proportion of outside the cone content Proportion of outside the cone content Number of parameters Number of observations R Each column estimated separately using OLS. : significant at p < : significant at p < Proportion of outside the cone content is the proportion of words in the script assigned to the residual topic, standardized across movies for interpretability. Movie rating is also standardized across movies for interpretability. ROI is the ratio of box office to production budget. Box office performance and/or production budget was not available for all movies. 43 / 49

48 Benchmark Comparisons (Perplexity) Approach Calibration documents Fit Calibration summaries Validation documents Predictive perf. Validation summaries Full Model No residual topic ɛ constant No residual topic and ɛ constant Residual topics only LDA / 49

49 Outline Relevant Literatures Model Empirical Applications Practical Application 45 / 49

50 Practical Application: creativesummary.org Domain specific (marketing academic papers and movie scripts for now) User uploads document, and summary (optional) Based on previously calibrated model, estimate on the fly (using php - up to 100 iterations of Variational Inference): Topic intensities of new document Residual topic of new document Predict occurrences of words in summary of new document Simulate summary content If user provided summary, compare predicted and observed occurrences in summary 46 / 49

51 Paper Overview Objectives: Quantify how humans summarize creative documents Computer-assisted writing of summaries of creative documents Natural Language Processing Model Inspired by creativity literature, based on Poisson Factorization Capture both inside the cone (based on common topics) and outside the cone (residual) content in documents Capture writing norms that govern summarization process Empirical applications Marketing academic papers and their abstracts Movie scripts and their synopses Online interactive tool (publicly available at 47 / 49

52 Examples of Other Ongoing Projects Liu, Jia, and Olivier Toubia, How do Consumers Form Online Search Queries? The Importance of Activation Probabilities between Queries and Results Liu, Jia, and Olivier Toubia, A Semantic Approach for Estimating Consumer Content Preferences from Online Search Queries Liu, Jia, Olivier Toubia, and Shawndra Hill, Content-Based Dynamic Model of Web Search Behavior: An Application to TV Show Search Dew, Ryan, Asim Ansari, and Olivier Toubia, Letting Logos Speak: Deep Probabilistic Models for Logo Design 48 / 49

53 THANK YOU! 49 / 49

Content-based Recommendation

Content-based Recommendation Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Fall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU

Fall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU Fall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU !2 Data Wrangling !3 Database Join (Python merge) unames = ['user_id', 'gender', 'age', 'occupation', 'zip'] users = pd.read_table('data/ml-1m/users.dat',

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Applying LDA topic model to a corpus of Italian Supreme Court decisions Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Topic Modeling for Recommending Scientific Articles Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Latent variable models for discrete data

Latent variable models for discrete data Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 chris.jianfei.chen@gmail.com Janurary 13, 2014 Murphy, Kevin P. Machine

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

text statistics October 24, 2018 text statistics 1 / 20

text statistics October 24, 2018 text statistics 1 / 20 text statistics October 24, 2018 text statistics 1 / 20 Overview 1 2 text statistics 2 / 20 Outline 1 2 text statistics 3 / 20 Model collection: The Reuters collection symbol statistic value N documents

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Collapsed Variational Bayesian Inference for Hidden Markov Models

Collapsed Variational Bayesian Inference for Hidden Markov Models Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Information retrieval LSI, plsi and LDA. Jian-Yun Nie Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

LDA with Amortized Inference

LDA with Amortized Inference LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

Evaluation Methods for Topic Models

Evaluation Methods for Topic Models University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.

Design of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt. Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider

More information

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Understanding Comments Submitted to FCC on Net Neutrality Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Abstract We aim to understand and summarize themes in the 1.65 million

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Clustering bi-partite networks using collapsed latent block models

Clustering bi-partite networks using collapsed latent block models Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space

More information

Side Information Aware Bayesian Affinity Estimation

Side Information Aware Bayesian Affinity Estimation Side Information Aware Bayesian Affinity Estimation Aayush Sharma Joydeep Ghosh {asharma/ghosh}@ece.utexas.edu IDEAL-00-TR0 Intelligent Data Exploration and Analysis Laboratory (IDEAL) ( Web: http://www.ideal.ece.utexas.edu/

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Online Bayesian Passive-Agressive Learning

Online Bayesian Passive-Agressive Learning Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online

More information

Web Search and Text Mining. Lecture 16: Topics and Communities

Web Search and Text Mining. Lecture 16: Topics and Communities Web Search and Tet Mining Lecture 16: Topics and Communities Outline Latent Dirichlet Allocation (LDA) Graphical models for social netorks Eploration, discovery, and query-ansering in the contet of the

More information

Content-Based Social Recommendation with Poisson Matrix Factorization

Content-Based Social Recommendation with Poisson Matrix Factorization Content-Based Social Recommendation with Poisson Matrix Factorization Eliezer de Souza da Silva 1, Helge Langseth 1, and Heri Ramampiaro 1 Norwegian University of Science and Technology (NTNU) Department

More information

how to *do* computationally assisted research

how to *do* computationally assisted research how to *do* computationally assisted research digital literacy @ comwell Kristoffer L Nielbo knielbo@sdu.dk knielbo.github.io/ March 22, 2018 1/30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class Person(object):

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Chap 2: Classical models for information retrieval

Chap 2: Classical models for information retrieval Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic

More information

topic modeling hanna m. wallach

topic modeling hanna m. wallach university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

Probabilistic Matrix Factorization

Probabilistic Matrix Factorization Probabilistic Matrix Factorization David M. Blei Columbia University November 25, 2015 1 Dyadic data One important type of modern data is dyadic data. Dyadic data are measurements on pairs. The idea is

More information

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Optimization Number of Topic Latent Dirichlet Allocation

Optimization Number of Topic Latent Dirichlet Allocation Optimization Number of Topic Latent Dirichlet Allocation Bambang Subeno Magister of Information System Universitas Diponegoro Semarang, Indonesian bambang.subeno.if@gmail.com Farikhin Department of Mathematics

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Measuring Topic Quality in Latent Dirichlet Allocation

Measuring Topic Quality in Latent Dirichlet Allocation Measuring Topic Quality in Sergei Koltsov Olessia Koltsova Steklov Institute of Mathematics at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St.

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Other Noninformative Priors

Other Noninformative Priors Other Noninformative Priors Other methods for noninformative priors include Bernardo s reference prior, which seeks a prior that will maximize the discrepancy between the prior and the posterior and minimize

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51

Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51 Topic Models Material adapted from David Mimno University of Maryland INTRODUCTION Material adapted from David Mimno UMD Topic Models 1 / 51 Why topic models? Suppose you have a huge number of documents

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Machine Learning Summer School, Austin, TX January 08, 2015

Machine Learning Summer School, Austin, TX January 08, 2015 Parametric Department of Information, Risk, and Operations Management Department of Statistics and Data Sciences The University of Texas at Austin Machine Learning Summer School, Austin, TX January 08,

More information

Modeling User Rating Profiles For Collaborative Filtering

Modeling User Rating Profiles For Collaborative Filtering Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper

More information

Probabilistic Topic Models in Natural Language Processing

Probabilistic Topic Models in Natural Language Processing Probabilistic Topic Models in Natural Language Processing Bachelor s Thesis submitted to Prof. Dr. Wolfgang K. Härdle and Prof. Dr. Cathy Y. Chen Humboldt-Universität zu Berlin School of Business and Economics

More information

How small wars become big(er) wars: latent dynamics of conflict and the role of peacekeeping

How small wars become big(er) wars: latent dynamics of conflict and the role of peacekeeping How small wars become big(er) wars: latent dynamics of conflict and the role of peacekeeping Gudmund Horn Hermansen 1,2 Håvard Mokleiv Nygård 3 1 Department of Economics, Norwegian Business School 2 Department

More information

Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability

Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability Ramesh Nallapati, William Cohen and John Lafferty Machine Learning Department Carnegie Mellon

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn

Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September

More information

arxiv: v2 [cs.ir] 14 May 2018

arxiv: v2 [cs.ir] 14 May 2018 A Probabilistic Model for the Cold-Start Problem in Rating Prediction using Click Data ThaiBinh Nguyen 1 and Atsuhiro Takasu 1, 1 Department of Informatics, SOKENDAI (The Graduate University for Advanced

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1

Retrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1 Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)

More information

The supervised hierarchical Dirichlet process

The supervised hierarchical Dirichlet process 1 The supervised hierarchical Dirichlet process Andrew M. Dai and Amos J. Storkey Abstract We propose the supervised hierarchical Dirichlet process (shdp), a nonparametric generative model for the joint

More information

Language Information Processing, Advanced. Topic Models

Language Information Processing, Advanced. Topic Models Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a

More information

Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation Taesun Moon Katrin Erk and Jason Baldridge Department of Linguistics University of Texas at Austin 1

More information

Term Filtering with Bounded Error

Term Filtering with Bounded Error Term Filtering with Bounded Error Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi, tangjie, ljz}@keg.cs.tsinghua.edu.cn

More information

Dirichlet Enhanced Latent Semantic Analysis

Dirichlet Enhanced Latent Semantic Analysis Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Web-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Web-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme Web-Mining Agents Topic Analysis: plsi and LDA Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme Acknowledgments Pilfered from: Ramesh M. Nallapati Machine Learning applied

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig

More information

Applying hlda to Practical Topic Modeling

Applying hlda to Practical Topic Modeling Joseph Heng lengerfulluse@gmail.com CIST Lab of BUPT March 17, 2013 Outline 1 HLDA Discussion 2 the nested CRP GEM Distribution Dirichlet Distribution Posterior Inference Outline 1 HLDA Discussion 2 the

More information