Summarizing Creative Content
|
|
- Jodie Cobb
- 5 years ago
- Views:
Transcription
1 Summarizing Creative Content Olivier Toubia Columbia Business School Behavioral Insights from Text Conference, January / 49
2 Background and Motivation More than 40 million ( one third of the employed population) of Americans belong to the Creative Class (Florida 2014) Science and engineering, education, arts, entertainment, etc. Primary economic function is to create new ideas/content Output often takes the form of creative documents (e.g., academic papers, books, scripts, business models) Creative Documents usually come with summaries (e.g., abstracts, synopses, executive summaries) Indispensable, given that the average american spends approx. 12 hours a day consuming media (Statista, 2017) 2 / 49
3 Paper Overview Objectives: Quantify how humans summarize creative documents Computer-assisted writing of summaries of creative documents Natural Language Processing Model Inspired by creativity literature, based on Poisson Factorization Capture both inside the cone (based on common topics) and outside the cone (residual) content in documents Capture writing norms that govern summarization process Empirical applications Marketing academic papers and their abstracts Movie scripts and their synopses Online interactive tool (publicly available at 3 / 49
4 Outline Relevant Literatures Model Empirical Applications Practical Application 4 / 49
5 Relevant literatures Creativity Creativity lies in balance between novelty and familiarity (Giora 2003; Uzzi et al., 2013; Toubia and Netzer, 2017) Summaries should capture both the familiar and novel aspects of the creative document, possibly with different weights Novelty and Familiarity should be measured by combinations of words rather than individual words (Mednick 1962; Finke, Ward and Smith 1992; Toubia and Netzer 2017) Text Summarization (e.g., Radev et al. 2002; Nenkova and McKeown, 2012) Focused primarily on automatic text summarization This project: Focus on how humans summarize creative documents Use computers to assist humans 5 / 49
6 Relevant literatures Poisson Factorization (e.g., Gopalan, Hofman and Blei, 2013; Gopalan, Charlin and Blei, 2014) Topic model Offset variables (e.g., explain choices of articles by academics) This project: Leverage offset variables to capture changes in topic weights in documents vs. summaries Introduce residual topics that capture outside the cone content 6 / 49
7 Traditional Content-Based Poisson Factorization (Gopalan, Charlin and Blei, 2014) Document d (e.g., academic article, movie script, book, pitch, product description) Words v = 1,...V Document d has w dv occurrences of word w K topics Each topic k has weight β kv on word v Document d has topic intensity θ dk on topic k 7 / 49
8 Traditional Content-Based Poisson Factorization: Data Generating Process 1. For each topic k = 1,...K: For each word v, draw βkv Gamma(a, b) 2. For each document d = 1,...D: For each topic, draw topic intensity θdk Gamma(c, d) For each word v, draw word count wdv Poisson( k θ dkβ kv ) 8 / 49
9 Outline Relevant Literatures Model Empirical Applications Practical Application 9 / 49
10 Geometric Interpretation of Poisson Factorization Traditional Poisson Factorization approximates frequency of words in document d as a weighted average of topics: w d Poisson( k θ dkβ k ) E(wd ) = k θ dkβ k E(wd ) is a point in the cone defined by the topics {β k }, in the Euclidean space defined by the words in the vocabulary Observed word frequency w d is: E(w d ) (projection on the cone - inside the cone ) + residual ( outside the cone ) Residual ( outside the cone ) should help explain content in summary May reflect some novel aspects of the document 10 / 49
11 Geometric Interpretation of Poisson Factorization three words, three topics 11 / 49
12 Proposed Model: Regular vs. Residual Topics K regular topics Similar to traditional topics in Content-Based Poisson Factorization and other topic models Each regular topic k has weight β reg kv on word v Document d has topic intensity θ reg dk on regular topic k D residual topics One per document Each residual topic d has weight β res dv on word v 12 / 49
13 Proposed Model: Offset Parameters Capture writing norms that govern the appearance of topics in summaries vs. full documents (e.g., abstracts of marketing academic papers rarely mention limitations) Each regular topic k has an offset parameter ɛ reg k that captures the relation between occurrences of this topic in full documents vs. summaries Residual topics have their own offset parameters (common across residual topics): ɛ res 13 / 49
14 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 14 / 49
15 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 2. For each residual topic d = 1,...D: For each word v, draw βdv res Gamma(a, b) 14 / 49
16 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 2. For each residual topic d = 1,...D: For each word v, draw βdv res Gamma(a, b) 3. Draw (single) offset parameter for residual topics ɛ res Gamma(g, h) 14 / 49
17 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 2. For each residual topic d = 1,...D: For each word v, draw βdv res Gamma(a, b) 3. Draw (single) offset parameter for residual topics ɛ res Gamma(g, h) 4. For each document d = 1,...D: For each regular topic, draw topic intensity θ reg dk Gamma(c, d) For each word v, draw word count w dv Poisson( k θreg dk βreg kv + βres dv ) 14 / 49
18 Proposed Model: Data Generating Process 1. For each regular topic k = 1,...K: For each word v, draw β reg kv Gamma(a, b) Draw offset parameter ɛ reg k Gamma(g, h) 2. For each residual topic d = 1,...D: For each word v, draw βdv res Gamma(a, b) 3. Draw (single) offset parameter for residual topics ɛ res Gamma(g, h) 4. For each document d = 1,...D: For each regular topic, draw topic intensity θ reg dk Gamma(c, d) For each word v, draw word count w dv Poisson( k θreg dk βreg kv + βres dv ) 5. For each document summary d = 1,...D, For each word v, draw word count w summary dv Poisson( k θreg dk βreg kv ɛreg k + βdv res ɛ res ) 14 / 49
19 Inference: Auxiliary Variables ( Gopalan, Hofman and Blei, 2013; Gopalan, Charlin and Blei, 2014) Assign occurrences of word v in document d across the various topics (latent variables) z reg dv,k Poisson(θreg dk βreg kv ) ; zres dv w dv = k zreg dv,k + zres dv Poisson(βres dv ) such that z sum,reg dv,k that w summary dv Poisson(θ reg dk βreg = k zsum,reg dv,k kv ɛreg k ); zsum,res dv + z sum,res dv Poisson(β res dv ɛres ) such 15 / 49
20 Inference: Posterior Distributions - ALL CONDITIONALLY CONJUGATE! ( Gopalan, Hofman and Blei, 2013; Gopalan, Charlin and Blei, 2014) β reg kv Gamma(a + d (zreg dv,k + zsum,reg dv,k ), b + d θreg dk (1 + ɛreg k )) β res dv ɛ reg Gamma(a + zres dv + zsum,res dv, b ɛ res ) k Gamma(g + d,v zsum,reg dv,k, h + d,v θreg dk βreg kv ) ɛ res Gamma(g + d,v zsum,res dv, h + d,v βres dv ) θ reg dk Gamma(c + v (zreg dv,k + zsum,reg dv,k ), d + v βreg vk (1 + ɛreg k )) [{z reg dv,k } k, z res dv ] Mult([{θreg dk βreg kv } k, β res dv ]) [{z sum,reg dv,k } k, z sum,res dv ] Mult([{θ reg dk βreg kv ɛreg k } k, β res dv ɛres ]) 16 / 49
21 Inference Using Variational Inference (Blei et al., 2003, 2016) Approximate posterior distribution of each parameter with member of a family of distribution Identify member of the family that minimizes the distance (KL divergence) between the true and approximated distribution Coordinate Ascent Mean-Field Variational Inference Algorithm: iteratively minimize the distance between the posterior distribution of each model parameter and the approximate distribution Order of magnitude faster than MCMC 17 / 49
22 Variational Inference P(θ reg reg dk...) approximated by Gamma( θ dk ) P(β reg kv...) approximated by Gamma( β reg kv ); P(βres kv...) approximated by Gamma( β kv res) P(ɛ reg k...) approximated by Gamma( ɛ reg k ); P(ɛ res...) approximated by Gamma( ɛ res ) P({z reg dv,k } k, zdv res...) = Mult({φreg dv,k } k, φ res dv ) where φ reg dv,k θreg dk βreg kv = exp(log(θ reg dk ) + log(βreg kv )) approximated by φ reg dv,k φ res dv βres kv P({z sum,reg dv,k φ sum,reg dv,k φ sum,reg dv,k φ sum,res dv } k, z sum,res dv...) = Mult({φ sum,reg dv,k } k, φ sum,res dv ) where θ reg dk βreg kv ɛreg k = exp(log(θ reg dk ) + log(βreg β res kv ɛres approximated by φ sum,res dv kv ) + log(ɛreg k )) approximated by 18 / 49
23 Coordinate Ascent Mean-Field Variational Inference Algorithm Blei et al., 2016 θ reg dk = < c + v (w dv φ reg β reg kv = < a + d (w dv φ reg β res dv = < a + w dv φ res ɛ reg k = < g + d,v ɛ res = < g + d,v dv,k + w summary dv dv,k + w summary dv dv + w summary dv (w summary dv (w summary dv φ summary,reg dv,k ), d + v φ summary,reg dv,k ), b + d φ summary,res SHAPE ɛres dv, b φ summary,reg dv,k ), h + d,v φ summary,res dv ), h + d,v β reg SHAPE kv β reg kv RATE θ reg SHAPE dk θ reg dk RATE (1 + ɛ (1 + ɛ > ɛ res RATE θ reg SHAPE β reg SHAPE dk θ reg dk RATE kv β reg > kv RATE β dv res SHAPE > β dv res RATE reg SHAPE k ɛ reg k reg SHAPE k ɛ reg k RATE ) > RATE ) > [{φ reg dv,k } k, φ res dv ] Exp({Ψ( θ reg dk SHAPE ) log( θ reg dk RATE ) + Ψ( β reg kv SHAPE ) log( β reg kv RATE )} k, Ψ( β dv res SHAPE ) log( β dv res RATE )) [{φ sum,reg dv,k } k, φ sum,res dv ] Exp({Ψ( θ reg dk SHAPE ) log( θ reg dk RATE )+Ψ( β reg kv SHAPE ) log( β reg kv RATE )+ Ψ( ɛ reg k SHAPE ) log( ɛ reg k RATE )} k, Ψ( β dv res SHAPE ) log( β dv res RATE )+Ψ( ɛ res SHAPE ) log( ɛ res RATE )) Where Ψ is the digamma function 19 / 49
24 Predicting Summary of an Out-of-Sample Document Based on its Full Text Input: Parameter estimates based on in-sample documents: {β reg k } k ; {ɛ reg k } k; ɛ res Full text of out-of-sample document d out Estimate topic intensities {θ reg d out k } k and residual βd res out out-of-sample document (using Variational Inference) topic for Predict word occurrences in summary of out-of-sample document: λ summary d out v = k θreg d out k βreg kv ɛreg k + βd res out v ɛres 20 / 49
25 Outline Relevant Literatures Model Empirical Applications Practical Application 21 / 49
26 Application I: Marketing Academic Papers and their Abstracts Abstracts and full texts of all 1,333 research papers published in Marketing Science, Journal of Marketing, Journal of Marketing Research, and Journal of Consumer Research, between 2010 and 2015 Preprocessing: Spelling corrector (Python) Eliminate non-english characters and words, numbers, punctuation Tokenize text Remove stopwords and words that contain only one character No stemming or Lemmatization Randomly split documents between calibration (75%) and validation (25%) 22 / 49
27 Vocabulary of words (based on calibration set of documents) Term frequency (tf ) of word v: total number of occurrences of word across all documents Remove words with tf < 100 Document frequency (df ) of word v: number of documents with at least one occurrence of word Term-frequency document inverse-frequency tf -idf (v) = tf (v) log( #documents df (v) ) Keep 1,000 words with highest tf -idf Remove words that appear in too many documents or that appear too infrequently 23 / 49
28 Descriptive Statistics Statistic Unit of analysis Mean St. dev. Min Max Occurrences of words from vocabulary Occurrences of words from vocabulary Number of words from vocabulary with at least one occurrence Number of words from vocabulary with at least one occurrence Number of occurrences across full texts Number of occurrences across abstracts Number of full texts with at least one occurrence Number of abstracts with at least one occurrence Full text of paper (N=1,333) Abstract of paper (N=1,333) Full text of paper (N=1,333) Abstract of paper (N=1,333) Word in vocabulary (N=1,000) Word in vocabulary (N=1,000) Word in vocabulary (N=1,000) Word in vocabulary (N=1,000) 2, , , , , , , / 49
29 Number of Topics Could be determined using cross-validation Instead, simply set number of regular topics K to 100 (Gopalan, Chaplin and Blei, 2014) Gamma prior induces sparsity If K = 100 is more than enough, some topics can be flat 25 / 49
30 Results - Distribution of Offset Parameters Among 29 non-flat topics 26 / 49
31 Results - Examples of Regular Topics with Small Offset Parameters (relatively weaker representation in summaries vs. documents) Visualize topic by creating word cloud of content simulated based on topic distribution 27 / 49
32 Results - Regular Topic with the highest Offset Parameter (relatively stronger representation in summaries vs. documents) 28 / 49
33 Results - Example Iyengar, Van den Bulte, Valente. Opinion leadership and social contagion in new product diffusion. Marketing Science 30.2 (2011) Content of actual paper Inside the cone content Outside the cone content 29 / 49
34 Outside the Cone Content Novelty? Covariates Journal fixed effects Publication year fixed effects Intensities on non-flat regular topics Proportion of outside the cone content DV=log(1+#citations) Proportion of outside the cone content Number of parameters 40 Number of observations 1,000 R Regression estimated separately using OLS. : significant at p < : significant at p < Residual is the proportion of words in the paper assigned to the residual topic, standardized across papers for interpretability. 30 / 49
35 Nested Benchmarks Full model with residual topics and offset parameters No residual topic Offset parameter constant across all topics (assume relative topic intensities are the same in summaries vs. documents) No residual topic and constant offset parameter traditional Poisson Factorization Residual topic only (each document is unique, no learning across documents) 31 / 49
36 Non-Nested Benchmarks Latent Dirichlet Allocation (Blei et al., 2003) Models probability of occurrence of each word conditional on total number of words Each document in calibration sample is merged with its summary No residual topic, no offset parameter Also estimated using Variational Inference (Blei et al., 2003) 32 / 49
37 Fit criterion: Perplexity Model output: fitted Poisson rate for each word in each document: λ dv = k θreg dk βreg kv in each summary: λsummary dv + βres dv = k θreg dk βreg kv ɛreg k + β res dv ɛres Transform Poisson rates into probability weights for each word φdv = probability that a random word in document d is word v Fit measured by Perplexity Perplexity = exp( d obs d log( φ d,obs ) N ) Inversely related to geometric mean of the likelihood function 33 / 49
38 Results - Fit Measures on Calibration Documents Train model on calibration documents and their summaries: Estimate regular topics β reg kv Estimate topic intensities θ reg dk Estimate offset parameters ɛ reg k, ɛres Estimate residual topics β res dv Perplexity of calibration documents and summaries (in-sample) 34 / 49
39 Results - Fit Measures on Validation Documents Based on parameter estimates from calibration documents + full texts of validation documents: Estimate topic intensities of validation documents: θ reg d out k Estimate residual topics of validation documents: β res d out v Perplexity of validation documents (in-sample) Predict content of validation summaries: λ summary d out v = k θreg d out k βreg kv ɛreg k + βd res out v ɛres Perplexity of validation summaries (out-of-sample) 35 / 49
40 Benchmark Comparisons (Perplexity - less is better) Approach Calibration documents Fit Calibration summaries Validation documents Predictive perf. Validation summaries Full Model No residual topic ɛ constant No residual topic and ɛ constant Residual topic only LDA / 49
41 Application II: Movie Scripts and their Synopses Scripts (documents) and synopses (summaries) of 858 movies Scripts from International Movie Script Database (imsdb.com) Synopses from International Movie Database (imdb.com) Same pre-processing as with marketing papers Calibration sample (75%) and validation sample (25%) 1,000 words selected based on tf -idf (cutoff of 65 vs. 100 because fewer documents) 37 / 49
42 Descriptive Statistics Statistic Unit of analysis Mean St. dev. Min Max Number of occurrences of words from vocabulary Number of occurrences of words from vocabulary Number of words from vocabulary with at least one occurrence Number of words from vocabulary with at least one occurrence Number of occurrences across scripts Number of occurrences across synopses Number of scripts with at least one occurrence Number of synopses with at least one occurrence Script 1, ,489 Synopsis Script Synopsis Word in vocabulary 1, , ,633 Word in vocabulary ,322 Word in vocabulary Word in vocabulary / 49
43 Results - Distribution of Offset Parameters Among 29 non-flat topics 39 / 49
44 Results - Examples of Regular Topics with Small Offset Parameters (relatively weaker representation in summaries vs. documents) 40 / 49
45 Results - Examples of Regular Topics with Large Offset Parameters (relatively stronger representation in summaries vs. documents) 41 / 49
46 Results - Example: Forrest Gump Content of actual script Inside the cone content Outside the cone content 42 / 49
47 Outside the Cone Content Novelty? Covariates DV=Movie rating DV=Log(ROI) MPAA rating fixed effects Genre fixed effects Intensities on non-flat regular topics Log(inflation-adjusted production budget) Movie rating N/A (Movie rating) 2 N/A Proportion of outside the cone content Proportion of outside the cone content Number of parameters Number of observations R Each column estimated separately using OLS. : significant at p < : significant at p < Proportion of outside the cone content is the proportion of words in the script assigned to the residual topic, standardized across movies for interpretability. Movie rating is also standardized across movies for interpretability. ROI is the ratio of box office to production budget. Box office performance and/or production budget was not available for all movies. 43 / 49
48 Benchmark Comparisons (Perplexity) Approach Calibration documents Fit Calibration summaries Validation documents Predictive perf. Validation summaries Full Model No residual topic ɛ constant No residual topic and ɛ constant Residual topics only LDA / 49
49 Outline Relevant Literatures Model Empirical Applications Practical Application 45 / 49
50 Practical Application: creativesummary.org Domain specific (marketing academic papers and movie scripts for now) User uploads document, and summary (optional) Based on previously calibrated model, estimate on the fly (using php - up to 100 iterations of Variational Inference): Topic intensities of new document Residual topic of new document Predict occurrences of words in summary of new document Simulate summary content If user provided summary, compare predicted and observed occurrences in summary 46 / 49
51 Paper Overview Objectives: Quantify how humans summarize creative documents Computer-assisted writing of summaries of creative documents Natural Language Processing Model Inspired by creativity literature, based on Poisson Factorization Capture both inside the cone (based on common topics) and outside the cone (residual) content in documents Capture writing norms that govern summarization process Empirical applications Marketing academic papers and their abstracts Movie scripts and their synopses Online interactive tool (publicly available at 47 / 49
52 Examples of Other Ongoing Projects Liu, Jia, and Olivier Toubia, How do Consumers Form Online Search Queries? The Importance of Activation Probabilities between Queries and Results Liu, Jia, and Olivier Toubia, A Semantic Approach for Estimating Consumer Content Preferences from Online Search Queries Liu, Jia, Olivier Toubia, and Shawndra Hill, Content-Based Dynamic Model of Web Search Behavior: An Application to TV Show Search Dew, Ryan, Asim Ansari, and Olivier Toubia, Letting Logos Speak: Deep Probabilistic Models for Logo Design 48 / 49
53 THANK YOU! 49 / 49
Content-based Recommendation
Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationFall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU
Fall 2018: Introduction to Data Science GIRI NARASIMHAN, SCIS, FIU !2 Data Wrangling !3 Database Join (Python merge) unames = ['user_id', 'gender', 'age', 'occupation', 'zip'] users = pd.read_table('data/ml-1m/users.dat',
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationApplying LDA topic model to a corpus of Italian Supreme Court decisions
Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationCollaborative Topic Modeling for Recommending Scientific Articles
Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationLatent variable models for discrete data
Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 chris.jianfei.chen@gmail.com Janurary 13, 2014 Murphy, Kevin P. Machine
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationtext statistics October 24, 2018 text statistics 1 / 20
text statistics October 24, 2018 text statistics 1 / 20 Overview 1 2 text statistics 2 / 20 Outline 1 2 text statistics 3 / 20 Model collection: The Reuters collection symbol statistic value N documents
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationCollapsed Variational Bayesian Inference for Hidden Markov Models
Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationLDA with Amortized Inference
LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More informationEvaluation Methods for Topic Models
University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationDesign of Text Mining Experiments. Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.
Design of Text Mining Experiments Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/research Active Learning: a flavor of design of experiments Optimal : consider
More informationTopic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1
Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationUnderstanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014
Understanding Comments Submitted to FCC on Net Neutrality Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Abstract We aim to understand and summarize themes in the 1.65 million
More informationICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts
ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationCOMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017
COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA
More informationNote 1: Varitional Methods for Latent Dirichlet Allocation
Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationClustering bi-partite networks using collapsed latent block models
Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space
More informationSide Information Aware Bayesian Affinity Estimation
Side Information Aware Bayesian Affinity Estimation Aayush Sharma Joydeep Ghosh {asharma/ghosh}@ece.utexas.edu IDEAL-00-TR0 Intelligent Data Exploration and Analysis Laboratory (IDEAL) ( Web: http://www.ideal.ece.utexas.edu/
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationOnline Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationWeb Search and Text Mining. Lecture 16: Topics and Communities
Web Search and Tet Mining Lecture 16: Topics and Communities Outline Latent Dirichlet Allocation (LDA) Graphical models for social netorks Eploration, discovery, and query-ansering in the contet of the
More informationContent-Based Social Recommendation with Poisson Matrix Factorization
Content-Based Social Recommendation with Poisson Matrix Factorization Eliezer de Souza da Silva 1, Helge Langseth 1, and Heri Ramampiaro 1 Norwegian University of Science and Technology (NTNU) Department
More informationhow to *do* computationally assisted research
how to *do* computationally assisted research digital literacy @ comwell Kristoffer L Nielbo knielbo@sdu.dk knielbo.github.io/ March 22, 2018 1/30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class Person(object):
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationChap 2: Classical models for information retrieval
Chap 2: Classical models for information retrieval Jean-Pierre Chevallet & Philippe Mulhem LIG-MRIM Sept 2016 Jean-Pierre Chevallet & Philippe Mulhem Models of IR 1 / 81 Outline Basic IR Models 1 Basic
More informationtopic modeling hanna m. wallach
university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent
More informationA Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank
A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,
More informationProbabilistic Matrix Factorization
Probabilistic Matrix Factorization David M. Blei Columbia University November 25, 2015 1 Dyadic data One important type of modern data is dyadic data. Dyadic data are measurements on pairs. The idea is
More informationApplying Latent Dirichlet Allocation to Group Discovery in Large Graphs
Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationTopic Models. Charles Elkan November 20, 2008
Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One
More informationOptimization Number of Topic Latent Dirichlet Allocation
Optimization Number of Topic Latent Dirichlet Allocation Bambang Subeno Magister of Information System Universitas Diponegoro Semarang, Indonesian bambang.subeno.if@gmail.com Farikhin Department of Mathematics
More informationKernel Density Topic Models: Visual Topics Without Visual Words
Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationMeasuring Topic Quality in Latent Dirichlet Allocation
Measuring Topic Quality in Sergei Koltsov Olessia Koltsova Steklov Institute of Mathematics at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St.
More informationNatural Language Processing. Topics in Information Retrieval. Updated 5/10
Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationOther Noninformative Priors
Other Noninformative Priors Other methods for noninformative priors include Bernardo s reference prior, which seeks a prior that will maximize the discrepancy between the prior and the posterior and minimize
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationTopic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51
Topic Models Material adapted from David Mimno University of Maryland INTRODUCTION Material adapted from David Mimno UMD Topic Models 1 / 51 Why topic models? Suppose you have a huge number of documents
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search IR models: Vector Space Model IR Models Set Theoretic Classic Models Fuzzy Extended Boolean U s e r T a s k Retrieval: Adhoc Filtering Brosing boolean vector probabilistic
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationMachine Learning Summer School, Austin, TX January 08, 2015
Parametric Department of Information, Risk, and Operations Management Department of Statistics and Data Sciences The University of Texas at Austin Machine Learning Summer School, Austin, TX January 08,
More informationModeling User Rating Profiles For Collaborative Filtering
Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper
More informationProbabilistic Topic Models in Natural Language Processing
Probabilistic Topic Models in Natural Language Processing Bachelor s Thesis submitted to Prof. Dr. Wolfgang K. Härdle and Prof. Dr. Cathy Y. Chen Humboldt-Universität zu Berlin School of Business and Economics
More informationHow small wars become big(er) wars: latent dynamics of conflict and the role of peacekeeping
How small wars become big(er) wars: latent dynamics of conflict and the role of peacekeeping Gudmund Horn Hermansen 1,2 Håvard Mokleiv Nygård 3 1 Department of Economics, Norwegian Business School 2 Department
More informationParallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability
Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability Ramesh Nallapati, William Cohen and John Lafferty Machine Learning Department Carnegie Mellon
More informationNote for plsa and LDA-Version 1.1
Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,
More informationAutoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas
On for Energy Based Models Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas Toronto Machine Learning Group Meeting, 2011 Motivation Models Learning Goal: Unsupervised
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationHomework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn
Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September
More informationarxiv: v2 [cs.ir] 14 May 2018
A Probabilistic Model for the Cold-Start Problem in Rating Prediction using Click Data ThaiBinh Nguyen 1 and Atsuhiro Takasu 1, 1 Department of Informatics, SOKENDAI (The Graduate University for Advanced
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationRetrieval by Content. Part 2: Text Retrieval Term Frequency and Inverse Document Frequency. Srihari: CSE 626 1
Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency Srihari: CSE 626 1 Text Retrieval Retrieval of text-based information is referred to as Information Retrieval (IR)
More informationThe supervised hierarchical Dirichlet process
1 The supervised hierarchical Dirichlet process Andrew M. Dai and Amos J. Storkey Abstract We propose the supervised hierarchical Dirichlet process (shdp), a nonparametric generative model for the joint
More informationLanguage Information Processing, Advanced. Topic Models
Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationTopic Modeling: Beyond Bag-of-Words
University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a
More informationCrouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation
Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation Taesun Moon Katrin Erk and Jason Baldridge Department of Linguistics University of Texas at Austin 1
More informationTerm Filtering with Bounded Error
Term Filtering with Bounded Error Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi, tangjie, ljz}@keg.cs.tsinghua.edu.cn
More informationDirichlet Enhanced Latent Semantic Analysis
Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,
More informationPachinko Allocation: DAG-Structured Mixture Models of Topic Correlations
: DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationWeb-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme
Web-Mining Agents Topic Analysis: plsi and LDA Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme Acknowledgments Pilfered from: Ramesh M. Nallapati Machine Learning applied
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig
More informationApplying hlda to Practical Topic Modeling
Joseph Heng lengerfulluse@gmail.com CIST Lab of BUPT March 17, 2013 Outline 1 HLDA Discussion 2 the nested CRP GEM Distribution Dirichlet Distribution Posterior Inference Outline 1 HLDA Discussion 2 the
More information