Welcome to CAMCOS Reports Day Fall 2011

Size: px
Start display at page:

Download "Welcome to CAMCOS Reports Day Fall 2011"

Transcription

1 Welcome s, Welcome to CAMCOS Reports Day Fall 2011

2 s, CAMCOS: Text Mining and Damien Adams, Neeti Mittal, Joanna Spencer, Huan Trinh, Annie Vu, Orvin Weng, Rachel Zadok December 9, 2011

3 Outline 1 s, 2 s,

4 s,

5 What is Text Mining? s, work deals with Modeling and Detecting s in documents using Text Mining.

6 What is Text Mining? s, work deals with Modeling and Detecting s in documents using Text Mining. So what exactly is text mining?

7 What is Text Mining? s, Text mining is the act of getting a computer to Read a document Identify topics

8 Why a Computer? s, You may ask, why do we want to get a computer to do this seemingly simple idea?

9 Why a Computer? s, You may ask, why do we want to get a computer to do this seemingly simple idea? It is easy to read a paper that is only a few pages long and identify the topics.

10 Why a Computer? s, You may ask, why do we want to get a computer to do this seemingly simple idea? It is easy to read a paper that is only a few pages long and identify the topics. We learned how to do this in English class in grade school. But what about dozens, hundreds, or thousands of documents?

11 Why a Computer? s, You may ask, why do we want to get a computer to do this seemingly simple idea? It is easy to read a paper that is only a few pages long and identify the topics. We learned how to do this in English class in grade school. But what about dozens, hundreds, or thousands of documents? The goal of text mining is to tackle documents of sizes that are not humanly feasible.

12 The DeLorean Motor Company s, In 2013, the DeLorian Motor Company will be producing DeLoreans again.

13 Flux Capacitor s, Suppose they need to recall certain DeLoreans due to flux capacitor issues.

14 Without Text Mining s, Without text mining, DMC would have to

15 Without Text Mining s, Without text mining, DMC would have to Spend days reading all reports

16 Without Text Mining s, Without text mining, DMC would have to Spend days reading all reports Manually identify topics, including the flux capacitor issue

17 Without Text Mining s, Without text mining, DMC would have to Spend days reading all reports Manually identify topics, including the flux capacitor issue Recall DeLoreans

18 Without Text Mining s, Without text mining, DMC would have to Spend days reading all reports Manually identify topics, including the flux capacitor issue Recall DeLoreans This could take days or even weeks!

19 With Text Mining s, On the other hand, DMC could use text mining to

20 With Text Mining s, On the other hand, DMC could use text mining to Spend 10 minutes inserting all reports into the computer

21 With Text Mining s, On the other hand, DMC could use text mining to Spend 10 minutes inserting all reports into the computer Read topics found from text mining

22 With Text Mining s, On the other hand, DMC could use text mining to Spend 10 minutes inserting all reports into the computer Read topics found from text mining Recall DeLoreans

23 With Text Mining s, On the other hand, DMC could use text mining to Spend 10 minutes inserting all reports into the computer Read topics found from text mining Recall DeLoreans This could take less than an hour!

24 Modeling s, The idea behind text mining is topic modeling.

25 Modeling s, The idea behind text mining is topic modeling. Given a document ( bag of words ), we wish to identify the topics.

26 Modeling s, The idea behind text mining is topic modeling. Given a document ( bag of words ), we wish to identify the topics. In the previous example, the document would be a collection of incident reports,

27 Modeling s, The idea behind text mining is topic modeling. Given a document ( bag of words ), we wish to identify the topics. In the previous example, the document would be a collection of incident reports, and the topic would be the flux capacitor issues.

28 Modeling s, Document! Words "

29 s, s,

30 What is a? s, What exactly is a topic?

31 What is a? s, What exactly is a topic? When we read a paper, the topic is the main idea.

32 s, How can a topic be defined?

33 s, How can a topic be defined? Definition A is a distribution of words in a document over a predetermined vocabulary.

34 Modeling s, What is topic modeling? We talked about it before, but here is a formal definition.

35 Modeling s, What is topic modeling? We talked about it before, but here is a formal definition. Definition Modeling is the using of methods to automatically assign words in documents to topics.

36 s, We focus on topic modeling using, Latent Dirichlet Allocation.

37 s, We focus on topic modeling using, Latent Dirichlet Allocation. (2002) is an example of a topic model and was first presented as a graphical model for topic discovery by David Blei, Andrew Ng, and Michael Jordan.

38 s, Definition Latent Dirichlet Allocation () is a generative process that defines a joint probability distribution over both the observed and hidden random variables. Simply put, uncovers the thematic structure hidden in a document. It generates the main ideas of a set of documents, which we call s.

39 Latent Dirichlet Allocation s, Examining what the words stand for, Latent: We observe the words in the documents, but the topics are hidden (latent) Dirichlet: uses the Dirichlet distribution (next slide) Allocation: We will allocate topics to documents

40 Dirichlet Distribution s, The Dirichlet distribution, with parameters 1,..., K,isa multivariate distribution of K random variables, x 1,...,x K.

41 Dirichlet Distribution s, The Dirichlet distribution, with parameters 1,..., K,isa multivariate distribution of K random variables, x 1,...,x K. Its density is Dir( 1,..., K ) / KY i=1 x i 1 i.

42 Example of Dirichlet Distribution (from Wikipedia) s, Consider an urn containing balls of K di erent colors. Initially, the urn contains 1 balls of color 1, 2 balls of color 2, and so on. Now perform N draws from the urn, where after each draw, the ball is placed back into the urn with an additional ball of the same color. In the limit as N approaches infinity, the proportions of di erent colored balls in the urn will be distributed as Dir( 1,..., K ). Jumping ahead, in our case i will be the importance of topic i among K topics.

43 s,

44 Assumptions s, The topics are Dirichlet distributed over the words The documents are Dirichlet distributed over the topics Order of the documents does not matter (this is a deficiency, exactly what we address) Order of the words in the documents does not matter The number of topics is assumed known and fixed

45 Additional Assumptions s, automatically removes all the common words such as with, the, and, etc.

46 Additional Assumptions s, automatically removes all the common words such as with, the, and, etc. It also removes all the numeric values, commas, parentheses, etc.

47 Additional Assumptions s, automatically removes all the common words such as with, the, and, etc. It also removes all the numeric values, commas, parentheses, etc. It eliminates all the words that are repeated many times in the document

48 Additional Assumptions s, automatically removes all the common words such as with, the, and, etc. It also removes all the numeric values, commas, parentheses, etc. It eliminates all the words that are repeated many times in the document We only considered the words with 3 or more letters

49 Example s, We ran over a document about habitats. We looked for five topics.

50 Output s, species.028 plant.019 habitat.038 species.021 species.076 environment.021 botanical.019 population.019 additional.021 particular.052 e ect.014 physical.019 species.019 cycle.021 analysis.041 references.014 zoological.014 natural.019 e ect.021 population.030 ecosystem.014 habitat.013 trophic.019 help.020 abundance s distributed over a list of words. This list consist of entire vocabulary with varying probabilities In other words, sum of the probabilities of all of the words under any given topic is 1 Under each, as the probability of each word decreases, its position drops.....

51 s with s, The assumption that the order of the documents does not matter prevents us from di erentiating between new information and prior knowledge. fails to infer time sensitive information.

52 Questions? s, Q&A?

53 Time for a Break... s, We will now take a five minute break.

54 s,

55 Contribution s, contribution to this field is the ability to automatically detect emerging topics.

56 Definition s, An emerging topic is a topic that is more prominent now than it was before.

57 Definition s, An emerging topic is a topic that is more prominent now than it was before. We implemented a variable that measures that prominence.

58 Definition s, An emerging topic is a topic that is more prominent now than it was before. We implemented a variable that measures that prominence. i is the prominence of topic i over, say, the last month s worth of data.

59 Definition s, An emerging topic is a topic that is more prominent now than it was before. We implemented a variable that measures that prominence. i is the prominence of topic i over, say, the last month s worth of data. Now if 0 i is the prominence of topic i over, say, the last week, then we mathematically define Definition i is an if 0 i i > 1.

60 Definition s, David Blei devised the algorithm.

61 Definition s, David Blei devised the algorithm. team implemented with a new feature of topic detection.

62 Definition s, David Blei devised the algorithm. team implemented with a new feature of topic detection. i is an if 0 i i > 1.

63 Another Approach of Defining an s, Ranking is the change in position of importance of a topic. Alternate Definition: i is an emerging topic if there is a positive change in position.

64 Another Approach of Defining an s, Ranking is the change in position of importance of a topic. Alternate Definition: i is an emerging topic if there is a positive change in position. We did not use this definition as we would have assumed that gives topic output in some order.

65 Goal s, We want to use text mining to detect emerging topics relative to two di erent documents We then want to observe only the topics with the greatest relative importance

66 s, Algorithm

67 Algorithm s, Written in the statistical language R Uses the package topicmodels by Grün and Hornik

68 The Input s, algorithm takes three inputs A document (which we suspect having an emerging topic) An estimated number of topics (K) The percentage of recent data, e.g., 14% = 1 7,ifdocument=weekandrecent=lastday 23% = 7 30,ifdocument=monthandrecent=lastweek 17% = 2 12,ifdocument=yearandrecent=lasttwomonths

69 How the Algorithm Works s, 1 Preprocessing: Common words (of, the, is, from,...), special characters ($, %,...), and numbers are discarded 2 Using, we discover the K topics in the entire document as well as their importances 1, 2,..., K 3 For each topic i, we compute its importance 0 i in the recent part of the document 4 The topics are sorted in decreasing order according to 0 i i 5 The topics for which 0 i i > 1aredisplayedassuspected emerging topics

70 The Output s, A list of words grouped by topics A plot of relative importance of topics in the document

71 s,

72 Setup s, To test our algorithm, we created a document with an emerging topic in it.

73 Setup s, To test our algorithm, we created a document with an emerging topic in it. We took the Wikipedia entry for Habitat and introduced an emerging topic by appending an article from the EPA on Climate Change.

74 Setup s, To test our algorithm, we created a document with an emerging topic in it. We took the Wikipedia entry for Habitat and introduced an emerging topic by appending an article from the EPA on Climate Change. The emerging topic represented about 10% the size of the entire document.

75 Why an Should Take Up About 10% of a Document s, We will examine three di erent situations. Consider a company that turns in daily reports. From these reports you want to discover emerging topics.

76 Why an Should Take Up About 10% of a Document s, Consider comparing the reports from today against the last week of reports. If half of the reports from today are emerging topics, then = That is, about 7% of the last weeks repots are about emerging topics.

77 Why an Should Take Up About 10% of a Document s, Now consider comparing the last week of reports against the past month of reports. If half of the last week of reports are emerging topics, then = That is, about 12% of the last weeks reports are about emerging topics.

78 Why an Should Take Up About 10% of a Document s, Now consider comparing the last months worth of reports against the past years worth of reports. If half of the last months worth of reports are emerging topics, then = That is, about 8% of the past years worth of reports are about emerging topics.

79 Why an Should Take Up About 10% of a Document s, So now lets recap what we have: = = = As you can see, these values are pretty close. In fact, their average is about Thus, 10%, give or take, is a good estimate.

80 Results s, We were looking for 10 topics. Here are our results.

81 Relative Importance of s, 0 i i s from Changes in Relative Importance s, Changes in Relative Importance Number

82 s And here are the program s suggested emerging topics. s, In particular, the topic Greenhouse Atmosphere Breeding Optional Relative which is clearly attributable to climate change is correctly discovered and identified.

83 Example 2 s, next example is based on the character merchandise sales reports from Star Wars and Disney.

84 Setup s, We ran our program over these reports. We looked for 10 topics.

85 Results Here are our results. s, Clearly, there are topics from both the Star Wars and the Disney reports.

86 Relative Importance of s, 0 i i s,

87 s s, Since our potential emerging topics are from both sales reports, we don t know which topic is our emerging topic.

88 How Can We Determine the s? s, In order to find out the emerging topics, we took the first 90% of the sales reports and ran over it.

89 90% Results Here are the topics from the first 90% of the reports. s, As you can see, all of the topics are Star Wars topics. Now we compare these topics against the potential emerging topics from before, and we can discard Star Wars topics as the emerging topics.

90 s s, Here are the program s suggested emerging topics. Since we have discarded Star Wars as emerging topics, s 10, 6, and 3 are the emerging topics. That is, the Disney topics are our emerging topics.

91 s,

92 s, We have developed an algorithm that automatically detects emerging topics It performs well in our experiments original purpose was to find emerging topics in NASA air tra c control incident reports. We are in the process of examining NASA data.

93 s, Future work: Gain better understanding of the relationship between emerging and old topics (i.e., what is the mathematical meaning of the value of 0 i i?) We have made our software (in R) and test data publicly available at

94 Acknowledgments and References s, We would like to thank: All of you for coming David Blei for his and DTM implementations and paper to Probabilistic Models Bettina Grün and Kurt Hornik for their paper topicmodels: An R Package for Fitting Models and their R package and script

95 Additional Thanks s, We would also like to thank sponsor NASA CAMCOS Professor Hsu Dr. Ginger Koev Professor Koev for supervising our team We would like to extend our gratitude to our friends and families for their support

96 Questions? s, Q&A?

97 Thanks! s, Thank You For Coming To CAMCOS Reports Day Fall 2011

98 Directions to Lunch Please join us for lunch at Flames! s, 4th St. * Flames San Fernando King Library SJSU Campus P San Salvador Student Union

EMERGING TOPIC MODELS CAMCOS REPORT FALL 2011 NEETI MITTAL

EMERGING TOPIC MODELS CAMCOS REPORT FALL 2011 NEETI MITTAL EMERGING TOPIC MODELS CAMCOS REPORT FALL 2011 NEETI MITTAL Abstract. We review the concept of Latent Dirichlet Allocation (LDA), along with the definitions of Text Mining, Topic, and Topic Modeling. We

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Welcome to CAMCOS Reports Day Spring 2009

Welcome to CAMCOS Reports Day Spring 2009 Welcome Welcome to CAMCOS Reports Day Spring 2009 Spring 2009 Jake Askeland, Jonathan Baptist, Miranda Braselton, David von Gunten, Douglas Mathews, Duncan McElfresh, Cheuk Wong In collaboration with NASA

More information

AN INTRODUCTION TO TOPIC MODELS

AN INTRODUCTION TO TOPIC MODELS AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Text mining and natural language analysis. Jefrey Lijffijt

Text mining and natural language analysis. Jefrey Lijffijt Text mining and natural language analysis Jefrey Lijffijt PART I: Introduction to Text Mining Why text mining The amount of text published on paper, on the web, and even within companies is inconceivably

More information

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Understanding Comments Submitted to FCC on Net Neutrality Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Abstract We aim to understand and summarize themes in the 1.65 million

More information

Collaborative Topic Modeling for Recommending Scientific Articles

Collaborative Topic Modeling for Recommending Scientific Articles Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Distributed ML for DOSNs: giving power back to users

Distributed ML for DOSNs: giving power back to users Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see

More information

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Applying LDA topic model to a corpus of Italian Supreme Court decisions Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

Lecture 22 Exploratory Text Analysis & Topic Models

Lecture 22 Exploratory Text Analysis & Topic Models Lecture 22 Exploratory Text Analysis & Topic Models Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor [Some slides borrowed from Michael Paul] 1 Text Corpus

More information

News English.com Ready-to-use ESL/EFL Lessons by Sean Banville There are 13 signs of the Zodiac, expert says

News English.com Ready-to-use ESL/EFL Lessons by Sean Banville There are 13 signs of the Zodiac, expert says www.breaking News English.com Ready-to-use ESL/EFL Lessons by Sean Banville 1,000 IDEAS & ACTIVITIES FOR LANGUAGE TEACHERS The Breaking News English.com Resource Book http://www.breakingnewsenglish.com/book.html

More information

Welcome to CAMCOS Reports Day Fall 2010

Welcome to CAMCOS Reports Day Fall 2010 Welcome to CAMCOS Reports Day Fall 2010 and Dynamics in Electrically Charged Binary Asteroid Systems Doug Mathews, Lara Mitchell, Jennifer Murguia, Tri Nguyen, Raquel Ortiz, Dave Richardson, Usha Watson,

More information

Wednesday, 10 September 2008

Wednesday, 10 September 2008 MA211 : Calculus, Part 1 Lecture 2: Sets and Functions Dr Niall Madden (Mathematics, NUI Galway) Wednesday, 10 September 2008 MA211 Lecture 2: Sets and Functions 1/33 Outline 1 Short review of sets 2 Sets

More information

Outline. Wednesday, 10 September Schedule. Welcome to MA211. MA211 : Calculus, Part 1 Lecture 2: Sets and Functions

Outline. Wednesday, 10 September Schedule. Welcome to MA211. MA211 : Calculus, Part 1 Lecture 2: Sets and Functions Outline MA211 : Calculus, Part 1 Lecture 2: Sets and Functions Dr Niall Madden (Mathematics, NUI Galway) Wednesday, 10 September 2008 1 Short review of sets 2 The Naturals: N The Integers: Z The Rationals:

More information

Language Information Processing, Advanced. Topic Models

Language Information Processing, Advanced. Topic Models Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:

More information

Warm Up. Fourth Grade Released Test Question: 1) Which of the following has the greatest value? 2) Write the following numbers in expanded form: 25:

Warm Up. Fourth Grade Released Test Question: 1) Which of the following has the greatest value? 2) Write the following numbers in expanded form: 25: Warm Up Fourth Grade Released Test Question: 1) Which of the following has the greatest value? A 12.1 B 0.97 C 4.23 D 5.08 Challenge: Plot these numbers on an open number line. 2) Write the following numbers

More information

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can

More information

Challenger Center Teacher Resources for Engaging Students in Science, Technology, Engineering, and Math

Challenger Center Teacher Resources for Engaging Students in Science, Technology, Engineering, and Math Challenger Center Teacher Resources for Engaging Students in Science, Technology, Engineering, and Math Designed for Grades 5-8 These resources are brought to you by: Earth vs. Mars Prep Time 10 minutes

More information

Chapter 3. Expressions and Equations Part 1

Chapter 3. Expressions and Equations Part 1 Chapter 3. Expressions and Equations Part 1 Chapter Overview Making connections from concrete (specific / numeric) thinking to algebraic (involving unknown quantities / variables) thinking is a challenging

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Projects in Geometry for High School Students

Projects in Geometry for High School Students Projects in Geometry for High School Students Goal: Our goal in more detail will be expressed on the next page. Our journey will force us to understand plane and three-dimensional geometry. We will take

More information

HIRES 2017 Syllabus. Instructors:

HIRES 2017 Syllabus. Instructors: HIRES 2017 Syllabus Instructors: Dr. Brian Vant-Hull: Steinman 185, 212-650-8514, brianvh@ce.ccny.cuny.edu Ms. Hannah Aizenman: NAC 7/311, 212-650-6295, haizenman@ccny.cuny.edu Dr. Tarendra Lakhankar:

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Unit 1 Science Models & Graphing

Unit 1 Science Models & Graphing Name: Date: 9/18 Period: Unit 1 Science Models & Graphing Essential Questions: What do scientists mean when they talk about models? How can we get equations from graphs? Objectives Explain why models are

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a

More information

Mathematics Practice Test 2

Mathematics Practice Test 2 Mathematics Practice Test 2 Complete 50 question practice test The questions in the Mathematics section require you to solve mathematical problems. Most of the questions are presented as word problems.

More information

General Physics (PHY 2130)

General Physics (PHY 2130) General Physics (PHY 2130) Introduction Syllabus and teaching strategy Physics Introduction Mathematical review http://www.physics.wayne.edu/~apetrov/phy2130/ Chapter 1 Lecturer:, Room 358 Physics Building,

More information

Applying hlda to Practical Topic Modeling

Applying hlda to Practical Topic Modeling Joseph Heng lengerfulluse@gmail.com CIST Lab of BUPT March 17, 2013 Outline 1 HLDA Discussion 2 the nested CRP GEM Distribution Dirichlet Distribution Posterior Inference Outline 1 HLDA Discussion 2 the

More information

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space

More information

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed

More information

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes. CS 188 Spring 2013 Introduction to Artificial Intelligence Midterm II ˆ You have approximately 1 hour and 50 minutes. ˆ The exam is closed book, closed notes except a one-page crib sheet. ˆ Please use

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Design and Implementation of Speech Recognition Systems

Design and Implementation of Speech Recognition Systems Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from

More information

Text Mining: Basic Models and Applications

Text Mining: Basic Models and Applications Introduction Basics Latent Dirichlet Allocation (LDA) Markov Chain Based Models Public Policy Applications Text Mining: Basic Models and Applications Alvaro J. Riascos Villegas University of los Andes

More information

Lesson One Hundred and Sixty-One Normal Distribution for some Resolution

Lesson One Hundred and Sixty-One Normal Distribution for some Resolution STUDENT MANUAL ALGEBRA II / LESSON 161 Lesson One Hundred and Sixty-One Normal Distribution for some Resolution Today we re going to continue looking at data sets and how they can be represented in different

More information

Lesson Objectives. Core Content Objectives. Language Arts Objectives

Lesson Objectives. Core Content Objectives. Language Arts Objectives Evergreen Trees 9 Lesson Objectives Core Content Objectives Students will: Explain that evergreen trees are one type of plant that stays green all year and does not become dormant in the winter Compare

More information

News English.com Ready-to-use ESL / EFL Lessons

News English.com Ready-to-use ESL / EFL Lessons www.breaking News English.com Ready-to-use ESL / EFL Lessons 1,000 IDEAS & ACTIVITIES FOR LANGUAGE TEACHERS The Breaking News English.com Resource Book http://www.breakingnewsenglish.com/book.html NASA

More information

Econ 250 Winter 2009 Assignment 2 - Solutions

Econ 250 Winter 2009 Assignment 2 - Solutions Eco50 Winter 2009 Assignment 2 - Solutions. For a restaurant, the time it takes to deliver pizza (in minutes) is uniform over the interval (25, 37). Determine the proportion of deliveries that are made

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

The Shape, Center and Spread of a Normal Distribution - Basic

The Shape, Center and Spread of a Normal Distribution - Basic The Shape, Center and Spread of a Normal Distribution - Basic Brenda Meery, (BrendaM) Say Thanks to the Authors Click http://www.ck12.org/saythanks (No sign in required) To access a customizable version

More information

Simulating Future Climate Change Using A Global Climate Model

Simulating Future Climate Change Using A Global Climate Model Simulating Future Climate Change Using A Global Climate Model Introduction: (EzGCM: Web-based Version) The objective of this abridged EzGCM exercise is for you to become familiar with the steps involved

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Chapter 1, Section 1 Exploring Geography

Chapter 1, Section 1 Exploring Geography Chapter 1, Section 1 Exploring Geography (Pages 19 22) Setting a Purpose for Reading Think about these questions as you read: What are the physical and human features geographers study? How do geographers

More information

Statistical Debugging with Latent Topic Models

Statistical Debugging with Latent Topic Models Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,

More information

Name Class Date. You can use the properties of equality to solve equations. Subtraction is the inverse of addition.

Name Class Date. You can use the properties of equality to solve equations. Subtraction is the inverse of addition. 2-1 Reteaching Solving One-Step Equations You can use the properties of equality to solve equations. Subtraction is the inverse of addition. What is the solution of + 5 =? In the equation, + 5 =, 5 is

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 5, 2011 Today: Latent Dirichlet Allocation topic models Social network analysis based on latent probabilistic

More information

Analyzing Lines of Fit

Analyzing Lines of Fit 4.5 Analyzing Lines of Fit Essential Question How can you analytically find a line of best fit for a scatter plot? Finding a Line of Best Fit Work with a partner. The scatter plot shows the median ages

More information

Understanding and Using Variables

Understanding and Using Variables Algebra is a powerful tool for understanding the world. You can represent ideas and relationships using symbols, tables and graphs. In this section you will learn about Understanding and Using Variables

More information

Physics Fundamentals of Astronomy

Physics Fundamentals of Astronomy Physics 1303.010 Fundamentals of Astronomy Course Information Meeting Place & Time ASU Planetarium (VIN P-02) TR 09:30-10:45 AM Spring 2018 Instructor Dr. Kenneth Carrell Office: VIN 119 Phone: (325) 942-2136

More information

EE595A Submodular functions, their optimization and applications Spring 2011

EE595A Submodular functions, their optimization and applications Spring 2011 EE595A Submodular functions, their optimization and applications Spring 2011 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Winter Quarter, 2011 http://ee.washington.edu/class/235/2011wtr/index.html

More information

Mathematics I Resources for EOC Remediation

Mathematics I Resources for EOC Remediation Mathematics I Resources for EOC Remediation CED Creating Equations Cluster: HSA CED.A.1 HSA CED.A.2 HSA CED.A.3 HSA CED.A.4 The information in this document is intended to demonstrate the depth and rigor

More information

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models. , I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding

More information

(Refer Slide Time: 00:10)

(Refer Slide Time: 00:10) Chemical Reaction Engineering 1 (Homogeneous Reactors) Professor R. Krishnaiah Department of Chemical Engineering Indian Institute of Technology Madras Lecture No 10 Design of Batch Reactors Part 1 (Refer

More information

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 COS 424: Interacting with ata Lecturer: ave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 1 Graphical Models Wrap-up We began the lecture with some final words on graphical models. Choosing a

More information

GLAD: Group Anomaly Detection in Social Media Analysis

GLAD: Group Anomaly Detection in Social Media Analysis GLAD: Group Anomaly Detection in Social Media Analysis Poster #: 1150 Rose Yu, Xinran He and Yan Liu University of Southern California Group Anomaly Detection Anomalous phenomenon in social media data

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry

Background: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what

More information

GCSE style questions arranged by topic

GCSE style questions arranged by topic Write your name here Surname Other names In the style of: Pearson Edexcel GCSE Centre Number Candidate Number Mathematics Histograms GCSE style questions arranged by topic Higher Tier Paper Reference 1MA0/1H

More information

Johns Hopkins Math Tournament Proof Round: Automata

Johns Hopkins Math Tournament Proof Round: Automata Johns Hopkins Math Tournament 2018 Proof Round: Automata February 9, 2019 Problem Points Score 1 10 2 5 3 10 4 20 5 20 6 15 7 20 Total 100 Instructions The exam is worth 100 points; each part s point value

More information

Algebra I. Systems of Linear Equations and Inequalities. Slide 1 / 179. Slide 2 / 179. Slide 3 / 179. Table of Contents

Algebra I. Systems of Linear Equations and Inequalities. Slide 1 / 179. Slide 2 / 179. Slide 3 / 179. Table of Contents Slide 1 / 179 Algebra I Slide 2 / 179 Systems of Linear Equations and Inequalities 2015-04-23 www.njctl.org Table of Contents Slide 3 / 179 Click on the topic to go to that section 8th Grade Review of

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

GRE Workshop Quantitative Reasoning. February 13 and 20, 2018

GRE Workshop Quantitative Reasoning. February 13 and 20, 2018 GRE Workshop Quantitative Reasoning February 13 and 20, 2018 Overview Welcome and introduction Tonight: arithmetic and algebra 6-7:15 arithmetic 7:15 break 7:30-8:45 algebra Time permitting, we ll start

More information

Simulating the Solar System

Simulating the Solar System Simulating the Solar System Classroom Activity Simulating the Solar System Objectives The primary objective of this activity is to increase the students understanding of the appearance and movements of

More information

Released Assessment Questions, 20 16

Released Assessment Questions, 20 16 Released Assessment Questions, 20 16 Grade 9 Assessment of Mathematics, Academic For Use with Assistive Technology: Listen as your teacher reads the instructions. Some key points are listed below. Make

More information

Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter

Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter Daichi Koike Yusuke Takahashi Takehito Utsuro Grad. Sch. Sys. & Inf. Eng., University of Tsukuba, Tsukuba, 305-8573,

More information

GIS Institute Center for Geographic Analysis

GIS Institute Center for Geographic Analysis GIS Institute Center for Geographic Analysis Welcome Intensive training in the application of GIS to research Collection, management, analysis, and communication of spatial data Topics include: data collection,

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Lesson 2: Introduction to Variables

Lesson 2: Introduction to Variables In this lesson we begin our study of algebra by introducing the concept of a variable as an unknown or varying quantity in an algebraic expression. We then take a closer look at algebraic expressions to

More information

Agile Mind Mathematics 6 Scope and Sequence, Common Core State Standards for Mathematics

Agile Mind Mathematics 6 Scope and Sequence, Common Core State Standards for Mathematics In the three years preceding Grade 6, students have acquired a strong foundation in numbers and operations, geometry, measurement, and data. They are fluent in multiplication of multi- digit whole numbers

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

INSTRUCTIONAL PLANNING GUIDE FOR CHARACTERISTICS OF THE EARTH, MOON, AND SUN

INSTRUCTIONAL PLANNING GUIDE FOR CHARACTERISTICS OF THE EARTH, MOON, AND SUN INSTRUCTIONAL PLANNING GUIDE FOR CHARACTERISTICS OF THE EARTH, MOON, AND SUN TEKS: 5.8D Earth and space. The student knows that there are recognizable patterns in the natural world and among the Sun, Earth,

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

Math 440 Project Assignment

Math 440 Project Assignment Math 440 Project Assignment 1. Overview The goal of your project assignment is to explore an aspect of topology beyond the topics covered in class. It will be necessary to use the tools and properties

More information

SYLLABUS FORM WESTCHESTER COMMUNITY COLLEGE Valhalla, NY lo CURRENT DATE: Please indicate whether this is a NEW COURSE or a REVISION:

SYLLABUS FORM WESTCHESTER COMMUNITY COLLEGE Valhalla, NY lo CURRENT DATE: Please indicate whether this is a NEW COURSE or a REVISION: SYLLABUS FORM WESTCHESTER COMMUNITY COLLEGE Valhalla, NY lo595 l. Course #: 2. NAME OF ORIGINATOR /REVISOR: PHYSC 143 Laurel Senft, Rob Applebaum, Eryn Klosko NAME OF COURSE Earth Science 3. CURRENT DATE:

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information

More information

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham

Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham Last name (family name): First name (given name): Student ID

More information

Moon. Grade Level: 1-3. pages 1 2 pages 3 4 pages 5 page 6 page 7 page 8 9

Moon. Grade Level: 1-3. pages 1 2 pages 3 4 pages 5 page 6 page 7 page 8 9 Moon Grade Level: 1-3 Teacher Guidelines Instructional Pages Activity Page Practice Page Homework Page Answer Key pages 1 2 pages 3 4 pages 5 page 6 page 7 page 8 9 Classroom Procedure: Approximate Grade

More information

Morpheus: Neo, sooner or later you re going to realize, just as I did, that there s a difference between knowing the path, and walking the path.

Morpheus: Neo, sooner or later you re going to realize, just as I did, that there s a difference between knowing the path, and walking the path. Morpheus: Neo, sooner or later you re going to realize, just as I did, that there s a difference between knowing the path, and walking the path. Why CS 53? Making linear algebra more concrete. Making it

More information

Paterson Public Schools

Paterson Public Schools A. Concepts About Print Understand how print is organized and read. (See LAL Curriculum Framework Grade KR Page 1 of 12) Understand that all print materials in English follow similar patterns. (See LAL

More information

Estadística I Exercises Chapter 4 Academic year 2015/16

Estadística I Exercises Chapter 4 Academic year 2015/16 Estadística I Exercises Chapter 4 Academic year 2015/16 1. An urn contains 15 balls numbered from 2 to 16. One ball is drawn at random and its number is reported. (a) Define the following events by listing

More information

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Topic Modeling Using Latent Dirichlet Allocation (LDA) Topic Modeling Using Latent Dirichlet Allocation (LDA) Porter Jenkins and Mimi Brinberg Penn State University prj3@psu.edu mjb6504@psu.edu October 23, 2017 Porter Jenkins and Mimi Brinberg (PSU) LDA October

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet

More information

Calculator Review. Ti-30xs Multiview Calculator. Name: Date: Session:

Calculator Review. Ti-30xs Multiview Calculator. Name: Date: Session: Calculator Review Ti-30xs Multiview Calculator Name: Date: Session: GED Express Review Calculator Review Page 2 Topics What type of Calculator do I use?... 3 What is with the non-calculator questions?...

More information

PBL: Colonial Life. Create a Brochure Attracting People to Come to your Region

PBL: Colonial Life. Create a Brochure Attracting People to Come to your Region PBL: Colonial Life Create a Brochure Attracting People to Come to your Region Project Idea: Brochure Working in partners, students will create a brochure attracting people to one of the three regions:

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Solving and Graphing a Linear Inequality of a Single Variable

Solving and Graphing a Linear Inequality of a Single Variable Chapter 3 Graphing Fundamentals Section 3.1 Solving and Graphing a Linear Inequality of a Single Variable TERMINOLOGY 3.1 Previously Used: Isolate a Variable Simplifying Expressions Prerequisite Terms:

More information

(Sessions I and II)* BROWARD COUNTY ELEMENTARY SCIENCE BENCHMARK PLAN FOR PERSONAL USE

(Sessions I and II)* BROWARD COUNTY ELEMENTARY SCIENCE BENCHMARK PLAN FOR PERSONAL USE activities 19&20 What Do Plants Need? (Sessions I and II)* BROWARD COUNTY ELEMENTARY SCIENCE BENCHMARK PLAN Grade 1 Quarter 2 Activities 19 & 20 SC.A.1.1.1 The student knows that objects can be described,

More information

Latent Dirichlet Allocation Based Multi-Document Summarization

Latent Dirichlet Allocation Based Multi-Document Summarization Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in

More information

RHS Libraries. A guide for researchers. RHS Libraries. rhs.org.uk/libraries

RHS Libraries. A guide for researchers. RHS Libraries. rhs.org.uk/libraries A guide for researchers Lindley Library 020 7821 3050 library.london@rhs.org.uk Wisley Library 01483 212428 library.wisley@rhs.org.uk Harlow Carr Library 01423 724 686 library.harlowcarr@rhs.org.uk rhs.org.uk/libraries

More information