Welcome to CAMCOS Reports Day Fall 2011
|
|
- Lambert Randall Webb
- 5 years ago
- Views:
Transcription
1 Welcome s, Welcome to CAMCOS Reports Day Fall 2011
2 s, CAMCOS: Text Mining and Damien Adams, Neeti Mittal, Joanna Spencer, Huan Trinh, Annie Vu, Orvin Weng, Rachel Zadok December 9, 2011
3 Outline 1 s, 2 s,
4 s,
5 What is Text Mining? s, work deals with Modeling and Detecting s in documents using Text Mining.
6 What is Text Mining? s, work deals with Modeling and Detecting s in documents using Text Mining. So what exactly is text mining?
7 What is Text Mining? s, Text mining is the act of getting a computer to Read a document Identify topics
8 Why a Computer? s, You may ask, why do we want to get a computer to do this seemingly simple idea?
9 Why a Computer? s, You may ask, why do we want to get a computer to do this seemingly simple idea? It is easy to read a paper that is only a few pages long and identify the topics.
10 Why a Computer? s, You may ask, why do we want to get a computer to do this seemingly simple idea? It is easy to read a paper that is only a few pages long and identify the topics. We learned how to do this in English class in grade school. But what about dozens, hundreds, or thousands of documents?
11 Why a Computer? s, You may ask, why do we want to get a computer to do this seemingly simple idea? It is easy to read a paper that is only a few pages long and identify the topics. We learned how to do this in English class in grade school. But what about dozens, hundreds, or thousands of documents? The goal of text mining is to tackle documents of sizes that are not humanly feasible.
12 The DeLorean Motor Company s, In 2013, the DeLorian Motor Company will be producing DeLoreans again.
13 Flux Capacitor s, Suppose they need to recall certain DeLoreans due to flux capacitor issues.
14 Without Text Mining s, Without text mining, DMC would have to
15 Without Text Mining s, Without text mining, DMC would have to Spend days reading all reports
16 Without Text Mining s, Without text mining, DMC would have to Spend days reading all reports Manually identify topics, including the flux capacitor issue
17 Without Text Mining s, Without text mining, DMC would have to Spend days reading all reports Manually identify topics, including the flux capacitor issue Recall DeLoreans
18 Without Text Mining s, Without text mining, DMC would have to Spend days reading all reports Manually identify topics, including the flux capacitor issue Recall DeLoreans This could take days or even weeks!
19 With Text Mining s, On the other hand, DMC could use text mining to
20 With Text Mining s, On the other hand, DMC could use text mining to Spend 10 minutes inserting all reports into the computer
21 With Text Mining s, On the other hand, DMC could use text mining to Spend 10 minutes inserting all reports into the computer Read topics found from text mining
22 With Text Mining s, On the other hand, DMC could use text mining to Spend 10 minutes inserting all reports into the computer Read topics found from text mining Recall DeLoreans
23 With Text Mining s, On the other hand, DMC could use text mining to Spend 10 minutes inserting all reports into the computer Read topics found from text mining Recall DeLoreans This could take less than an hour!
24 Modeling s, The idea behind text mining is topic modeling.
25 Modeling s, The idea behind text mining is topic modeling. Given a document ( bag of words ), we wish to identify the topics.
26 Modeling s, The idea behind text mining is topic modeling. Given a document ( bag of words ), we wish to identify the topics. In the previous example, the document would be a collection of incident reports,
27 Modeling s, The idea behind text mining is topic modeling. Given a document ( bag of words ), we wish to identify the topics. In the previous example, the document would be a collection of incident reports, and the topic would be the flux capacitor issues.
28 Modeling s, Document! Words "
29 s, s,
30 What is a? s, What exactly is a topic?
31 What is a? s, What exactly is a topic? When we read a paper, the topic is the main idea.
32 s, How can a topic be defined?
33 s, How can a topic be defined? Definition A is a distribution of words in a document over a predetermined vocabulary.
34 Modeling s, What is topic modeling? We talked about it before, but here is a formal definition.
35 Modeling s, What is topic modeling? We talked about it before, but here is a formal definition. Definition Modeling is the using of methods to automatically assign words in documents to topics.
36 s, We focus on topic modeling using, Latent Dirichlet Allocation.
37 s, We focus on topic modeling using, Latent Dirichlet Allocation. (2002) is an example of a topic model and was first presented as a graphical model for topic discovery by David Blei, Andrew Ng, and Michael Jordan.
38 s, Definition Latent Dirichlet Allocation () is a generative process that defines a joint probability distribution over both the observed and hidden random variables. Simply put, uncovers the thematic structure hidden in a document. It generates the main ideas of a set of documents, which we call s.
39 Latent Dirichlet Allocation s, Examining what the words stand for, Latent: We observe the words in the documents, but the topics are hidden (latent) Dirichlet: uses the Dirichlet distribution (next slide) Allocation: We will allocate topics to documents
40 Dirichlet Distribution s, The Dirichlet distribution, with parameters 1,..., K,isa multivariate distribution of K random variables, x 1,...,x K.
41 Dirichlet Distribution s, The Dirichlet distribution, with parameters 1,..., K,isa multivariate distribution of K random variables, x 1,...,x K. Its density is Dir( 1,..., K ) / KY i=1 x i 1 i.
42 Example of Dirichlet Distribution (from Wikipedia) s, Consider an urn containing balls of K di erent colors. Initially, the urn contains 1 balls of color 1, 2 balls of color 2, and so on. Now perform N draws from the urn, where after each draw, the ball is placed back into the urn with an additional ball of the same color. In the limit as N approaches infinity, the proportions of di erent colored balls in the urn will be distributed as Dir( 1,..., K ). Jumping ahead, in our case i will be the importance of topic i among K topics.
43 s,
44 Assumptions s, The topics are Dirichlet distributed over the words The documents are Dirichlet distributed over the topics Order of the documents does not matter (this is a deficiency, exactly what we address) Order of the words in the documents does not matter The number of topics is assumed known and fixed
45 Additional Assumptions s, automatically removes all the common words such as with, the, and, etc.
46 Additional Assumptions s, automatically removes all the common words such as with, the, and, etc. It also removes all the numeric values, commas, parentheses, etc.
47 Additional Assumptions s, automatically removes all the common words such as with, the, and, etc. It also removes all the numeric values, commas, parentheses, etc. It eliminates all the words that are repeated many times in the document
48 Additional Assumptions s, automatically removes all the common words such as with, the, and, etc. It also removes all the numeric values, commas, parentheses, etc. It eliminates all the words that are repeated many times in the document We only considered the words with 3 or more letters
49 Example s, We ran over a document about habitats. We looked for five topics.
50 Output s, species.028 plant.019 habitat.038 species.021 species.076 environment.021 botanical.019 population.019 additional.021 particular.052 e ect.014 physical.019 species.019 cycle.021 analysis.041 references.014 zoological.014 natural.019 e ect.021 population.030 ecosystem.014 habitat.013 trophic.019 help.020 abundance s distributed over a list of words. This list consist of entire vocabulary with varying probabilities In other words, sum of the probabilities of all of the words under any given topic is 1 Under each, as the probability of each word decreases, its position drops.....
51 s with s, The assumption that the order of the documents does not matter prevents us from di erentiating between new information and prior knowledge. fails to infer time sensitive information.
52 Questions? s, Q&A?
53 Time for a Break... s, We will now take a five minute break.
54 s,
55 Contribution s, contribution to this field is the ability to automatically detect emerging topics.
56 Definition s, An emerging topic is a topic that is more prominent now than it was before.
57 Definition s, An emerging topic is a topic that is more prominent now than it was before. We implemented a variable that measures that prominence.
58 Definition s, An emerging topic is a topic that is more prominent now than it was before. We implemented a variable that measures that prominence. i is the prominence of topic i over, say, the last month s worth of data.
59 Definition s, An emerging topic is a topic that is more prominent now than it was before. We implemented a variable that measures that prominence. i is the prominence of topic i over, say, the last month s worth of data. Now if 0 i is the prominence of topic i over, say, the last week, then we mathematically define Definition i is an if 0 i i > 1.
60 Definition s, David Blei devised the algorithm.
61 Definition s, David Blei devised the algorithm. team implemented with a new feature of topic detection.
62 Definition s, David Blei devised the algorithm. team implemented with a new feature of topic detection. i is an if 0 i i > 1.
63 Another Approach of Defining an s, Ranking is the change in position of importance of a topic. Alternate Definition: i is an emerging topic if there is a positive change in position.
64 Another Approach of Defining an s, Ranking is the change in position of importance of a topic. Alternate Definition: i is an emerging topic if there is a positive change in position. We did not use this definition as we would have assumed that gives topic output in some order.
65 Goal s, We want to use text mining to detect emerging topics relative to two di erent documents We then want to observe only the topics with the greatest relative importance
66 s, Algorithm
67 Algorithm s, Written in the statistical language R Uses the package topicmodels by Grün and Hornik
68 The Input s, algorithm takes three inputs A document (which we suspect having an emerging topic) An estimated number of topics (K) The percentage of recent data, e.g., 14% = 1 7,ifdocument=weekandrecent=lastday 23% = 7 30,ifdocument=monthandrecent=lastweek 17% = 2 12,ifdocument=yearandrecent=lasttwomonths
69 How the Algorithm Works s, 1 Preprocessing: Common words (of, the, is, from,...), special characters ($, %,...), and numbers are discarded 2 Using, we discover the K topics in the entire document as well as their importances 1, 2,..., K 3 For each topic i, we compute its importance 0 i in the recent part of the document 4 The topics are sorted in decreasing order according to 0 i i 5 The topics for which 0 i i > 1aredisplayedassuspected emerging topics
70 The Output s, A list of words grouped by topics A plot of relative importance of topics in the document
71 s,
72 Setup s, To test our algorithm, we created a document with an emerging topic in it.
73 Setup s, To test our algorithm, we created a document with an emerging topic in it. We took the Wikipedia entry for Habitat and introduced an emerging topic by appending an article from the EPA on Climate Change.
74 Setup s, To test our algorithm, we created a document with an emerging topic in it. We took the Wikipedia entry for Habitat and introduced an emerging topic by appending an article from the EPA on Climate Change. The emerging topic represented about 10% the size of the entire document.
75 Why an Should Take Up About 10% of a Document s, We will examine three di erent situations. Consider a company that turns in daily reports. From these reports you want to discover emerging topics.
76 Why an Should Take Up About 10% of a Document s, Consider comparing the reports from today against the last week of reports. If half of the reports from today are emerging topics, then = That is, about 7% of the last weeks repots are about emerging topics.
77 Why an Should Take Up About 10% of a Document s, Now consider comparing the last week of reports against the past month of reports. If half of the last week of reports are emerging topics, then = That is, about 12% of the last weeks reports are about emerging topics.
78 Why an Should Take Up About 10% of a Document s, Now consider comparing the last months worth of reports against the past years worth of reports. If half of the last months worth of reports are emerging topics, then = That is, about 8% of the past years worth of reports are about emerging topics.
79 Why an Should Take Up About 10% of a Document s, So now lets recap what we have: = = = As you can see, these values are pretty close. In fact, their average is about Thus, 10%, give or take, is a good estimate.
80 Results s, We were looking for 10 topics. Here are our results.
81 Relative Importance of s, 0 i i s from Changes in Relative Importance s, Changes in Relative Importance Number
82 s And here are the program s suggested emerging topics. s, In particular, the topic Greenhouse Atmosphere Breeding Optional Relative which is clearly attributable to climate change is correctly discovered and identified.
83 Example 2 s, next example is based on the character merchandise sales reports from Star Wars and Disney.
84 Setup s, We ran our program over these reports. We looked for 10 topics.
85 Results Here are our results. s, Clearly, there are topics from both the Star Wars and the Disney reports.
86 Relative Importance of s, 0 i i s,
87 s s, Since our potential emerging topics are from both sales reports, we don t know which topic is our emerging topic.
88 How Can We Determine the s? s, In order to find out the emerging topics, we took the first 90% of the sales reports and ran over it.
89 90% Results Here are the topics from the first 90% of the reports. s, As you can see, all of the topics are Star Wars topics. Now we compare these topics against the potential emerging topics from before, and we can discard Star Wars topics as the emerging topics.
90 s s, Here are the program s suggested emerging topics. Since we have discarded Star Wars as emerging topics, s 10, 6, and 3 are the emerging topics. That is, the Disney topics are our emerging topics.
91 s,
92 s, We have developed an algorithm that automatically detects emerging topics It performs well in our experiments original purpose was to find emerging topics in NASA air tra c control incident reports. We are in the process of examining NASA data.
93 s, Future work: Gain better understanding of the relationship between emerging and old topics (i.e., what is the mathematical meaning of the value of 0 i i?) We have made our software (in R) and test data publicly available at
94 Acknowledgments and References s, We would like to thank: All of you for coming David Blei for his and DTM implementations and paper to Probabilistic Models Bettina Grün and Kurt Hornik for their paper topicmodels: An R Package for Fitting Models and their R package and script
95 Additional Thanks s, We would also like to thank sponsor NASA CAMCOS Professor Hsu Dr. Ginger Koev Professor Koev for supervising our team We would like to extend our gratitude to our friends and families for their support
96 Questions? s, Q&A?
97 Thanks! s, Thank You For Coming To CAMCOS Reports Day Fall 2011
98 Directions to Lunch Please join us for lunch at Flames! s, 4th St. * Flames San Fernando King Library SJSU Campus P San Salvador Student Union
EMERGING TOPIC MODELS CAMCOS REPORT FALL 2011 NEETI MITTAL
EMERGING TOPIC MODELS CAMCOS REPORT FALL 2011 NEETI MITTAL Abstract. We review the concept of Latent Dirichlet Allocation (LDA), along with the definitions of Text Mining, Topic, and Topic Modeling. We
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationWelcome to CAMCOS Reports Day Spring 2009
Welcome Welcome to CAMCOS Reports Day Spring 2009 Spring 2009 Jake Askeland, Jonathan Baptist, Miranda Braselton, David von Gunten, Douglas Mathews, Duncan McElfresh, Cheuk Wong In collaboration with NASA
More informationAN INTRODUCTION TO TOPIC MODELS
AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationText mining and natural language analysis. Jefrey Lijffijt
Text mining and natural language analysis Jefrey Lijffijt PART I: Introduction to Text Mining Why text mining The amount of text published on paper, on the web, and even within companies is inconceivably
More informationUnderstanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014
Understanding Comments Submitted to FCC on Net Neutrality Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Abstract We aim to understand and summarize themes in the 1.65 million
More informationCollaborative Topic Modeling for Recommending Scientific Articles
Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationDistributed ML for DOSNs: giving power back to users
Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework 3 Due Nov 12, 10.30 am Rules 1. Homework is due on the due date at 10.30 am. Please hand over your homework at the beginning of class. Please see
More informationApplying LDA topic model to a corpus of Italian Supreme Court decisions
Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More informationLecture 22 Exploratory Text Analysis & Topic Models
Lecture 22 Exploratory Text Analysis & Topic Models Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor [Some slides borrowed from Michael Paul] 1 Text Corpus
More informationNews English.com Ready-to-use ESL/EFL Lessons by Sean Banville There are 13 signs of the Zodiac, expert says
www.breaking News English.com Ready-to-use ESL/EFL Lessons by Sean Banville 1,000 IDEAS & ACTIVITIES FOR LANGUAGE TEACHERS The Breaking News English.com Resource Book http://www.breakingnewsenglish.com/book.html
More informationWelcome to CAMCOS Reports Day Fall 2010
Welcome to CAMCOS Reports Day Fall 2010 and Dynamics in Electrically Charged Binary Asteroid Systems Doug Mathews, Lara Mitchell, Jennifer Murguia, Tri Nguyen, Raquel Ortiz, Dave Richardson, Usha Watson,
More informationWednesday, 10 September 2008
MA211 : Calculus, Part 1 Lecture 2: Sets and Functions Dr Niall Madden (Mathematics, NUI Galway) Wednesday, 10 September 2008 MA211 Lecture 2: Sets and Functions 1/33 Outline 1 Short review of sets 2 Sets
More informationOutline. Wednesday, 10 September Schedule. Welcome to MA211. MA211 : Calculus, Part 1 Lecture 2: Sets and Functions
Outline MA211 : Calculus, Part 1 Lecture 2: Sets and Functions Dr Niall Madden (Mathematics, NUI Galway) Wednesday, 10 September 2008 1 Short review of sets 2 The Naturals: N The Integers: Z The Rationals:
More informationLanguage Information Processing, Advanced. Topic Models
Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:
More informationWarm Up. Fourth Grade Released Test Question: 1) Which of the following has the greatest value? 2) Write the following numbers in expanded form: 25:
Warm Up Fourth Grade Released Test Question: 1) Which of the following has the greatest value? A 12.1 B 0.97 C 4.23 D 5.08 Challenge: Plot these numbers on an open number line. 2) Write the following numbers
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationChallenger Center Teacher Resources for Engaging Students in Science, Technology, Engineering, and Math
Challenger Center Teacher Resources for Engaging Students in Science, Technology, Engineering, and Math Designed for Grades 5-8 These resources are brought to you by: Earth vs. Mars Prep Time 10 minutes
More informationChapter 3. Expressions and Equations Part 1
Chapter 3. Expressions and Equations Part 1 Chapter Overview Making connections from concrete (specific / numeric) thinking to algebraic (involving unknown quantities / variables) thinking is a challenging
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationProjects in Geometry for High School Students
Projects in Geometry for High School Students Goal: Our goal in more detail will be expressed on the next page. Our journey will force us to understand plane and three-dimensional geometry. We will take
More informationHIRES 2017 Syllabus. Instructors:
HIRES 2017 Syllabus Instructors: Dr. Brian Vant-Hull: Steinman 185, 212-650-8514, brianvh@ce.ccny.cuny.edu Ms. Hannah Aizenman: NAC 7/311, 212-650-6295, haizenman@ccny.cuny.edu Dr. Tarendra Lakhankar:
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationUnit 1 Science Models & Graphing
Name: Date: 9/18 Period: Unit 1 Science Models & Graphing Essential Questions: What do scientists mean when they talk about models? How can we get equations from graphs? Objectives Explain why models are
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationProbability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur
Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation
More informationTopic Modeling: Beyond Bag-of-Words
University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a
More informationMathematics Practice Test 2
Mathematics Practice Test 2 Complete 50 question practice test The questions in the Mathematics section require you to solve mathematical problems. Most of the questions are presented as word problems.
More informationGeneral Physics (PHY 2130)
General Physics (PHY 2130) Introduction Syllabus and teaching strategy Physics Introduction Mathematical review http://www.physics.wayne.edu/~apetrov/phy2130/ Chapter 1 Lecturer:, Room 358 Physics Building,
More informationApplying hlda to Practical Topic Modeling
Joseph Heng lengerfulluse@gmail.com CIST Lab of BUPT March 17, 2013 Outline 1 HLDA Discussion 2 the nested CRP GEM Distribution Dirichlet Distribution Posterior Inference Outline 1 HLDA Discussion 2 the
More informationTopic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1
Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1 Low-Dimensional Space for Documents Last time: embedding space
More informationApplying Latent Dirichlet Allocation to Group Discovery in Large Graphs
Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed
More informationMidterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.
CS 188 Spring 2013 Introduction to Artificial Intelligence Midterm II ˆ You have approximately 1 hour and 50 minutes. ˆ The exam is closed book, closed notes except a one-page crib sheet. ˆ Please use
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationDesign and Implementation of Speech Recognition Systems
Design and Implementation of Speech Recognition Systems Spring 2013 Class 7: Templates to HMMs 13 Feb 2013 1 Recap Thus far, we have looked at dynamic programming for string matching, And derived DTW from
More informationText Mining: Basic Models and Applications
Introduction Basics Latent Dirichlet Allocation (LDA) Markov Chain Based Models Public Policy Applications Text Mining: Basic Models and Applications Alvaro J. Riascos Villegas University of los Andes
More informationLesson One Hundred and Sixty-One Normal Distribution for some Resolution
STUDENT MANUAL ALGEBRA II / LESSON 161 Lesson One Hundred and Sixty-One Normal Distribution for some Resolution Today we re going to continue looking at data sets and how they can be represented in different
More informationLesson Objectives. Core Content Objectives. Language Arts Objectives
Evergreen Trees 9 Lesson Objectives Core Content Objectives Students will: Explain that evergreen trees are one type of plant that stays green all year and does not become dormant in the winter Compare
More informationNews English.com Ready-to-use ESL / EFL Lessons
www.breaking News English.com Ready-to-use ESL / EFL Lessons 1,000 IDEAS & ACTIVITIES FOR LANGUAGE TEACHERS The Breaking News English.com Resource Book http://www.breakingnewsenglish.com/book.html NASA
More informationEcon 250 Winter 2009 Assignment 2 - Solutions
Eco50 Winter 2009 Assignment 2 - Solutions. For a restaurant, the time it takes to deliver pizza (in minutes) is uniform over the interval (25, 37). Determine the proportion of deliveries that are made
More informationLanguage as a Stochastic Process
CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any
More informationThe Shape, Center and Spread of a Normal Distribution - Basic
The Shape, Center and Spread of a Normal Distribution - Basic Brenda Meery, (BrendaM) Say Thanks to the Authors Click http://www.ck12.org/saythanks (No sign in required) To access a customizable version
More informationSimulating Future Climate Change Using A Global Climate Model
Simulating Future Climate Change Using A Global Climate Model Introduction: (EzGCM: Web-based Version) The objective of this abridged EzGCM exercise is for you to become familiar with the steps involved
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationChapter 1, Section 1 Exploring Geography
Chapter 1, Section 1 Exploring Geography (Pages 19 22) Setting a Purpose for Reading Think about these questions as you read: What are the physical and human features geographers study? How do geographers
More informationStatistical Debugging with Latent Topic Models
Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,
More informationName Class Date. You can use the properties of equality to solve equations. Subtraction is the inverse of addition.
2-1 Reteaching Solving One-Step Equations You can use the properties of equality to solve equations. Subtraction is the inverse of addition. What is the solution of + 5 =? In the equation, + 5 =, 5 is
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 5, 2011 Today: Latent Dirichlet Allocation topic models Social network analysis based on latent probabilistic
More informationAnalyzing Lines of Fit
4.5 Analyzing Lines of Fit Essential Question How can you analytically find a line of best fit for a scatter plot? Finding a Line of Best Fit Work with a partner. The scatter plot shows the median ages
More informationUnderstanding and Using Variables
Algebra is a powerful tool for understanding the world. You can represent ideas and relationships using symbols, tables and graphs. In this section you will learn about Understanding and Using Variables
More informationPhysics Fundamentals of Astronomy
Physics 1303.010 Fundamentals of Astronomy Course Information Meeting Place & Time ASU Planetarium (VIN P-02) TR 09:30-10:45 AM Spring 2018 Instructor Dr. Kenneth Carrell Office: VIN 119 Phone: (325) 942-2136
More informationEE595A Submodular functions, their optimization and applications Spring 2011
EE595A Submodular functions, their optimization and applications Spring 2011 Prof. Jeff Bilmes University of Washington, Seattle Department of Electrical Engineering Winter Quarter, 2011 http://ee.washington.edu/class/235/2011wtr/index.html
More informationMathematics I Resources for EOC Remediation
Mathematics I Resources for EOC Remediation CED Creating Equations Cluster: HSA CED.A.1 HSA CED.A.2 HSA CED.A.3 HSA CED.A.4 The information in this document is intended to demonstrate the depth and rigor
More informationHidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.
, I. Toy Markov, I. February 17, 2017 1 / 39 Outline, I. Toy Markov 1 Toy 2 3 Markov 2 / 39 , I. Toy Markov A good stack of examples, as large as possible, is indispensable for a thorough understanding
More information(Refer Slide Time: 00:10)
Chemical Reaction Engineering 1 (Homogeneous Reactors) Professor R. Krishnaiah Department of Chemical Engineering Indian Institute of Technology Madras Lecture No 10 Design of Batch Reactors Part 1 (Refer
More informationCOS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007
COS 424: Interacting with ata Lecturer: ave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007 1 Graphical Models Wrap-up We began the lecture with some final words on graphical models. Choosing a
More informationGLAD: Group Anomaly Detection in Social Media Analysis
GLAD: Group Anomaly Detection in Social Media Analysis Poster #: 1150 Rose Yu, Xinran He and Yan Liu University of Southern California Group Anomaly Detection Anomalous phenomenon in social media data
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationBackground: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry
Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what
More informationGCSE style questions arranged by topic
Write your name here Surname Other names In the style of: Pearson Edexcel GCSE Centre Number Candidate Number Mathematics Histograms GCSE style questions arranged by topic Higher Tier Paper Reference 1MA0/1H
More informationJohns Hopkins Math Tournament Proof Round: Automata
Johns Hopkins Math Tournament 2018 Proof Round: Automata February 9, 2019 Problem Points Score 1 10 2 5 3 10 4 20 5 20 6 15 7 20 Total 100 Instructions The exam is worth 100 points; each part s point value
More informationAlgebra I. Systems of Linear Equations and Inequalities. Slide 1 / 179. Slide 2 / 179. Slide 3 / 179. Table of Contents
Slide 1 / 179 Algebra I Slide 2 / 179 Systems of Linear Equations and Inequalities 2015-04-23 www.njctl.org Table of Contents Slide 3 / 179 Click on the topic to go to that section 8th Grade Review of
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationGRE Workshop Quantitative Reasoning. February 13 and 20, 2018
GRE Workshop Quantitative Reasoning February 13 and 20, 2018 Overview Welcome and introduction Tonight: arithmetic and algebra 6-7:15 arithmetic 7:15 break 7:30-8:45 algebra Time permitting, we ll start
More informationSimulating the Solar System
Simulating the Solar System Classroom Activity Simulating the Solar System Objectives The primary objective of this activity is to increase the students understanding of the appearance and movements of
More informationReleased Assessment Questions, 20 16
Released Assessment Questions, 20 16 Grade 9 Assessment of Mathematics, Academic For Use with Assistive Technology: Listen as your teacher reads the instructions. Some key points are listed below. Make
More informationTime Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter
Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter Daichi Koike Yusuke Takahashi Takehito Utsuro Grad. Sch. Sys. & Inf. Eng., University of Tsukuba, Tsukuba, 305-8573,
More informationGIS Institute Center for Geographic Analysis
GIS Institute Center for Geographic Analysis Welcome Intensive training in the application of GIS to research Collection, management, analysis, and communication of spatial data Topics include: data collection,
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationLesson 2: Introduction to Variables
In this lesson we begin our study of algebra by introducing the concept of a variable as an unknown or varying quantity in an algebraic expression. We then take a closer look at algebraic expressions to
More informationAgile Mind Mathematics 6 Scope and Sequence, Common Core State Standards for Mathematics
In the three years preceding Grade 6, students have acquired a strong foundation in numbers and operations, geometry, measurement, and data. They are fluent in multiplication of multi- digit whole numbers
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationINSTRUCTIONAL PLANNING GUIDE FOR CHARACTERISTICS OF THE EARTH, MOON, AND SUN
INSTRUCTIONAL PLANNING GUIDE FOR CHARACTERISTICS OF THE EARTH, MOON, AND SUN TEKS: 5.8D Earth and space. The student knows that there are recognizable patterns in the natural world and among the Sun, Earth,
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationMath 440 Project Assignment
Math 440 Project Assignment 1. Overview The goal of your project assignment is to explore an aspect of topology beyond the topics covered in class. It will be necessary to use the tools and properties
More informationSYLLABUS FORM WESTCHESTER COMMUNITY COLLEGE Valhalla, NY lo CURRENT DATE: Please indicate whether this is a NEW COURSE or a REVISION:
SYLLABUS FORM WESTCHESTER COMMUNITY COLLEGE Valhalla, NY lo595 l. Course #: 2. NAME OF ORIGINATOR /REVISOR: PHYSC 143 Laurel Senft, Rob Applebaum, Eryn Klosko NAME OF COURSE Earth Science 3. CURRENT DATE:
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information
More informationEconomics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham
Economics 345: Applied Econometrics Section A01 University of Victoria Midterm Examination #2 Version 2 Fall 2016 Instructor: Martin Farnham Last name (family name): First name (given name): Student ID
More informationMoon. Grade Level: 1-3. pages 1 2 pages 3 4 pages 5 page 6 page 7 page 8 9
Moon Grade Level: 1-3 Teacher Guidelines Instructional Pages Activity Page Practice Page Homework Page Answer Key pages 1 2 pages 3 4 pages 5 page 6 page 7 page 8 9 Classroom Procedure: Approximate Grade
More informationMorpheus: Neo, sooner or later you re going to realize, just as I did, that there s a difference between knowing the path, and walking the path.
Morpheus: Neo, sooner or later you re going to realize, just as I did, that there s a difference between knowing the path, and walking the path. Why CS 53? Making linear algebra more concrete. Making it
More informationPaterson Public Schools
A. Concepts About Print Understand how print is organized and read. (See LAL Curriculum Framework Grade KR Page 1 of 12) Understand that all print materials in English follow similar patterns. (See LAL
More informationEstadística I Exercises Chapter 4 Academic year 2015/16
Estadística I Exercises Chapter 4 Academic year 2015/16 1. An urn contains 15 balls numbered from 2 to 16. One ball is drawn at random and its number is reported. (a) Define the following events by listing
More informationTopic Modeling Using Latent Dirichlet Allocation (LDA)
Topic Modeling Using Latent Dirichlet Allocation (LDA) Porter Jenkins and Mimi Brinberg Penn State University prj3@psu.edu mjb6504@psu.edu October 23, 2017 Porter Jenkins and Mimi Brinberg (PSU) LDA October
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationCS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I
X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet
More informationCalculator Review. Ti-30xs Multiview Calculator. Name: Date: Session:
Calculator Review Ti-30xs Multiview Calculator Name: Date: Session: GED Express Review Calculator Review Page 2 Topics What type of Calculator do I use?... 3 What is with the non-calculator questions?...
More informationPBL: Colonial Life. Create a Brochure Attracting People to Come to your Region
PBL: Colonial Life Create a Brochure Attracting People to Come to your Region Project Idea: Brochure Working in partners, students will create a brochure attracting people to one of the three regions:
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationSolving and Graphing a Linear Inequality of a Single Variable
Chapter 3 Graphing Fundamentals Section 3.1 Solving and Graphing a Linear Inequality of a Single Variable TERMINOLOGY 3.1 Previously Used: Isolate a Variable Simplifying Expressions Prerequisite Terms:
More information(Sessions I and II)* BROWARD COUNTY ELEMENTARY SCIENCE BENCHMARK PLAN FOR PERSONAL USE
activities 19&20 What Do Plants Need? (Sessions I and II)* BROWARD COUNTY ELEMENTARY SCIENCE BENCHMARK PLAN Grade 1 Quarter 2 Activities 19 & 20 SC.A.1.1.1 The student knows that objects can be described,
More informationLatent Dirichlet Allocation Based Multi-Document Summarization
Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in
More informationRHS Libraries. A guide for researchers. RHS Libraries. rhs.org.uk/libraries
A guide for researchers Lindley Library 020 7821 3050 library.london@rhs.org.uk Wisley Library 01483 212428 library.wisley@rhs.org.uk Harlow Carr Library 01423 724 686 library.harlowcarr@rhs.org.uk rhs.org.uk/libraries
More information