Social Media & Text Analysis
|
|
- Angela Ramsey
- 5 years ago
- Views:
Transcription
1 Social Media & Text Analysis lecture 5 - Paraphrase Identification and Logistic Regression CSE Ohio State University Instructor: Wei Xu Website: socialmedia-class.org
2 In-class Presentation pick your topic and sign up a 10 minute presentation (20 points) - A Social Media Platform - Or a NLP Researcher
3 Reading #6 & Quiz #3 socialmedia-class.org
4 Mini Research Proposal propose/explore NLP problems in GitHub dataset
5 Mini Research Proposal propose/explore NLP problems in GitHub dataset GitHub: - a social network for programmers (sharing, collaboration, bug tracking, etc.) - hosting Git repositories (a version control system that tracks changes to files, usually source code) - containing potentially interesting text fields in natural language (comments, issues, etc.)
6 (Recap) what is Paraphrase? sentences or phrases that convey approximately the same meaning using different words (Bhagat & Hovy, 2012)
7 (Recap) what is Paraphrase? sentences or phrases that convey approximately the same meaning using different words (Bhagat & Hovy, 2012) wealthy word rich
8 (Recap) what is Paraphrase? sentences or phrases that convey approximately the same meaning using different words (Bhagat & Hovy, 2012) wealthy the king s speech word phrase rich His Majesty s address
9 (Recap) what is Paraphrase? sentences or phrases that convey approximately the same meaning using different words (Bhagat & Hovy, 2012) wealthy the king s speech the forced resignation of the CEO of Boeing, Harry Stonecipher, for word phrase sentence rich His Majesty s address after Boeing Co. Chief Executive Harry Stonecipher was ousted from
10 The Ideal
11 (Recap) Paraphrase Research 80s WordNet 01 Web '01 Novels '04 News '05 '13 Bi-Text '11 Video '12* Style '13* '14* Twitter '15*'16* Simple Xu Ritter Dolan Grishman Cherry Xu Ritter Callison-Burch Dolan Ji Xu Callison-Burch Napoles * my research Wei Xu. Data-driven Approaches for Paraphrasing Across Language Variations PhD Thesis (2014)
12 DIRT (Discovery of Inference Rules from Text) Lin and Panel (2001) operationalize the Distributional Hypothesis using dependency relationships to define similar environments. Duty and responsibility share a similar set of dependency contexts in large volumes of text: modified by adjectives additional, administrative, assigned, assumed, collective, congressional, constitutional... objects of verbs assert, assign, assume, attend to, avoid, become, breach... Decking Lin and Patrick Pantel. DIRT - Discovery of Inference Rules from Text In KDD (2001)
13 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch
14 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch
15 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch
16 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch
17 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch
18 Quiz #2 Key Limitations of PPDB?
19 Quiz #2 word sense microbe, virus, bacterium, germ, parasite insect, beetle, pest, mosquito, fly bother, annoy, pester bug microphone, tracker, mic, wire, earpiece, cookie glitch, error, malfunction, fault, failure squealer, snitch, rat, mole Source: Chris Callison-Burch
20 Another Key Limitation 80s WordNet 01 Web '01 Novels '04 News '05 '13 Bi-Text '11 Video '12* Style '13* '14* Twitter '15*'16* Simple only paraphrases, no non-paraphrases * my research Wei Xu. Data-driven Approaches for Paraphrasing Across Language Variations PhD Thesis (2014)
21 Paraphrase Identification obtain sentential paraphrases automatically Mancini has been sacked by Manchester City Mancini gets the boot from Man City Yes!% WORLD OF JENKS IS ON AT 11 No!$ World of Jenks is my favorite show on tv (meaningful) non-paraphrases are needed to train classifiers! Wei Xu, Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. Extracting Lexically Divergent Paraphrases from Twitter In
22 Also Non-Paraphrases 80s WordNet 01 Web '01 Novels '04 News '05 '13 Bi-Text '11 Video '12* Style '13* '14* Twitter '15*'16* Simple (meaningful) non-paraphrases are needed to train classifiers! * my research Wei Xu. Data-driven Approaches for Paraphrasing Across Language Variations PhD Thesis (2014)
23 News Paraphrase Corpus Microsoft Research Paraphrase Corpus also contains some non-paraphrases (Dolan, Quirk and Brockett, 2004; Dolan and Brockett, 2005; Brockett and Dolan, 2005)
24 Twitter Paraphrase Corpus also contains a lot of non-paraphrases Wei Xu, Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. Extracting Lexically Divergent Paraphrases from Twitter In TACL (2014)
25 Paraphrase Identification: A Binary Classification Problem Input: - a sentence pair x - a fixed set of binary classes Y = {0, 1} Output: - a predicted class y Y (y = 0 or y = 1)
26 Paraphrase Identification: A Binary Classification Problem Input: - a sentence pair x negative (non-paraphrases) - a fixed set of binary classes Y = {0, 1} Output: - a predicted class y Y (y = 0 or y = 1)
27 Paraphrase Identification: A Binary Classification Problem Input: - a sentence pair x negative (non-paraphrases) - a fixed set of binary classes Y = {0, 1} positive (paraphrases) Output: - a predicted class y Y (y = 0 or y = 1)
28 Paraphrase Identification: A Binary Classification Problem Input: - a sentence pair x - a fixed set of binary classes Y = {0, 1} Output: - a predicted class y Y (y = 0 or y = 1)
29 Classification Method: Supervised Machine Learning Input: - a sentence pair x - a fixed set of binary classes Y = {0, 1} - a training set of m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) Output: - a learned classifier γ: x y Y (y = 0 or y = 1)
30 Classification Method: Supervised Machine Learning Input: - a sentence pair x (represented by features) - a fixed set of binary classes Y = {0, 1} - a training set of m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) Output: - a learned classifier γ: x y Y (y = 0 or y = 1)
31 (Recap Week #3) Classification Method: Supervised Machine Learning Naïve Bayes Logistic Regression Support Vector Machines (SVM)
32 (Recap Week#3) Cons: Naïve Bayes features ti are assumed independent given the class y P(t 1,t 2,...,t n y) = P(t 1 y) P(t 2 y)... P(t n y) This will cause problems: - correlated features double-counted evidence - while parameters are estimated independently - hurt classier s accuracy
33 Classification Method: Supervised Machine Learning Naïve Bayes Logistic Regression Support Vector Machines (SVM)
34 Logistic Regression One of the most useful supervised machine learning algorithm for classification! Generally high performance for a lot of problems. Much more robust than Naïve Bayes (better performance on various datasets).
35 Before Logistic Regression Let s start with something simpler!
36 Paraphrase Identification: Simplified Features We use only one feature: - number of words that two sentence shared in common
37 A very related problem of Paraphrase Identification: Semantic Textual Similarity How similar (close in meaning) two sentences are? 5: completely equivalent in meaning 4: mostly equivalent, but some unimportant details differ 3: roughly equivalent, some important information differs/missing 2: not equivalent, but share some details 1: not equivalent, but are on the same topic 0: completely dissimilar
38 A Simpler Model: Linear Regression 5 Sentence Similarity (rated by Human) #words in common (feature)
39 A Simpler Model: Linear Regression 5 Sentence Similarity #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)
40 A Simpler Model: Linear Regression 5 Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)
41 A Simpler Model: Linear Regression paraphrase{ Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)
42 A Simpler Model: Linear Regression 5 { paraphrase { non-paraphrase Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)
43 Training Set #words in common (x) Sentence Similarity (y) m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) - x s: input variable / features - y s: output / target variable
44 (Recap Week#3) Supervised Machine Learning training set Source: NLTK Book
45 Supervised Machine Learning training set (also called) hypothesis
46 Supervised Machine Learning training set #words in common Sentence Similarity (also called) hypothesis
47 Supervised Machine Learning training set x (estimated) y #words in common Sentence Similarity (also called) hypothesis
48 Linear Regression: Model Representation How to represent h? training set h θ (x) = θ 0 +θ 1 x x (estimated) y Linear Regression w/ one variable hypothesis (h)
49 Linear Regression w/ one variable: Model Representation #words in common (x) Sentence Similarity (y) h θ (x) = θ 0 +θ 1 x m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) θ s: parameters Source: many following slides are adapted from Andrew Ng
50 Linear Regression w/ one variable: Model Representation #words in common (x) Sentence Similarity (y) h θ (x) = θ 0 +θ 1 x m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) - θ s: parameters How to choose θ?
51 Linear Regression w/ one variable:: Model Representation 3' 3' 3' 2' 2' 2' 1' 1' 1' 0' 0' 1' 2' 3' 0' 0' 1' 2' 3' 0' 0' 1' 2' 3' Andrew'Ng'
52 Linear Regression w/ one variable: Cost Function 5 Sentence Similarity (rated by Human) #words in common (feature) Idea: choose θ0, θ1 so that hθ(x) is close to y for training examples (x, y)
53 Linear Regression w/ one variable: Cost Function 5 Sentence Similarity (rated by Human) #words in common (feature) Idea: choose θ0, θ1 so that hθ(x) is close to y for training examples (x, y)
54 Linear Regression w/ one variable: Cost Function 5 Sentence Similarity (rated by Human) #words in common (feature) squared error function: J(θ 0,θ 1 ) = 1 m 2m (h θ (x (i) ) y (i) ) 2 i=1 Idea: choose θ0, θ1 so that hθ(x) is close to y for training examples (x, y) minimize θ 0, θ 1 J(θ 0,θ 1 )
55 Linear Regression Hypothesis: h θ (x) = θ 0 +θ 1 x Parameters: θ0, θ1 Cost Function: J(θ 0,θ 1 ) = 1 Goal: ' m 2m (h θ (x(i) ) y (i) ) 2 i=1 minimize θ 0, θ 1 J(θ 0,θ 1 )
56 Hypothesis: Linear Regression Simplified h θ (x) = θ 0 +θ 1 x h θ (x) = θ 1 x Parameters: θ0, θ1 θ1 Cost Function: J(θ 0,θ 1 ) = 1 Goal: ' m 2m (h θ (x(i) ) y (i) ) 2 i=1 minimize θ 0, θ 1 J(θ 0,θ 1 ) J(θ 1 ) = 1 m 2m (h θ (x (i) ) y (i) ) 2 i=1 minimize θ 1 J(θ 1 )
57 h θ (x) (for fixed θ1, this is a function of x ) J(θ 1 ) (function of the parameter θ1 ) 3 3 y 2 1 θ1=1 J(θ1) x J(1) = [(1 1)2 + (2 2) 2 + (3 3) 2 ] = θ1
58 h θ (x) (for fixed θ1, this is a function of x ) J(θ 1 ) (function of the parameter θ1 ) 3 3 y 2 1 θ1=0.5 J(θ1) x J(1) = [(0.5 1)2 + (1 2) 2 + (1.5 3) 2 ] = θ1
59 h θ (x) (for fixed θ1, this is a function of x ) J(θ 1 ) (function of the parameter θ1 ) 3 3 y 2 1 J(θ1) x θ1 minimize θ 1 J(θ 1 )
60 h θ (x) J(θ 0,θ 1 ) (for fixed θ0, θ1, this is a function of x ) (function of the parameter θ0, θ1 ) J(θ1, θ2) θ1 θ0
61 Hypothesis: Linear Regression Simplified h θ (x) = θ 0 +θ 1 x h θ (x) = θ 1 x Parameters: θ0, θ1 θ1 Cost Function: J(θ 0,θ 1 ) = 1 Goal: ' m 2m (h θ (x(i) ) y (i) ) 2 i=1 minimize θ 0, θ 1 J(θ 0,θ 1 ) J(θ 1 ) = 1 m 2m (h θ (x (i) ) y (i) ) 2 i=1 minimize θ 1 J(θ 1 )
62 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) contour plot Andrew'Ng'
63 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
64 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
65 Parameter Learning Have some function J(θ 0,θ 1 ) Want min J(θ,θ ) 0 1 θ 0, θ 1 Outline: - Start with some θ0, θ1 - Keep changing θ0, θ1 to reduce J(θ1, θ2) until we hopefully end up at a minimum
66 Gradient Descent 3 J(θ1) θ1 minimize θ 1 J(θ 1 )
67 Gradient Descent 3 J(θ1) 2 1 θ 1 := θ 1 α θ 1 J(θ 1 ) θ1 learning rate minimize θ 1 J(θ 1 )
68 Gradient Descent J(θ 0,θ 1 ) " θ 1" θ 0" minimize θ 0,θ 1 J(θ 0,θ 1 ) Andrew'Ng'
69 Gradient Descent J(θ 0,θ 1 ) " θ 0" θ 1" minimize θ 0,θ 1 J(θ 0,θ 1 ) Andrew'Ng'
70 Gradient Descent repeat until convergence { θ j := θ j α θ j J(θ 0,θ 1 ) (simultaneous update for j=0 and j=1) } learning rate
71 Linear Regression w/ one variable: Gradient Descent Cost Function h θ (x) = θ 0 +θ 1 x J(θ 0,θ 1 ) = 1 m 2m (h θ (x(i) ) y (i) ) 2 i=1 θ 0 J(θ 0,θ 1 ) =? θ 1 J(θ 0,θ 1 ) =?
72 Linear Regression w/ one variable: Gradient Descent repeat until convergence { } m simultaneous θ 0 := θ 0 α 1 m (h θ (x (i) ) y (i) ) i=1 m i=1 θ := θ α m (h (x (i) ) y (i) ) x (i) θ update θ0, θ1
73 Linear Regression cost function is convex J(θ1, θ2) θ1 θ0
74 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
75 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
76 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
77 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
78 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
79 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
80 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
81 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
82 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'
83 Batch Update Each step of gradient descent uses all the training examples
84 (Recap) Linear Regression 5 Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)
85 (Recap) Linear Regression paraphrase{ Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)
86 (Recap) Linear Regression 5 { paraphrase { non-paraphrase Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)
87 (Recap) Hypothesis: Linear Regression h θ (x) = θ 0 +θ 1 x Parameters: J(θ1, θ2) θ0, θ1 Cost Function: J(θ 0,θ 1 ) = 1 Goal: m 2m (h θ (x(i) ) y (i) ) 2 i=1 minimize θ 0, θ 1 J(θ 0,θ 1 ) θ1 θ0
88 (Recap) Gradient Descent repeat until convergence { θ j := θ j α θ j J(θ 0,θ 1 ) (simultaneous update for j=0 and j=1) } learning rate
89 Next Class: - Logistic Regression (cont ) socialmedia-class.org
10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationLecture 11 Linear regression
Advanced Algorithms Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2013-2014 Lecture 11 Linear regression These slides are taken from Andrew Ng, Machine Learning
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationWhat s so Hard about Natural Language Understanding?
What s so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationLogistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationOnline Advertising is Big Business
Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationStatistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields
Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationPredicting English keywords from Java Bytecodes using Machine Learning
Predicting English keywords from Java Bytecodes using Machine Learning Pablo Ariel Duboue Les Laboratoires Foulab (Montreal Hackerspace) 999 du College Montreal, QC, H4C 2S3 REcon June 15th, 2012 Outline
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationLearning Features from Co-occurrences: A Theoretical Analysis
Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 17 2019 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Jan. 25, 2019 Office
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/22/2010 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements W7 due tonight [this is your last written for
More informationTraining the linear classifier
215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationRegularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018
1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: multi-class Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due in a week Midterm:
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More informationDM-Group Meeting. Subhodip Biswas 10/16/2014
DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationMachine Learning: Assignment 1
10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 7: Testing (Feb 12, 2019) David Bamman, UC Berkeley Significance in NLP You develop a new method for text classification; is it better than what comes
More informationSpatial Role Labeling CS365 Course Project
Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationMachine Learning (CS 567) Lecture 3
Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationInstance-based Domain Adaptation
Instance-based Domain Adaptation Rui Xia School of Computer Science and Engineering Nanjing University of Science and Technology 1 Problem Background Training data Test data Movie Domain Sentiment Classifier
More informationLogistic Regression. William Cohen
Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting
More informationLogistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor
Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationMulti-theme Sentiment Analysis using Quantified Contextual
Multi-theme Sentiment Analysis using Quantified Contextual Valence Shifters Hongkun Yu, Jingbo Shang, MeichunHsu, Malú Castellanos, Jiawei Han Presented by Jingbo Shang University of Illinois at Urbana-Champaign
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationtotal 58
CS687 Exam, J. Košecká Name: HONOR SYSTEM: This examination is strictly individual. You are not allowed to talk, discuss, exchange solutions, etc., with other fellow students. Furthermore, you are only
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationDepartment of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling
Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationBayesian Paragraph Vectors
Bayesian Paragraph Vectors Geng Ji 1, Robert Bamler 2, Erik B. Sudderth 1, and Stephan Mandt 2 1 Department of Computer Science, UC Irvine, {gji1, sudderth}@uci.edu 2 Disney Research, firstname.lastname@disneyresearch.com
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationLogistic Regression Trained with Different Loss Functions. Discussion
Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationlecture 6: modeling sequences (final part)
Natural Language Processing 1 lecture 6: modeling sequences (final part) Ivan Titov Institute for Logic, Language and Computation Outline After a recap: } Few more words about unsupervised estimation of
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationThe Perceptron. Volker Tresp Summer 2014
The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a
More informationLecture 4 Logistic Regression
Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd
More informationExperiment 1: Linear Regression
Experiment 1: Linear Regression August 27, 2018 1 Description This first exercise will give you practice with linear regression. These exercises have been extensively tested with Matlab, but they should
More informationConditional Random Field
Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More information