Social Media & Text Analysis

Size: px
Start display at page:

Download "Social Media & Text Analysis"

Transcription

1 Social Media & Text Analysis lecture 5 - Paraphrase Identification and Logistic Regression CSE Ohio State University Instructor: Wei Xu Website: socialmedia-class.org

2 In-class Presentation pick your topic and sign up a 10 minute presentation (20 points) - A Social Media Platform - Or a NLP Researcher

3 Reading #6 & Quiz #3 socialmedia-class.org

4 Mini Research Proposal propose/explore NLP problems in GitHub dataset

5 Mini Research Proposal propose/explore NLP problems in GitHub dataset GitHub: - a social network for programmers (sharing, collaboration, bug tracking, etc.) - hosting Git repositories (a version control system that tracks changes to files, usually source code) - containing potentially interesting text fields in natural language (comments, issues, etc.)

6 (Recap) what is Paraphrase? sentences or phrases that convey approximately the same meaning using different words (Bhagat & Hovy, 2012)

7 (Recap) what is Paraphrase? sentences or phrases that convey approximately the same meaning using different words (Bhagat & Hovy, 2012) wealthy word rich

8 (Recap) what is Paraphrase? sentences or phrases that convey approximately the same meaning using different words (Bhagat & Hovy, 2012) wealthy the king s speech word phrase rich His Majesty s address

9 (Recap) what is Paraphrase? sentences or phrases that convey approximately the same meaning using different words (Bhagat & Hovy, 2012) wealthy the king s speech the forced resignation of the CEO of Boeing, Harry Stonecipher, for word phrase sentence rich His Majesty s address after Boeing Co. Chief Executive Harry Stonecipher was ousted from

10 The Ideal

11 (Recap) Paraphrase Research 80s WordNet 01 Web '01 Novels '04 News '05 '13 Bi-Text '11 Video '12* Style '13* '14* Twitter '15*'16* Simple Xu Ritter Dolan Grishman Cherry Xu Ritter Callison-Burch Dolan Ji Xu Callison-Burch Napoles * my research Wei Xu. Data-driven Approaches for Paraphrasing Across Language Variations PhD Thesis (2014)

12 DIRT (Discovery of Inference Rules from Text) Lin and Panel (2001) operationalize the Distributional Hypothesis using dependency relationships to define similar environments. Duty and responsibility share a similar set of dependency contexts in large volumes of text: modified by adjectives additional, administrative, assigned, assumed, collective, congressional, constitutional... objects of verbs assert, assign, assume, attend to, avoid, become, breach... Decking Lin and Patrick Pantel. DIRT - Discovery of Inference Rules from Text In KDD (2001)

13 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch

14 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch

15 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch

16 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch

17 Bilingual Pivoting word alignment... 5 farmers were thrown into jail in Ireland fünf Landwirte festgenommen, weil oder wurden festgenommen, gefoltert or have been imprisoned, tortured... Source: Chris Callison-Burch

18 Quiz #2 Key Limitations of PPDB?

19 Quiz #2 word sense microbe, virus, bacterium, germ, parasite insect, beetle, pest, mosquito, fly bother, annoy, pester bug microphone, tracker, mic, wire, earpiece, cookie glitch, error, malfunction, fault, failure squealer, snitch, rat, mole Source: Chris Callison-Burch

20 Another Key Limitation 80s WordNet 01 Web '01 Novels '04 News '05 '13 Bi-Text '11 Video '12* Style '13* '14* Twitter '15*'16* Simple only paraphrases, no non-paraphrases * my research Wei Xu. Data-driven Approaches for Paraphrasing Across Language Variations PhD Thesis (2014)

21 Paraphrase Identification obtain sentential paraphrases automatically Mancini has been sacked by Manchester City Mancini gets the boot from Man City Yes!% WORLD OF JENKS IS ON AT 11 No!$ World of Jenks is my favorite show on tv (meaningful) non-paraphrases are needed to train classifiers! Wei Xu, Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. Extracting Lexically Divergent Paraphrases from Twitter In

22 Also Non-Paraphrases 80s WordNet 01 Web '01 Novels '04 News '05 '13 Bi-Text '11 Video '12* Style '13* '14* Twitter '15*'16* Simple (meaningful) non-paraphrases are needed to train classifiers! * my research Wei Xu. Data-driven Approaches for Paraphrasing Across Language Variations PhD Thesis (2014)

23 News Paraphrase Corpus Microsoft Research Paraphrase Corpus also contains some non-paraphrases (Dolan, Quirk and Brockett, 2004; Dolan and Brockett, 2005; Brockett and Dolan, 2005)

24 Twitter Paraphrase Corpus also contains a lot of non-paraphrases Wei Xu, Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. Extracting Lexically Divergent Paraphrases from Twitter In TACL (2014)

25 Paraphrase Identification: A Binary Classification Problem Input: - a sentence pair x - a fixed set of binary classes Y = {0, 1} Output: - a predicted class y Y (y = 0 or y = 1)

26 Paraphrase Identification: A Binary Classification Problem Input: - a sentence pair x negative (non-paraphrases) - a fixed set of binary classes Y = {0, 1} Output: - a predicted class y Y (y = 0 or y = 1)

27 Paraphrase Identification: A Binary Classification Problem Input: - a sentence pair x negative (non-paraphrases) - a fixed set of binary classes Y = {0, 1} positive (paraphrases) Output: - a predicted class y Y (y = 0 or y = 1)

28 Paraphrase Identification: A Binary Classification Problem Input: - a sentence pair x - a fixed set of binary classes Y = {0, 1} Output: - a predicted class y Y (y = 0 or y = 1)

29 Classification Method: Supervised Machine Learning Input: - a sentence pair x - a fixed set of binary classes Y = {0, 1} - a training set of m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) Output: - a learned classifier γ: x y Y (y = 0 or y = 1)

30 Classification Method: Supervised Machine Learning Input: - a sentence pair x (represented by features) - a fixed set of binary classes Y = {0, 1} - a training set of m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) Output: - a learned classifier γ: x y Y (y = 0 or y = 1)

31 (Recap Week #3) Classification Method: Supervised Machine Learning Naïve Bayes Logistic Regression Support Vector Machines (SVM)

32 (Recap Week#3) Cons: Naïve Bayes features ti are assumed independent given the class y P(t 1,t 2,...,t n y) = P(t 1 y) P(t 2 y)... P(t n y) This will cause problems: - correlated features double-counted evidence - while parameters are estimated independently - hurt classier s accuracy

33 Classification Method: Supervised Machine Learning Naïve Bayes Logistic Regression Support Vector Machines (SVM)

34 Logistic Regression One of the most useful supervised machine learning algorithm for classification! Generally high performance for a lot of problems. Much more robust than Naïve Bayes (better performance on various datasets).

35 Before Logistic Regression Let s start with something simpler!

36 Paraphrase Identification: Simplified Features We use only one feature: - number of words that two sentence shared in common

37 A very related problem of Paraphrase Identification: Semantic Textual Similarity How similar (close in meaning) two sentences are? 5: completely equivalent in meaning 4: mostly equivalent, but some unimportant details differ 3: roughly equivalent, some important information differs/missing 2: not equivalent, but share some details 1: not equivalent, but are on the same topic 0: completely dissimilar

38 A Simpler Model: Linear Regression 5 Sentence Similarity (rated by Human) #words in common (feature)

39 A Simpler Model: Linear Regression 5 Sentence Similarity #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)

40 A Simpler Model: Linear Regression 5 Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)

41 A Simpler Model: Linear Regression paraphrase{ Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)

42 A Simpler Model: Linear Regression 5 { paraphrase { non-paraphrase Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)

43 Training Set #words in common (x) Sentence Similarity (y) m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) - x s: input variable / features - y s: output / target variable

44 (Recap Week#3) Supervised Machine Learning training set Source: NLTK Book

45 Supervised Machine Learning training set (also called) hypothesis

46 Supervised Machine Learning training set #words in common Sentence Similarity (also called) hypothesis

47 Supervised Machine Learning training set x (estimated) y #words in common Sentence Similarity (also called) hypothesis

48 Linear Regression: Model Representation How to represent h? training set h θ (x) = θ 0 +θ 1 x x (estimated) y Linear Regression w/ one variable hypothesis (h)

49 Linear Regression w/ one variable: Model Representation #words in common (x) Sentence Similarity (y) h θ (x) = θ 0 +θ 1 x m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) θ s: parameters Source: many following slides are adapted from Andrew Ng

50 Linear Regression w/ one variable: Model Representation #words in common (x) Sentence Similarity (y) h θ (x) = θ 0 +θ 1 x m hand-labeled sentence pairs (x (1), y (1) ),, (x (m), y (m) ) - θ s: parameters How to choose θ?

51 Linear Regression w/ one variable:: Model Representation 3' 3' 3' 2' 2' 2' 1' 1' 1' 0' 0' 1' 2' 3' 0' 0' 1' 2' 3' 0' 0' 1' 2' 3' Andrew'Ng'

52 Linear Regression w/ one variable: Cost Function 5 Sentence Similarity (rated by Human) #words in common (feature) Idea: choose θ0, θ1 so that hθ(x) is close to y for training examples (x, y)

53 Linear Regression w/ one variable: Cost Function 5 Sentence Similarity (rated by Human) #words in common (feature) Idea: choose θ0, θ1 so that hθ(x) is close to y for training examples (x, y)

54 Linear Regression w/ one variable: Cost Function 5 Sentence Similarity (rated by Human) #words in common (feature) squared error function: J(θ 0,θ 1 ) = 1 m 2m (h θ (x (i) ) y (i) ) 2 i=1 Idea: choose θ0, θ1 so that hθ(x) is close to y for training examples (x, y) minimize θ 0, θ 1 J(θ 0,θ 1 )

55 Linear Regression Hypothesis: h θ (x) = θ 0 +θ 1 x Parameters: θ0, θ1 Cost Function: J(θ 0,θ 1 ) = 1 Goal: ' m 2m (h θ (x(i) ) y (i) ) 2 i=1 minimize θ 0, θ 1 J(θ 0,θ 1 )

56 Hypothesis: Linear Regression Simplified h θ (x) = θ 0 +θ 1 x h θ (x) = θ 1 x Parameters: θ0, θ1 θ1 Cost Function: J(θ 0,θ 1 ) = 1 Goal: ' m 2m (h θ (x(i) ) y (i) ) 2 i=1 minimize θ 0, θ 1 J(θ 0,θ 1 ) J(θ 1 ) = 1 m 2m (h θ (x (i) ) y (i) ) 2 i=1 minimize θ 1 J(θ 1 )

57 h θ (x) (for fixed θ1, this is a function of x ) J(θ 1 ) (function of the parameter θ1 ) 3 3 y 2 1 θ1=1 J(θ1) x J(1) = [(1 1)2 + (2 2) 2 + (3 3) 2 ] = θ1

58 h θ (x) (for fixed θ1, this is a function of x ) J(θ 1 ) (function of the parameter θ1 ) 3 3 y 2 1 θ1=0.5 J(θ1) x J(1) = [(0.5 1)2 + (1 2) 2 + (1.5 3) 2 ] = θ1

59 h θ (x) (for fixed θ1, this is a function of x ) J(θ 1 ) (function of the parameter θ1 ) 3 3 y 2 1 J(θ1) x θ1 minimize θ 1 J(θ 1 )

60 h θ (x) J(θ 0,θ 1 ) (for fixed θ0, θ1, this is a function of x ) (function of the parameter θ0, θ1 ) J(θ1, θ2) θ1 θ0

61 Hypothesis: Linear Regression Simplified h θ (x) = θ 0 +θ 1 x h θ (x) = θ 1 x Parameters: θ0, θ1 θ1 Cost Function: J(θ 0,θ 1 ) = 1 Goal: ' m 2m (h θ (x(i) ) y (i) ) 2 i=1 minimize θ 0, θ 1 J(θ 0,θ 1 ) J(θ 1 ) = 1 m 2m (h θ (x (i) ) y (i) ) 2 i=1 minimize θ 1 J(θ 1 )

62 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) contour plot Andrew'Ng'

63 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

64 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

65 Parameter Learning Have some function J(θ 0,θ 1 ) Want min J(θ,θ ) 0 1 θ 0, θ 1 Outline: - Start with some θ0, θ1 - Keep changing θ0, θ1 to reduce J(θ1, θ2) until we hopefully end up at a minimum

66 Gradient Descent 3 J(θ1) θ1 minimize θ 1 J(θ 1 )

67 Gradient Descent 3 J(θ1) 2 1 θ 1 := θ 1 α θ 1 J(θ 1 ) θ1 learning rate minimize θ 1 J(θ 1 )

68 Gradient Descent J(θ 0,θ 1 ) " θ 1" θ 0" minimize θ 0,θ 1 J(θ 0,θ 1 ) Andrew'Ng'

69 Gradient Descent J(θ 0,θ 1 ) " θ 0" θ 1" minimize θ 0,θ 1 J(θ 0,θ 1 ) Andrew'Ng'

70 Gradient Descent repeat until convergence { θ j := θ j α θ j J(θ 0,θ 1 ) (simultaneous update for j=0 and j=1) } learning rate

71 Linear Regression w/ one variable: Gradient Descent Cost Function h θ (x) = θ 0 +θ 1 x J(θ 0,θ 1 ) = 1 m 2m (h θ (x(i) ) y (i) ) 2 i=1 θ 0 J(θ 0,θ 1 ) =? θ 1 J(θ 0,θ 1 ) =?

72 Linear Regression w/ one variable: Gradient Descent repeat until convergence { } m simultaneous θ 0 := θ 0 α 1 m (h θ (x (i) ) y (i) ) i=1 m i=1 θ := θ α m (h (x (i) ) y (i) ) x (i) θ update θ0, θ1

73 Linear Regression cost function is convex J(θ1, θ2) θ1 θ0

74 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

75 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

76 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

77 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

78 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

79 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

80 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

81 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

82 h θ (x) (for fixed θ0, θ1, this is a function of x ) J(θ 0,θ 1 ) (function of the parameter θ0, θ1 ) Andrew'Ng'

83 Batch Update Each step of gradient descent uses all the training examples

84 (Recap) Linear Regression 5 Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)

85 (Recap) Linear Regression paraphrase{ Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)

86 (Recap) Linear Regression 5 { paraphrase { non-paraphrase Sentence Similarity threshold Classification #words in common (feature) also supervised learning (learn from annotated data) but for Regression: predict real-valued output (Classification: predict discrete-valued output)

87 (Recap) Hypothesis: Linear Regression h θ (x) = θ 0 +θ 1 x Parameters: J(θ1, θ2) θ0, θ1 Cost Function: J(θ 0,θ 1 ) = 1 Goal: m 2m (h θ (x(i) ) y (i) ) 2 i=1 minimize θ 0, θ 1 J(θ 0,θ 1 ) θ1 θ0

88 (Recap) Gradient Descent repeat until convergence { θ j := θ j α θ j J(θ 0,θ 1 ) (simultaneous update for j=0 and j=1) } learning rate

89 Next Class: - Logistic Regression (cont ) socialmedia-class.org

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Lecture 11 Linear regression

Lecture 11 Linear regression Advanced Algorithms Floriano Zini Free University of Bozen-Bolzano Faculty of Computer Science Academic Year 2013-2014 Lecture 11 Linear regression These slides are taken from Andrew Ng, Machine Learning

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

What s so Hard about Natural Language Understanding?

What s so Hard about Natural Language Understanding? What s so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Logistic Regression Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please start HW 1 early! Questions are welcome! Two principles for estimating parameters Maximum Likelihood

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Online Advertising is Big Business

Online Advertising is Big Business Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi Binary classification In binary classification problems, the target y can take on at only two values. In this set of notes, we show how to model this problem

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Learning Lab Course 2017 (Deep Learning Practical) Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

IFT Lecture 7 Elements of statistical learning theory

IFT Lecture 7 Elements of statistical learning theory IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information

Predicting English keywords from Java Bytecodes using Machine Learning

Predicting English keywords from Java Bytecodes using Machine Learning Predicting English keywords from Java Bytecodes using Machine Learning Pablo Ariel Duboue Les Laboratoires Foulab (Montreal Hackerspace) 999 du College Montreal, QC, H4C 2S3 REcon June 15th, 2012 Outline

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Learning Features from Co-occurrences: A Theoretical Analysis

Learning Features from Co-occurrences: A Theoretical Analysis Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences

More information

CSE 546 Midterm Exam, Fall 2014

CSE 546 Midterm Exam, Fall 2014 CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 17 2019 Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Jan. 25, 2019 Office

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/22/2010 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements W7 due tonight [this is your last written for

More information

Training the linear classifier

Training the linear classifier 215, Training the linear classifier A natural way to train the classifier is to minimize the number of classification errors on the training data, i.e. choosing w so that the training error is minimized.

More information

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Machine Learning, Midterm Exam: Spring 2009 SOLUTION 10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 9 Slides adapted from Jordan Boyd-Graber Machine Learning: Chenhao Tan Boulder 1 of 39 Recap Supervised learning Previously: KNN, naïve

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement

More information

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018 1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: multi-class Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due in a week Midterm:

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)

More information

Least Mean Squares Regression

Least Mean Squares Regression Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method

More information

DM-Group Meeting. Subhodip Biswas 10/16/2014

DM-Group Meeting. Subhodip Biswas 10/16/2014 DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Machine Learning: Assignment 1

Machine Learning: Assignment 1 10-701 Machine Learning: Assignment 1 Due on Februrary 0, 014 at 1 noon Barnabas Poczos, Aarti Singh Instructions: Failure to follow these directions may result in loss of points. Your solutions for this

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 7: Testing (Feb 12, 2019) David Bamman, UC Berkeley Significance in NLP You develop a new method for text classification; is it better than what comes

More information

Spatial Role Labeling CS365 Course Project

Spatial Role Labeling CS365 Course Project Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Machine Learning (CS 567) Lecture 3

Machine Learning (CS 567) Lecture 3 Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Instance-based Domain Adaptation

Instance-based Domain Adaptation Instance-based Domain Adaptation Rui Xia School of Computer Science and Engineering Nanjing University of Science and Technology 1 Problem Background Training data Test data Movie Domain Sentiment Classifier

More information

Logistic Regression. William Cohen

Logistic Regression. William Cohen Logistic Regression William Cohen 1 Outline Quick review classi5ication, naïve Bayes, perceptrons new result for naïve Bayes Learning as optimization Logistic regression via gradient ascent Over5itting

More information

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Multi-theme Sentiment Analysis using Quantified Contextual

Multi-theme Sentiment Analysis using Quantified Contextual Multi-theme Sentiment Analysis using Quantified Contextual Valence Shifters Hongkun Yu, Jingbo Shang, MeichunHsu, Malú Castellanos, Jiawei Han Presented by Jingbo Shang University of Illinois at Urbana-Champaign

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Binary Classification / Perceptron

Binary Classification / Perceptron Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data

More information

total 58

total 58 CS687 Exam, J. Košecká Name: HONOR SYSTEM: This examination is strictly individual. You are not allowed to talk, discuss, exchange solutions, etc., with other fellow students. Furthermore, you are only

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

ML4NLP Multiclass Classification

ML4NLP Multiclass Classification ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Midterm, Fall 2003

Midterm, Fall 2003 5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are

More information

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Bayesian Paragraph Vectors

Bayesian Paragraph Vectors Bayesian Paragraph Vectors Geng Ji 1, Robert Bamler 2, Erik B. Sudderth 1, and Stephan Mandt 2 1 Department of Computer Science, UC Irvine, {gji1, sudderth}@uci.edu 2 Disney Research, firstname.lastname@disneyresearch.com

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework

More information

Logistic Regression Trained with Different Loss Functions. Discussion

Logistic Regression Trained with Different Loss Functions. Discussion Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Naïve Bayes, Maxent and Neural Models

Naïve Bayes, Maxent and Neural Models Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

lecture 6: modeling sequences (final part)

lecture 6: modeling sequences (final part) Natural Language Processing 1 lecture 6: modeling sequences (final part) Ivan Titov Institute for Logic, Language and Computation Outline After a recap: } Few more words about unsupervised estimation of

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

The Perceptron. Volker Tresp Summer 2014

The Perceptron. Volker Tresp Summer 2014 The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a

More information

Lecture 4 Logistic Regression

Lecture 4 Logistic Regression Lecture 4 Logistic Regression Dr.Ammar Mohammed Normal Equation Hypothesis hθ(x)=θ0 x0+ θ x+ θ2 x2 +... + θd xd Normal Equation is a method to find the values of θ operations x0 x x2.. xd y x x2... xd

More information

Experiment 1: Linear Regression

Experiment 1: Linear Regression Experiment 1: Linear Regression August 27, 2018 1 Description This first exercise will give you practice with linear regression. These exercises have been extensively tested with Matlab, but they should

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

More information

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012) Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation

More information