CS 175, Project in Artificial Intelligence Lecture 3: Document Classification

Size: px
Start display at page:

Download "CS 175, Project in Artificial Intelligence Lecture 3: Document Classification"

Transcription

1 CS 175, Project in Artificial Intelligence Lecture 3: Document Classification Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California, Irvine

2 2 Announcements Assignment 1: completed Assignment 2: Text Classification Due by 5pm Wednesday next week Project Proposals Due 2 weeks from Friday Will discuss project ideas and proposals over the next 2 weeks Today s lecture Possible project topics Document classification

3 3 Assignment 2 Use scikit-learn (Python library) to investigate document classification create bag-of-words representations generate training and test data sets experiment with different classifiers (naïve Bayes, logistic regression, knn) Submit modified assignment2.py file. (Wed next week, 5pm) Please read the assignment and submit any questions via Piazza Office hours over the next week Eric: Thursday 1 to 3, and next Tuesday 1 to 3 (new) Instructor: Friday 9:30 to 11:30

4 4

5 5 Background Reading: useful for Project Ideas Links on Class Web site: Tutorial Articles Software and Demos for Text Analysis Data Sets Reference books on text/natural language Introduction to Information Retrieval, Speech and Language Processing, Mining the Social Web (2nd Edition), by Matthew Russell, O'Reilly Media, (O Reilly books are available for free online via the UCI Library's subscription to Safari Books Online ( Reference books on machine learning Hands-On Machine Learning with Scikit-Learn and TensorFlow A Course in Machine Learning, Deep Learning,

6 6

7 7 Text Analysis Techniques Classification: automatically assign a document to 1 or more categories e.g., is an spam or non-spam? is a review positive or negative? Clustering/Topic Discovery: Group a set of documents into clusters, discover themes/topics in documents e.g., automatically group documents in search results Word prediction e.g., predict the next word for typing on a mobile device Chatbots and Text Synthesis e.g., automatically generate new text, e.g., in response to human dialog Information Extraction: Extract mentions of entities from documents e.g., tag news articles with mentions of companies and products

8 8 Figure 5.1 from the NLTK book showing the results of matching strings to a geographic dictionary. Illustrates clearly why dictionary look-up is not sufficient for entity recognition!

9 9 Project Topic: Sentiment or Emotion Prediction from Text Problem: Given text from a short document (e.g., Tweets) investigate classification methods to predict sentiment (e.g., positive/negative) or emotions (e.g., anger) from short documents. (e.g., Tweets) Possible Data Sources Labeled Tweet data sets (positive/negative or other emotion) Evaluation Accuracy, precision/recall, etc, on test data set or using cross-validation Comments Add additional aspects so that project is not too simple, needs to be interesting Additional Reading See links to Tutorial Articles, Data Sets, Software Demos on class Web page

10 10 Project Topic: Predicting Review Scores from Text Problem: Given text from a review (e.g., product, movie, restaurant) investigate machine learning methods to predict the numerical score of a review (e.g., 1, 2, 3, 4, 5) given the text of the review Possible Data Sources Yelp Challenge data sets, Amazon product review data sets, etc Evaluation Accuracy, precision/recall, etc, on test data set or using cross-validation Comments Could start with binary classification: {1 or 2} versus {4 or 5} Additional Reading See links to Tutorial Articles, Data Sets, Software Demos on class Web page

11 11

12 12 Histogram of Review Lengths

13 13 Project Topic: Summarizing Aspects of Review Text Problem: Given text from a set of reviews (e.g., product, movie, restaurant) investigate information extraction methods to automatically extract and summarize sentiment for different aspects of the reviews, e.g., price, service, quality of food, etc Possible Data Sources Yelp Challenge data sets, Amazon product review data sets, etc Evaluation User studies: is output of method A better than method B Evaluation techniques for text summarization such as BLEU scores Comments Evaluation is difficult for a problem like this Additional Reading See links to Tutorial Articles, Data Sets, Software Demos on class Web page

14 14

15 15 Project Topic: Chatbot Problem: Given a sentence generate an appropriate response sentence Possible Data Sources Transcripts of spoken or written dialog (e.g., Switchboard or Ubuntu corpus) Evaluation User studies, e.g., human judges responses of algorithm A v algorithm B Comments This is a difficult problem to do well on partial success would be fine J Additional Reading Chapter on Chatbots in Jurafsky text book (class Web site) See also Proceedings of the Amazon Alexa Prize competition in 2017 (online)

16 16

17 17 Project Topic: Text Generation/Simulation Problem: Given text from a particular author or source, generate new simulated text with the same style as the author or source Possible Data Sources Fiction from different authors, speeches, songs, poetry, movie scripts Evaluation User studies, e.g., human judges quality of output of algorithm A v algorithm B Comments This is easier than a Chatbot (more open loop ) Evaluation is tricky: how do you avoid generating text very similar to original? Additional Reading See links to Tutorial Articles, Software Demos on class Web page

18 18 Output from a Model Learned on Source Code Examples from The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Kaparthy, blog,

19 19 Output from a Model Learned on Mathematics Papers Examples from The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Kaparthy, blog,

20 20 Important Components of Projects Clear definition of the problem: think of inputs, outputs, pipelines Data Make sure you will be able data you need, e.g., labeled data for classifications Self-written components Which parts of the code will you write and what will be existing code? Evaluation How will you evaluate the quality of your system? Think of ways to compare version A versus B Run-time Do you want a system/demo that can run in real-time, or one that operates offline? Different design decisions for each.

21 21 Project Tips: Plan in Stages Plan your project in stages so that the overall project is not dependent on the riskier elements working Example: PHASE 1 Original Documents Standard Bag of Words Standard Logistic Regression Cross- Validation Experiments

22 22 Project Tips: Plan in Stages Plan your project in stages so that the overall project is not dependent on the riskier elements working Example: PHASE 1 Original Documents Standard Bag of Words Standard Logistic Regression Cross- Validation Experiments PHASE 2 Bag of Phrases (ngrams)

23 23 Project Tips: Plan in Stages Plan your project in stages so that the overall project is not dependent on the riskier elements working Example: PHASE 1 Original Documents Standard Bag of Words Standard Logistic Regression Cross- Validation Experiments PHASE 2 Bag of Phrases (ngrams) PHASE 3 Deep Neural Network

24 24 Overview of Text Classification

25 25 Text Classification Text classification has many applications Spam detection Classifying news articles, e.g., Google News Classifying Web pages into categories

26 26 Text Classification Text classification has many applications Spam detection Classifying news articles, e.g., Google News Classifying Web pages into categories Data Representation Bag of words/terms most commonly used: either counts or binary Can also use other weighting and additional features (e.g., metadata)

27 27 Text Classification Text classification has many applications Spam detection Classifying news articles, e.g., Google News Classifying Web pages into categories Data Representation Bag of words/terms most commonly used: either counts or binary Can also use other weighting and additional features (e.g., metadata) Classification Methods Naïve Bayes widely used baseline Fast and reasonably accurate Logistic Regression Widely used in industry, accurate, excellent baseline Neural networks and deep learning Can be very accurate Can require very large amounts of labeled training data More complex than other methods

28 28 Example of Document by Term Matrix predict finance stocks goal score team Class Label d d d d d d d d d d

29 29 Example of Document by Term Matrix predict finance Note: we use term to allow for multi-word terms, stocks i.e., n-grams, goal e.g., Santa score Barbara team and New York Class Label d d d d d d d d d d

30 30 Real Example from Yelp Data Yelp Dataset Number of Reviews 706,693 Number of Reviews w/o Neutral Rating 595,468 Number of Tokens 85,392,376 Vocabulary Size w/o Stopwords 176,114 Array Dimensions (595468, ) Number of cells in the Array 104,870,251,352 Non-zero entries 28,357,001 Density

31 31 Training and Prediction Terms Labeled Documents Training Data (used to learn the model) Class Labels are Known Unlabeled Documents Future Data (using the model to make predictions) Class Labels are Unknown

32 32 Example of a Pipeline for Document Classification Training Documents (corpus) Tokenization Lists of Tokens Stopword and rare word removal Vocabulary Bag of Words Frequency Counts Machine Learning Algorithm Document Classifier

33 33 Example of a Pipeline for Document Classification Training Documents (corpus) Tokenization Lists of Tokens Stopword and rare word removal Vocabulary Bag of Words Frequency Counts Machine Learning Algorithm Tokenization Lists of Tokens Bag of Words Document Classifier Label Prediction New Document

34 34 Key Steps in Document Analysis Pipelines (for Bag of Words) Tokenization Various options (e.g., with punctuation, non alphanumeric symbols, etc) Vocabulary definition N-grams, stopword removal, rare word removal, stemming Feature definition Binary (term present or not?) Counts Weighted counts, e.g., TD-IDF (see later in the slides) Classifier selection Naïve Bayes, logistic, SVMs, neural networks, etc

35 35 Example of Document by Term Matrix (count version) predict finance stocks goal score team Class Label d d d d d d d d d d

36 36 Example of Document by Term Matrix (binary version) predict finance stocks goal score team Class Label d d d d d d d d d d

37 37 TF-IDF Weighting of Features In practice the inputs can be weighted It can be helpful to use TF-IDF weights instead of counts TF(t,d) = term frequency = count = number of times term t occurs in doc d IDF(t,d) = inverse document frequency = log ( N / number of docs with term t) (where N = total number of docs in the corpus) TF-IDF(t,d) = TF(t,d) * IDF(t,d) The IDF term has the effect of upweightingterms that occur in few docs

38 38 TF-IDF Example N = 1000 in a corpus of news articles Term 1: t = city, appears in 500 documents IDF(t) = log(1000/500) = log(2) = 1 (log is base 2, not important) Term 2: t = freeway, appears in 10 documents IDF(t) = log(1000/10) = log(100) = 6.64 So occurrences of freeway will get upweighted by a factor of 6.64 compared to occurrences of city

39 39 Comparing True Labels and Predictions Classification accuracy = percentage of correct predictions = 70% below predict finance stocks goal score team Class Label Algorithm s Predictions d d d d d d d d d d

40 40 Confusion Matrix and Accuracy Predicted Class 1 Predicted Class 2 True Class 1 True Class 2 True Class Predicted Class Accuracy = fraction of examples classified correctly = 280/400 = 70%

41 41 Training and Test Data Classification accuracy on our training data will tend to be optimistic Classifier can memorize the training data Test set performance A more accurate estimate of accuracy can be gotten by reserving some of our data as an independent/unseen/holdout test data set Train the model on the training data Evaluate the model s true accuracy on the data it did not see (the test set) Cross-validation We can repeat the process of splitting our data into train-test sets multiple times to get an even more robust estimate of accuracy V-fold cross-validation V train-test splits of the data, train V models and evaluate on V test sets Our final accuracy estimate is the average over the V test folds

42 42 Linear Classifiers

43 43 Notation Data: N documents, T features (e.g., term counts) N x T array of features Variables: c is a class label, taking one of M possible values (M > 1) x is a T-dimensional vector of features for a document (e.g., x could be a binary vector indicating which terms are in the document or not) P(c x) is the probability of class label c given x

44 44 Linear Classifiers for 2-Class Problems A linear classifier computes a linear weighted sum of the inputs e.g., in 2 dimensions, with 2 classes f = classifier output = w 0 + w 1 x 1 + w 2 x 2 Why do we need this extra constant weight?

45 45 Geometric Interpretation of a Linear Classifier 8 TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE 6 Decision Region 1 Decision Region 2 Note that the decision boundary corresponds to the points where 4 f = 0, i.e., w 0 + w 1 x 1 + w 2 x 2 = 0 Feature Decision Boundary Feature 1 which is the equation of a line in 2 dimensions The w 0 weight allows us to have lines that have non-zero intercept, i.e., that don t need to go through the origin

46 46 Linear Classifier with Overlapping Class Distributions 6 5 TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE Decision Region 1 Decision Region 2 4 Feature Decision Boundary Feature 1

47 47 A Linear Classifier (with 2 Features) Inputs x 1 Weights w 1 Weighted Sum of the inputs Threshold Function Output = class prediction x 2 w 2 f ( x 1, x 2 ) = w 0 + w 1 x 1 + w 2 x 2 f >0? 1 or 2 w 3 1

48 48 Linear Classifiers for 2-Class Problems A linear classifier computes a linear weighted sum of the inputs e.g., in 2 dimensions f ( x 1, x 2 ) = w 0 + w 1 x 1 + w 2 x 2 and more generally in T dimensions f(x) = f(x 1,. x T ) = Σ j w j x j = w 0 + w 1 x w 2 x w T x T

49 49 Linear Classifiers for 2-Class Problems A linear classifier computes a linear weighted sum of the inputs e.g., in 2 dimensions f ( x 1, x 2 ) = w 0 + w 1 x 1 + w 2 x 2 and more generally in T dimensions f(x) = f(x 1,. x T ) = Σ j w j x j = w 0 + w 1 x w 2 x w T x T Sidenote: this can also be written as the inner product of a weight-vector and the feature vector, i.e., f(x) = Σ j w j x j = w t x, where w = (w 0, w 1,. w T ) and w t is the transpose of w

50 50 Linear Classifiers for Text Documents Linear classifiers use a weighted sum of the inputs With T features we have T + 1 weights (one per feature plus one intercept ) Examples of Linear Classifiers Linear Classifier (Perceptron) Logistic Regression <- widely used in practice, is what we will focus on Naïve Bayes (it is effectively linear) Support Vector Machines Example of a Non-Linear Classifier Neural Networks <- will be reviewed in future lectures For further discussion of linear and non-linear classifiers see:

51 51 Possible Weights for a Linear Classifier with Documents predict finance stocks goal score team Class Label Weight d d d d d d d d d d

52 52 Logistic Regression Classification

53 53 Notation Data: N documents, T features (e.g., term counts) N x T array of features Variables: c is a class label, taking one of M possible values (M > 1) x is a T-dimensional vector of features for a document (e.g., x could be a binary vector indicating which terms are in the document or not) P(c x) is the probability of class label c given x

54 54 Getting Class Probabilities. Estimates of class probabilities P(c x ) are very useful in practice e.g., for ranking documents to show to a human user

55 55 Getting Class Probabilities. Estimates of class probabilities P(c x ) are very useful in practice e.g., for ranking documents to show to a human user Assume for simplicity we have a 2-class binary classification problem Say we tried to get a probability of a class with a linear model: P(c x ) = f( x ) = w 0 + w 1 x 1 + w 2 x w T x T There is a problem: f( x ) could be negative, could be > 1, etc.

56 56 A Better Approach P(c x ) = f( x ) = g (w 0 + w 1 x 1 + w 2 x w T x T ) where g(z) = 1 / [ 1 + e -z ] As z -> positive infinity, g(z) -> 1, P(class) -> 1 As z -> negative infinity, g(z) -> 0, P(class) -> 0 This is the logistic regression model In effect: a linear (weighted sum) model where the sum is transformed to lie between 0 and 1 and we can interpret f( x ) directly as a probability between 0 and 1

57 57 What does the Logistic Function look like? Shape of the Logistic Function 1 g(z) = g(z) = z = weighted sum = Σ j w j x j As z -> positive infinity, g(z) -> 1, P(class) -> 1 As z -> negative infinity, g(z) -> 0, P(class) -> 0

58 58 Logistic Regression as a Neural Network Logistic regression can be viewed as a simple artificial neuron x 1 Each edge in the network has an associated weight or parameter, w j x 2 f(x) x 3 +1

59 59 A Neural Network with 1 Hidden Layer Here the model learns 3 different logistic functions, each one a hidden unit and then combines the outputs of the 3 to make a prediction x 1 x 2 f(x) x 3 +1 This model is representationally more powerful than a single logistic function, but has many more parameters (can overfit unless we are careful) The model can be trained using gradient methods but local minima are a problem

60 60 Deep Learning: Models with 2 or More Hidden Layers We can build on this idea to create deep models with many hidden layers x 1 x 2 f(x) x 3 +1 The model f(x) is now a very flexible highly non-linear function Significant current interest in deep learning (e.g., 5, 10, 20 layers)

61 61 Explaining Decisions by an AI Algorithm

62 62 Explaining an Algorithm s Decisions Generating human-interpretable explanations of decisions made by AI systems is very important to human users of these systems, e.g., in Autonomous driving Medical diagnosis Product recommendations And so on.. For linear classifiers, where we have 1 weight per input, this is straightforward For each class, look at most positive weights and most negative weights This tells us which features/terms (if present) have the most impact (Assignment 2) For documents note that some terms might be rare: so we could measure how much impact they have on average, rather than when they are present Can also tell the user which terms in a particular document contributed most to a decision For non-linear classifiers (such as neural networks), explaining decisions is much more complicated to do

63 63 From:

64 64 From:

65 65 From:

66 66 From:

67 67 Assignment 2 Use scikit-learn (Python library) to investigate document classification create bag-of-words representations generate training and test data sets experiment with different classifiers (naïve Bayes, logistic regression, knn) Submit modified assignment2.py file. (Wed next week, 5pm) Please read the assignment and submit any questions via Piazza Office hours over the next week Eric: Thursday 1 to 3, and next Tuesday 1 to 3 (new) Instructor: Friday 9:30 to 11:30

68 68 Week Monday Wednesday Jan 8 Lecture: Introduction and course outline Lecture: Basic concepts in text analysis Jan 15 No class (university holiday) Lecture: Text classification, part 1 Assignment 1 due, 5pm Jan 22 Text classification, part 2 Jan 29 Lecture: Neural networks for text, part 2 Lecture: Neural networks for text, part 1 Assignment 2 due, 5pm Lecture: Neural networks for text, part 2 Project proposal due, Friday 6pm Feb 5 Office hours (no lecture) Lecture: Algorithm evaluation methods Feb 12 Office hours (no lecture) Lecture: Unsupervised learning algorithms Feb 19 No class (university holiday) Lecture: Discussion of progress reports Progress report due, Friday 6pm Feb 26 Office hours (no lecture) Office hours (no lecture) Mar 5 Office hours (no lecture) Lecture: Discussion of final reports Mar 12 Mar 19 Project Presentations (in class) Upload slides by 4pm Final project reports due (day/time TBD) Project Presentations (in class) Upload slides by 4pm

69 69 Example (in Python) of Classifying Yelp Reviews (code from Dimitris Kotzias, PhD student, Computer Science Department, UCI)

70 70

71 71 Real Example from Yelp Data Simple pipeline for classification of Yelp Reviews Extract the restaurant reviews Convert them to a tf*idfarray Split data into training and testing Train on training data, and Test

72 72 Real Example from Yelp Data Yelp Dataset Number of Reviews 706,693 Number of Reviews w/o Neutral Rating 595,468 Number of Tokens 85,392,376 Vocabulary Size w/o Stopwords 176,114 Array Dimensions (595468, ) Number of cells in the Array 104,870,251,352 Non-zero entries 28,357,001 Density

73 73 Histogram of Review Lengths

74 74 Real Example from Yelp Data Number of restaurants: 14,308 A total of 706,693 reviews

75 75 Real Example from Yelp Data data shape: (595468, )

76 76 Real Example from Yelp Data training size: testing size:

77 77 Real Example from Yelp Data Training: acc: Testing: acc: auc: Overall takes about mins to run (may produce some warnings)

78 78 Other Aspects of Document Classification

79 Examples of Labels/Categories/Classes Labels for documents or web-pages Labels are often general categories e.g., for news articles "finance," "sports," "news>world>asia>business e.g., for biomedical articles gene expression, microarray, lung cancer Ch Labels may be genres "editorials" "movie-reviews" "news Labels may be opinion on a person/product like, hate, neutral Labels may be domain-specific "interesting-to-me" : "not-interesting-to-me contains adult language : doesn t language identification: English, French, Chinese, link spam : not link spam

80 80 Where do Document Labels come from? Manually assigned (expensive) Predefined dictionary of labels Human labelers read all or part of the article and assigning the most likely label Who are the labelers? Domain experts Librarians/editors (e.g., for the New York Times) Low-paid labelers, e.g., via Amazon Turk This is a subjective process Even domain experts will disagree on some labels In many cases there is no absolute right or wrong labeling Semi-automated process e.g., domain experts define selected keywords for each label Keyword matching used to return documents with most keyword matches for each label Experts then label these returned documents Classifier trained on these labeled documents

81 81 Other Aspects of Document Labels Large numbers of label values Many applications have a very large number of possible class labels (thousands) Distribution of labels is often highly skewed Some labels very common, other labels very rare Multi-Label versus Single-Label documents Multi-Label: each document can have multiple labels Single-Label: each document is assigned a single label The multi-label problem is more complex to handle E.g., the model needs to decide how many labels to assign to each document (we will assume single-label for now, return to multi-label later) Hierarchical labels Common in real-world applications that labels are related hierarchically in a tree e.g., "news>world>asia>business Classifiers that use this hierarchy will generally perform better than classifiers that ignore it

82 82 Feature Selection Performance of text classification algorithms can often be improved by selecting only a subset of the terms Greedy search Start from empty set or full set and add/delete one at a time Heuristics for adding/deleting Information gain (mutual information of term with class) Chi-square Other ideas Methods tend not to be particularly sensitive to the specific heuristic used for feature selection, but some form of feature selection often improves performance

83 83 Feature Selection using Mutual Information Average mutual information between (a) C, the class label and (b) f t, the presence or absence of a term in a document, defined as From McCallum and Nigam, 1998 Where here c is the class and f t indicates the presence or absence of term t Typical approach: compute for all terms, include the top K terms in the classifier, and optimize the value of K via cross-validation (next lecture)

84 84 Generating Multi-Word Terms Consider multi-word terms like New York Would rather treat this as one word New York rather than New and York We can extend our vocabulary to include multi-word terms (or ngrams) Ngrams with n=1,2,3,4. e.g., University of California Irvine (n=4) Finding candidate n-grams Space of possible multi-word combinations is huge W word tokens: W 2 bigrams, W 3 trigrams, etc. (W order of 10 5 ) General approach: select ngrams that occur frequently Keep track of all k-frequent ngrams in the corpus (e.g., k=10) Use feature selection (e.g., mutual information) to select best Can also use other filters to find good terms, e.g., use a parser to automatically extract noun-phrases The big dog jumped over the lazy brown cat

85 85 Sentiment Lexicons Basic analysis of text: Overall count and percentage of words in various categories Example lexicons General inquirer ( Words categorized according to Positive / Negative, Strong vs Weak, Active vs Passive, etc Sentiwordnet ( Synsets in WordNet3.0 annotated for degrees of positivity, negativity, and neutrality/objectiveness Linguistic Inquiry and Word Count ( Next slide

86 86 LIWC (Linguistic Inquiry and Word Count) Pennebaker, J.W., Booth, R.J., & Francis, M.E. (2007). Linguistic Inquiry and Word Count: LIWC Austin, TX words, >70 classes Affective Processes negative emotion (bad, weird, hate, problem, tough) positive emotion (love, nice, sweet) Cognitive Processes Tentative (maybe, perhaps, guess), Inhibition (block, constraint) Pronouns, Negation (no, never), Quantifiers (few, many)

87 87 LIWC Word Categories Pronouns Affect Hearing 1st person singular Positive emotions Feeling 1st person plural Negative emotions Body 2nd person Anxiety Sexual Articles Anger Motion Past tense verbs Sadness Space Present tense verbs Cognitive mechanisms Time Future tense verbs Insight Occupation Prepositions Causal Achievement Negations Discrepancy Leisure Numbers Tentative Home Swear words Certainty Money Social words Inhibition Religion Family Inclusive Death Friends Exclusive Assent Humans Seeing Nonfluencies

88 88 Pros and Cons of Dictionary Approaches such as LIWC Pros Effective method for studying the various emotional, cognitive, structural and process components present in individual s verbal and written speech. Easy to use Cons Sentiment lexicons are fixed in number of categories and words in categories Word context is often ignored Not domain specific

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents

More information

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes 1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition

More information

ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Statistical NLP for the Web

Statistical NLP for the Web Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks

More information

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018 1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 5: Text classification (Feb 5, 2019) David Bamman, UC Berkeley Data Classification A mapping h from input data x (drawn from instance space X) to a

More information

Naïve Bayes, Maxent and Neural Models

Naïve Bayes, Maxent and Neural Models Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words

More information

Classification: Analyzing Sentiment

Classification: Analyzing Sentiment Classification: Analyzing Sentiment STAT/CSE 416: Machine Learning Emily Fox University of Washington April 17, 2018 Predicting sentiment by topic: An intelligent restaurant review system 1 It s a big

More information

Machine Learning (CS 567) Lecture 3

Machine Learning (CS 567) Lecture 3 Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

How to Read 100 Million Blogs (& Classify Deaths Without Physicians)

How to Read 100 Million Blogs (& Classify Deaths Without Physicians) ary King Institute for Quantitative Social Science Harvard University () How to Read 100 Million Blogs (& Classify Deaths Without Physicians) (6/19/08 talk at Google) 1 / 30 How to Read 100 Million Blogs

More information

Multi-theme Sentiment Analysis using Quantified Contextual

Multi-theme Sentiment Analysis using Quantified Contextual Multi-theme Sentiment Analysis using Quantified Contextual Valence Shifters Hongkun Yu, Jingbo Shang, MeichunHsu, Malú Castellanos, Jiawei Han Presented by Jingbo Shang University of Illinois at Urbana-Champaign

More information

What s Cooking? Predicting Cuisines from Recipe Ingredients

What s Cooking? Predicting Cuisines from Recipe Ingredients What s Cooking? Predicting Cuisines from Recipe Ingredients Kevin K. Do Department of Computer Science Duke University Durham, NC 27708 kevin.kydat.do@gmail.com Abstract Kaggle is an online platform for

More information

Lecture 2: Probability, Naive Bayes

Lecture 2: Probability, Naive Bayes Lecture 2: Probability, Naive Bayes CS 585, Fall 205 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp205/ Brendan O Connor Today Probability Review Naive Bayes classification

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

Classification: Analyzing Sentiment

Classification: Analyzing Sentiment Classification: Analyzing Sentiment STAT/CSE 416: Machine Learning Emily Fox University of Washington April 17, 2018 Predicting sentiment by topic: An intelligent restaurant review system 1 4/16/18 It

More information

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017 CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code

More information

Stats 170A: Project in Data Science Predictive Modeling: Regression

Stats 170A: Project in Data Science Predictive Modeling: Regression Stats 170A: Project in Data Science Predictive Modeling: Regression Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California, Irvine Reading,

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields

Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project

More information

CPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016

CPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016 CPSC 340: Machine Learning and Data Mining Linear Least Squares Fall 2016 Assignment 2 is due Friday: Admin You should already be started! 1 late day to hand it in on Wednesday, 2 for Friday, 3 for next

More information

How to Read 100 Million Blogs (& Classify Deaths Without Physicians)

How to Read 100 Million Blogs (& Classify Deaths Without Physicians) How to Read 100 Million Blogs (& Classify Deaths Without Physicians) Gary King Institute for Quantitative Social Science Harvard University (7/1/08 talk at IBM) () (7/1/08 talk at IBM) 1 / 30 References

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Boolean and Vector Space Retrieval Models

Boolean and Vector Space Retrieval Models Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1

More information

CS 188: Artificial Intelligence Spring Today

CS 188: Artificial Intelligence Spring Today CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

DM-Group Meeting. Subhodip Biswas 10/16/2014

DM-Group Meeting. Subhodip Biswas 10/16/2014 DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions

More information

Spatial Role Labeling CS365 Course Project

Spatial Role Labeling CS365 Course Project Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Behavioral Data Mining. Lecture 2

Behavioral Data Mining. Lecture 2 Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events

More information

CS 188: Artificial Intelligence Spring Announcements

CS 188: Artificial Intelligence Spring Announcements CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)

More information

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling

Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Homework 4, Part B: Structured perceptron

Homework 4, Part B: Structured perceptron Homework 4, Part B: Structured perceptron CS 585, UMass Amherst, Fall 2016 Overview Due Friday, Oct 28. Get starter code/data from the course website s schedule page. You should submit a zipped directory

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Machine Learning. Boris

Machine Learning. Boris Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)

More information

IN FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning

IN FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning 1 IN4080 2018 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning 2 Logistic regression Lecture 8, 26 Sept Today 3 Recap: HMM-tagging Generative and discriminative classifiers Linear classifiers Logistic

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks

More information

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes

Pattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

ML4NLP Multiclass Classification

ML4NLP Multiclass Classification ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

Errors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology

Errors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology CS 188: Artificial Intelligence Fall 2011 Lecture 22: Perceptrons and More! 11/15/2011 Dan Klein UC Berkeley Errors, and What to Do Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has partnered

More information

INTRODUCTION TO DATA SCIENCE

INTRODUCTION TO DATA SCIENCE INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 22: Perceptrons and More! 11/15/2011 Dan Klein UC Berkeley Errors, and What to Do Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has partnered

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Linear Classifiers: multi-class Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due in a week Midterm:

More information

Welcome to MAT 137! Course website:

Welcome to MAT 137! Course website: Welcome to MAT 137! Course website: http://uoft.me/ Read the course outline Office hours to be posted here Online forum: Piazza Precalculus review: http://uoft.me/precalc If you haven t gotten an email

More information

Feature Engineering, Model Evaluations

Feature Engineering, Model Evaluations Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,

Stat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January, 1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008 Outline 2 Administrivia Some basic definitions.

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Lecture 3: Probabilistic Retrieval Models

Lecture 3: Probabilistic Retrieval Models Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful

More information

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Perceptrons Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188

More information

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor

Logistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

(COM4513/6513) Week 1. Nikolaos Aletras ( Department of Computer Science University of Sheffield

(COM4513/6513) Week 1. Nikolaos Aletras (  Department of Computer Science University of Sheffield Natural Language Processing (COM4513/6513) Week 1 Part II: Text classification with the perceptron Nikolaos Aletras (http://www.nikosaletras.com) n.aletras@sheffield.ac.uk Department of Computer Science

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima. http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb

More information

COMS F18 Homework 3 (due October 29, 2018)

COMS F18 Homework 3 (due October 29, 2018) COMS 477-2 F8 Homework 3 (due October 29, 208) Instructions Submit your write-up on Gradescope as a neatly typeset (not scanned nor handwritten) PDF document by :59 PM of the due date. On Gradescope, be

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Natural Language Processing SoSe Words and Language Model

Natural Language Processing SoSe Words and Language Model Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence

More information

The Changing Landscape of Land Administration

The Changing Landscape of Land Administration The Changing Landscape of Land Administration B r e n t J o n e s P E, PLS E s r i World s Largest Media Company No Journalists No Content Producers No Photographers World s Largest Hospitality Company

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig

More information

Linear Classification and SVM. Dr. Xin Zhang

Linear Classification and SVM. Dr. Xin Zhang Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a

More information

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).

Boolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

Machine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview Previous lectures: (Principle for loss function) MLE to derive loss Example: linear

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014

Understanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Understanding Comments Submitted to FCC on Net Neutrality Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Abstract We aim to understand and summarize themes in the 1.65 million

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Reading for today: R&N 18.1-18.4 Next lecture: R&N 18.6-18.12, 20.1-20.3.2 Outline The importance of a good representation Different types of learning problems Different

More information