CS 175, Project in Artificial Intelligence Lecture 3: Document Classification
|
|
- Corey King
- 6 years ago
- Views:
Transcription
1 CS 175, Project in Artificial Intelligence Lecture 3: Document Classification Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California, Irvine
2 2 Announcements Assignment 1: completed Assignment 2: Text Classification Due by 5pm Wednesday next week Project Proposals Due 2 weeks from Friday Will discuss project ideas and proposals over the next 2 weeks Today s lecture Possible project topics Document classification
3 3 Assignment 2 Use scikit-learn (Python library) to investigate document classification create bag-of-words representations generate training and test data sets experiment with different classifiers (naïve Bayes, logistic regression, knn) Submit modified assignment2.py file. (Wed next week, 5pm) Please read the assignment and submit any questions via Piazza Office hours over the next week Eric: Thursday 1 to 3, and next Tuesday 1 to 3 (new) Instructor: Friday 9:30 to 11:30
4 4
5 5 Background Reading: useful for Project Ideas Links on Class Web site: Tutorial Articles Software and Demos for Text Analysis Data Sets Reference books on text/natural language Introduction to Information Retrieval, Speech and Language Processing, Mining the Social Web (2nd Edition), by Matthew Russell, O'Reilly Media, (O Reilly books are available for free online via the UCI Library's subscription to Safari Books Online ( Reference books on machine learning Hands-On Machine Learning with Scikit-Learn and TensorFlow A Course in Machine Learning, Deep Learning,
6 6
7 7 Text Analysis Techniques Classification: automatically assign a document to 1 or more categories e.g., is an spam or non-spam? is a review positive or negative? Clustering/Topic Discovery: Group a set of documents into clusters, discover themes/topics in documents e.g., automatically group documents in search results Word prediction e.g., predict the next word for typing on a mobile device Chatbots and Text Synthesis e.g., automatically generate new text, e.g., in response to human dialog Information Extraction: Extract mentions of entities from documents e.g., tag news articles with mentions of companies and products
8 8 Figure 5.1 from the NLTK book showing the results of matching strings to a geographic dictionary. Illustrates clearly why dictionary look-up is not sufficient for entity recognition!
9 9 Project Topic: Sentiment or Emotion Prediction from Text Problem: Given text from a short document (e.g., Tweets) investigate classification methods to predict sentiment (e.g., positive/negative) or emotions (e.g., anger) from short documents. (e.g., Tweets) Possible Data Sources Labeled Tweet data sets (positive/negative or other emotion) Evaluation Accuracy, precision/recall, etc, on test data set or using cross-validation Comments Add additional aspects so that project is not too simple, needs to be interesting Additional Reading See links to Tutorial Articles, Data Sets, Software Demos on class Web page
10 10 Project Topic: Predicting Review Scores from Text Problem: Given text from a review (e.g., product, movie, restaurant) investigate machine learning methods to predict the numerical score of a review (e.g., 1, 2, 3, 4, 5) given the text of the review Possible Data Sources Yelp Challenge data sets, Amazon product review data sets, etc Evaluation Accuracy, precision/recall, etc, on test data set or using cross-validation Comments Could start with binary classification: {1 or 2} versus {4 or 5} Additional Reading See links to Tutorial Articles, Data Sets, Software Demos on class Web page
11 11
12 12 Histogram of Review Lengths
13 13 Project Topic: Summarizing Aspects of Review Text Problem: Given text from a set of reviews (e.g., product, movie, restaurant) investigate information extraction methods to automatically extract and summarize sentiment for different aspects of the reviews, e.g., price, service, quality of food, etc Possible Data Sources Yelp Challenge data sets, Amazon product review data sets, etc Evaluation User studies: is output of method A better than method B Evaluation techniques for text summarization such as BLEU scores Comments Evaluation is difficult for a problem like this Additional Reading See links to Tutorial Articles, Data Sets, Software Demos on class Web page
14 14
15 15 Project Topic: Chatbot Problem: Given a sentence generate an appropriate response sentence Possible Data Sources Transcripts of spoken or written dialog (e.g., Switchboard or Ubuntu corpus) Evaluation User studies, e.g., human judges responses of algorithm A v algorithm B Comments This is a difficult problem to do well on partial success would be fine J Additional Reading Chapter on Chatbots in Jurafsky text book (class Web site) See also Proceedings of the Amazon Alexa Prize competition in 2017 (online)
16 16
17 17 Project Topic: Text Generation/Simulation Problem: Given text from a particular author or source, generate new simulated text with the same style as the author or source Possible Data Sources Fiction from different authors, speeches, songs, poetry, movie scripts Evaluation User studies, e.g., human judges quality of output of algorithm A v algorithm B Comments This is easier than a Chatbot (more open loop ) Evaluation is tricky: how do you avoid generating text very similar to original? Additional Reading See links to Tutorial Articles, Software Demos on class Web page
18 18 Output from a Model Learned on Source Code Examples from The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Kaparthy, blog,
19 19 Output from a Model Learned on Mathematics Papers Examples from The Unreasonable Effectiveness of Recurrent Neural Networks, Andrej Kaparthy, blog,
20 20 Important Components of Projects Clear definition of the problem: think of inputs, outputs, pipelines Data Make sure you will be able data you need, e.g., labeled data for classifications Self-written components Which parts of the code will you write and what will be existing code? Evaluation How will you evaluate the quality of your system? Think of ways to compare version A versus B Run-time Do you want a system/demo that can run in real-time, or one that operates offline? Different design decisions for each.
21 21 Project Tips: Plan in Stages Plan your project in stages so that the overall project is not dependent on the riskier elements working Example: PHASE 1 Original Documents Standard Bag of Words Standard Logistic Regression Cross- Validation Experiments
22 22 Project Tips: Plan in Stages Plan your project in stages so that the overall project is not dependent on the riskier elements working Example: PHASE 1 Original Documents Standard Bag of Words Standard Logistic Regression Cross- Validation Experiments PHASE 2 Bag of Phrases (ngrams)
23 23 Project Tips: Plan in Stages Plan your project in stages so that the overall project is not dependent on the riskier elements working Example: PHASE 1 Original Documents Standard Bag of Words Standard Logistic Regression Cross- Validation Experiments PHASE 2 Bag of Phrases (ngrams) PHASE 3 Deep Neural Network
24 24 Overview of Text Classification
25 25 Text Classification Text classification has many applications Spam detection Classifying news articles, e.g., Google News Classifying Web pages into categories
26 26 Text Classification Text classification has many applications Spam detection Classifying news articles, e.g., Google News Classifying Web pages into categories Data Representation Bag of words/terms most commonly used: either counts or binary Can also use other weighting and additional features (e.g., metadata)
27 27 Text Classification Text classification has many applications Spam detection Classifying news articles, e.g., Google News Classifying Web pages into categories Data Representation Bag of words/terms most commonly used: either counts or binary Can also use other weighting and additional features (e.g., metadata) Classification Methods Naïve Bayes widely used baseline Fast and reasonably accurate Logistic Regression Widely used in industry, accurate, excellent baseline Neural networks and deep learning Can be very accurate Can require very large amounts of labeled training data More complex than other methods
28 28 Example of Document by Term Matrix predict finance stocks goal score team Class Label d d d d d d d d d d
29 29 Example of Document by Term Matrix predict finance Note: we use term to allow for multi-word terms, stocks i.e., n-grams, goal e.g., Santa score Barbara team and New York Class Label d d d d d d d d d d
30 30 Real Example from Yelp Data Yelp Dataset Number of Reviews 706,693 Number of Reviews w/o Neutral Rating 595,468 Number of Tokens 85,392,376 Vocabulary Size w/o Stopwords 176,114 Array Dimensions (595468, ) Number of cells in the Array 104,870,251,352 Non-zero entries 28,357,001 Density
31 31 Training and Prediction Terms Labeled Documents Training Data (used to learn the model) Class Labels are Known Unlabeled Documents Future Data (using the model to make predictions) Class Labels are Unknown
32 32 Example of a Pipeline for Document Classification Training Documents (corpus) Tokenization Lists of Tokens Stopword and rare word removal Vocabulary Bag of Words Frequency Counts Machine Learning Algorithm Document Classifier
33 33 Example of a Pipeline for Document Classification Training Documents (corpus) Tokenization Lists of Tokens Stopword and rare word removal Vocabulary Bag of Words Frequency Counts Machine Learning Algorithm Tokenization Lists of Tokens Bag of Words Document Classifier Label Prediction New Document
34 34 Key Steps in Document Analysis Pipelines (for Bag of Words) Tokenization Various options (e.g., with punctuation, non alphanumeric symbols, etc) Vocabulary definition N-grams, stopword removal, rare word removal, stemming Feature definition Binary (term present or not?) Counts Weighted counts, e.g., TD-IDF (see later in the slides) Classifier selection Naïve Bayes, logistic, SVMs, neural networks, etc
35 35 Example of Document by Term Matrix (count version) predict finance stocks goal score team Class Label d d d d d d d d d d
36 36 Example of Document by Term Matrix (binary version) predict finance stocks goal score team Class Label d d d d d d d d d d
37 37 TF-IDF Weighting of Features In practice the inputs can be weighted It can be helpful to use TF-IDF weights instead of counts TF(t,d) = term frequency = count = number of times term t occurs in doc d IDF(t,d) = inverse document frequency = log ( N / number of docs with term t) (where N = total number of docs in the corpus) TF-IDF(t,d) = TF(t,d) * IDF(t,d) The IDF term has the effect of upweightingterms that occur in few docs
38 38 TF-IDF Example N = 1000 in a corpus of news articles Term 1: t = city, appears in 500 documents IDF(t) = log(1000/500) = log(2) = 1 (log is base 2, not important) Term 2: t = freeway, appears in 10 documents IDF(t) = log(1000/10) = log(100) = 6.64 So occurrences of freeway will get upweighted by a factor of 6.64 compared to occurrences of city
39 39 Comparing True Labels and Predictions Classification accuracy = percentage of correct predictions = 70% below predict finance stocks goal score team Class Label Algorithm s Predictions d d d d d d d d d d
40 40 Confusion Matrix and Accuracy Predicted Class 1 Predicted Class 2 True Class 1 True Class 2 True Class Predicted Class Accuracy = fraction of examples classified correctly = 280/400 = 70%
41 41 Training and Test Data Classification accuracy on our training data will tend to be optimistic Classifier can memorize the training data Test set performance A more accurate estimate of accuracy can be gotten by reserving some of our data as an independent/unseen/holdout test data set Train the model on the training data Evaluate the model s true accuracy on the data it did not see (the test set) Cross-validation We can repeat the process of splitting our data into train-test sets multiple times to get an even more robust estimate of accuracy V-fold cross-validation V train-test splits of the data, train V models and evaluate on V test sets Our final accuracy estimate is the average over the V test folds
42 42 Linear Classifiers
43 43 Notation Data: N documents, T features (e.g., term counts) N x T array of features Variables: c is a class label, taking one of M possible values (M > 1) x is a T-dimensional vector of features for a document (e.g., x could be a binary vector indicating which terms are in the document or not) P(c x) is the probability of class label c given x
44 44 Linear Classifiers for 2-Class Problems A linear classifier computes a linear weighted sum of the inputs e.g., in 2 dimensions, with 2 classes f = classifier output = w 0 + w 1 x 1 + w 2 x 2 Why do we need this extra constant weight?
45 45 Geometric Interpretation of a Linear Classifier 8 TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE 6 Decision Region 1 Decision Region 2 Note that the decision boundary corresponds to the points where 4 f = 0, i.e., w 0 + w 1 x 1 + w 2 x 2 = 0 Feature Decision Boundary Feature 1 which is the equation of a line in 2 dimensions The w 0 weight allows us to have lines that have non-zero intercept, i.e., that don t need to go through the origin
46 46 Linear Classifier with Overlapping Class Distributions 6 5 TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE Decision Region 1 Decision Region 2 4 Feature Decision Boundary Feature 1
47 47 A Linear Classifier (with 2 Features) Inputs x 1 Weights w 1 Weighted Sum of the inputs Threshold Function Output = class prediction x 2 w 2 f ( x 1, x 2 ) = w 0 + w 1 x 1 + w 2 x 2 f >0? 1 or 2 w 3 1
48 48 Linear Classifiers for 2-Class Problems A linear classifier computes a linear weighted sum of the inputs e.g., in 2 dimensions f ( x 1, x 2 ) = w 0 + w 1 x 1 + w 2 x 2 and more generally in T dimensions f(x) = f(x 1,. x T ) = Σ j w j x j = w 0 + w 1 x w 2 x w T x T
49 49 Linear Classifiers for 2-Class Problems A linear classifier computes a linear weighted sum of the inputs e.g., in 2 dimensions f ( x 1, x 2 ) = w 0 + w 1 x 1 + w 2 x 2 and more generally in T dimensions f(x) = f(x 1,. x T ) = Σ j w j x j = w 0 + w 1 x w 2 x w T x T Sidenote: this can also be written as the inner product of a weight-vector and the feature vector, i.e., f(x) = Σ j w j x j = w t x, where w = (w 0, w 1,. w T ) and w t is the transpose of w
50 50 Linear Classifiers for Text Documents Linear classifiers use a weighted sum of the inputs With T features we have T + 1 weights (one per feature plus one intercept ) Examples of Linear Classifiers Linear Classifier (Perceptron) Logistic Regression <- widely used in practice, is what we will focus on Naïve Bayes (it is effectively linear) Support Vector Machines Example of a Non-Linear Classifier Neural Networks <- will be reviewed in future lectures For further discussion of linear and non-linear classifiers see:
51 51 Possible Weights for a Linear Classifier with Documents predict finance stocks goal score team Class Label Weight d d d d d d d d d d
52 52 Logistic Regression Classification
53 53 Notation Data: N documents, T features (e.g., term counts) N x T array of features Variables: c is a class label, taking one of M possible values (M > 1) x is a T-dimensional vector of features for a document (e.g., x could be a binary vector indicating which terms are in the document or not) P(c x) is the probability of class label c given x
54 54 Getting Class Probabilities. Estimates of class probabilities P(c x ) are very useful in practice e.g., for ranking documents to show to a human user
55 55 Getting Class Probabilities. Estimates of class probabilities P(c x ) are very useful in practice e.g., for ranking documents to show to a human user Assume for simplicity we have a 2-class binary classification problem Say we tried to get a probability of a class with a linear model: P(c x ) = f( x ) = w 0 + w 1 x 1 + w 2 x w T x T There is a problem: f( x ) could be negative, could be > 1, etc.
56 56 A Better Approach P(c x ) = f( x ) = g (w 0 + w 1 x 1 + w 2 x w T x T ) where g(z) = 1 / [ 1 + e -z ] As z -> positive infinity, g(z) -> 1, P(class) -> 1 As z -> negative infinity, g(z) -> 0, P(class) -> 0 This is the logistic regression model In effect: a linear (weighted sum) model where the sum is transformed to lie between 0 and 1 and we can interpret f( x ) directly as a probability between 0 and 1
57 57 What does the Logistic Function look like? Shape of the Logistic Function 1 g(z) = g(z) = z = weighted sum = Σ j w j x j As z -> positive infinity, g(z) -> 1, P(class) -> 1 As z -> negative infinity, g(z) -> 0, P(class) -> 0
58 58 Logistic Regression as a Neural Network Logistic regression can be viewed as a simple artificial neuron x 1 Each edge in the network has an associated weight or parameter, w j x 2 f(x) x 3 +1
59 59 A Neural Network with 1 Hidden Layer Here the model learns 3 different logistic functions, each one a hidden unit and then combines the outputs of the 3 to make a prediction x 1 x 2 f(x) x 3 +1 This model is representationally more powerful than a single logistic function, but has many more parameters (can overfit unless we are careful) The model can be trained using gradient methods but local minima are a problem
60 60 Deep Learning: Models with 2 or More Hidden Layers We can build on this idea to create deep models with many hidden layers x 1 x 2 f(x) x 3 +1 The model f(x) is now a very flexible highly non-linear function Significant current interest in deep learning (e.g., 5, 10, 20 layers)
61 61 Explaining Decisions by an AI Algorithm
62 62 Explaining an Algorithm s Decisions Generating human-interpretable explanations of decisions made by AI systems is very important to human users of these systems, e.g., in Autonomous driving Medical diagnosis Product recommendations And so on.. For linear classifiers, where we have 1 weight per input, this is straightforward For each class, look at most positive weights and most negative weights This tells us which features/terms (if present) have the most impact (Assignment 2) For documents note that some terms might be rare: so we could measure how much impact they have on average, rather than when they are present Can also tell the user which terms in a particular document contributed most to a decision For non-linear classifiers (such as neural networks), explaining decisions is much more complicated to do
63 63 From:
64 64 From:
65 65 From:
66 66 From:
67 67 Assignment 2 Use scikit-learn (Python library) to investigate document classification create bag-of-words representations generate training and test data sets experiment with different classifiers (naïve Bayes, logistic regression, knn) Submit modified assignment2.py file. (Wed next week, 5pm) Please read the assignment and submit any questions via Piazza Office hours over the next week Eric: Thursday 1 to 3, and next Tuesday 1 to 3 (new) Instructor: Friday 9:30 to 11:30
68 68 Week Monday Wednesday Jan 8 Lecture: Introduction and course outline Lecture: Basic concepts in text analysis Jan 15 No class (university holiday) Lecture: Text classification, part 1 Assignment 1 due, 5pm Jan 22 Text classification, part 2 Jan 29 Lecture: Neural networks for text, part 2 Lecture: Neural networks for text, part 1 Assignment 2 due, 5pm Lecture: Neural networks for text, part 2 Project proposal due, Friday 6pm Feb 5 Office hours (no lecture) Lecture: Algorithm evaluation methods Feb 12 Office hours (no lecture) Lecture: Unsupervised learning algorithms Feb 19 No class (university holiday) Lecture: Discussion of progress reports Progress report due, Friday 6pm Feb 26 Office hours (no lecture) Office hours (no lecture) Mar 5 Office hours (no lecture) Lecture: Discussion of final reports Mar 12 Mar 19 Project Presentations (in class) Upload slides by 4pm Final project reports due (day/time TBD) Project Presentations (in class) Upload slides by 4pm
69 69 Example (in Python) of Classifying Yelp Reviews (code from Dimitris Kotzias, PhD student, Computer Science Department, UCI)
70 70
71 71 Real Example from Yelp Data Simple pipeline for classification of Yelp Reviews Extract the restaurant reviews Convert them to a tf*idfarray Split data into training and testing Train on training data, and Test
72 72 Real Example from Yelp Data Yelp Dataset Number of Reviews 706,693 Number of Reviews w/o Neutral Rating 595,468 Number of Tokens 85,392,376 Vocabulary Size w/o Stopwords 176,114 Array Dimensions (595468, ) Number of cells in the Array 104,870,251,352 Non-zero entries 28,357,001 Density
73 73 Histogram of Review Lengths
74 74 Real Example from Yelp Data Number of restaurants: 14,308 A total of 706,693 reviews
75 75 Real Example from Yelp Data data shape: (595468, )
76 76 Real Example from Yelp Data training size: testing size:
77 77 Real Example from Yelp Data Training: acc: Testing: acc: auc: Overall takes about mins to run (may produce some warnings)
78 78 Other Aspects of Document Classification
79 Examples of Labels/Categories/Classes Labels for documents or web-pages Labels are often general categories e.g., for news articles "finance," "sports," "news>world>asia>business e.g., for biomedical articles gene expression, microarray, lung cancer Ch Labels may be genres "editorials" "movie-reviews" "news Labels may be opinion on a person/product like, hate, neutral Labels may be domain-specific "interesting-to-me" : "not-interesting-to-me contains adult language : doesn t language identification: English, French, Chinese, link spam : not link spam
80 80 Where do Document Labels come from? Manually assigned (expensive) Predefined dictionary of labels Human labelers read all or part of the article and assigning the most likely label Who are the labelers? Domain experts Librarians/editors (e.g., for the New York Times) Low-paid labelers, e.g., via Amazon Turk This is a subjective process Even domain experts will disagree on some labels In many cases there is no absolute right or wrong labeling Semi-automated process e.g., domain experts define selected keywords for each label Keyword matching used to return documents with most keyword matches for each label Experts then label these returned documents Classifier trained on these labeled documents
81 81 Other Aspects of Document Labels Large numbers of label values Many applications have a very large number of possible class labels (thousands) Distribution of labels is often highly skewed Some labels very common, other labels very rare Multi-Label versus Single-Label documents Multi-Label: each document can have multiple labels Single-Label: each document is assigned a single label The multi-label problem is more complex to handle E.g., the model needs to decide how many labels to assign to each document (we will assume single-label for now, return to multi-label later) Hierarchical labels Common in real-world applications that labels are related hierarchically in a tree e.g., "news>world>asia>business Classifiers that use this hierarchy will generally perform better than classifiers that ignore it
82 82 Feature Selection Performance of text classification algorithms can often be improved by selecting only a subset of the terms Greedy search Start from empty set or full set and add/delete one at a time Heuristics for adding/deleting Information gain (mutual information of term with class) Chi-square Other ideas Methods tend not to be particularly sensitive to the specific heuristic used for feature selection, but some form of feature selection often improves performance
83 83 Feature Selection using Mutual Information Average mutual information between (a) C, the class label and (b) f t, the presence or absence of a term in a document, defined as From McCallum and Nigam, 1998 Where here c is the class and f t indicates the presence or absence of term t Typical approach: compute for all terms, include the top K terms in the classifier, and optimize the value of K via cross-validation (next lecture)
84 84 Generating Multi-Word Terms Consider multi-word terms like New York Would rather treat this as one word New York rather than New and York We can extend our vocabulary to include multi-word terms (or ngrams) Ngrams with n=1,2,3,4. e.g., University of California Irvine (n=4) Finding candidate n-grams Space of possible multi-word combinations is huge W word tokens: W 2 bigrams, W 3 trigrams, etc. (W order of 10 5 ) General approach: select ngrams that occur frequently Keep track of all k-frequent ngrams in the corpus (e.g., k=10) Use feature selection (e.g., mutual information) to select best Can also use other filters to find good terms, e.g., use a parser to automatically extract noun-phrases The big dog jumped over the lazy brown cat
85 85 Sentiment Lexicons Basic analysis of text: Overall count and percentage of words in various categories Example lexicons General inquirer ( Words categorized according to Positive / Negative, Strong vs Weak, Active vs Passive, etc Sentiwordnet ( Synsets in WordNet3.0 annotated for degrees of positivity, negativity, and neutrality/objectiveness Linguistic Inquiry and Word Count ( Next slide
86 86 LIWC (Linguistic Inquiry and Word Count) Pennebaker, J.W., Booth, R.J., & Francis, M.E. (2007). Linguistic Inquiry and Word Count: LIWC Austin, TX words, >70 classes Affective Processes negative emotion (bad, weird, hate, problem, tough) positive emotion (love, nice, sweet) Cognitive Processes Tentative (maybe, perhaps, guess), Inhibition (block, constraint) Pronouns, Negation (no, never), Quantifiers (few, many)
87 87 LIWC Word Categories Pronouns Affect Hearing 1st person singular Positive emotions Feeling 1st person plural Negative emotions Body 2nd person Anxiety Sexual Articles Anger Motion Past tense verbs Sadness Space Present tense verbs Cognitive mechanisms Time Future tense verbs Insight Occupation Prepositions Causal Achievement Negations Discrepancy Leisure Numbers Tentative Home Swear words Certainty Money Social words Inhibition Religion Family Inclusive Death Friends Exclusive Assent Humans Seeing Nonfluencies
88 88 Pros and Cons of Dictionary Approaches such as LIWC Pros Effective method for studying the various emotional, cognitive, structural and process components present in individual s verbal and written speech. Easy to use Cons Sentiment lexicons are fixed in number of categories and words in categories Word context is often ignored Not domain specific
Information Retrieval and Organisation
Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents
More informationCategorization ANLP Lecture 10 Text Categorization with Naive Bayes
1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition
More informationANLP Lecture 10 Text Categorization with Naive Bayes
ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationRegularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018
1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 5: Text classification (Feb 5, 2019) David Bamman, UC Berkeley Data Classification A mapping h from input data x (drawn from instance space X) to a
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More informationClassification: Analyzing Sentiment
Classification: Analyzing Sentiment STAT/CSE 416: Machine Learning Emily Fox University of Washington April 17, 2018 Predicting sentiment by topic: An intelligent restaurant review system 1 It s a big
More informationMachine Learning (CS 567) Lecture 3
Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationHow to Read 100 Million Blogs (& Classify Deaths Without Physicians)
ary King Institute for Quantitative Social Science Harvard University () How to Read 100 Million Blogs (& Classify Deaths Without Physicians) (6/19/08 talk at Google) 1 / 30 How to Read 100 Million Blogs
More informationMulti-theme Sentiment Analysis using Quantified Contextual
Multi-theme Sentiment Analysis using Quantified Contextual Valence Shifters Hongkun Yu, Jingbo Shang, MeichunHsu, Malú Castellanos, Jiawei Han Presented by Jingbo Shang University of Illinois at Urbana-Champaign
More informationWhat s Cooking? Predicting Cuisines from Recipe Ingredients
What s Cooking? Predicting Cuisines from Recipe Ingredients Kevin K. Do Department of Computer Science Duke University Durham, NC 27708 kevin.kydat.do@gmail.com Abstract Kaggle is an online platform for
More informationLecture 2: Probability, Naive Bayes
Lecture 2: Probability, Naive Bayes CS 585, Fall 205 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp205/ Brendan O Connor Today Probability Review Naive Bayes classification
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationClassification: Analyzing Sentiment
Classification: Analyzing Sentiment STAT/CSE 416: Machine Learning Emily Fox University of Washington April 17, 2018 Predicting sentiment by topic: An intelligent restaurant review system 1 4/16/18 It
More informationCPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017
CPSC 340: Machine Learning and Data Mining Stochastic Gradient Fall 2017 Assignment 3: Admin Check update thread on Piazza for correct definition of trainndx. This could make your cross-validation code
More informationStats 170A: Project in Data Science Predictive Modeling: Regression
Stats 170A: Project in Data Science Predictive Modeling: Regression Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California, Irvine Reading,
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationStatistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields
Statistical NLP for the Web Log Linear Models, MEMM, Conditional Random Fields Sameer Maskey Week 13, Nov 28, 2012 1 Announcements Next lecture is the last lecture Wrap up of the semester 2 Final Project
More informationCPSC 340: Machine Learning and Data Mining. Linear Least Squares Fall 2016
CPSC 340: Machine Learning and Data Mining Linear Least Squares Fall 2016 Assignment 2 is due Friday: Admin You should already be started! 1 late day to hand it in on Wednesday, 2 for Friday, 3 for next
More informationHow to Read 100 Million Blogs (& Classify Deaths Without Physicians)
How to Read 100 Million Blogs (& Classify Deaths Without Physicians) Gary King Institute for Quantitative Social Science Harvard University (7/1/08 talk at IBM) () (7/1/08 talk at IBM) 1 / 30 References
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationBoolean and Vector Space Retrieval Models
Boolean and Vector Space Retrieval Models Many slides in this section are adapted from Prof. Joydeep Ghosh (UT ECE) who in turn adapted them from Prof. Dik Lee (Univ. of Science and Tech, Hong Kong) 1
More informationCS 188: Artificial Intelligence Spring Today
CS 188: Artificial Intelligence Spring 2006 Lecture 9: Naïve Bayes 2/14/2006 Dan Klein UC Berkeley Many slides from either Stuart Russell or Andrew Moore Bayes rule Today Expectations and utilities Naïve
More informationLecture 6: Neural Networks for Representing Word Meaning
Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,
More informationDM-Group Meeting. Subhodip Biswas 10/16/2014
DM-Group Meeting Subhodip Biswas 10/16/2014 Papers to be discussed 1. Crowdsourcing Land Use Maps via Twitter Vanessa Frias-Martinez and Enrique Frias-Martinez in KDD 2014 2. Tracking Climate Change Opinions
More informationSpatial Role Labeling CS365 Course Project
Spatial Role Labeling CS365 Course Project Amit Kumar, akkumar@iitk.ac.in Chandra Sekhar, gchandra@iitk.ac.in Supervisor : Dr.Amitabha Mukerjee ABSTRACT In natural language processing one of the important
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationBehavioral Data Mining. Lecture 2
Behavioral Data Mining Lecture 2 Autonomy Corp Bayes Theorem Bayes Theorem P(A B) = probability of A given that B is true. P(A B) = P(B A)P(A) P(B) In practice we are most interested in dealing with events
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)
More informationDepartment of Computer Science and Engineering Indian Institute of Technology, Kanpur. Spatial Role Labeling
Department of Computer Science and Engineering Indian Institute of Technology, Kanpur CS 365 Artificial Intelligence Project Report Spatial Role Labeling Submitted by Satvik Gupta (12633) and Garvit Pahal
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationHomework 4, Part B: Structured perceptron
Homework 4, Part B: Structured perceptron CS 585, UMass Amherst, Fall 2016 Overview Due Friday, Oct 28. Get starter code/data from the course website s schedule page. You should submit a zipped directory
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationMachine Learning. Boris
Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)
More informationIN FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning
1 IN4080 2018 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning 2 Logistic regression Lecture 8, 26 Sept Today 3 Recap: HMM-tagging Generative and discriminative classifiers Linear classifiers Logistic
More informationDeep Learning for NLP
Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks
More informationPattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The
More informationDISTRIBUTIONAL SEMANTICS
COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationArticle from. Predictive Analytics and Futurism. July 2016 Issue 13
Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted
More informationErrors, and What to Do. CS 188: Artificial Intelligence Fall What to Do About Errors. Later On. Some (Simplified) Biology
CS 188: Artificial Intelligence Fall 2011 Lecture 22: Perceptrons and More! 11/15/2011 Dan Klein UC Berkeley Errors, and What to Do Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has partnered
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do
More informationCS 188: Artificial Intelligence Fall 2011
CS 188: Artificial Intelligence Fall 2011 Lecture 22: Perceptrons and More! 11/15/2011 Dan Klein UC Berkeley Errors, and What to Do Examples of errors Dear GlobalSCAPE Customer, GlobalSCAPE has partnered
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Linear Classifiers: multi-class Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due in a week Midterm:
More informationWelcome to MAT 137! Course website:
Welcome to MAT 137! Course website: http://uoft.me/ Read the course outline Office hours to be posted here Online forum: Piazza Precalculus review: http://uoft.me/precalc If you haven t gotten an email
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationReal Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report
Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationStat 406: Algorithms for classification and prediction. Lecture 1: Introduction. Kevin Murphy. Mon 7 January,
1 Stat 406: Algorithms for classification and prediction Lecture 1: Introduction Kevin Murphy Mon 7 January, 2008 1 1 Slides last updated on January 7, 2008 Outline 2 Administrivia Some basic definitions.
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationBehavioral Data Mining. Lecture 7 Linear and Logistic Regression
Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationLecture 3: Probabilistic Retrieval Models
Probabilistic Retrieval Models Information Retrieval and Web Search Engines Lecture 3: Probabilistic Retrieval Models November 5 th, 2013 Wolf-Tilo Balke and Kinda El Maarry Institut für Informationssysteme
More informationNatural Language Processing
Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning
CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Perceptrons Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188
More informationLogistic Regression. Some slides adapted from Dan Jurfasky and Brendan O Connor
Logistic Regression Some slides adapted from Dan Jurfasky and Brendan O Connor Naïve Bayes Recap Bag of words (order independent) Features are assumed independent given class P (x 1,...,x n c) =P (x 1
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More information(COM4513/6513) Week 1. Nikolaos Aletras ( Department of Computer Science University of Sheffield
Natural Language Processing (COM4513/6513) Week 1 Part II: Text classification with the perceptron Nikolaos Aletras (http://www.nikosaletras.com) n.aletras@sheffield.ac.uk Department of Computer Science
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationCOMS F18 Homework 3 (due October 29, 2018)
COMS 477-2 F8 Homework 3 (due October 29, 208) Instructions Submit your write-up on Gradescope as a neatly typeset (not scanned nor handwritten) PDF document by :59 PM of the due date. On Gradescope, be
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationNatural Language Processing SoSe Words and Language Model
Natural Language Processing SoSe 2016 Words and Language Model Dr. Mariana Neves May 2nd, 2016 Outline 2 Words Language Model Outline 3 Words Language Model Tokenization Separation of words in a sentence
More informationThe Changing Landscape of Land Administration
The Changing Landscape of Land Administration B r e n t J o n e s P E, PLS E s r i World s Largest Media Company No Journalists No Content Producers No Photographers World s Largest Hospitality Company
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Guy Lebanon Binary Classification Binary classification is the most basic task in machine learning, and yet the most frequent. Binary classifiers often serve as the
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 4: Probabilistic Retrieval Models April 29, 2010 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically non-linear It puts non-identical things in the same class, so a
More informationBoolean and Vector Space Retrieval Models CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK).
Boolean and Vector Space Retrieval Models 2013 CS 290N Some of slides from R. Mooney (UTexas), J. Ghosh (UT ECE), D. Lee (USTHK). 1 Table of Content Boolean model Statistical vector space model Retrieval
More informationN-gram Language Modeling
N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical
More informationMachine Learning Basics Lecture 3: Perceptron. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 3: Perceptron Princeton University COS 495 Instructor: Yingyu Liang Perceptron Overview Previous lectures: (Principle for loss function) MLE to derive loss Example: linear
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationClassification & Information Theory Lecture #8
Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing
More informationN-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24
L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,
More informationUnderstanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014
Understanding Comments Submitted to FCC on Net Neutrality Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Abstract We aim to understand and summarize themes in the 1.65 million
More informationIntroduction to Machine Learning
Introduction to Machine Learning Reading for today: R&N 18.1-18.4 Next lecture: R&N 18.6-18.12, 20.1-20.3.2 Outline The importance of a good representation Different types of learning problems Different
More information