Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018
|
|
- Eunice Tate
- 5 years ago
- Views:
Transcription
1 1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1
2 Reminders Homework 4: Logistic Regression Out: Wed, Feb 14 Due: Fri, Feb 23 at 11:59pm New reading on Probabilistic Learning (reused later in the course for MLE/MAP) 3
3 FEATURE ENGINEERING 5
4 Handcrafted Features born-in LOC PER p(y x) NP S VP f( ) exp(θ ) y ADJP NP VP NNP : VBN NNP VBD egypt - born proyas direct Egypt - born Proyas directed 6
5 Where do features come from? Feature Engineering hand-crafted features Sun et al., 211 Zhou et al., 25 First word before M1 Second word before M1 Bag-of-words in M1 Head word of M1 Other word in between First word after M2 Second word after M2 Bag-of-words in M2 Head word of M2 Bigrams in between Words on dependency path Country name list Personal relative triggers Personal title list WordNet Tags Heads of chunks in between Path of phrase labels Combination of entity types Feature Learning 7
6 Where do features come from? Feature Engineering hand-crafted features Sun et al., 211 input (context words) Zhou et al., 25 word embeddings Mikolov et al., 213 Look-up table similar words, similar embeddings CBOW model in Mikolov et al. (213) Feature Learning embeddin g cat: dog: Classifier missing word unsupervised learning
7 Where do features come from? pooling Feature Engineering hand-crafted features Sun et al., 211 The [movie] showed [wars] Convolutional Neural Networks (Collobert and Weston 28) CNN Zhou et al., 25 word embeddings Mikolov et al., 213 Feature Learning The [movie] showed [wars] Recursive Auto Encoder (Socher 211) string embeddings Socher, 211 Collobert & Weston, 28 RAE 9
8 Where do features come from? Feature Engineering hand-crafted features NP W NP,VP Sun et al., 211 W DT,NN S VP W V,NN The [movie] showed [wars] Zhou et al., 25 word embeddings Mikolov et al., 213 Feature Learning tree embeddings Socher et al., 213 Hermann & Blunsom, 213 string embeddings Socher, 211 Collobert & Weston, 28 1
9 Where do features come from? Feature Engineering hand-crafted features Sun et al., 211 Zhou et al., 25 word embedding features Turian et al. 21 Koo et al. 28 word embeddings Mikolov et al., 213 Hermann et al. 214 Feature Learning tree embeddings Socher et al., 213 Hermann & Blunsom, 213 string embeddings Socher, 211 Collobert & Weston, 28 11
10 Where do features come from? Feature Engineering hand-crafted features Sun et al., 211 Zhou et al., 25 word embedding features Turian et al. 21 Hermann et al. Koo et al word embeddings Mikolov et al., 213 Feature Learning best of both worlds? tree embeddings Socher et al., 213 Hermann & Blunsom, 213 string embeddings Socher, 211 Collobert & Weston, 28 12
11 Feature Engineering for NLP Suppose you build a logistic regression model to predict a part-of-speech (POS) tag for each word in a sentence. What features should you use? deter. noun noun noun verb verb The movie I watched depicted hope 13
12 Feature Engineering for NLP Per-word Features: is-capital(w i ) endswith(w i, e ) endswith(w i, d ) endswith(w i, ed ) w i == aardvark w i == hope x (1) x (2) x (3) x (4) x (5) x (6) deter. noun noun verb verb noun The movie I watched depicted hope 14
13 Feature Engineering for NLP Context Features: w i == watched w i+1 == watched w i-1 == watched w i+2 == watched w i-2 == watched x (1) x (2) x (3) x (4) x (5) x (6) deter. noun noun verb verb noun The movie I watched depicted hope 15
14 Feature Engineering for NLP Context Features: w i == I w i+1 == I w i-1 == I w i+2 == I w i-2 == I x (1) x (2) x (3) x (4) x (5) x (6) deter. noun noun verb verb noun The movie I watched depicted hope 16
15 Table from Manning (211) Feature Engineering for NLP Table 3. Tagging accuracies with different feature templates and other changes on the WSJ development set. Model Feature Templates # Sent. Token Unk. Feats Acc. Acc. Acc. 3gramMemm See text 248, % 96.92% 88.99% naacl 23 See text and [1] 46, % 97.15% 88.61% Replication See text and [1] 46, % 97.18% 88.92% Replication +rarefeaturethresh = 5 482, % 97.19% 88.96% 5w + t,w 2, t,w 2 73, % 97.2% 89.3% 5wShapes + t,s 1, t,s, t,s , % 97.25% 89.81% 5wShapesDS + distributional similarity 737, % 97.28% 9.46% deter. noun noun verb verb noun The movie I watched depicted hope 17
16 Feature Engineering for CV Edge detection (Canny) Corner Detection (Harris) Figures from 22
17 Feature Engineering for CV Scale Invariant Feature Transform (SIFT) Figure from Lowe (1999) and Lowe (24) 23
18 NON-LINEAR FEATURES 25
19 Nonlinear Features aka. nonlinear basis functions So far, input was always Key Idea: let input be some function of x original input: new input: define Examples: (M = 1) For a linear model: still a linear function of b(x) even though a nonlinear function of x Examples: - Perceptron - Linear regression - Logistic regression 27
20 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 31
21 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 32
22 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 33
23 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 34
24 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 35
25 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 36
26 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 37
27 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 38
28 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 39
29 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is y = tanh(x) + noise x 4
30 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is linear with negative slope and gaussian noise x 41
31 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is linear with negative slope and gaussian noise x 42
32 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is linear with negative slope and gaussian noise x 43
33 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is linear with negative slope and gaussian noise x 44
34 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is linear with negative slope and gaussian noise x 45
35 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is linear with negative slope and gaussian noise x 46
36 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is linear with negative slope and gaussian noise x 47
37 Over-fitting Root-Mean-Square (RMS) Error: Slide courtesy of William Cohen
38 Slide courtesy of William Cohen Polynomial Coefficients
39 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function y true unknown target function is linear with negative slope and gaussian noise x 5
40 Example: Linear Regression Goal: Learn y = w T f(x) + b where f(.) is a polynomial basis function Same as before, but now with N = 1 points y true unknown target function is linear with negative slope and gaussian noise x 51
41 REGULARIZATION 52
42 Overfitting Definition: The problem of overfitting is when the model captures the noise in the training data instead of the underlying structure Overfitting can occur in all the models we ve seen so far: KNN (e.g. when k is small) Naïve Bayes (e.g. without a prior) Linear Regression (e.g. with basis function) Logistic Regression (e.g. with many rare features) 53
43 Motivation: Regularization Example: Stock Prices Suppose we wish to predict Google s stock price at time t+1 What features should we use? (putting all computational concerns aside) Stock prices of all other stocks at times t, t-1, t-2,, t - k Mentions of Google with positive / negative sentiment words in all newspapers and social media outlets Do we believe that all of these features are going to be useful? 54
44 Motivation: Regularization Occam s Razor: prefer the simplest hypothesis What does it mean for a hypothesis (or model) to be simple? 1. small number of features (model selection) 2. small number of important features (shrinkage) 55
45 Chalkboard Regularization L2, L1, L Regularization Example: Linear Regression 56
46 Regularization Don t Regularize the Bias (Intercept) Parameter! In our models so far, the bias / intercept parameter is usually denoted by! " -- that is, the parameter for which we fixed # " = 1 Regularizers always avoid penalizing this bias / intercept parameter Why? Because otherwise the learning algorithms wouldn t be invariant to a shift in the y-values Whitening Data It s common to whiten each feature by subtracting its mean and dividing by its variance For regularization, this helps all the features be penalized in the same units (e.g. convert both centimeters and kilometers to z-scores) 57
47 Regularization: + Slide courtesy of William Cohen
48 Polynomial Coefficients none exp(18) huge Slide courtesy of William Cohen
49 Slide courtesy of William Cohen Over Regularization:
50 Regularization Exercise In-class Exercise 1. Plot train error vs. # features (cartoon) 2. Plot test error vs. # features (cartoon) error # features 61
51 Example: Logistic Regression Training Data 62
52 Example: Logistic Regression Test Data 63
53 Example: Logistic Regression error 1/lambda 64
54 Example: Logistic Regression 65
55 Example: Logistic Regression 66
56 Example: Logistic Regression 67
57 Example: Logistic Regression 68
58 Example: Logistic Regression 69
59 Example: Logistic Regression 7
60 Example: Logistic Regression 71
61 Example: Logistic Regression 72
62 Example: Logistic Regression 73
63 Example: Logistic Regression 74
64 Example: Logistic Regression 75
65 Example: Logistic Regression 76
66 Example: Logistic Regression 77
67 Example: Logistic Regression error 1/lambda 78
68 Regularization as MAP L1 and L2 regularization can be interpreted as maximum a-posteriori (MAP) estimation of the parameters To be discussed later in the course 79
69 Takeaways 1. Nonlinear basis functions allow linear models (e.g. Linear Regression, Logistic Regression) to capture nonlinear aspects of the original input 2. Nonlinear features are require no changes to the model (i.e. just preprocessing) 3. Regularization helps to avoid overfitting 4. Regularization and MAP estimation are equivalent for appropriately chosen priors 81
70 Feature Engineering / Regularization Objectives You should be able to Engineer appropriate features for a new task Use feature selection techniques to identify and remove irrelevant features Identify when a model is overfitting Add a regularizer to an existing objective in order to combat overfitting Explain why we should not regularize the bias term Convert linearly inseparable dataset to a linearly separable dataset in higher dimensions Describe feature engineering in common application areas 82
71 Neural Networks Outline Logistic Regression (Recap) Data, Model, Learning, Prediction Neural Networks A Recipe for Machine Learning Visual Notation for Neural Networks Example: Logistic Regression Output Surface 2-Layer Neural Network 3-Layer Neural Network Neural Net Architectures Objective Functions Activation Functions Backpropagation Basic Chain Rule (of calculus) Chain Rule for Arbitrary Computation Graph Backpropagation Algorithm Module-based Automatic Differentiation (Autodiff) 99
72 NEURAL NETWORKS 15
73 Background 1. Given training data: A Recipe for Machine Learning Face Face Not a face 2. Choose each of these: Decision function Examples: Linear regression, Logistic regression, Neural Network Loss function Examples: Mean-squared error, Cross Entropy 16
74 Background A Recipe for Machine Learning 1. Given training data: 3. Define goal: 2. Choose each of these: Decision function Loss function 4. Train with SGD: (take small steps opposite the gradient) 17
75 Background A Recipe for Gradients Machine Learning 1. Given training data: 3. Define goal: And it s a special case of a more general algorithm called reversemode automatic differentiation that Decision functioncan compute 4. Train the with gradient SGD: of any differentiable function efficiently! 2. Choose each of these: Loss function Backpropagation can compute this gradient! (take small steps opposite the gradient) 18
76 A Recipe for Goals for Today s Machine Lecture Learning Background Given Explore training a new data: class of 3. decision Define functions goal: (Neural Networks) 2. Consider variants of this recipe for training 2. Choose each of these: Decision function Loss function 4. Train with SGD: (take small steps opposite the gradient) 19
77 Decision Functions Output Linear Regression y = h (x) = ( T x) where (a) =a θ 1 θ 2 θ 3 θ M Input 11
78 Decision Functions Logistic Regression Output y = h (x) = ( T x) 1 where (a) = 1+ ( a) θ 1 θ 2 θ 3 θ M Input 111
79 Decision Functions Output Logistic Regression y = h (x) = ( T x) 1 where (a) = 1+ ( a) Face Face Not a face θ 1 θ 2 θ 3 θ M Input 112
80 Decision Functions Output Logistic Regression y = h (x) = ( T x) In-Class Example 1 where (a) = 1+ ( a) 1 1 y θ 1 θ 2 θ 3 θ M x 1 x 2 Input 113
NASSLLI 2018 Machine Learning
NASSLLI 2018 Machine Learning Department School of Computer Science Carnegie Mellon University Machine Learning Matt Gormley June 23-24, 2018 1 Prerequisites 1. Be able to find Porter Hall 100 2. Later,
More informationPerceptron (Theory) + Linear Regression
10601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron (Theory) Linear Regression Matt Gormley Lecture 6 Feb. 5, 2018 1 Q&A
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 9 Sep. 26, 2018 1 Reminders Homework 3:
More informationJelinek Summer School Machine Learning
Jelinek Summer School Machine Learning Department School of Computer Science Carnegie Mellon University Machine Learning Matt Gormley June 19, 2017 1 This section is based on Chapter 1 of (Mitchell, 1997)
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending
More informationNatural Language Processing
Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationDeep Learning for NLP Part 2
Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The
More informationPart-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationDeep Learning for NLP
Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationBayesian Networks (Part I)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationUnsupervised Neural Nets
Unsupervised Neural Nets (and ICA) Lyle Ungar (with contributions from Quoc Le, Socher & Manning) Lyle Ungar, University of Pennsylvania Semi-Supervised Learning Hypothesis:%P(c x)%can%be%more%accurately%computed%using%
More informationNeural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig
Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Slides credit: Graham Neubig Outline Perceptron: recap and limitations Neural networks Multi-layer perceptron Forward propagation
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationPerceptron & Neural Networks
School of Computer Science 10-701 Introduction to Machine Learning Perceptron & Neural Networks Readings: Bishop Ch. 4.1.7, Ch. 5 Murphy Ch. 16.5, Ch. 28 Mitchell Ch. 4 Matt Gormley Lecture 12 October
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationConditional Random Fields for Sequential Supervised Learning
Conditional Random Fields for Sequential Supervised Learning Thomas G. Dietterich Adam Ashenfelter Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.eecs.oregonstate.edu/~tgd
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationLinear Regression Introduction to Machine Learning. Matt Gormley Lecture 4 September 19, Readings: Bishop, 3.
School of Computer Science 10-701 Introduction to Machine Learning Linear Regression Readings: Bishop, 3.1 Murphy, 7 Matt Gormley Lecture 4 September 19, 2016 1 Homework 1: due 9/26/16 Project Proposal:
More informationSequential Supervised Learning
Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationLoss Functions, Decision Theory, and Linear Models
Loss Functions, Decision Theory, and Linear Models CMSC 678 UMBC January 31 st, 2018 Some slides adapted from Hamed Pirsiavash Logistics Recap Piazza (ask & answer questions): https://piazza.com/umbc/spring2018/cmsc678
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationBayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationNeural Networks for NLP. COMP-599 Nov 30, 2016
Neural Networks for NLP COMP-599 Nov 30, 2016 Outline Neural networks and deep learning: introduction Feedforward neural networks word2vec Complex neural network architectures Convolutional neural networks
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and
More information10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging
10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will
More informationRidge Regression 1. to which some random noise is added. So that the training labels can be represented as:
CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationBayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?
Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationLinear Models for Regression
Linear Models for Regression Machine Learning Torsten Möller Möller/Mori 1 Reading Chapter 3 of Pattern Recognition and Machine Learning by Bishop Chapter 3+5+6+7 of The Elements of Statistical Learning
More informationFundamentals of Machine Learning
Fundamentals of Machine Learning Mohammad Emtiyaz Khan AIP (RIKEN), Tokyo http://icapeople.epfl.ch/mekhan/ emtiyaz@gmail.com Jan 2, 27 Mohammad Emtiyaz Khan 27 Goals Understand (some) fundamentals of Machine
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationIN FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning
1 IN4080 2018 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning 2 Logistic regression Lecture 8, 26 Sept Today 3 Recap: HMM-tagging Generative and discriminative classifiers Linear classifiers Logistic
More informationarxiv: v3 [cs.lg] 14 Jan 2018
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 4, 2015 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationNatural Language Processing
Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More information