text classification 3: neural networks
|
|
- Kathleen Fitzgerald
- 5 years ago
- Views:
Transcription
1 text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst some slides adapted from Jordan Boyd-Graber and Richard Socher
2 questions from last time see PMI vs covariance matrix? why do we have two embedding matrices (W and C) in word2vec? what distribution do we draw negative samples from? unigram ^ why? *shrug* HW 1 encoding issues? 2 Goldberg & Levy, 2014
3 Summary: How to learn word2vec (skip-gram) embeddings Start with V random 300-dimensional vectors as initial embeddings Use logistic regression, the second most basic classifier used in machine learning after naïve bayes Take a corpus and take pairs of words that co-occur as positive examples Take pairs of words that don't co-occur as negative examples Train the classifier to distinguish these by slowly adjusting all the embeddings to improve the classifier performance Throw away the classifier code and keep the embeddings. 3
4 qualitatively evaluating word embeddings: nearest neighbors demo 4
5 text classification input: some text x (e.g., sentence, document) output: a label y (from a finite label set) goal: learn a mapping function f from x to y 5
6 the rise of deep learning in natural language processing
7 neural classification goal: avoid feature engineering why? general model architectures that work well for many different datasets (and tasks!) for medium-to-large datasets, deep learning methods generally outperform naive Bayes / feature-based logistic regression 7
8 what is deep learning? f (input) = output
9 what is deep learning? input Neural Network output
10 Logistic Regression by Another Name: Map inputs to output 10
11 Logistic Regression by Another Name: Map inputs to output Logistic Regression by Another Name: Map inputs to output Input Vector x 1...x d inputs encoded as real numbers 11
12 Logistic Regression by Another Name: Map inputs to output Output Input Vector x 1...x d f Ç X i W i x i + b å multiply 12 inputs by
13 Logistic Regression by Another Name: Map inputs to output Output Input Vector x 1...x d f Ç X i W i x i + b å add bias 13
14 Logistic Regression by Another Name: Map inputs to output Output Activation Input Vector x 1...x d f Ç å X W i x i + b f(z) 1 1 +exp( z) i 14 pass through nonlinear sigmoid
15 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logistic regression functions, then we get a vector of outputs But we don t have to decide ahead of time what variables these logistic regressions are trying to predict! 15
16 A neural network = running several logistic regressions at the same time which we can feed into another logistic regression function It is the loss function that will direct what the intermediate hidden variables should be, so as to do a good job at predicting the targets for the next layer, etc. 16
17 A neural network = running several logistic regressions at the same time Before we know it, we have a multilayer neural network. 17
18 logistic regression is a linear classifier its decision boundary is linear in x sigmoid function e βx βx 18
19 what is deep learning? input Neural Network { nonlinear transformation nonlinear transformation nonlinear transformation output
20 what is deep learning? input Neural Network { nonlinear transformation nonlinear transformation h n = f(wh n 1 + b) nonlinear transformation output
21 Better name: non-linearity Ñ Logistic / Sigmoid f(x)= e x (1) Ñ Ñ tanh f(x)=tanh(x)= ReLU e 2x 1 (2) f(x)= 0 for x < 0 x for x 0 (3) 21 Ñ SoftPlus: f(x)=ln(1 + ex )
22 is a multi-layer neural network with no nonlinearities (i.e., f is the identity f(x) = x) more powerful than a one-layer network?
23 is a multi-layer neural network with no nonlinearities (i.e., f is the identity f(x) = x) more powerful than a one-layer network? No! You can just compile all of the layers into a single transformation! y = f(w 3 f(w 2 f(w 1 x))) = Wx
24 why nonlinearities? credit for figure: Christopher Olah 24
25 why nonlinearities? complex functions! 25
26 neuron a (2) 1 = f Ä W (1) 11 x 1 + W (1) 12 x 2 + W (1) 13 x 3 + b (1) 1 26 ä
27 Learn the features and the function a (2) 2 = f Ä W (1) 21 x 1 + W (1) 22 x 2 + W (1) 23 x 3 + b (1) 2 27 ä
28 Learn the features and the function a (2) 3 = f Ä W (1) 31 x 1 + W (1) 32 x 2 + W (1) 33 x 3 + b (1) 3 28 ä
29 Learn the features and the function h W,b (x)=a (3) 1 = f Ä W (2) 11 a(2) W (2) 12 a(2) 2 + W (2) 13 a(2) 3 + b (2) 1 ä
30 Learn the features and the function we will be learning the x s and the W s! h W,b (x)=a (3) 1 = f Ä W (2) 11 a(2) W (2) 12 a(2) 2 + W (2) 13 a(2) 3 + b (2) 1 ä
31 in matrix-vector notation h L3 = f(w 2 h L2 + b) Ä h L2 = f(w 1 x + b) ä 31
32 Dracula is a really good book! neural network Positive
33 words as basic building blocks from last time: represent words with low-dimensional vectors called embeddings (Mikolov et al., NIPS 2013) king = [0.23, 1.3, -0.3, 0.43]
34 composing embeddings neural networks compose word embeddings into vectors for phrases, sentences, and documents neural network ( a really good book )=
35 deep averaging networks predict Positive z2 = f (W2 z1 ) nonlinear function z1 = f (W1 av) av = n X i=1 ci n a really good book c1 c2 c3 c4 affine transformation Iyyer et al., ACL 2015 :)
36 deep averaging networks predict Positive let s generalize to multiclass classification! z2 = f (W2 z1 ) nonlinear function z1 = f (W1 av) av = n X i=1 ci n a really good book c1 c2 c3 c4 affine transformation
37 softmax function let s say I have 3 classes instead of 2 (e.g., positive, neutral, negative) i want to compute probabilities for each class. for every class c, i have an associated weight vector β c, and then i compute P(y = c x) = e β cx 3 k=1 eβ kx sigmoid is a special case of softmax where number of classes = 2 37
38 in practice, this computation is done more efficiently softmax(x) = ex j e x j x is a vector xj is dimension j of x each dimension j of the softmaxed output represents the probability of class j 38
39 deep averaging networks out = softmax(w3 z2) z2 = f (W2 z1 ) nonlinear function z1 = f (W1 av) av = n X i=1 ci n a really good book c1 c2 c3 c4 affine transformation
40 deep averaging networks out = softmax(w3 z2) what are our model parameters (i.e., weights)? z2 = f (W2 z1 ) z1 = f (W1 av) av = n X ci i=1 n a really good book c1 c2 c3 c4
41 Training with softmax and cross-entropy error For each training example {x,y}, our objective is to maximize the probability of the correct class y Hence, we minimize the negative log probability of that class: L = 41
42 Background: Why Cross entropy error Assuming a ground truth (or gold or target) probability distribution that is 1 at the right class and 0 everywhere else: p = [0,,0,1,0, 0] and our computed probability is q, then the cross entropy is: Because of one-hot p, the only term left is the negative log probability of the true class 42
43 deep averaging networks out = softmax(w3 z2) L = cross-entropy(out, ground-truth) how do i update these parameters given the loss L? z2 = f (W2 z1 ) z1 = f (W1 av) av = n X ci i=1 n a really good book c1 c2 c3 c4
44 deep averaging networks out = softmax(w3 z2) L = cross-entropy(out, ground-truth) how do i update these parameters given the loss L? z2 = f (W2 z1 ) z1 = f (W1 av) L =??? ci av = n X ci i=1 n a really good book c1 c2 c3 c4
45 deep averaging networks out = softmax(w3 z2) chain rule!!! L L out z2 z1 av = ci out z2 z1 av ci z2 = f (W2 z1 ) z1 = f (W1 av) av = n X ci i=1 n a really good book c1 c2 c3 c4
46 deep averaging networks out = softmax(w3 z2) L = cross-entropy(out, ground-truth) z2 = f (W2 z1 ) L =??? W2 z1 = f (W1 av) av = n X ci i=1 n a really good book c1 c2 c3 c4
47 deep averaging networks out = softmax(w3 z2) L = cross-entropy(out, ground-truth) z2 = f (W2 z1 ) L L out z2 = W2 out z2 W2 z1 = f (W1 av) av = n X ci i=1 n a really good book c1 c2 c3 c4
48 backpropagation use the chain rule to compute partial derivatives w/ respect to each parameter trick: re-use derivatives computed for higher layers to compute derivatives for lower layers! L c i = L out out z 2 z 2 z 1 z 1 av av c i L W 2 = L out out z 2 z 2 W 2 48 Rumelhart et al., 1986
49 deep learning frameworks make building NNs super easy! Deep Averaging Network set up the network out = softmax(w2 z2) Initialization w1,..., wn z1 = f (W1 av) # z 0 =CBOW(w1,..., wx Nn) c av = z 1 =g (z 1 ) i=1 i n z 2 =g (z 2 ) y =softmax(z 3 ) a really good book def init (self, n_classes, vocab_size, emb_dim=300, n_hidden_units=300): super(danmodel, self). init () self.n_classes = n_classes self.vocab_size = vocab_size self.emb_dim = emb_dim self.n_hidden_units = n_hidden_units self.embeddings = nn.embedding(self.vocab_size, self.emb_dim) self.classifier = nn.sequential( nn.linear(self.n_hidden_units, self.n_hidden_units), nn.relu(), nn.linear(self.n_hidden_units, self.n_classes)) self._softmax = nn.softmax()
50 deep learning frameworks make Deep Averaging Network building NNs super easy! do a forward pass to compute prediction out = softmax(w2 z2) Forward w1,..., wn # z1 = f (W1 av) z 0 =CBOW(w1,..., wn ) z 1 =g (z 1 ) av = z 2 =g (z 2 ) n X ci i=1 n y =softmax(z 3 ) a really good book def forward(self, batch, probs=false): text = batch[ text ][ tokens ] length = batch[ length ] text_embed = self._word_embeddings(text) # Take the mean embedding. Since padding results # in zeros its safe to sum and divide by length encoded = text_embed.sum(1) encoded /= lengths.view(text_embed.size(0), -1) # Compute the network score predictions logits = self.classifier(encoded) if probs: return self._softmax(logits) else: return logits
51 deep learning frameworks make building NNs super easy! Deep Averaging Network w1,..., wn # 0 =CBOW(w1,...z, w ) N = f (W 1 =g (z 1 ) 2 1 =g (z 2 ) av = =softmax(z 3 ) a do a backward pass to update weights out = softmax(w2 z2) really good 1 Training av) n X ci i=1 n book def _run_epoch(self, batch_iter, train=true): self._model.train() for batch in batch_iter: model.zero_grad() out = model(batches) batch_loss = criterion(out, batch[ label ]) batch_loss.backward() self.optimizer.step()
52 deep learning frameworks make building NNs super easy! Deep Averaging Network w1,..., wn # 0 =CBOW(w1,...z, w ) N = f (W 1 =g (z 1 ) 2 1 =g (z 2 ) av = =softmax(z 3 ) a do a backward pass to update weights out = softmax(w2 z2) really good 1 Training av) n X ci i=1 n book def _run_epoch(self, batch_iter, train=true): self._model.train() for batch in batch_iter: model.zero_grad() out = model(batches) batch_loss = criterion(out, batch[ label ]) batch_loss.backward() self.optimizer.step() that s it! no need to compute gradients by hand! however, you will have to do this in HW2 :(
53 Regularization Regularization prevents overfitting when we have a lot of features (or later a very powerful/deep model,++) overfitting 11 model power 1/18/18 53
54 L2 regularization θ represents all of the model s parameters! 54
55 L2 regularization θ represents all of the model s parameters! penalizing their norm leads to smaller weights > we are constraining the parameter space > we are putting a prior on our model 55
56 dropout (for neural networks) randomly set p% of neurons to 0 in the forward pass [Srivastava et al., 2014] 56
57 why does this make sense? randomly set p% of neurons to 0 in the forward pass has an ear has a tail is furry X X p(cat) has claws X mischievous look 57
58 why does this make sense? randomly set p% of neurons to 0 in the forward pass has an ear has a tail is furry X X p(cat) has claws X mischievous look 58 network can t just rely on one neuron!
59 exercise! 59
Natural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationPart-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationMidterm: CS 6375 Spring 2018
Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationLecture 6: Neural Networks for Representing Word Meaning
Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationDeep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang
Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationHomework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown
Homework 3 COMS 4705 Fall 017 Prof. Kathleen McKeown The assignment consists of a programming part and a written part. For the programming part, make sure you have set up the development environment as
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationNeural Network Language Modeling
Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationNatural Language Processing
Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationNatural Language Processing with Deep Learning. CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Christopher Manning and Richard Socher Classifica6on setup and nota6on Generally
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationNeural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig
Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Slides credit: Graham Neubig Outline Perceptron: recap and limitations Neural networks Multi-layer perceptron Forward propagation
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationCS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam
CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam This examination consists of 17 printed sides, 5 questions, and 100 points. The exam accounts for 20% of your total grade.
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 3: Word Window Classification, Neural Networks, and Matrix Calculus 1. Course plan: coming up Week 2: We learn
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationword2vec Parameter Learning Explained
word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan Boulder 1 of 39 HW1 turned in HW2 released Office
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationLecture 2: Learning with neural networks
Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationIntroduction to Machine Learning HW6
CS 189 Spring 2018 Introduction to Machine Learning HW6 Your self-grade URL is http://eecs189.org/self_grade?question_ids=1_1,1_ 2,2_1,2_2,3_1,3_2,3_3,4_1,4_2,4_3,4_4,4_5,4_6,5_1,5_2,6. This homework is
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationNeural Networks for NLP. COMP-599 Nov 30, 2016
Neural Networks for NLP COMP-599 Nov 30, 2016 Outline Neural networks and deep learning: introduction Feedforward neural networks word2vec Complex neural network architectures Convolutional neural networks
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/62 Lecture 1b Logistic regression & neural network October 2, 2015 2/62 Table of contents 1 1 Bird s-eye
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationFinal Examination CS 540-2: Introduction to Artificial Intelligence
Final Examination CS 540-2: Introduction to Artificial Intelligence May 7, 2017 LAST NAME: SOLUTIONS FIRST NAME: Problem Score Max Score 1 14 2 10 3 6 4 10 5 11 6 9 7 8 9 10 8 12 12 8 Total 100 1 of 11
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationSequence Modeling with Neural Networks
Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationDISTRIBUTIONAL SEMANTICS
COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationOverview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop
Overview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop Project Tips Announcement: Hint for PSet1: Understand
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More information