Machine Learning for Smart Learners
|
|
- Brendan Baker
- 6 years ago
- Views:
Transcription
1 Department of Computer Science Instituto de Matemathics and Statistics University of Sao Paulo, Brazil Sao Paulo School of Advanced Science on Smart Cities 2017
2 What Is All This Fuss About Machine Learning?
3 What Is All This Fuss About Machine Learning? It s Just Hype!!! (Oba-Oba)
4 What Is All This Fuss About Machine Learning? It s Just Hype!!! (Oba-Oba) Other areas that went through similar hype Boolean Algebra and Digital Circuits ($$$) Automata and Formal Languages Relational Databases ($$$) Expert Systems Object-oriented Programming ($$$), etc
5 What Is All This Fuss About Machine Learning? It s Just Hype!!! (Oba-Oba) Other areas that went through similar hype Boolean Algebra and Digital Circuits ($$$) Automata and Formal Languages Relational Databases ($$$) Expert Systems Object-oriented Programming ($$$), etc When hype is gone, what remains is Science
6 Topics 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion
7 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion
8 Personal Note I have been working with machine learning (ML) since 1995
9 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics
10 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics Also worked in 2 environmental projects with ML
11 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics Also worked in 2 environmental projects with ML Materials Discovery for Air and Water Cleaning ( ) Fine Chemistry, (very) expensive data High complexity (NPc), no so big data Cornell University
12 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics Also worked in 2 environmental projects with ML Materials Discovery for Air and Water Cleaning ( ) Fine Chemistry, (very) expensive data High complexity (NPc), no so big data Cornell University Materials Refinement (just started) Brute Chemistry, not so expensive data Lots of data (Big Data) Civil engineering (USP), several companies interested
13 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics Also worked in 2 environmental projects with ML Materials Discovery for Air and Water Cleaning ( ) Fine Chemistry, (very) expensive data High complexity (NPc), no so big data Cornell University Materials Refinement (just started) Brute Chemistry, not so expensive data Lots of data (Big Data) Civil engineering (USP), several companies interested
14 Description of a Real Project (Started) Q: What is the product that is most manufactured today?
15 Description of a Real Project (Started) Q: What is the product that is most manufactured today? A: Concrete
16 Description of a Real Project (Started) Q: What is the product that is most manufactured today? A: Concrete Artificial Stone, mainly used in cities Big potential to improve living environment Affects the efficiency of built environment and climate change
17 Description of a Real Project (Started) Q: What is the product that is most manufactured today? A: Concrete Artificial Stone, mainly used in cities Big potential to improve living environment Affects the efficiency of built environment and climate change Per-capta production (2003):
18 Description of a Real Project (Started) Q: What is the product that is most manufactured today? A: Concrete Artificial Stone, mainly used in cities Big potential to improve living environment Affects the efficiency of built environment and climate change Per-capta production (2003): 4.2 t/inhabitant (cementitious material)
19 A Cement World
20 Evolution of CO 2 Emissions from Cement Conclusion: Concrete has an enormous impact in CO 2 emissions
21 Concrete Depends on Too Many Inputs Cement quality Water quantity Aggregate origin (Sand, natural gravel, and crushed stone) Chemical and mineral admixtures Reinforcement (steel, protracted steel) Type of production, temperature, humidity, altitude Intended use, dust emission, Mixing, workability, curing, time to dry out, etc
22 Concrete Depends on Too Many Inputs Cement quality Water quantity Aggregate origin (Sand, natural gravel, and crushed stone) Chemical and mineral admixtures Reinforcement (steel, protracted steel) Type of production, temperature, humidity, altitude Intended use, dust emission, Mixing, workability, curing, time to dry out, etc And there are lots of data collected
23 The Problem 1 Given the inputs, predict CO 2 emission and important properties (compressive strength, tensile strength, elasticity, coefficient of thermal expansion)
24 The Problem 1 Given the inputs, predict CO 2 emission and important properties (compressive strength, tensile strength, elasticity, coefficient of thermal expansion) 2 Control deviations in the production process
25 The Problem 1 Given the inputs, predict CO 2 emission and important properties (compressive strength, tensile strength, elasticity, coefficient of thermal expansion) 2 Control deviations in the production process 3 Propose new kinds of cements and cementitious material to optimize: CO 2 emission profit Cement use in concrete Chemical additives in concrete, etc.
26 The Problem 1 Given the inputs, predict CO 2 emission and important properties (compressive strength, tensile strength, elasticity, coefficient of thermal expansion) 2 Control deviations in the production process 3 Propose new kinds of cements and cementitious material to optimize: CO 2 emission profit Cement use in concrete Chemical additives in concrete, etc. So now we can talk about machine learning
27 Similar Approaches in the Literature Taffese & Sistonen. Machine learning for durability and..., Automation in Construction, 2017 Machine learning methods can substitute time and resource consuming lab tests Machine learning techniques can play a substantial role in durability assessment Machine learning algorithms utilizing sensors data can discover hidden insights The future durability assessment and service-life prediction approach is proposed Thanks to Prof. Vanderley M. John for input on concrete information
28 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion
29 What is Machine Learning? Branch of Computer Science Aim: Give computers the ability to learn without being explicitly programmed How?
30 What is Machine Learning? Branch of Computer Science Aim: Give computers the ability to learn without being explicitly programmed How? Algorithms that learn from data
31 What is Machine Learning? Branch of Computer Science Aim: Give computers the ability to learn without being explicitly programmed How? Algorithms that learn from data and make predictions on data
32 What is Machine Learning? Branch of Computer Science Aim: Give computers the ability to learn without being explicitly programmed How? Algorithms that learn from data (Learning Phase) and make predictions on data (Execution Phase)
33 The Most Important Element?
34 The Most Important Element? Data
35 The Most Important Element? Data Annotated Data: input and desired output. Expensive. Raw Data: input only. Less expensive.
36 The Most Important Element? Data Annotated Data: input and desired output. Expensive. Basis for Supervised Learning Raw Data: input only. Less expensive. Basis for Unsupervised Learning
37 The Most Important Element? Data Annotated Data: input and desired output. Expensive. Basis for Supervised Learning Raw Data: input only. Less expensive. Basis for Unsupervised Learning Other important elements: learning models learning algorithms computing power
38 Common Uses of Machine Learning
39 Common Application Domains Unstructured models Image Processing Computational Linguistics
40 Machine Learning Model Types (a more relevant classification) Generative Models: model-based learning Regressive Models: function-based learning
41 Machine Learning Model Types (a more relevant classification) Generative Models: model-based learning Decision Trees Logic-based models: association rules, inductive logic Markov Models: hidden, explicit, variable length Bayesian Models: naive, graphic models Probabilistic relational models PAC learning, etc Regressive Models: function-based learning Perceptron and its variants SVM Neural Nets (NN), etc
42 Concrete Examples (from Language Processing) Example 1 Example 2 Generative / Model-based Supervised Classification Hidden Markov Model Regressive / function-based Unsupervised Feature Learning Neural Net
43 Concrete Examples (from Language Processing) Example 1 Part-of-speech tagging Generative / Model-based Supervised Classification Hidden Markov Model Example 2 Regressive / function-based Unsupervised Feature Learning Neural Net
44 Concrete Examples (from Language Processing) Example 1 Part-of-speech tagging Example 2 Word2Vec Generative / Model-based Supervised Classification Hidden Markov Model Regressive / function-based Unsupervised Feature Learning Neural Net
45 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion
46 Hidden Markov Models See presentation attached
47 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion
48 Neural Networks It all started with the linear separation problem
49 Perceptron Created by Rosenblatt [1957] Binary separation of data X = {x 1,... x n }, dim(x i ) = k Find w and b such that, for x X { 1, w x + b > 0 perceptron(x) = 0, w x + b < 0 Zero (unseparable) or infinitely many hyperplanes (w, b) Minsky & Papert [1969]: Xor-functions cannot be learned by single layer perceptrons
50 Dealing with non-separable cases SVM Employs Quadratic Programming to obtain single answer Expanding number of dimensions leads to separability, extra dimensions are a function of given ones Uses kernels to deal with large number of dimensions
51 Dealing with non-separable cases SVM Employs Quadratic Programming to obtain single answer Expanding number of dimensions leads to separability, extra dimensions are a function of given ones Uses kernels to deal with large number of dimensions Neural Networks Multilayer perceptrons 0-1 functions a problem for learning (not differentiable) Use some sigmoid (σ) function instead Neuron: σ(w x + b)
52 Neural Net (Old View) Does not scale. e.g. X 15, 000
53 Neural Net (New View) A whole layer may be represented as : layer(x) = σ(wx + b) x, b, layer(x): vectors; W : matrix W W x x σ y + W x+b b Nodes are operations arcs are tensors
54 Neural Net (New View) A whole layer may be represented as : layer(x) = σ(wx + b) x, b, layer(x): vectors; W : matrix W W x x σ y + W x+b b Nodes are operations arcs are tensors Training occurs over such graph using backpropagation
55 The Backpropagation Algorithm Proposed by Kelley [1960], Bryson [1961] and Dreyfus [1962] Popularized by Rumelhart, Hinton & Williams [1986] Learning by gradient descent. Supervised learning Applies to a tensor graph (not only NN!)
56 The Backpropagation Algorithm (Guts) Initializes weights to be learned randomly Minimizes a loss function. E.g. L(x) = (y i ŷ i (x)) 2 Phase 1: Propagation. Computes the output ŷ and loss L Phase 2: Weight update. α is the learning rate W t+1 = W t + α L W b t+1 = b t + α L b A cycle propagation-update is an epoch Local optimization, may get stuck at local minima
57 Word2vec Word2vec is a name that covers two models Skip-gram Continuous Bag-of-Words (CBOW) Aim: learn efficiently a vectorial representation of words banana v 1,..., v N, v i Q Unsupervised learning from a large unstructured corpus Based on word co-ocurrence statistics. The idea is not new: You shall know a word by the company it keeps (J.R Firth, 1957)
58 A Simplified CBOW model Given a corpus, choose: A vocabulary V. A vector size N to represent words Use matrices W Q V,N and W Q N, V to create two vetor representations of each word w: input vector: vw (line of W ). output vector: v w (column of W ).
59 A Simplified CBOW model The model s task is to predict a focus word given a context of words: O primeiro rei de Portugal nasceu em... Observation (rei, primeiro) Observation (input word, output word) Observation (w I, w O )
60 A Simplified CBOW model Deep learning, with depth 2!!!
61 A Simplified CBOW model ( ) * (1, V ) ( V,N) ( ) x h y x wo W = * softmax cross (1,N) W' (N, V ) entropy (1, V ) (1, V )
62 Some Definitions A one-hot is a vector of bits with a single 1-bit; all other bits 0. Given N elements we associate them to N 1-hot vectors of size N: The softmax function is a probability distribution over the elements of a vector: P(z j ) = e z j N i=1 ez i The cross-entropy of distributions p and q CE(p, q) = i p i log q i
63 A Simplified CBOW model Given (xw I, xw O ) one-hot of (w I, w O ) and x = xw I, the model is: V h i = w si x s com i = 1,..., N (1) u j = s=1 N w sjh s com j = 1,..., V (2) s=1 y j = P(w j w I ) = exp(u j ) V j =1 exp(u j ) com j = 1,..., V (3) V E = CE(xw O, y) = xw O s log(y s ) (4) s=1
64 A Simplified CBOW model Due to 1-hot format of xw I e xw O we simplify (1), (2), (3) e (4) y j = where j is the index of w O. h = vw I (5) u j = v w j. T vw I (6) exp(u j ) V j =1 exp(u j ) (7) V E = u j + log( j =1 exp(u j )) (8)
65 A Simplified CBOW model: update Applying backpropagration we have the weight update of the last layer is w ij (new) = w (old) ij α ej h i (9) in vector notation v w j (new) = v wj (old) α ej vw I (10) where e = y xw O
66 A Simplified CBOW model: update w j w O α e j < 0 subtract from v w j a fraction of vw I increase the cosine distance between vw I and v w j. w j = w O α e j > 0 add a fraction of vw I to v w j decrease the cosine distance between vw I and v w j.
67 A Simplified CBOW model: update Proceed with backpropagation: W (new) = W (old) α xeh T (11) vw I (new) = vw I (old) α xeh T (k I,.) (12) where EH = e(w ) T and k I is the index of w I.
68 A Simplified CBOW model Repeat this process with examples from the corpus, the effect accumulates and as a result words with similar contexts will get close to each other. The model captures the co-ocurrence statistics using cosine distance
69 CBOW Now, starting from an arbitrary window of size C, we construct observations such as ([w I1,...,w IC ],w O ). E.g., for C = 4: Nunca me acostumei com o cantor dessa banda, e nem... ([com, o, dessa, banda], cantor)
70 CBOW: modelo (i) x wi 1 (1, V ) x wi 2 (1, V ). x wi C + ( ) * (1, V ) ( V,N) ( ) x h y x wo W 1/C * softmax cross (1,N) W' (N, V ) entropy (1, V ) (1, V ) (1, V )
71 CBOW: model x = xw I1 + + xw IC (13) h = 1 C (v w I1 + + vw IC ) (14) u j = N w sjh s (15) s=1 y j = p(w j w I1,...,w IC ) = V E = u j + log( exp(v w j. T h) V j =1 exp(v w j.t h) j =1 (16) exp(u j )) (17)
72 CBOW: update v w j (new) = v wj (old) α ej h (18) vw Ic (new) = vw Ic (old) 1 C α xeht (k Ic,.) (19) for c = 1,..., C. Where k I1,..., k IC respectively. w I1,...,w IC are the indexes of
73 Optimization y j = exp(u j ) V j =1 exp(u j ) Too costly to compute for each input training instance Negative Sampling Hierarchical Softmax
74 Negative Sampling We keep x, W, W, h as before. To compute the error function, we employ a distribution P n (w) over all words in the corpus. E.g.: P n (w) = U(w) 3 4 Z Using P n (w) we sample w i1,...,w ik ; avoid w O amog them
75 Negative Sampling (w I,w O ) Positive example (w i1,w O ),..., (w ik,w O ) Negative examples
76 Negative Sampling: the model p(d = 1 w,w O ) = σ(v w. T h) probability of (w,w O ) to co-occur in the corpus (in a C-window) p(d = 0 w,w O ) probability of (w,w O ) not to co-occur in the corpus The goal of training now is to maximize the probabilities p(d = 1 w I,w O ), p(d = 0 w i1,w O ),..., p(d = 0 w ik,w O )
77 Negative Sampling: the model Minimize the following error function: E = log(p(d = 1 w I,w O ). K s=1 = (log p(d = 1 w I,w O ) + log( = (log p(d = 1 w I,w O ) + K s=1 p(d = 0 w is,w O )) K s=1 K = log σ(v w O. T h) log(σ( v w is. T h)) s=1 p(d = 0 w is,w O ))) log(p(d = 0 w is,w O )))
78 Negative Sampling: update v w O (new) = v wo (old) α (σ(v wo. T h) 1) h (20) v w is (new) = v wis (old) α #(is ) σ(v w is. T h) h (21) W (new) = W (old) α xeh T (22) where K EH = (σ(v w O. T h) 1) v w O + σ(v w is s=1. T h)v w is Deep learning with 1.5 layers!!!
79 Intrinsic Evaluation of the Method w 1 is to w 2 as w 3 is to x w 1 = France, w 2 = Paris, w 3 = Japan; x = Tokyo w 1 = man, w 2 = king, w 3 = woman; x = queen
80 Intrinsic Evaluation of the Method w 1 is to w 2 as w 3 is to x w 1 = France, w 2 = Paris, w 3 = Japan; x = Tokyo w 1 = man, w 2 = king, w 3 = woman; x = queen Problematic cases: w 1 = man, w 2 = manager, w 3 = woman; x = secretary w 1 = white, w 2 = beauty, w 3 = black; x = ugly
81 Intrinsic Evaluation of the Method w 1 is to w 2 as w 3 is to x w 1 = France, w 2 = Paris, w 3 = Japan; x = Tokyo w 1 = man, w 2 = king, w 3 = woman; x = queen Problematic cases: w 1 = man, w 2 = manager, w 3 = woman; x = secretary w 1 = white, w 2 = beauty, w 3 = black; x = ugly The method unveils sexism and racism buried in the data! Book: Weapons of Math Destruction
82 Applications (extrinsic evaluation) Many NLP applications employ word2vec We have implemented word2vec in portuguese, using Google s TensorFlow And used it for Named Entity Recognition (NER)
83 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion
84 Thank you! Visit our group s Machine Learning tutorials Tensorflow Theano+Lasagne Keras
85 Bibliography Halevy, A. Norvig, P. and Pereira, F. (2009). The unreasonable effectiveness of data in Intelligent Systems, IEEE. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representation in vector space. arxiv preprint arxiv: Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages Morin, F., Bengio, Y. (2005). Hierarchical probabilistic neural network language model. In AISTATS, pages Rong, X. (2016). Word2vec Parameter Learning Explained. arxiv preprint arxiv:
word2vec Parameter Learning Explained
word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationarxiv: v3 [cs.cl] 30 Jan 2016
word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu arxiv:1411.2738v3 [cs.cl] 30 Jan 2016 Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention
More informationAn overview of word2vec
An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations
More informationGloVe: Global Vectors for Word Representation 1
GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 https://nlp.stanford.edu/projects/glove/ Outline Background
More informationNatural Language Processing
Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization
More informationDeep Learning. Ali Ghodsi. University of Waterloo
University of Waterloo Language Models A language model computes a probability for a sequence of words: P(w 1,..., w T ) Useful for machine translation Word ordering: p (the cat is small) > p (small the
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationPart-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationDeep Learning for NLP Part 2
Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing Dylan Drover, Borui Ye, Jie Peng University of Waterloo djdrover@uwaterloo.ca borui.ye@uwaterloo.ca July 8, 2015 Dylan Drover, Borui Ye, Jie Peng (University
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationDeep Learning: a gentle introduction
Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLecture 6: Neural Networks for Representing Word Meaning
Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationWord Embeddings 2 - Class Discussions
Word Embeddings 2 - Class Discussions Jalaj February 18, 2016 Opening Remarks - Word embeddings as a concept are intriguing. The approaches are mostly adhoc but show good empirical performance. Paper 1
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationNeural Networks for NLP. COMP-599 Nov 30, 2016
Neural Networks for NLP COMP-599 Nov 30, 2016 Outline Neural networks and deep learning: introduction Feedforward neural networks word2vec Complex neural network architectures Convolutional neural networks
More informationCMSC 421: Neural Computation. Applications of Neural Networks
CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationDISTRIBUTIONAL SEMANTICS
COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationUnit 8: Introduction to neural networks. Perceptrons
Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationDeep Learning. Ali Ghodsi. University of Waterloo
University of Waterloo Deep learning attempts to learn representations of data with multiple levels of abstraction. Deep learning usually refers to a set of algorithms and computational models that are
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationTTIC 31230, Fundamentals of Deep Learning, Winter David McAllester. The Fundamental Equations of Deep Learning
TTIC 31230, Fundamentals of Deep Learning, Winter 2019 David McAllester The Fundamental Equations of Deep Learning 1 Early History 1943: McCullock and Pitts introduced the linear threshold neuron. 1962:
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationDeep Neural Networks
Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationDeep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationLast update: October 26, Neural networks. CMSC 421: Section Dana Nau
Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications
More informationCS224n: Natural Language Processing with Deep Learning 1
CS224n: Natural Language Processing with Deep Learning Lecture Notes: Part I 2 Winter 27 Course Instructors: Christopher Manning, Richard Socher 2 Authors: Francois Chaubard, Michael Fang, Guillaume Genthial,
More informationNeural Word Embeddings from Scratch
Neural Word Embeddings from Scratch Xin Li 12 1 NLP Center Tencent AI Lab 2 Dept. of System Engineering & Engineering Management The Chinese University of Hong Kong 2018-04-09 Xin Li Neural Word Embeddings
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More information22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1
Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable
More informationNatural Language Processing with Deep Learning CS224N/Ling284. Richard Socher Lecture 2: Word Vectors
Natural Language Processing with Deep Learning CS224N/Ling284 Richard Socher Lecture 2: Word Vectors Organization PSet 1 is released. Coding Session 1/22: (Monday, PA1 due Thursday) Some of the questions
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationArtificial Neural Networks. Historical description
Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationThe representation of word and sentence
2vec Jul 4, 2017 Presentation Outline 2vec 1 2 2vec 3 4 5 6 discrete representation taxonomy:wordnet Example:good 2vec Problems 2vec synonyms: adept,expert,good It can t keep up to date It can t accurate
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationNeural Network Language Modeling
Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation
More informationPlan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation
Neural Networks Plan Perceptron Linear discriminant Associative memories Hopfield networks Chaotic networks Multilayer perceptron Backpropagation Perceptron Historically, the first neural net Inspired
More informationLearning and Neural Networks
Artificial Intelligence Learning and Neural Networks Readings: Chapter 19 & 20.5 of Russell & Norvig Example: A Feed-forward Network w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More informationThe Perceptron. Volker Tresp Summer 2018
The Perceptron Volker Tresp Summer 2018 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationIntroduction to Artificial Neural Networks
Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More information2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University
2018 EE448, Big Data Mining, Lecture 5 Supervised Learning (Part II) Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of Supervised Learning
More informationCSC321 Lecture 7 Neural language models
CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We
More informationHidden Markov Models
Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic
More informationCOMP-4360 Machine Learning Neural Networks
COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More information10. Artificial Neural Networks
Foundations of Machine Learning CentraleSupélec Fall 217 1. Artificial Neural Networks Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning
More information