Machine Learning for Smart Learners

Size: px
Start display at page:

Download "Machine Learning for Smart Learners"

Transcription

1 Department of Computer Science Instituto de Matemathics and Statistics University of Sao Paulo, Brazil Sao Paulo School of Advanced Science on Smart Cities 2017

2 What Is All This Fuss About Machine Learning?

3 What Is All This Fuss About Machine Learning? It s Just Hype!!! (Oba-Oba)

4 What Is All This Fuss About Machine Learning? It s Just Hype!!! (Oba-Oba) Other areas that went through similar hype Boolean Algebra and Digital Circuits ($$$) Automata and Formal Languages Relational Databases ($$$) Expert Systems Object-oriented Programming ($$$), etc

5 What Is All This Fuss About Machine Learning? It s Just Hype!!! (Oba-Oba) Other areas that went through similar hype Boolean Algebra and Digital Circuits ($$$) Automata and Formal Languages Relational Databases ($$$) Expert Systems Object-oriented Programming ($$$), etc When hype is gone, what remains is Science

6 Topics 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion

7 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion

8 Personal Note I have been working with machine learning (ML) since 1995

9 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics

10 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics Also worked in 2 environmental projects with ML

11 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics Also worked in 2 environmental projects with ML Materials Discovery for Air and Water Cleaning ( ) Fine Chemistry, (very) expensive data High complexity (NPc), no so big data Cornell University

12 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics Also worked in 2 environmental projects with ML Materials Discovery for Air and Water Cleaning ( ) Fine Chemistry, (very) expensive data High complexity (NPc), no so big data Cornell University Materials Refinement (just started) Brute Chemistry, not so expensive data Lots of data (Big Data) Civil engineering (USP), several companies interested

13 Personal Note I have been working with machine learning (ML) since 1995 mainly in Computational Linguistics Also worked in 2 environmental projects with ML Materials Discovery for Air and Water Cleaning ( ) Fine Chemistry, (very) expensive data High complexity (NPc), no so big data Cornell University Materials Refinement (just started) Brute Chemistry, not so expensive data Lots of data (Big Data) Civil engineering (USP), several companies interested

14 Description of a Real Project (Started) Q: What is the product that is most manufactured today?

15 Description of a Real Project (Started) Q: What is the product that is most manufactured today? A: Concrete

16 Description of a Real Project (Started) Q: What is the product that is most manufactured today? A: Concrete Artificial Stone, mainly used in cities Big potential to improve living environment Affects the efficiency of built environment and climate change

17 Description of a Real Project (Started) Q: What is the product that is most manufactured today? A: Concrete Artificial Stone, mainly used in cities Big potential to improve living environment Affects the efficiency of built environment and climate change Per-capta production (2003):

18 Description of a Real Project (Started) Q: What is the product that is most manufactured today? A: Concrete Artificial Stone, mainly used in cities Big potential to improve living environment Affects the efficiency of built environment and climate change Per-capta production (2003): 4.2 t/inhabitant (cementitious material)

19 A Cement World

20 Evolution of CO 2 Emissions from Cement Conclusion: Concrete has an enormous impact in CO 2 emissions

21 Concrete Depends on Too Many Inputs Cement quality Water quantity Aggregate origin (Sand, natural gravel, and crushed stone) Chemical and mineral admixtures Reinforcement (steel, protracted steel) Type of production, temperature, humidity, altitude Intended use, dust emission, Mixing, workability, curing, time to dry out, etc

22 Concrete Depends on Too Many Inputs Cement quality Water quantity Aggregate origin (Sand, natural gravel, and crushed stone) Chemical and mineral admixtures Reinforcement (steel, protracted steel) Type of production, temperature, humidity, altitude Intended use, dust emission, Mixing, workability, curing, time to dry out, etc And there are lots of data collected

23 The Problem 1 Given the inputs, predict CO 2 emission and important properties (compressive strength, tensile strength, elasticity, coefficient of thermal expansion)

24 The Problem 1 Given the inputs, predict CO 2 emission and important properties (compressive strength, tensile strength, elasticity, coefficient of thermal expansion) 2 Control deviations in the production process

25 The Problem 1 Given the inputs, predict CO 2 emission and important properties (compressive strength, tensile strength, elasticity, coefficient of thermal expansion) 2 Control deviations in the production process 3 Propose new kinds of cements and cementitious material to optimize: CO 2 emission profit Cement use in concrete Chemical additives in concrete, etc.

26 The Problem 1 Given the inputs, predict CO 2 emission and important properties (compressive strength, tensile strength, elasticity, coefficient of thermal expansion) 2 Control deviations in the production process 3 Propose new kinds of cements and cementitious material to optimize: CO 2 emission profit Cement use in concrete Chemical additives in concrete, etc. So now we can talk about machine learning

27 Similar Approaches in the Literature Taffese & Sistonen. Machine learning for durability and..., Automation in Construction, 2017 Machine learning methods can substitute time and resource consuming lab tests Machine learning techniques can play a substantial role in durability assessment Machine learning algorithms utilizing sensors data can discover hidden insights The future durability assessment and service-life prediction approach is proposed Thanks to Prof. Vanderley M. John for input on concrete information

28 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion

29 What is Machine Learning? Branch of Computer Science Aim: Give computers the ability to learn without being explicitly programmed How?

30 What is Machine Learning? Branch of Computer Science Aim: Give computers the ability to learn without being explicitly programmed How? Algorithms that learn from data

31 What is Machine Learning? Branch of Computer Science Aim: Give computers the ability to learn without being explicitly programmed How? Algorithms that learn from data and make predictions on data

32 What is Machine Learning? Branch of Computer Science Aim: Give computers the ability to learn without being explicitly programmed How? Algorithms that learn from data (Learning Phase) and make predictions on data (Execution Phase)

33 The Most Important Element?

34 The Most Important Element? Data

35 The Most Important Element? Data Annotated Data: input and desired output. Expensive. Raw Data: input only. Less expensive.

36 The Most Important Element? Data Annotated Data: input and desired output. Expensive. Basis for Supervised Learning Raw Data: input only. Less expensive. Basis for Unsupervised Learning

37 The Most Important Element? Data Annotated Data: input and desired output. Expensive. Basis for Supervised Learning Raw Data: input only. Less expensive. Basis for Unsupervised Learning Other important elements: learning models learning algorithms computing power

38 Common Uses of Machine Learning

39 Common Application Domains Unstructured models Image Processing Computational Linguistics

40 Machine Learning Model Types (a more relevant classification) Generative Models: model-based learning Regressive Models: function-based learning

41 Machine Learning Model Types (a more relevant classification) Generative Models: model-based learning Decision Trees Logic-based models: association rules, inductive logic Markov Models: hidden, explicit, variable length Bayesian Models: naive, graphic models Probabilistic relational models PAC learning, etc Regressive Models: function-based learning Perceptron and its variants SVM Neural Nets (NN), etc

42 Concrete Examples (from Language Processing) Example 1 Example 2 Generative / Model-based Supervised Classification Hidden Markov Model Regressive / function-based Unsupervised Feature Learning Neural Net

43 Concrete Examples (from Language Processing) Example 1 Part-of-speech tagging Generative / Model-based Supervised Classification Hidden Markov Model Example 2 Regressive / function-based Unsupervised Feature Learning Neural Net

44 Concrete Examples (from Language Processing) Example 1 Part-of-speech tagging Example 2 Word2Vec Generative / Model-based Supervised Classification Hidden Markov Model Regressive / function-based Unsupervised Feature Learning Neural Net

45 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion

46 Hidden Markov Models See presentation attached

47 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion

48 Neural Networks It all started with the linear separation problem

49 Perceptron Created by Rosenblatt [1957] Binary separation of data X = {x 1,... x n }, dim(x i ) = k Find w and b such that, for x X { 1, w x + b > 0 perceptron(x) = 0, w x + b < 0 Zero (unseparable) or infinitely many hyperplanes (w, b) Minsky & Papert [1969]: Xor-functions cannot be learned by single layer perceptrons

50 Dealing with non-separable cases SVM Employs Quadratic Programming to obtain single answer Expanding number of dimensions leads to separability, extra dimensions are a function of given ones Uses kernels to deal with large number of dimensions

51 Dealing with non-separable cases SVM Employs Quadratic Programming to obtain single answer Expanding number of dimensions leads to separability, extra dimensions are a function of given ones Uses kernels to deal with large number of dimensions Neural Networks Multilayer perceptrons 0-1 functions a problem for learning (not differentiable) Use some sigmoid (σ) function instead Neuron: σ(w x + b)

52 Neural Net (Old View) Does not scale. e.g. X 15, 000

53 Neural Net (New View) A whole layer may be represented as : layer(x) = σ(wx + b) x, b, layer(x): vectors; W : matrix W W x x σ y + W x+b b Nodes are operations arcs are tensors

54 Neural Net (New View) A whole layer may be represented as : layer(x) = σ(wx + b) x, b, layer(x): vectors; W : matrix W W x x σ y + W x+b b Nodes are operations arcs are tensors Training occurs over such graph using backpropagation

55 The Backpropagation Algorithm Proposed by Kelley [1960], Bryson [1961] and Dreyfus [1962] Popularized by Rumelhart, Hinton & Williams [1986] Learning by gradient descent. Supervised learning Applies to a tensor graph (not only NN!)

56 The Backpropagation Algorithm (Guts) Initializes weights to be learned randomly Minimizes a loss function. E.g. L(x) = (y i ŷ i (x)) 2 Phase 1: Propagation. Computes the output ŷ and loss L Phase 2: Weight update. α is the learning rate W t+1 = W t + α L W b t+1 = b t + α L b A cycle propagation-update is an epoch Local optimization, may get stuck at local minima

57 Word2vec Word2vec is a name that covers two models Skip-gram Continuous Bag-of-Words (CBOW) Aim: learn efficiently a vectorial representation of words banana v 1,..., v N, v i Q Unsupervised learning from a large unstructured corpus Based on word co-ocurrence statistics. The idea is not new: You shall know a word by the company it keeps (J.R Firth, 1957)

58 A Simplified CBOW model Given a corpus, choose: A vocabulary V. A vector size N to represent words Use matrices W Q V,N and W Q N, V to create two vetor representations of each word w: input vector: vw (line of W ). output vector: v w (column of W ).

59 A Simplified CBOW model The model s task is to predict a focus word given a context of words: O primeiro rei de Portugal nasceu em... Observation (rei, primeiro) Observation (input word, output word) Observation (w I, w O )

60 A Simplified CBOW model Deep learning, with depth 2!!!

61 A Simplified CBOW model ( ) * (1, V ) ( V,N) ( ) x h y x wo W = * softmax cross (1,N) W' (N, V ) entropy (1, V ) (1, V )

62 Some Definitions A one-hot is a vector of bits with a single 1-bit; all other bits 0. Given N elements we associate them to N 1-hot vectors of size N: The softmax function is a probability distribution over the elements of a vector: P(z j ) = e z j N i=1 ez i The cross-entropy of distributions p and q CE(p, q) = i p i log q i

63 A Simplified CBOW model Given (xw I, xw O ) one-hot of (w I, w O ) and x = xw I, the model is: V h i = w si x s com i = 1,..., N (1) u j = s=1 N w sjh s com j = 1,..., V (2) s=1 y j = P(w j w I ) = exp(u j ) V j =1 exp(u j ) com j = 1,..., V (3) V E = CE(xw O, y) = xw O s log(y s ) (4) s=1

64 A Simplified CBOW model Due to 1-hot format of xw I e xw O we simplify (1), (2), (3) e (4) y j = where j is the index of w O. h = vw I (5) u j = v w j. T vw I (6) exp(u j ) V j =1 exp(u j ) (7) V E = u j + log( j =1 exp(u j )) (8)

65 A Simplified CBOW model: update Applying backpropagration we have the weight update of the last layer is w ij (new) = w (old) ij α ej h i (9) in vector notation v w j (new) = v wj (old) α ej vw I (10) where e = y xw O

66 A Simplified CBOW model: update w j w O α e j < 0 subtract from v w j a fraction of vw I increase the cosine distance between vw I and v w j. w j = w O α e j > 0 add a fraction of vw I to v w j decrease the cosine distance between vw I and v w j.

67 A Simplified CBOW model: update Proceed with backpropagation: W (new) = W (old) α xeh T (11) vw I (new) = vw I (old) α xeh T (k I,.) (12) where EH = e(w ) T and k I is the index of w I.

68 A Simplified CBOW model Repeat this process with examples from the corpus, the effect accumulates and as a result words with similar contexts will get close to each other. The model captures the co-ocurrence statistics using cosine distance

69 CBOW Now, starting from an arbitrary window of size C, we construct observations such as ([w I1,...,w IC ],w O ). E.g., for C = 4: Nunca me acostumei com o cantor dessa banda, e nem... ([com, o, dessa, banda], cantor)

70 CBOW: modelo (i) x wi 1 (1, V ) x wi 2 (1, V ). x wi C + ( ) * (1, V ) ( V,N) ( ) x h y x wo W 1/C * softmax cross (1,N) W' (N, V ) entropy (1, V ) (1, V ) (1, V )

71 CBOW: model x = xw I1 + + xw IC (13) h = 1 C (v w I1 + + vw IC ) (14) u j = N w sjh s (15) s=1 y j = p(w j w I1,...,w IC ) = V E = u j + log( exp(v w j. T h) V j =1 exp(v w j.t h) j =1 (16) exp(u j )) (17)

72 CBOW: update v w j (new) = v wj (old) α ej h (18) vw Ic (new) = vw Ic (old) 1 C α xeht (k Ic,.) (19) for c = 1,..., C. Where k I1,..., k IC respectively. w I1,...,w IC are the indexes of

73 Optimization y j = exp(u j ) V j =1 exp(u j ) Too costly to compute for each input training instance Negative Sampling Hierarchical Softmax

74 Negative Sampling We keep x, W, W, h as before. To compute the error function, we employ a distribution P n (w) over all words in the corpus. E.g.: P n (w) = U(w) 3 4 Z Using P n (w) we sample w i1,...,w ik ; avoid w O amog them

75 Negative Sampling (w I,w O ) Positive example (w i1,w O ),..., (w ik,w O ) Negative examples

76 Negative Sampling: the model p(d = 1 w,w O ) = σ(v w. T h) probability of (w,w O ) to co-occur in the corpus (in a C-window) p(d = 0 w,w O ) probability of (w,w O ) not to co-occur in the corpus The goal of training now is to maximize the probabilities p(d = 1 w I,w O ), p(d = 0 w i1,w O ),..., p(d = 0 w ik,w O )

77 Negative Sampling: the model Minimize the following error function: E = log(p(d = 1 w I,w O ). K s=1 = (log p(d = 1 w I,w O ) + log( = (log p(d = 1 w I,w O ) + K s=1 p(d = 0 w is,w O )) K s=1 K = log σ(v w O. T h) log(σ( v w is. T h)) s=1 p(d = 0 w is,w O ))) log(p(d = 0 w is,w O )))

78 Negative Sampling: update v w O (new) = v wo (old) α (σ(v wo. T h) 1) h (20) v w is (new) = v wis (old) α #(is ) σ(v w is. T h) h (21) W (new) = W (old) α xeh T (22) where K EH = (σ(v w O. T h) 1) v w O + σ(v w is s=1. T h)v w is Deep learning with 1.5 layers!!!

79 Intrinsic Evaluation of the Method w 1 is to w 2 as w 3 is to x w 1 = France, w 2 = Paris, w 3 = Japan; x = Tokyo w 1 = man, w 2 = king, w 3 = woman; x = queen

80 Intrinsic Evaluation of the Method w 1 is to w 2 as w 3 is to x w 1 = France, w 2 = Paris, w 3 = Japan; x = Tokyo w 1 = man, w 2 = king, w 3 = woman; x = queen Problematic cases: w 1 = man, w 2 = manager, w 3 = woman; x = secretary w 1 = white, w 2 = beauty, w 3 = black; x = ugly

81 Intrinsic Evaluation of the Method w 1 is to w 2 as w 3 is to x w 1 = France, w 2 = Paris, w 3 = Japan; x = Tokyo w 1 = man, w 2 = king, w 3 = woman; x = queen Problematic cases: w 1 = man, w 2 = manager, w 3 = woman; x = secretary w 1 = white, w 2 = beauty, w 3 = black; x = ugly The method unveils sexism and racism buried in the data! Book: Weapons of Math Destruction

82 Applications (extrinsic evaluation) Many NLP applications employ word2vec We have implemented word2vec in portuguese, using Google s TensorFlow And used it for Named Entity Recognition (NER)

83 Next Topic 1 Machine Learning and Smart Cities 2 Machine Learning Basics 3 Example 1: PoS Tagging 4 Example 2: Feature Learning via Word2Vec 5 Conclusion

84 Thank you! Visit our group s Machine Learning tutorials Tensorflow Theano+Lasagne Keras

85 Bibliography Halevy, A. Norvig, P. and Pereira, F. (2009). The unreasonable effectiveness of data in Intelligent Systems, IEEE. Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representation in vector space. arxiv preprint arxiv: Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pages Morin, F., Bengio, Y. (2005). Hierarchical probabilistic neural network language model. In AISTATS, pages Rong, X. (2016). Word2vec Parameter Learning Explained. arxiv preprint arxiv:

word2vec Parameter Learning Explained

word2vec Parameter Learning Explained word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

arxiv: v3 [cs.cl] 30 Jan 2016

arxiv: v3 [cs.cl] 30 Jan 2016 word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu arxiv:1411.2738v3 [cs.cl] 30 Jan 2016 Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention

More information

An overview of word2vec

An overview of word2vec An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations

More information

GloVe: Global Vectors for Word Representation 1

GloVe: Global Vectors for Word Representation 1 GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 https://nlp.stanford.edu/projects/glove/ Outline Background

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

More information

Deep Learning. Ali Ghodsi. University of Waterloo

Deep Learning. Ali Ghodsi. University of Waterloo University of Waterloo Language Models A language model computes a probability for a sequence of words: P(w 1,..., w T ) Useful for machine translation Word ordering: p (the cat is small) > p (small the

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Deep Learning for NLP Part 2

Deep Learning for NLP Part 2 Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing Deep Learning for Natural Language Processing Dylan Drover, Borui Ye, Jie Peng University of Waterloo djdrover@uwaterloo.ca borui.ye@uwaterloo.ca July 8, 2015 Dylan Drover, Borui Ye, Jie Peng (University

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Supervised Learning. George Konidaris

Supervised Learning. George Konidaris Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Word Embeddings 2 - Class Discussions

Word Embeddings 2 - Class Discussions Word Embeddings 2 - Class Discussions Jalaj February 18, 2016 Opening Remarks - Word embeddings as a concept are intriguing. The approaches are mostly adhoc but show good empirical performance. Paper 1

More information

Statistical NLP for the Web

Statistical NLP for the Web Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks

More information

Neural Networks for NLP. COMP-599 Nov 30, 2016

Neural Networks for NLP. COMP-599 Nov 30, 2016 Neural Networks for NLP COMP-599 Nov 30, 2016 Outline Neural networks and deep learning: introduction Feedforward neural networks word2vec Complex neural network architectures Convolutional neural networks

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

ML4NLP Multiclass Classification

ML4NLP Multiclass Classification ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20, Section 5 1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Deep Learning. Ali Ghodsi. University of Waterloo

Deep Learning. Ali Ghodsi. University of Waterloo University of Waterloo Deep learning attempts to learn representations of data with multiple levels of abstraction. Deep learning usually refers to a set of algorithms and computational models that are

More information

Machine Learning Lecture 12

Machine Learning Lecture 12 Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

TTIC 31230, Fundamentals of Deep Learning, Winter David McAllester. The Fundamental Equations of Deep Learning

TTIC 31230, Fundamentals of Deep Learning, Winter David McAllester. The Fundamental Equations of Deep Learning TTIC 31230, Fundamentals of Deep Learning, Winter 2019 David McAllester The Fundamental Equations of Deep Learning 1 Early History 1943: McCullock and Pitts introduced the linear threshold neuron. 1962:

More information

The Perceptron. Volker Tresp Summer 2016

The Perceptron. Volker Tresp Summer 2016 The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau Last update: October 26, 207 Neural networks CMSC 42: Section 8.7 Dana Nau Outline Applications of neural networks Brains Neural network units Perceptrons Multilayer perceptrons 2 Example Applications

More information

CS224n: Natural Language Processing with Deep Learning 1

CS224n: Natural Language Processing with Deep Learning 1 CS224n: Natural Language Processing with Deep Learning Lecture Notes: Part I 2 Winter 27 Course Instructors: Christopher Manning, Richard Socher 2 Authors: Francois Chaubard, Michael Fang, Guillaume Genthial,

More information

Neural Word Embeddings from Scratch

Neural Word Embeddings from Scratch Neural Word Embeddings from Scratch Xin Li 12 1 NLP Center Tencent AI Lab 2 Dept. of System Engineering & Engineering Management The Chinese University of Hong Kong 2018-04-09 Xin Li Neural Word Embeddings

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Natural Language Processing with Deep Learning CS224N/Ling284. Richard Socher Lecture 2: Word Vectors

Natural Language Processing with Deep Learning CS224N/Ling284. Richard Socher Lecture 2: Word Vectors Natural Language Processing with Deep Learning CS224N/Ling284 Richard Socher Lecture 2: Word Vectors Organization PSet 1 is released. Coding Session 1/22: (Monday, PA1 due Thursday) Some of the questions

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Artificial Neural Networks. Historical description

Artificial Neural Networks. Historical description Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of

More information

Neural Networks: Introduction

Neural Networks: Introduction Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

The representation of word and sentence

The representation of word and sentence 2vec Jul 4, 2017 Presentation Outline 2vec 1 2 2vec 3 4 5 6 discrete representation taxonomy:wordnet Example:good 2vec Problems 2vec synonyms: adept,expert,good It can t keep up to date It can t accurate

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation Neural Networks Plan Perceptron Linear discriminant Associative memories Hopfield networks Chaotic networks Multilayer perceptron Backpropagation Perceptron Historically, the first neural net Inspired

More information

Learning and Neural Networks

Learning and Neural Networks Artificial Intelligence Learning and Neural Networks Readings: Chapter 19 & 20.5 of Russell & Norvig Example: A Feed-forward Network w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

The Perceptron. Volker Tresp Summer 2018

The Perceptron. Volker Tresp Summer 2018 The Perceptron Volker Tresp Summer 2018 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University 2018 EE448, Big Data Mining, Lecture 5 Supervised Learning (Part II) Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html Content of Supervised Learning

More information

CSC321 Lecture 7 Neural language models

CSC321 Lecture 7 Neural language models CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic

More information

COMP-4360 Machine Learning Neural Networks

COMP-4360 Machine Learning Neural Networks COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

10. Artificial Neural Networks

10. Artificial Neural Networks Foundations of Machine Learning CentraleSupélec Fall 217 1. Artificial Neural Networks Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Learning

More information