Tutorial on Machine Learning for Advanced Electronics
|
|
- Stewart Cummings
- 5 years ago
- Views:
Transcription
1 Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017
2 Part I (Some) Theory and Principles
3 Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling a computer to perform well on a given task without explicitly programming the task improving performance on a task based on experience
4 Statistical Learning Paradigm Y }{{} output = f }{{} ( ) }{{} X input Experience comes in the form of training data (X 1, Y 1 ), (X 2, Y 2 ),..., (X n, Y n ) sampled at random from the phenomenon of interest unknown distribution p(x, y) Performance is measured by expected accuracy of inference (prediction on previously unseen instances): R(f) = E[l(f(X), Y )] = l(f(x), y) }{{} prediction error p(x, y) }{{} data distribution dx dy Improvement comes from seeing more data before settling on a hypothesis
5 Goals of Learning (X 1, Y 1 ),..., (X n, Y n ) }{{} training data Predictive consistency algorithm R( f n ) n min f F R(f) f n }{{} hypothesis eventually, learn to predict as well as the best predictor you could find with full knowledge of p(x, y) Generalization 1 n n l( f n (X i ), Y i ) R( f n ) i=1 performance of f n on the training data is a good predictor of its performance on previously unseen instances Efficiency minimize resource usage (data, computation time, memory,...)
6 Learning and Inference Pipeline Learning and inference pipeline Learning Training Samples Features Training Labels Training Learned model Inference Learned model Features Predic'on Test Sample Image credit: Svetlana Lazebnik
7 Data Requirements Predictive consistency: for given ε, δ > 0, what is the smallest n, such that with probability 1 δ? R( f n ) min f F R(f) ε Generalization: for given ε, δ > 0, what is the smallest n, such that 1 n l( n f n (X i ), Y i ) R( f n ) ε i=1 with probability 1 δ? Efficiency the golden standard is n(ε, δ) 1 ε 2 log 1 δ from P[error diff. ε] c 1 e c 2nε 2
8 Underfitting and Overfitting Underfitting: not enough data R( f n ) min R(f) is large f F Overfitting: low training error but large testing error R( f n ) 1 n l( n f n (X i ), Y i ) is large i=1 High Bias Low Variance Low Bias High Variance Prediction Error ts Test Sample Training Sample Low Model Complexity High Image credit: Hastie et al., Elements of Statistical Learning
9 Generative and Discriminative Models (X 1, Y 1 ),..., (X n, Y n ) }{{} training data algorithm f n }{{} hypothesis Recall: distribution p(x, y) is unknown Generative modeling: use the data to estimate p, then find the best predictor for p n : (X 1, Y 1 ),..., (X n, Y n ) p n (x, y) f n terminology: generative model explicitly models how input x and the output y are generated advantages: pn can be used for other tasks besides prediction (e.g., exploring the data space) disadvantages: inflated sample complexity, especially in high dimension
10 Generative and Discriminative Models (X 1, Y 1 ),..., (X n, Y n ) }{{} training data algorithm f n }{{} hypothesis Recall: distribution p(x, y) is unknown Discriminative modeling: focus on prediction only (X 1, Y 1 ),..., (X n, Y n ) f n terminology: discriminative model (implicitly) models the distribution of y for each fixed x, allowing to discriminate between different values of y advantages: smaller data requirements disadvantages: narrowly focused; will need retraining if the task changes
11 Part II Canonical ML Tasks and Algorithms
12 Supervised and Unsupervised Learning Prediction is an example of supervised learning: training data consist of input instances matched (or labeled) with output instances to guide the learning process Many learning tasks are unsupervised: no labels provided, and the goal is to learn some sort of structure in the data Performance criteria are more vague and subjective than in supervised learning Unsupervised learning is often an initial step in a more complicated learning and inference pipeline
13 Unsupervised Learning Clustering: Unsupervised discoverlearning groups of similar instances Clustering Discover groups of similar data points Manifold learning: discover low-dimensional structures Density estimation: approximate probability density of data Unsupervised learning is often used at early stages of ML pipeline. Image credits: Svetlana Lazebnik
14 Supervised Learning: Classification (X, Y ) p input/output Classification: assign attributes X to one of M classes Example: spam detection Criterion: probability of misclassification R(f) = P[f(X) Y ], so that l(y, f(x)) = { 0, if f(x) = Y 1, if f(x) Y
15 Consider the following setting. Let Supervised Learning: Regression (X, Y ) p Y = f (X) + W, where X is a random variable (r.v.) on X = [0, 1], W is a r.v. on Y = R, independent of X input/output Finally let f : [0, 1] R be a function satisfying E[W ] = 0 and E[W 2 ] = σ 2 <. f (t) f (s) L t s, t, s [0, 1], Regression: predict a continuous output Y given X Example: curve fitting from noisy data where L > 0 is a constant. A function satisfying condition (1) is said to be Lipschitz on [0, such a function must be continuous, but it is not necessarily differentiable. An example of is depicted in Figure 1(a) (a) (b) Figure 1: Example of a Lipschitz function, and our observations setting. (a) random samp points correspond to (X i, Y i ), i = 1,..., n; (b) deterministic sampling of f, the points (i/n, Y i ), i = 1,..., n. Criterion: mean squared error R(f) = E[(f(X) Y ) 2 ], so that l(y, f(x)) = (f(x) Y ) 2 Note that
16 Example 1: Kernel Methods Training data: (X 1, Y 1 ),..., (X n, Y n ) Look for f n of the form n f c (x) = c i K(x X i ) i=1 Coefficients c 1,..., c n chosen to minimize n (f c (X i ) Y i ) 2 i=1 (subject to additional constraints) K(x x ): potential field due to unit charge at x Advantage: easy to optimize Disadvantage: need to store c and X i s to do inference Image credit: Aizerman, Braverman, Rozonoer
17 Example 2: Neural Networks and Deep Learning x 1 w 1 x 2 x d. w 2 w d 1 b y = f(w 1 x w d x d + b) where: w 1,..., w d tunable gains b tunable bias f( ) activation function f y Build up complicated functions by composing simple units Examples of activation functions: { 1, z 0 threshold, f(z) = 0, z < 0 sigmoid, f(z) = 1 1+e z tanh, f(z) = ez e z e z +e z rectified linear unit (ReLU), f(z) = max(z, 0) Advantage: neural networks are universal approximators for smooth functions y = f(x)
18 Example 2: Neural Networks and Deep Learning z 1 = f(a 1 x + b 1 ), z 2 = f(a 2 z 1 + b 2 ),..., y = f(a k z k 1 + b k ) Advantages: deep networks can efficiently represent rich classes of functions; work extremely well in practice Disadvantages: training is computationally intensive; poorly understood theoretically
19 Part III Interaction between Data and Algorithms
20 Recap: Learning and Inference Pipeline Learning and inference pipeline Learning Training Samples Features Training Labels Training Learned model Inference Learned model Features Predic'on Test Sample What happens before the training step?
21 Feature Extraction and Dimensionality Reduction Which features to use? traditionally, this step relied on extensive expert knowledge hand-engineered features tailored to the task at hand deep learning aims to automate this task (think of each layer performing feature extraction) Dimensionality reduction use unsupervised learning to discover low-dimensional structure in the data for example, principal component analysis (PCA)
22 Principal Component Analysis (PCA) compute sample covariance matrix: Σ n = 1 n n (X i µ n )(X i µ n ) T, µ n = 1 n X i n i=1 }{{} sample mean i=1 eigendecomposition: Σ n = UDU T retain top k eigenvectors Image source: MathWorks
23 What To Do If You Don t Have Enough Data? Bootstrap Training data: (X 1, Y 1 ),..., (X n, Y n ) If n is small, there is risk of underfitting Bootstrap is a resampling method: for j = 1 to m end for draw I(j) uniformly at random from {1,..., n} let Z j = (X I(j), Y I(j) ) return Z 1, Z 2,..., Z m Bootstrap can be used to get a good estimate of error bars for a chosen class of models
24 Gradient Descent and Its Relatives Training is an optimization problem: minimize L(w) = 1 n n (Y i f(w, X i )) 2 e.g., x f(w, x) is a deep neural network, so w is a (very) high-dimensional vector i=1 Gradient descent w t+1 = w t η t L(w t ), η t learning rate Backpropagation: use chain rule to compute gradients efficiently Computing L(w) using backprop takes O(nkm) operations, where k is the number of layers and m is the number of connections per layer Still computationally expensive, requires full pass through training data at each iteration
25 Stochastic Gradient Descent Gradient descent w t+1 = w t η t L(w t ), η t learning rate L(w) = 1 n w (Y i f(w, X i )) 2 n i=1 Stochastic gradient with mini-batches: essentially, bootstrap for t = 1, 2,... draw I(1, t),..., I(b, t) with replacement from {1,..., n} w t+1 = w t η t 1 b w l(y I(j,t) f(w t, X I(j,t) )) 2 b end for j=1 Computational-statistical trade-offs: each iteration takes O(bkm) operations instead of O(nkm) gradient error is now 1/b instead of 1/n
26 Part IV Machine Learning for EDA
27 Main Challenges for EDA Training data are waveforms X = (X(t)) 0 t T, Y = (Y (t)) 0 t T e.g., X is voltage, Y is current Many different types of models/analysis: steady-state transient small-signal... How do we represent waveform data? sampling rate affects quality of learned models different features are useful for different analyses (rise/hold time; frequency/phase; duration;...) How can we optimize measurement collection process?
28 Recurrent Neural Networks y 1 y 2 y 3 y T 0 1 f f 2 3 f... T 1 f T x 1 x 2 x 3 x T Nonlinear state-space model: ξ t = g(aξ t 1 + Bx t ) y t = h(cξ t + Dx t ) Physics-based models of nonlinear electronic devices are state-space models RNNs are universal approximators for such systems States (internal variables) can be discovered automatically, without hand tuning
29 Learning RNN Models of Nonlinear Devices y 1 y 2 y 3 y T 0 1 f f 2 3 f... T 1 f T x 1 x 2 x 3 x T Data: sampled waveforms (τ - sampling period, T - # samples) X i = (X i (0), X i (τ),..., X i (T τ)) Y i = (Y i (0), Y i (τ),..., Y i (T τ)) i = 1,..., n RNN is a T -layer NN, with parameters shared by all neurons Training by SGD with Backpropagation Through Time (BPTT) Output: weight matrices A, B, C, D; discrete-time approximation ξ k+1 = g(aξ k + Bx k ) y k = h(cξ k + Dx k )
30 Main Challenges Which stimuli to use? The learned model may not be equally suitable for different types of analysis (transient vs. steady-state) The learned model should be stable, plug-and-play with Verilog-A Enforcing physical constraints: zero in/zero out, symmetry, polarity, etc. Usual issues with BPTT: vanishing/exploding gradients, etc.
31 Capturing Variability: Generative Modeling Device model: y = g(x, β), where x is input, y is output, β are device parameters. Variability: β is a random quantity which varies device to device, yet cannot be sampled directly Data: sample input-ouptut behavior from many devices of the same type Learn a generative model for (x, y) use deep NN to learn complicated functions mapping explicit design parameters to latent variability factors maximum likelihood via SGD with backprop bootstrap can be used to compensate for lack of data Disadvantage: training takes time Advantage: once learned, model is easy to sample from, can be used to explore variability, 3σ typical behavior, etc.
32 Compositional ML Training and inference/simulation pipeline: { } training ξ = f(ξ, x) Verilog-A data simulation y = g(ξ, x) Theoretical and practical challenge: optimize end-to-end performance The learned model should be internally stable enforce stability during training via appropriate penalty functions The learned model should be ready for Verilog-A RNN outputs a discrete-time model Verilog-A uses adaptive time discretization to simulate behavior need deep understanding of numerical methods for ODEs to ensure end-to-end stability and robustness
33 For more questions:
Recap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLinear Models for Regression. Sargur Srihari
Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationWe choose parameter values that will minimize the difference between the model outputs & the true function values.
CSE 4502/5717 Big Data Analytics Lecture #16, 4/2/2018 with Dr Sanguthevar Rajasekaran Notes from Yenhsiang Lai Machine learning is the task of inferring a function, eg, f : R " R This inference has to
More informationIntroduction to Machine Learning
Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMachine Learning Linear Models
Machine Learning Linear Models Outline II - Linear Models 1. Linear Regression (a) Linear regression: History (b) Linear regression with Least Squares (c) Matrix representation and Normal Equation Method
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationNotes on Back Propagation in 4 Lines
Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationFeedforward Neural Networks
Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationArtificial Neural Network
Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationAdvanced computational methods X Selected Topics: SGD
Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationNeuroscience Introduction
Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationIntroduction to (Convolutional) Neural Networks
Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient
More informationCSC321 Lecture 8: Optimization
CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationLinear Models for Regression
Linear Models for Regression CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 The Regression Problem Training data: A set of input-output
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationNeural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science
Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationESS2222. Lecture 4 Linear model
ESS2222 Lecture 4 Linear model Hosein Shahnas University of Toronto, Department of Earth Sciences, 1 Outline Logistic Regression Predicting Continuous Target Variables Support Vector Machine (Some Details)
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationImportance Reweighting Using Adversarial-Collaborative Training
Importance Reweighting Using Adversarial-Collaborative Training Yifan Wu yw4@andrew.cmu.edu Tianshu Ren tren@andrew.cmu.edu Lidan Mu lmu@andrew.cmu.edu Abstract We consider the problem of reweighting a
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationMachine Learning. Boris
Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationRegularization via Spectral Filtering
Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More information