Beyond finite layer neural network
|
|
- Percival Townsend
- 6 years ago
- Views:
Transcription
1 Beyond finite layer neural network Bridging Numerical Dynamic System And Deep Neural Networks arxiv: Joint work with Bin Dong, Quanzheng Li, Aoxiao Zhong Yiping Lu Peiking University School Of Mathematical Science
2 Depth Revolution
3 Motivation Deep Residual x t = f(x) x n+1 = x n + f(x n ) Forward Euler Scheme Weinan E. A Proposal on Machine Learning via Dynamical Systems.
4 Previous Works learn a diffusion process for denoising Chen Y, Yu W, Pock T. On learning optimized reaction diffusion processes for effective image restoration CVPR2015
5 Depth Revolution Going into infinite layer Differential Equation As Infinite Layer Neural Network
6 Revisiting previous efforts in deep learning, we found that diversity, another aspect in network design that is relatively less explored, also plays a significant role PolyStrure: x n+1 = x n + F x n + F(F x n ) Backward Euler Scheme: x n+1 = x n + F x n+1 x n+1 = I F 1 x n (b) Polynet Approximate the operator I F 1 by I + F + F 2 + Zhang X, Li Z, Loy C C, et al. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks
7 fc fc fc Runge-Kutta Scheme(2order) x n+1 = k 1 x n + k 2 (k 3 x n + f 1 x n ) + f 2 (k 3 x n + f 1 x n ) Larsson G, Maire M, Shakhnarovich G. FractalNet: Ultra-Deep Neural Networks without Residuals.
8 PDE: Infinite Layer Neural Network Dynamic System Continuous limit Nueral Network Numerical Approximation WRN, ResNeXt, Inception-ResNet, PolyNet, SENet etc : New scheme to Approximate the right hand side term Why not change the way to discrete u_t?
9 Multi-step Residual Network x t = f(x) x n+1 = x n + f(x n )
10 Multi-step Residual Network Linear Multi-step Scheme x t = f(x) x n+1 = (1 k n )x n + k n x n 1 + f(x n ) x n+1 = x n + f(x n ) Linear Multi-step Residual Network
11 Scale k Scale 1-k (a) ResNet (b)linear Multi-step ResNet
12 Scale k Scale 1-k Only One More Parameter (a) ResNet (b)linear Multi-step ResNet
13 Multi-step Residual Network (a)resnet (b)lm-resnet
14 Multi-step Residual Network
15 Explanation on the performance boost via modified Multi-step Residual Network ResNet x n+1 = x n + Δtf(x n ) u + Δt 2 u n = f(u) LM-ResNet x n+1 = (1 k n )x n +k n x n 1 + Δtf(x n ) 1 + k n u + 1 k n Δt 2 u n = f(u) [1] Dong B, Jiang Q, Shen Z. Image restoration: wavelet frame shrinkage, nonlinear evolution PDEs, and beyond. Multiscale Modeling and Simulation: A SIAM Interdisciplinary Journal [2] Su W, Boyd S, Candes E J. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights. Advances in Neural Information Processing Systems, [3] A. Wibisono, A. Wilson, and M. I. Jordan. A variational perspective on accelerated methods in optimizationproceedings of the National Academy of Sciences 2016.
16 Plot The Multi-step Residual Network x n+1 = (1 k n )x n +k n x n 1 + Δtf(x n ) Learn A Momentum 1 + k n u + 1 k n Δt 2 u n + o Δt 3 = f(u)
17 Plot The Multi-step Residual Network x n+1 = (1 k n )x n +k n x n 1 + Δtf(x n ) Learn A Momentum 1 + k n u + 1 k n Δt 2 u n + o Δt 3 = f(u)
18 Bridge the stochastic dynamic Noise can avoid overfit? Dynamic System
19 Previous Works Shake-Shake regularization x n+1 = x n + ηf 1 x + 1 η f 2 x, η U 0, 1 = x n + f 2 x n f 1 x n f 2 x n + (η 1 2 ) f 1 x n f 2 x n Apply data augmentation techniques to internal representations. Gastaldi X. Shake-Shake regularization. ICLR Workshop Track2017.
20 Previous Works Deep Networks with Stochastic Depth x n+1 = x n + η n f x = x n + Eη n f x n + η n Eη n f(x n ) To reduce the effective length of a neural network during training, we randomly skip layers entirely. Huang G, Sun Y, Liu Z, et al. Deep Networks with Stochastic Depth ECCV2016.
21 Bridge the stochastic control Noise can avoid overfit? X t = f X t, a t + g(x t, t)db t, X 0 = X 0 The numerical scheme is only need to be weak ergence!
22 Previous Works Deep Networks with Stochastic Depth x n+1 = x n + η n f x = x n + Eη n f x n + η n Eη n f(x n ) We need 1 2p n = O( Δt) To reduce the effective length of a neural network during training, we randomly skip layers entirely. Huang G, Sun Y, Liu Z, et al. Deep Networks with Stochastic Depth ECCV2016.
23 1 + k n u + 1 k n Δt 2 u n + o Δt 3 = f u + g u dw t Scale k Scale 1-k Stochastic Strategy As Previous (a) ResNet (b)linear Multi-step ResNet
24 Multi-step Residual Network
25 Finite Layer Neural Network Neural Network Dynamic System Stochastic Learning Stochastic Dynamic System New Discretization Original One: LM-Resnet56 Beats Resnet110 Modified Equation LM-ResNet Stochastic Depth One: LM-Resnet110 Beats Resnet1202
26 Thanks For Attention And Question? Lu Y, Zhong A, Li Q, et al. Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations arxiv:
FreezeOut: Accelerate Training by Progressively Freezing Layers
FreezeOut: Accelerate Training by Progressively Freezing Layers Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim,
More informationTwo at Once: Enhancing Learning and Generalization Capacities via IBN-Net
Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University
More informationA Conservation Law Method in Optimization
A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu
More informationarxiv: v2 [stat.ml] 18 Jun 2017
FREEZEOUT: ACCELERATE TRAINING BY PROGRES- SIVELY FREEZING LAYERS Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim,
More informationSHAKE-SHAKE REGULARIZATION OF 3-BRANCH
SHAKE-SHAKE REGULARIZATION OF 3-BRANCH RESIDUAL NETWORKS Xavier Gastaldi xgastaldi.mba2011@london.edu ABSTRACT The method introduced in this paper aims at helping computer vision practitioners faced with
More informationarxiv: v2 [cs.cv] 1 Nov 2017
BEYOND FINITE LAYER NEURAL NETWORKS: BRIDGING DEEP ARCHITECTURES AND NUMERICAL DIFFERENTIAL EQUATIONS arxiv:1710.10121v2 [cs.cv] 1 Nov 2017 Yiping Lu School of Mathematical Sciences Peking University,
More informationConvolutional Neural Network Architecture
Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation
More informationPDE-NET: LEARNING PDES FROM DATA
PDE-NET: LEARNING PDES FROM DATA Anonymous authors Paper under double-blind review ABSTRACT Partial differential equations (PDEs play a prominent role in many disciplines such as applied mathematics, physics,
More informationThe Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems
The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems Weinan E 1 and Bing Yu 2 arxiv:1710.00211v1 [cs.lg] 30 Sep 2017 1 The Beijing Institute of Big Data Research,
More informationarxiv: v2 [math.na] 1 Jan 2018
PDE-NET: LEARNING PDES FROM DATA Zichao Long, Yiping Lu School of Mathematical Sciences Peking University, Beijing, China {zlong,luyiping9712}@pku.edu.cn Xianzhong Ma School of Mathematical Sciences, Peking
More informationBEYOND FINITE LAYER NEURAL NETWORKS: BRIDGING DEEP ARCHITECTURES AND NUMERICAL DIFFERENTIAL EQUATIONS
BEYOND FINITE LAYER NEURAL NETWORKS: BRIDGING DEEP ARCHITECTURES AND NUMERICAL DIFFERENTIAL EQUATIONS Anonymous authors Paper under double-blind review ABSTRACT Deep neural networks have become the state-of-the-art
More informationBeyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations
: Bridging Deep Architectures and Numerical Differential Equations Yiping Lu 1 Aoxiao Zhong 2 Quanzheng Li 2 3 4 Bin Dong 5 6 4 Abstract Deep neural networks have become the stateof-the-art models in numerous
More informationarxiv: v2 [cs.lg] 23 May 2017
Shake-Shake regularization Xavier Gastaldi xgastaldi.mba2011@london.edu arxiv:1705.07485v2 [cs.lg] 23 May 2017 Abstract The method introduced in this paper aims at helping deep learning practitioners faced
More informationDATA DRIVEN METHOD A MATHEMATICS VIEW POINT. Lecture1 Introduction Yiping Lu Peking University
DATA DRIVEN METHOD A MATHEMATICS VIEW POINT Lecture1 Introduction Yiping Lu Peking University MACHINE LEARNING LEARNING Supervised learning: Descriptive, Diagnostic, Discovery, Predictive, Prescriptive
More informationPDE-Net: Learning PDEs from Data
Zichao Long 1 Yiping Lu 1 Xianzhong Ma 1 2 Bin Dong 3 4 5 Abstract Partial differential equations (PDEs) play a prominent role in many disciplines of science and engineering. PDEs are commonly derived
More informationSparse Approximation: from Image Restoration to High Dimensional Classification
Sparse Approximation: from Image Restoration to High Dimensional Classification Bin Dong Beijing International Center for Mathematical Research Beijing Institute of Big Data Research Peking University
More informationCSE 591: Introduction to Deep Learning in Visual Computing. - Parag S. Chandakkar - Instructors: Dr. Baoxin Li and Ragav Venkatesan
CSE 591: Introduction to Deep Learning in Visual Computing - Parag S. Chandakkar - Instructors: Dr. Baoxin Li and Ragav Venkatesan Overview Background Why another network structure? Vanishing and exploding
More informationNormalization Techniques in Training of Deep Neural Networks
Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,
More informationA Second-Order Path-Following Algorithm for Unconstrained Convex Optimization
A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization Yinyu Ye Department is Management Science & Engineering and Institute of Computational & Mathematical Engineering Stanford
More informationDeep Learning Year in Review 2016: Computer Vision Perspective
Deep Learning Year in Review 2016: Computer Vision Perspective Alex Kalinin, PhD Candidate Bioinformatics @ UMich alxndrkalinin@gmail.com @alxndrkalinin Architectures Summary of CNN architecture development
More informationarxiv: v4 [cs.cv] 6 Sep 2017
Deep Pyramidal Residual Networks Dongyoon Han EE, KAIST dyhan@kaist.ac.kr Jiwhan Kim EE, KAIST jhkim89@kaist.ac.kr Junmo Kim EE, KAIST junmo.kim@kaist.ac.kr arxiv:1610.02915v4 [cs.cv] 6 Sep 2017 Abstract
More informationTutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba
Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation
More informationNonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.
Nonlinear Models Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Intro 1 Course Overview Intro 2 Course Overview Lecture 1: Linear Models
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationA Wavelet Neural Network Forecasting Model Based On ARIMA
A Wavelet Neural Network Forecasting Model Based On ARIMA Wang Bin*, Hao Wen-ning, Chen Gang, He Deng-chao, Feng Bo PLA University of Science &Technology Nanjing 210007, China e-mail:lgdwangbin@163.com
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationDeep Residual. Variations
Deep Residual Network and Its Variations Diyu Yang (Originally prepared by Kaiming He from Microsoft Research) Advantages of Depth Degradation Problem Possible Causes? Vanishing/Exploding Gradients. Overfitting
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationData Mining & Machine Learning
Data Mining & Machine Learning CS57300 Purdue University March 1, 2018 1 Recap of Last Class (Model Search) Forward and Backward passes 2 Feedforward Neural Networks Neural Networks: Architectures 2-layer
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationGeneralized Zero-Shot Learning with Deep Calibration Network
Generalized Zero-Shot Learning with Deep Calibration Network Shichen Liu, Mingsheng Long, Jianmin Wang, and Michael I.Jordan School of Software, Tsinghua University, China KLiss, MOE; BNRist; Research
More informationA Brief Overview of Practical Optimization Algorithms in the Context of Relaxation
A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:
More informationVery Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas
Very Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas Center for Machine Perception Department of Cybernetics Faculty of Electrical Engineering
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationDeep Learning for Partial Differential Equations (PDEs)
Deep Learning for Partial Differential Equations (PDEs) Kailai Xu kailaix@stanford.edu Bella Shi bshi@stanford.edu Shuyi Yin syin3@stanford.edu Abstract Partial differential equations (PDEs) have been
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationResearch Article A Two-Grid Method for Finite Element Solutions of Nonlinear Parabolic Equations
Abstract and Applied Analysis Volume 212, Article ID 391918, 11 pages doi:1.1155/212/391918 Research Article A Two-Grid Method for Finite Element Solutions of Nonlinear Parabolic Equations Chuanjun Chen
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationStochastic Optimization: First order method
Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO
More informationNonlinear Optimization Methods for Machine Learning
Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks
More informationCOMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-
Workshop track - ICLR COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- CURRENT NEURAL NETWORKS Daniel Fojo, Víctor Campos, Xavier Giró-i-Nieto Universitat Politècnica de Catalunya, Barcelona Supercomputing
More informationCS260: Machine Learning Algorithms
CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {
More informationCredit Assignment: Beyond Backpropagation
Credit Assignment: Beyond Backpropagation Yoshua Bengio 11 December 2016 AutoDiff NIPS 2016 Workshop oo b s res P IT g, M e n i arn nlin Le ain o p ee em : D will r G PLU ters p cha k t, u o is Deep Learning
More informationDon t Decay the Learning Rate, Increase the Batch Size. Samuel L. Smith, Pieter-Jan Kindermans, Quoc V. Le December 9 th 2017
Don t Decay the Learning Rate, Increase the Batch Size Samuel L. Smith, Pieter-Jan Kindermans, Quoc V. Le December 9 th 2017 slsmith@ Google Brain Three related questions: What properties control generalization?
More informationarxiv: v2 [cs.cv] 12 Apr 2016
arxiv:1603.05027v2 [cs.cv] 12 Apr 2016 Identity Mappings in Deep Residual Networks Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Microsoft Research Abstract Deep residual networks [1] have emerged
More informationMollifying Networks. ICLR,2017 Presenter: Arshdeep Sekhon & Be
Mollifying Networks Caglar Gulcehre 1 Marcin Moczulski 2 Francesco Visin 3 Yoshua Bengio 1 1 University of Montreal, 2 University of Oxford, 3 Politecnico di Milano ICLR,2017 Presenter: Arshdeep Sekhon
More informationQuantum Convolutional Neural Networks
Quantum Convolutional Neural Networks Iris Cong Soonwon Choi Mikhail D. Lukin arxiv:1810.03787 Berkeley Quantum Information Seminar October 16 th, 2018 Why quantum machine learning? Machine learning: interpret
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationwhat can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley
what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley Collaborators Joint work with Samy Bengio, Moritz Hardt, Michael Jordan, Jason Lee, Max Simchowitz,
More informationCh.6 Deep Feedforward Networks (2/3)
Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations
More informationNumerical Solutions of ODEs by Gaussian (Kalman) Filtering
Numerical Solutions of ODEs by Gaussian (Kalman) Filtering Hans Kersting joint work with Michael Schober, Philipp Hennig, Tim Sullivan and Han C. Lie SIAM CSE, Atlanta March 1, 2017 Emmy Noether Group
More informationBACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation
BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)
More informationLearning MMSE Optimal Thresholds for FISTA
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationA random perturbation approach to some stochastic approximation algorithms in optimization.
A random perturbation approach to some stochastic approximation algorithms in optimization. Wenqing Hu. 1 (Presentation based on joint works with Chris Junchi Li 2, Weijie Su 3, Haoyi Xiong 4.) 1. Department
More informationAdvanced computational methods X Selected Topics: SGD
Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationGraph based Subspace Segmentation. Canyi Lu National University of Singapore Nov. 21, 2013
Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013 Content Subspace Segmentation Problem Related Work Sparse Subspace Clustering (SSC) Low-Rank Representation (LRR)
More informationarxiv: v1 [cs.cv] 18 May 2018
Norm-Preservation: Why Residual Networks Can Become Extremely Deep? arxiv:85.7477v [cs.cv] 8 May 8 Alireza Zaeemzadeh University of Central Florida zaeemzadeh@eecs.ucf.edu Nazanin Rahnavard University
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationMachine Learning (CSE 446): Backpropagation
Machine Learning (CSE 446): Backpropagation Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 8, 2017 1 / 32 Neuron-Inspired Classifiers correct output y n L n loss hidden units
More informationStochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations
Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Weiran Wang Toyota Technological Institute at Chicago * Joint work with Raman Arora (JHU), Karen Livescu and Nati Srebro (TTIC)
More informationarxiv: v5 [cs.cv] 19 May 2018
Convolutional Neural Networks combined with Runge-Kutta Methods arxiv:1802.08831v5 [cs.cv] 19 May 2018 Mai Zhu Northeastern University Shenyang, Liaoning, China zhumai@stumail.neu.edu.cn Bo Chang University
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More informationOptimization for Training I. First-Order Methods Training algorithm
Optimization for Training I First-Order Methods Training algorithm 2 OPTIMIZATION METHODS Topics: Types of optimization methods. Practical optimization methods breakdown into two categories: 1. First-order
More informationPrimal-dual algorithms for the sum of two and three functions 1
Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms
More informationExplaining Machine Learning Decisions
Explaining Machine Learning Decisions Grégoire Montavon, TU Berlin Joint work with: Wojciech Samek, Klaus-Robert Müller, Sebastian Lapuschkin, Alexander Binder 18/09/2018 Intl. Workshop ML & AI, Telecom
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationLecture 7 Convolutional Neural Networks
Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:
More informationEnforcing constraints for interpolation and extrapolation in Generative Adversarial Networks
Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks Panos Stinis (joint work with T. Hagge, A.M. Tartakovsky and E. Yeung) Pacific Northwest National Laboratory
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 15: Matrix Factorization Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University
More informationADMM and Accelerated ADMM as Continuous Dynamical Systems
Guilherme França 1 Daniel P. Robinson 1 René Vidal 1 Abstract Recently, there has been an increasing interest in using tools from dynamical systems to analyze the behavior of simple optimization algorithms
More informationDeep Convolutional Neural Networks for Pairwise Causality
Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,
More informationarxiv: v2 [cs.cv] 18 Jul 2017
Interleaved Group Convolutions for Deep Neural Networks Ting Zhang 1 Guo-Jun Qi 2 Bin Xiao 1 Jingdong Wang 1 1 Microsoft Research 2 University of Central Florida {tinzhan, Bin.Xiao, jingdw}@microsoft.com
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationInformation theoretic perspectives on learning algorithms
Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationDissipativity Theory for Nesterov s Accelerated Method
Bin Hu Laurent Lessard Abstract In this paper, we adapt the control theoretic concept of dissipativity theory to provide a natural understanding of Nesterov s accelerated method. Our theory ties rigorous
More informationMemory-Augmented Attention Model for Scene Text Recognition
Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences
More informationCharacterization of Gradient Dominance and Regularity Conditions for Neural Networks
Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past
More informationIFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent
IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):
More informationarxiv: v3 [cs.lg] 22 Mar 2018
arxiv:1710.06081v3 [cs.lg] 22 Mar 2018 Boosting Adversarial Attacks with Momentum Yinpeng Dong1, Fangzhou Liao1, Tianyu Pang1, Hang Su1, Jun Zhu1, Xiaolin Hu1, Jianguo Li2 1 Department of Computer Science
More informationNeural Networks. Intro to AI Bert Huang Virginia Tech
Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg
More informationUnraveling the mysteries of stochastic gradient descent on deep neural networks
Unraveling the mysteries of stochastic gradient descent on deep neural networks Pratik Chaudhari UCLA VISION LAB 1 The question measures disagreement of predictions with ground truth Cat Dog... x = argmin
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationNeural Networks Teaser
1/11 Neural Networks Teaser February 27, 2017 Deep Learning in the News 2/11 Go falls to computers. Learning 3/11 How to teach a robot to be able to recognize images as either a cat or a non-cat? This
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationAdversarial Image Perturbation for Privacy Protection A Game Theory Perspective Supplementary Materials
Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective Supplementary Materials 1. Contents The supplementary materials contain auxiliary experiments for the empirical analyses
More informationOverparametrization for Landscape Design in Non-convex Optimization
Overparametrization for Landscape Design in Non-convex Optimization Jason D. Lee University of Southern California September 19, 2018 The State of Non-Convex Optimization Practical observation: Empirically,
More information