Beyond finite layer neural network

Size: px
Start display at page:

Download "Beyond finite layer neural network"

Transcription

1 Beyond finite layer neural network Bridging Numerical Dynamic System And Deep Neural Networks arxiv: Joint work with Bin Dong, Quanzheng Li, Aoxiao Zhong Yiping Lu Peiking University School Of Mathematical Science

2 Depth Revolution

3 Motivation Deep Residual x t = f(x) x n+1 = x n + f(x n ) Forward Euler Scheme Weinan E. A Proposal on Machine Learning via Dynamical Systems.

4 Previous Works learn a diffusion process for denoising Chen Y, Yu W, Pock T. On learning optimized reaction diffusion processes for effective image restoration CVPR2015

5 Depth Revolution Going into infinite layer Differential Equation As Infinite Layer Neural Network

6 Revisiting previous efforts in deep learning, we found that diversity, another aspect in network design that is relatively less explored, also plays a significant role PolyStrure: x n+1 = x n + F x n + F(F x n ) Backward Euler Scheme: x n+1 = x n + F x n+1 x n+1 = I F 1 x n (b) Polynet Approximate the operator I F 1 by I + F + F 2 + Zhang X, Li Z, Loy C C, et al. PolyNet: A Pursuit of Structural Diversity in Very Deep Networks

7 fc fc fc Runge-Kutta Scheme(2order) x n+1 = k 1 x n + k 2 (k 3 x n + f 1 x n ) + f 2 (k 3 x n + f 1 x n ) Larsson G, Maire M, Shakhnarovich G. FractalNet: Ultra-Deep Neural Networks without Residuals.

8 PDE: Infinite Layer Neural Network Dynamic System Continuous limit Nueral Network Numerical Approximation WRN, ResNeXt, Inception-ResNet, PolyNet, SENet etc : New scheme to Approximate the right hand side term Why not change the way to discrete u_t?

9 Multi-step Residual Network x t = f(x) x n+1 = x n + f(x n )

10 Multi-step Residual Network Linear Multi-step Scheme x t = f(x) x n+1 = (1 k n )x n + k n x n 1 + f(x n ) x n+1 = x n + f(x n ) Linear Multi-step Residual Network

11 Scale k Scale 1-k (a) ResNet (b)linear Multi-step ResNet

12 Scale k Scale 1-k Only One More Parameter (a) ResNet (b)linear Multi-step ResNet

13 Multi-step Residual Network (a)resnet (b)lm-resnet

14 Multi-step Residual Network

15 Explanation on the performance boost via modified Multi-step Residual Network ResNet x n+1 = x n + Δtf(x n ) u + Δt 2 u n = f(u) LM-ResNet x n+1 = (1 k n )x n +k n x n 1 + Δtf(x n ) 1 + k n u + 1 k n Δt 2 u n = f(u) [1] Dong B, Jiang Q, Shen Z. Image restoration: wavelet frame shrinkage, nonlinear evolution PDEs, and beyond. Multiscale Modeling and Simulation: A SIAM Interdisciplinary Journal [2] Su W, Boyd S, Candes E J. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights. Advances in Neural Information Processing Systems, [3] A. Wibisono, A. Wilson, and M. I. Jordan. A variational perspective on accelerated methods in optimizationproceedings of the National Academy of Sciences 2016.

16 Plot The Multi-step Residual Network x n+1 = (1 k n )x n +k n x n 1 + Δtf(x n ) Learn A Momentum 1 + k n u + 1 k n Δt 2 u n + o Δt 3 = f(u)

17 Plot The Multi-step Residual Network x n+1 = (1 k n )x n +k n x n 1 + Δtf(x n ) Learn A Momentum 1 + k n u + 1 k n Δt 2 u n + o Δt 3 = f(u)

18 Bridge the stochastic dynamic Noise can avoid overfit? Dynamic System

19 Previous Works Shake-Shake regularization x n+1 = x n + ηf 1 x + 1 η f 2 x, η U 0, 1 = x n + f 2 x n f 1 x n f 2 x n + (η 1 2 ) f 1 x n f 2 x n Apply data augmentation techniques to internal representations. Gastaldi X. Shake-Shake regularization. ICLR Workshop Track2017.

20 Previous Works Deep Networks with Stochastic Depth x n+1 = x n + η n f x = x n + Eη n f x n + η n Eη n f(x n ) To reduce the effective length of a neural network during training, we randomly skip layers entirely. Huang G, Sun Y, Liu Z, et al. Deep Networks with Stochastic Depth ECCV2016.

21 Bridge the stochastic control Noise can avoid overfit? X t = f X t, a t + g(x t, t)db t, X 0 = X 0 The numerical scheme is only need to be weak ergence!

22 Previous Works Deep Networks with Stochastic Depth x n+1 = x n + η n f x = x n + Eη n f x n + η n Eη n f(x n ) We need 1 2p n = O( Δt) To reduce the effective length of a neural network during training, we randomly skip layers entirely. Huang G, Sun Y, Liu Z, et al. Deep Networks with Stochastic Depth ECCV2016.

23 1 + k n u + 1 k n Δt 2 u n + o Δt 3 = f u + g u dw t Scale k Scale 1-k Stochastic Strategy As Previous (a) ResNet (b)linear Multi-step ResNet

24 Multi-step Residual Network

25 Finite Layer Neural Network Neural Network Dynamic System Stochastic Learning Stochastic Dynamic System New Discretization Original One: LM-Resnet56 Beats Resnet110 Modified Equation LM-ResNet Stochastic Depth One: LM-Resnet110 Beats Resnet1202

26 Thanks For Attention And Question? Lu Y, Zhong A, Li Q, et al. Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations arxiv:

FreezeOut: Accelerate Training by Progressively Freezing Layers

FreezeOut: Accelerate Training by Progressively Freezing Layers FreezeOut: Accelerate Training by Progressively Freezing Layers Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim,

More information

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net

Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net Supplementary Material Xingang Pan 1, Ping Luo 1, Jianping Shi 2, and Xiaoou Tang 1 1 CUHK-SenseTime Joint Lab, The Chinese University

More information

A Conservation Law Method in Optimization

A Conservation Law Method in Optimization A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu

More information

arxiv: v2 [stat.ml] 18 Jun 2017

arxiv: v2 [stat.ml] 18 Jun 2017 FREEZEOUT: ACCELERATE TRAINING BY PROGRES- SIVELY FREEZING LAYERS Andrew Brock, Theodore Lim, & J.M. Ritchie School of Engineering and Physical Sciences Heriot-Watt University Edinburgh, UK {ajb5, t.lim,

More information

SHAKE-SHAKE REGULARIZATION OF 3-BRANCH

SHAKE-SHAKE REGULARIZATION OF 3-BRANCH SHAKE-SHAKE REGULARIZATION OF 3-BRANCH RESIDUAL NETWORKS Xavier Gastaldi xgastaldi.mba2011@london.edu ABSTRACT The method introduced in this paper aims at helping computer vision practitioners faced with

More information

arxiv: v2 [cs.cv] 1 Nov 2017

arxiv: v2 [cs.cv] 1 Nov 2017 BEYOND FINITE LAYER NEURAL NETWORKS: BRIDGING DEEP ARCHITECTURES AND NUMERICAL DIFFERENTIAL EQUATIONS arxiv:1710.10121v2 [cs.cv] 1 Nov 2017 Yiping Lu School of Mathematical Sciences Peking University,

More information

Convolutional Neural Network Architecture

Convolutional Neural Network Architecture Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation

More information

PDE-NET: LEARNING PDES FROM DATA

PDE-NET: LEARNING PDES FROM DATA PDE-NET: LEARNING PDES FROM DATA Anonymous authors Paper under double-blind review ABSTRACT Partial differential equations (PDEs play a prominent role in many disciplines such as applied mathematics, physics,

More information

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems Weinan E 1 and Bing Yu 2 arxiv:1710.00211v1 [cs.lg] 30 Sep 2017 1 The Beijing Institute of Big Data Research,

More information

arxiv: v2 [math.na] 1 Jan 2018

arxiv: v2 [math.na] 1 Jan 2018 PDE-NET: LEARNING PDES FROM DATA Zichao Long, Yiping Lu School of Mathematical Sciences Peking University, Beijing, China {zlong,luyiping9712}@pku.edu.cn Xianzhong Ma School of Mathematical Sciences, Peking

More information

BEYOND FINITE LAYER NEURAL NETWORKS: BRIDGING DEEP ARCHITECTURES AND NUMERICAL DIFFERENTIAL EQUATIONS

BEYOND FINITE LAYER NEURAL NETWORKS: BRIDGING DEEP ARCHITECTURES AND NUMERICAL DIFFERENTIAL EQUATIONS BEYOND FINITE LAYER NEURAL NETWORKS: BRIDGING DEEP ARCHITECTURES AND NUMERICAL DIFFERENTIAL EQUATIONS Anonymous authors Paper under double-blind review ABSTRACT Deep neural networks have become the state-of-the-art

More information

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations : Bridging Deep Architectures and Numerical Differential Equations Yiping Lu 1 Aoxiao Zhong 2 Quanzheng Li 2 3 4 Bin Dong 5 6 4 Abstract Deep neural networks have become the stateof-the-art models in numerous

More information

arxiv: v2 [cs.lg] 23 May 2017

arxiv: v2 [cs.lg] 23 May 2017 Shake-Shake regularization Xavier Gastaldi xgastaldi.mba2011@london.edu arxiv:1705.07485v2 [cs.lg] 23 May 2017 Abstract The method introduced in this paper aims at helping deep learning practitioners faced

More information

DATA DRIVEN METHOD A MATHEMATICS VIEW POINT. Lecture1 Introduction Yiping Lu Peking University

DATA DRIVEN METHOD A MATHEMATICS VIEW POINT. Lecture1 Introduction Yiping Lu Peking University DATA DRIVEN METHOD A MATHEMATICS VIEW POINT Lecture1 Introduction Yiping Lu Peking University MACHINE LEARNING LEARNING Supervised learning: Descriptive, Diagnostic, Discovery, Predictive, Prescriptive

More information

PDE-Net: Learning PDEs from Data

PDE-Net: Learning PDEs from Data Zichao Long 1 Yiping Lu 1 Xianzhong Ma 1 2 Bin Dong 3 4 5 Abstract Partial differential equations (PDEs) play a prominent role in many disciplines of science and engineering. PDEs are commonly derived

More information

Sparse Approximation: from Image Restoration to High Dimensional Classification

Sparse Approximation: from Image Restoration to High Dimensional Classification Sparse Approximation: from Image Restoration to High Dimensional Classification Bin Dong Beijing International Center for Mathematical Research Beijing Institute of Big Data Research Peking University

More information

CSE 591: Introduction to Deep Learning in Visual Computing. - Parag S. Chandakkar - Instructors: Dr. Baoxin Li and Ragav Venkatesan

CSE 591: Introduction to Deep Learning in Visual Computing. - Parag S. Chandakkar - Instructors: Dr. Baoxin Li and Ragav Venkatesan CSE 591: Introduction to Deep Learning in Visual Computing - Parag S. Chandakkar - Instructors: Dr. Baoxin Li and Ragav Venkatesan Overview Background Why another network structure? Vanishing and exploding

More information

Normalization Techniques in Training of Deep Neural Networks

Normalization Techniques in Training of Deep Neural Networks Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,

More information

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization

A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization A Second-Order Path-Following Algorithm for Unconstrained Convex Optimization Yinyu Ye Department is Management Science & Engineering and Institute of Computational & Mathematical Engineering Stanford

More information

Deep Learning Year in Review 2016: Computer Vision Perspective

Deep Learning Year in Review 2016: Computer Vision Perspective Deep Learning Year in Review 2016: Computer Vision Perspective Alex Kalinin, PhD Candidate Bioinformatics @ UMich alxndrkalinin@gmail.com @alxndrkalinin Architectures Summary of CNN architecture development

More information

arxiv: v4 [cs.cv] 6 Sep 2017

arxiv: v4 [cs.cv] 6 Sep 2017 Deep Pyramidal Residual Networks Dongyoon Han EE, KAIST dyhan@kaist.ac.kr Jiwhan Kim EE, KAIST jhkim89@kaist.ac.kr Junmo Kim EE, KAIST junmo.kim@kaist.ac.kr arxiv:1610.02915v4 [cs.cv] 6 Sep 2017 Abstract

More information

Tutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba

Tutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation

More information

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University. Nonlinear Models Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Intro 1 Course Overview Intro 2 Course Overview Lecture 1: Linear Models

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

A Wavelet Neural Network Forecasting Model Based On ARIMA

A Wavelet Neural Network Forecasting Model Based On ARIMA A Wavelet Neural Network Forecasting Model Based On ARIMA Wang Bin*, Hao Wen-ning, Chen Gang, He Deng-chao, Feng Bo PLA University of Science &Technology Nanjing 210007, China e-mail:lgdwangbin@163.com

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

Deep Residual. Variations

Deep Residual. Variations Deep Residual Network and Its Variations Diyu Yang (Originally prepared by Kaiming He from Microsoft Research) Advantages of Depth Degradation Problem Possible Causes? Vanishing/Exploding Gradients. Overfitting

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Data Mining & Machine Learning

Data Mining & Machine Learning Data Mining & Machine Learning CS57300 Purdue University March 1, 2018 1 Recap of Last Class (Model Search) Forward and Backward passes 2 Feedforward Neural Networks Neural Networks: Architectures 2-layer

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Generalized Zero-Shot Learning with Deep Calibration Network

Generalized Zero-Shot Learning with Deep Calibration Network Generalized Zero-Shot Learning with Deep Calibration Network Shichen Liu, Mingsheng Long, Jianmin Wang, and Michael I.Jordan School of Software, Tsinghua University, China KLiss, MOE; BNRist; Research

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Very Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas

Very Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas Very Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas Center for Machine Perception Department of Cybernetics Faculty of Electrical Engineering

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Deep Learning for Partial Differential Equations (PDEs)

Deep Learning for Partial Differential Equations (PDEs) Deep Learning for Partial Differential Equations (PDEs) Kailai Xu kailaix@stanford.edu Bella Shi bshi@stanford.edu Shuyi Yin syin3@stanford.edu Abstract Partial differential equations (PDEs) have been

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

Research Article A Two-Grid Method for Finite Element Solutions of Nonlinear Parabolic Equations

Research Article A Two-Grid Method for Finite Element Solutions of Nonlinear Parabolic Equations Abstract and Applied Analysis Volume 212, Article ID 391918, 11 pages doi:1.1155/212/391918 Research Article A Two-Grid Method for Finite Element Solutions of Nonlinear Parabolic Equations Chuanjun Chen

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

Nonlinear Optimization Methods for Machine Learning

Nonlinear Optimization Methods for Machine Learning Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks

More information

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- Workshop track - ICLR COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- CURRENT NEURAL NETWORKS Daniel Fojo, Víctor Campos, Xavier Giró-i-Nieto Universitat Politècnica de Catalunya, Barcelona Supercomputing

More information

CS260: Machine Learning Algorithms

CS260: Machine Learning Algorithms CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {

More information

Credit Assignment: Beyond Backpropagation

Credit Assignment: Beyond Backpropagation Credit Assignment: Beyond Backpropagation Yoshua Bengio 11 December 2016 AutoDiff NIPS 2016 Workshop oo b s res P IT g, M e n i arn nlin Le ain o p ee em : D will r G PLU ters p cha k t, u o is Deep Learning

More information

Don t Decay the Learning Rate, Increase the Batch Size. Samuel L. Smith, Pieter-Jan Kindermans, Quoc V. Le December 9 th 2017

Don t Decay the Learning Rate, Increase the Batch Size. Samuel L. Smith, Pieter-Jan Kindermans, Quoc V. Le December 9 th 2017 Don t Decay the Learning Rate, Increase the Batch Size Samuel L. Smith, Pieter-Jan Kindermans, Quoc V. Le December 9 th 2017 slsmith@ Google Brain Three related questions: What properties control generalization?

More information

arxiv: v2 [cs.cv] 12 Apr 2016

arxiv: v2 [cs.cv] 12 Apr 2016 arxiv:1603.05027v2 [cs.cv] 12 Apr 2016 Identity Mappings in Deep Residual Networks Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun Microsoft Research Abstract Deep residual networks [1] have emerged

More information

Mollifying Networks. ICLR,2017 Presenter: Arshdeep Sekhon & Be

Mollifying Networks. ICLR,2017 Presenter: Arshdeep Sekhon & Be Mollifying Networks Caglar Gulcehre 1 Marcin Moczulski 2 Francesco Visin 3 Yoshua Bengio 1 1 University of Montreal, 2 University of Oxford, 3 Politecnico di Milano ICLR,2017 Presenter: Arshdeep Sekhon

More information

Quantum Convolutional Neural Networks

Quantum Convolutional Neural Networks Quantum Convolutional Neural Networks Iris Cong Soonwon Choi Mikhail D. Lukin arxiv:1810.03787 Berkeley Quantum Information Seminar October 16 th, 2018 Why quantum machine learning? Machine learning: interpret

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley

what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley Collaborators Joint work with Samy Bengio, Moritz Hardt, Michael Jordan, Jason Lee, Max Simchowitz,

More information

Ch.6 Deep Feedforward Networks (2/3)

Ch.6 Deep Feedforward Networks (2/3) Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations

More information

Numerical Solutions of ODEs by Gaussian (Kalman) Filtering

Numerical Solutions of ODEs by Gaussian (Kalman) Filtering Numerical Solutions of ODEs by Gaussian (Kalman) Filtering Hans Kersting joint work with Michael Schober, Philipp Hennig, Tim Sullivan and Han C. Lie SIAM CSE, Atlanta March 1, 2017 Emmy Noether Group

More information

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)

More information

Learning MMSE Optimal Thresholds for FISTA

Learning MMSE Optimal Thresholds for FISTA MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm

More information

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

A random perturbation approach to some stochastic approximation algorithms in optimization.

A random perturbation approach to some stochastic approximation algorithms in optimization. A random perturbation approach to some stochastic approximation algorithms in optimization. Wenqing Hu. 1 (Presentation based on joint works with Chris Junchi Li 2, Weijie Su 3, Haoyi Xiong 4.) 1. Department

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Graph based Subspace Segmentation. Canyi Lu National University of Singapore Nov. 21, 2013

Graph based Subspace Segmentation. Canyi Lu National University of Singapore Nov. 21, 2013 Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013 Content Subspace Segmentation Problem Related Work Sparse Subspace Clustering (SSC) Low-Rank Representation (LRR)

More information

arxiv: v1 [cs.cv] 18 May 2018

arxiv: v1 [cs.cv] 18 May 2018 Norm-Preservation: Why Residual Networks Can Become Extremely Deep? arxiv:85.7477v [cs.cv] 8 May 8 Alireza Zaeemzadeh University of Central Florida zaeemzadeh@eecs.ucf.edu Nazanin Rahnavard University

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Machine Learning (CSE 446): Backpropagation

Machine Learning (CSE 446): Backpropagation Machine Learning (CSE 446): Backpropagation Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 8, 2017 1 / 32 Neuron-Inspired Classifiers correct output y n L n loss hidden units

More information

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Weiran Wang Toyota Technological Institute at Chicago * Joint work with Raman Arora (JHU), Karen Livescu and Nati Srebro (TTIC)

More information

arxiv: v5 [cs.cv] 19 May 2018

arxiv: v5 [cs.cv] 19 May 2018 Convolutional Neural Networks combined with Runge-Kutta Methods arxiv:1802.08831v5 [cs.cv] 19 May 2018 Mai Zhu Northeastern University Shenyang, Liaoning, China zhumai@stumail.neu.edu.cn Bo Chang University

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley

A Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study

More information

Optimization for Training I. First-Order Methods Training algorithm

Optimization for Training I. First-Order Methods Training algorithm Optimization for Training I First-Order Methods Training algorithm 2 OPTIMIZATION METHODS Topics: Types of optimization methods. Practical optimization methods breakdown into two categories: 1. First-order

More information

Primal-dual algorithms for the sum of two and three functions 1

Primal-dual algorithms for the sum of two and three functions 1 Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms

More information

Explaining Machine Learning Decisions

Explaining Machine Learning Decisions Explaining Machine Learning Decisions Grégoire Montavon, TU Berlin Joint work with: Wojciech Samek, Klaus-Robert Müller, Sebastian Lapuschkin, Alexander Binder 18/09/2018 Intl. Workshop ML & AI, Telecom

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:

More information

Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks

Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks Enforcing constraints for interpolation and extrapolation in Generative Adversarial Networks Panos Stinis (joint work with T. Hagge, A.M. Tartakovsky and E. Yeung) Pacific Northwest National Laboratory

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 15: Matrix Factorization Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

More information

ADMM and Accelerated ADMM as Continuous Dynamical Systems

ADMM and Accelerated ADMM as Continuous Dynamical Systems Guilherme França 1 Daniel P. Robinson 1 René Vidal 1 Abstract Recently, there has been an increasing interest in using tools from dynamical systems to analyze the behavior of simple optimization algorithms

More information

Deep Convolutional Neural Networks for Pairwise Causality

Deep Convolutional Neural Networks for Pairwise Causality Deep Convolutional Neural Networks for Pairwise Causality Karamjit Singh, Garima Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal TCS Research, Delhi Tata Consultancy Services Ltd. {karamjit.singh,

More information

arxiv: v2 [cs.cv] 18 Jul 2017

arxiv: v2 [cs.cv] 18 Jul 2017 Interleaved Group Convolutions for Deep Neural Networks Ting Zhang 1 Guo-Jun Qi 2 Bin Xiao 1 Jingdong Wang 1 1 Microsoft Research 2 University of Central Florida {tinzhan, Bin.Xiao, jingdw}@microsoft.com

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear

More information

Information theoretic perspectives on learning algorithms

Information theoretic perspectives on learning algorithms Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez

More information

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Learning Lab Course 2017 (Deep Learning Practical) Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Dissipativity Theory for Nesterov s Accelerated Method

Dissipativity Theory for Nesterov s Accelerated Method Bin Hu Laurent Lessard Abstract In this paper, we adapt the control theoretic concept of dissipativity theory to provide a natural understanding of Nesterov s accelerated method. Our theory ties rigorous

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past

More information

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):

More information

arxiv: v3 [cs.lg] 22 Mar 2018

arxiv: v3 [cs.lg] 22 Mar 2018 arxiv:1710.06081v3 [cs.lg] 22 Mar 2018 Boosting Adversarial Attacks with Momentum Yinpeng Dong1, Fangzhou Liao1, Tianyu Pang1, Hang Su1, Jun Zhu1, Xiaolin Hu1, Jianguo Li2 1 Department of Computer Science

More information

Neural Networks. Intro to AI Bert Huang Virginia Tech

Neural Networks. Intro to AI Bert Huang Virginia Tech Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

More information

Unraveling the mysteries of stochastic gradient descent on deep neural networks

Unraveling the mysteries of stochastic gradient descent on deep neural networks Unraveling the mysteries of stochastic gradient descent on deep neural networks Pratik Chaudhari UCLA VISION LAB 1 The question measures disagreement of predictions with ground truth Cat Dog... x = argmin

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Neural Networks Teaser

Neural Networks Teaser 1/11 Neural Networks Teaser February 27, 2017 Deep Learning in the News 2/11 Go falls to computers. Learning 3/11 How to teach a robot to be able to recognize images as either a cat or a non-cat? This

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective Supplementary Materials

Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective Supplementary Materials Adversarial Image Perturbation for Privacy Protection A Game Theory Perspective Supplementary Materials 1. Contents The supplementary materials contain auxiliary experiments for the empirical analyses

More information

Overparametrization for Landscape Design in Non-convex Optimization

Overparametrization for Landscape Design in Non-convex Optimization Overparametrization for Landscape Design in Non-convex Optimization Jason D. Lee University of Southern California September 19, 2018 The State of Non-Convex Optimization Practical observation: Empirically,

More information