Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Similar documents
Pattern Recognition and Machine Learning

Learning Deep Architectures for AI. Part II - Vijay Chakilam

PATTERN CLASSIFICATION

Deep Feedforward Networks

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Deep Learning Autoencoder Models

Deep unsupervised learning

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Introduction to Deep Learning CMPT 733. Steven Bergner

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Deep Feedforward Networks

Lecture 16 Deep Neural Generative Models

Bagging and Other Ensemble Methods

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Deep Generative Models. (Unsupervised Learning)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

UNSUPERVISED LEARNING

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Greedy Layer-Wise Training of Deep Networks

Lecture 14: Deep Generative Learning

Unsupervised Learning

Regularization in Neural Networks

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Learning Deep Architectures

Basic Principles of Unsupervised and Unsupervised

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Deep Learning Architectures and Algorithms

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

STA 414/2104: Lecture 8

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

Recurrent Neural Networks. Jian Tang

Introduction to Deep Neural Networks

Recurrent Latent Variable Networks for Session-Based Recommendation

TUTORIAL PART 1 Unsupervised Learning

STA 414/2104: Lecture 8

The connection of dropout and Bayesian statistics

Statistical Learning Theory. Part I 5. Deep Learning

Mathematical Formulation of Our Example

ECE521 week 3: 23/26 January 2017

OPTIMIZATION METHODS IN DEEP LEARNING

STA 4273H: Statistical Machine Learning

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Machine Learning Basics: Maximum Likelihood Estimation

The Origin of Deep Learning. Lili Mou Jan, 2015

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Deep Belief Networks are compact universal approximators

Adversarial Examples

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Lecture 5 Neural models for NLP

Machine Learning Techniques for Computer Vision

CSC321 Lecture 20: Autoencoders

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Recurrent Neural Network

Machine Learning Techniques

Lecture 11 Recurrent Neural Networks I

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Learning Energy-Based Models of High-Dimensional Data

Part 2. Representation Learning Algorithms

Unsupervised Neural Nets

STA 4273H: Statistical Machine Learning

Lecture 11 Recurrent Neural Networks I

Introduction to Machine Learning Midterm Exam

Undirected Graphical Models

Jakub Hajic Artificial Intelligence Seminar I

Reservoir Computing and Echo State Networks

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

Long-Short Term Memory and Other Gated RNNs

CSC321 Lecture 9: Generalization

Denoising Autoencoders

Convolution and Pooling as an Infinitely Strong Prior

CSC321 Lecture 9: Generalization

Nonparametric Bayesian Methods (Gaussian Processes)

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

How to do backpropagation in a brain

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Dynamic Approaches: The Hidden Markov Model

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

GENERATIVE ADVERSARIAL LEARNING

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Neural Networks and Deep Learning

Bayesian Deep Learning

Deep Learning: a gentle introduction

Feature Design. Feature Design. Feature Design. & Deep Learning

PV021: Neural networks. Tomáš Brázdil

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

Independent Component Analysis. Contents

Bayesian Learning in Undirected Graphical Models

A Variance Modeling Framework Based on Variational Autoencoders for Speech Enhancement

STA 414/2104: Machine Learning

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Transcription:

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book? 8 1.2 Historical Trends in Deep Learning 12 I Applied Math and Machine Learning Basics 27 2 Linear Algebra 29 2.1 Scalars, Vectors, Matrices and Tensors 29 2.2 Multiplying Matrices and Vectors 32 2.3 Identity and Inverse Matrices 34 2.4 Linear Dependence and Span 35 2.5 Norms 36 2.6 Special Kinds of Matrices and Vectors 38 2.7 Eigendecomposition 39 2.8 Singular Value Decomposition 42 2.9 The Moore-Penrose Pseudoinverse 43 2.10 The Trace Operator 44 2.11 The Determinant 45 2.12 Example: Principal Components Analysis 45 3 Probability and Information Theory 51 3.1 Why Probability? 52 3.2 Random Variables 54 3.3 Probability Distributions 54 3.4 Marginal Probability 56 3.5 Conditional Probability 57 3.6 The Chain Rule of Conditional Probabilities 57 3.7 Independence and Conditional Independence 58 3.8 Expectation, Variance and Covariance 58 3.9 Common Probability Distributions 60 3.10 Useful Properties of Common Functions 65 3.11 Bayes' Rule 68 3.12 Technical Details of Continuous Variables 68 3.13 Information Theory 70

3.14 Structured Probabilistic Models 74 4 Numerical Computation 77 4.1 Overflow and Underflow 77 4.2 Poor Conditioning 79 4.3 Gradient-Based Optimization 79 4.4 Constrained Optimization 89 4.5 Example: Linear Least Squares 92 5 Machine Learning Basics 95 5.1 Learning Algorithms 96 5.2 Capacity, Overfitting and Underfitting 107 5.3 Hyperparameters and Validation Sets 117 5.4 Estimators, Bias and Variance 119 5.5 Maximum Likelihood Estimation 128 5.6 Bayesian Statistics 132 5.7 Supervised Learning Algorithms 136 5.8 Unsupervised Learning Algorithms 142 5.9 Stochastic Gradient Descent 147 5.10 Building a Machine Learning Algorithm 149 5.11 Challenges Motivating Deep Learning 151 II Deep Networks: Modern Practices 161 6 Deep Feedforward Networks 163 6.1 Example: Learning XOR 166 6.2 Gradient-Based Learning 171 6.3 Hidden Units 185 6.4 Architecture Design 191 6.5 Back-Propagation and Other Differentiation Algorithms 197 6.6 Historical Notes 217 7 Regularization for Deep Learning 221 7.1 Parameter Norm Penalties 223 7.2 Norm Penalties as Constrained Optimization 230 7.3 Regularization and Under-Constrained Problems 232 7.4 Dataset Augmentation 233 7.5 Noise Robustness 235 7.6 Semi-Supervised Learning 236 7.7 Multitask Learning 237 7.8 Early Stopping 239 7.9 Parameter Tying and Parameter Sharing 246 7.10 Sparse Representations 247 7.11 Bagging and Other Ensemble Methods 249 7.12 Dropout 251 7.13 Adversarial Training 261 7.14 Tangent Distance, Tangent Prop and Manifold Tangent Classifier 263

8 Optimization for Training Deep Models 267 8.1 How Learning Differs from Pure Optimization 268 8.2 Challenges in Neural Network Optimization 275 8.3 Basic Algorithms 286 8.4 Parameter Initialization Strategies 292 8.5 Algorithms with Adaptive Learning Rates 298 8.6 Approximate Second-Order Methods 302 8.7 Optimization Strategies and Meta-Algorithms 309 9 Convolutional Networks 321 9.1 The Convolution Operation 322 9.2 Motivation 324 9.3 Pooling 330 9.4 Convolution and Pooling as an Infinitely Strong Prior 334 9.5 Variants of the Basic Convolution Function 337 9.6 Structured Outputs 347 9.7 Data Types 348 9.8 Efficient Convolution Algorithms 350 9.9 Random or Unsupervised Features 351 9.10 The Neuroscientific Basis for Convolutional Networks 353 9.11 Convolutional Networks and the History of Deep Learning 359 10 Sequence Modeling: Recurrent and Recursive Nets 363 10.1 Unfolding Computational Graphs 365 10.2 Recurrent Neural Networks 368 10.3 Bidirectional RNNs 383 10.4 Encoder-Decoder Sequence-to-Sequence Architectures 385 10.5 Deep Recurrent Networks 387 10.6 Recursive Neural Networks 388 10.7 The Challenge of Long-Term Dependencies 390 10.8 Echo State Networks 392 10.9 Leaky Units and Other Strategies for Multiple Time Scales 395 10.10 The Long Short-Term Memory and Other Gated RNNs 397 10.11 Optimization for Long-Term Dependencies 401 10.12 Explicit Memory 405 11 Practical Methodology 409 11.1 Performance Metrics 410 11.2 Default Baseline Models 413 11.3 Determining Whether to Gather More Data 414 11.4 Selecting Hyperparameters 415 11.5 Debugging Strategies 424 11.6 Example: Multi-Digit Number Recognition 428 12 Applications 431 12.1 Large-Scale Deep Learning 431

12.2 Computer Vision 440 12.3 Speech Recognition 446 12.4 Natural Language Processing 448 12.5 Other Applications 465 III Deep Learning Research 475 13 Linear Factor Models 479 13.1 Probabilistic PC A and Factor Analysis 480 13.2 Independent Component Analysis (ICA) 481 13.3 Slow Feature Analysis 484 13.4 Sparse Coding 486 13.5 Manifold Interpretation of PCA 489 14 Autoencoders 493 14.1 Undercomplete Autoencoders 494 14.2 Regularized Autoencoders 495 14.3 Representational Power, Layer Size and Depth 499 14.4 Stochastic Encoders and Decoders 500 14.5 Denoising Autoencoders 501 14.6 Learning Manifolds with Autoencoders 506 14.7 Contractive Autoencoders 510 14.8 Predictive Sparse Decomposition 514 14.9 Applications of Autoencoders 515 15 Representation Learning 517 15.1 Greedy Layer-Wise Unsupervised Pretraining 519 15.2 Transfer Learning and Domain Adaptation 526 15.3 Semi-Supervised Disentangling of Causal Factors 532 15.4 Distributed Representation 536 15.5 Exponential Gains from Depth 543 15.6 Providing Clues to Discover Underlying Causes 544 16 Structured Probabilistic Models for Deep Learning 549 16.1 The Challenge of Unstructured Modeling 550 16.2 Using Graphs to Describe Model Structure 554 16.3 Sampling from Graphical Models 570 16.4 Advantages of Structured Modeling 572 16.5 Learning about Dependencies 572 16.6 Inference and Approximate Inference 573 16.7 The Deep Learning Approach to Structured Probabilistic Models 575 17 Monte Carlo Methods 581 17.1 Sampling and Monte Carlo Methods 581 17.2 Importance Sampling 583 17.3 Markov Chain Monte Carlo Methods 586 17.4 Gibbs Sampling 590

17.5 The Challenge of Mixing between Separated Modes 591 18 Confronting the Partition Function 597 18.1 The Log-Likelihood Gradient 598 18.2 Stochastic Maximum Likelihood and Contrastive Divergence 599 18.3 Pseudolikelihood 607 18.4 Score Matching and Ratio Matching 609 18.5 Denoising Score Matching 611 18.6 Noise-Contrastive Estimation 612 18.7 Estimating the Partition Function 614 19 Approximate Inference 623 19.1 Inference as Optimization 624 19.2 Expectation Maximization 626 19.3 MAP Inference and Sparse Coding 627 19.4 Variational Inference and Learning 629 19.5 Learned Approximate Inference 642 20 Deep Generative Models 645 20.1 Boltzmann Machines 645 20.2 Restricted Boltzmann Machines 647 20.3 Deep Belief Networks 651 20.4 Deep Boltzmann Machines 654 20.5 Boltzmann Machines for Real-Valued Data 667 20.6 Convolutional Boltzmann Machines 673 20.7 Boltzmann Machines for Structured or Sequential Outputs 675 20.8 Other Boltzmann Machines 677 20.9 Back-Propagation through Random Operations 678 20.10 Directed Generative Nets 682 20.11 Drawing Samples from Autoencoders 701 20.12 Generative Stochastic Networks 704 20.13 Other Generation Schemes 706 20.14 Evaluating Generative Models 707 20.15 Conclusion 710 Bibliography 711 Index 767 oprac. BPK