Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Size: px

Start display at page:

Download "Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,"

Juliana Flynn
5 years ago
Views:

1 Index A Activation functions, neuron/perceptron binary threshold activation function, linear activation function, 102 rectified linear unit, 106 sigmoid activation function, SoftMax activation function, tanh activation function, 107 AdadeltaOptimizer, AdagradOptimizer, AdamOptimizer, 135 Auto encoders architecture, 323 cases, 324 combined classification network, class prediction, 326 denoising auto-encoder implementation, 333 element wise activation function, 324 hidden layer, 323 KL divergence, learning rule of model, 324 multiple hidden layers, 325 network, class prediction, 326 sparse, 328 unsupervised ANN, 322 B Backpropagation, 109 convolution layer, for gradient computation cost derivative, 116 cost function, , 112 cross-entropy cost, SoftMax activation layer, 115 forward pass and backward pass, 114 hidden layer unit, 110 independent sigmoid output units, 111 multi-layer neural network, 113 neural networks, 114 partial derivative, partial derivative, cost function, propagating error, 109 sigmoid activation functions, 114 SoftMax function, 114 Softmax output layer, 114 pooling layer, Backpropagation through time (BPTT), 256 Batch normalization, Bayesian inference Bernoulli distribution, 282 likelihood function, , 286 likelihood function plot, 284 posterior distribution, 281 posterior probability distribution, 281, 283, prior, 283 prior probability distribution, 283, 285 Bayesian networks, 38 Bayes rule, 38 Bernoulli distribution, Bidirectional RNN, Binary threshold activation function, Binomial distribution, 49 Block Gibbs sampling, 305 Boltzmann distribution, C Calculus, 23 convex function, convex set, differentiation, gradient of function, Hessian matrix of function, 25 local and global minima, maxima and minima of functions, 26 for univariate function, multivariate convex and non-convex functions, Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow, 393

2 Calculus (cont.) non-convex function, 31 positive semi-definite and definite, 29 successive partial derivatives, 25 Taylor series, 34 Central Limit theorem, 53 Collaborative filtering contrastive divergence, 315 derived probabilities, 317 description, 313 energy configuration, 317 joint configuration, 316 matrix factorization method, 313 probability of hidden unit, 316 RBMs, 314 restricted Boltzmann View, user, Continuous bag of words (CBOW) hidden-layer embedding, 230 hidden layer vector, 229, 231 SoftMax output probability, 231 TensorFlow implementation, 234 word embeddings, Contrastive divergence, , 315 Convolutional neural networks (CNNs), 153 architectures, 206 AlexNet, LeNet, ResNet, VGG16, components, 179 convolution layer, input layer, 180 pooling layer, 182 convolution operation, 153 2D convolution of image, D convolution of signal, LTI/LSI systems, signals in one dimension, , digit recognition on MNIST dataset, dropout layers and regularization, elements, 153 image-processing filters, 169 Gaussian filter, 173 gradient-based filters, identity transform, Mean filter, Median filter, Sobel edge-detection filter, for solving real-world problems, translational equivariance, pooling, weight sharing, 187 Cross-correlation, 180 D Deep belief networks (DBNs) backpropagation, 318 implementation, 319 learning algorithm, 318 MNIST dataset, 318 RBMs, 317 ReLU activation functions, 319 schematic diagram, 317, 318 sigmoid units, 318 Deep learning evolution artificial neural networks, artificial neuron structure, 90 biological neuron structure, 89 perceptron learning algorithms activation functions, hidden layers linear, backpropagation (see Backpropagation, for gradient computation) geometrical interpretation, hyperplane, classes, 93 limitations, machine-learning domain, 94 non-linearity, rule, multi-layer perceptrons network, weight parameters vector, 95 vs. traditional methods, Denoising auto-encoder, 333 E Elliptical contours, 123, 125 F Forget-gate value, 264 Fully convolutional network (FCN) architecture, 356 down and up sampling max unpooling, 360 transpose convolution, 361, 363 unpooling, 359 output feature maps, network, pixel categories, 356 SoftMax probability, 357 G, H Gated recurrent unit (GRU), Gaussian blur, 173 Generative adversarial networks (GANs) 394

3 agents zero-sum game, 378 cost function and training, generative models, 378 illustration, 379 maximin and minimax problem, minimax and saddle points, neural networks, 378 TensorFlow implementation, 386 vanishing gradient, generator, 386 zero sum game, 381 Gibbs sampling bivariate normal distribution, 305 block, 305 burn in period, 306 conditional distributions, 305 generating samples, 306 Markov Chain Monte Carlo method, 304 restricted Boltzmann machines, Global co-occurrence methods, 241 building word vectors, extraction, word embeddings, 242 statistics and prediction methods, 240 SVD method, 241 word combination, 241 Word-embeddings plot, 245 word-vector embedding matrix, 242 Global minima, 28 GloVe, 245 Gradient clipping, 261 Gradient descent, backpropagation, 236 GradientDescentOptimizer, 130 Graphical processing unit (GPU), 152 I, J Image classification, Image segmentation, 345 binary thresholding method, histogram, 345, 349 FCN (see Fully convolutional network (FCN)) K-means clustering, 352 Otsu s method, semantic segmentation, 355 sliding window approach, 355 in TensorFlow implementation, semantic segmentation, 365 U-Net convolutional neutral network, Watershed algorithm, K Karush Kahn Tucker method, 78 K-means algorithm, 352 Kullback-Leibler (KL) divergence plot for mean, 327 sparse auto-encoders, L Lagrangian multipliers, 79 Language modeling, Lasso Regularization, 16 Linear activation function, 102 Linear algebra, 2 determinant of matrix, 12 interpretation, 13 Eigen vectors, characteristic equation of matrix, power iteration method, identity matrix or operator, inverse of matrix, 14 linear independence of vectors, 9 10 matrix, 4 5 matrix operations and manipulations, 5 addition of two matrices, 6 matrix working on vector, 8 product of two matrices, 6 product of two vectors, 7 subtractions of two matrices, 6 transpose of matrix, 7 norm of vector, product of vector in direction of another vector, pseudo inverse of matrix, 16 rank of matrix, scalar, 4 tensor, 5 unit vector in direction of specific vector, 17 vector, 3 4 Linear shift invariant (LSI) systems, Linear time invariant (LTI) systems, Localization network, Local minima point, 28 Long short-term memory (LSTM) architecture, 262 building blocks and function, exploding-and vanishing-gradient problems, forget gate, 263 output gates, 263 M, N Machine learning, 55 constrained optimization problem, and data science, 2 dimensionality reduction methods, 79 principal component analysis, singular value decomposition, optimization techniques contour plot and lines, gradient descent, 66 linear curve,

4 Machine learning (cont.) for multivariate cost function, gradient descent, negative curvature, 75 Newton s method, 74 positive curvature, steepest descent, 70 stochastic gradient descent, regularization, constraint optimization problem, supervised learning, 56 classification, hyperplanes and linear classifiers, linear regression, unsupervised learning, 65 Markov Chain, 288 Markov Chain Monte Carlo (MCMC) methods, 280 aperiodicity, 289 area of Pi, 287 computation of Pi, 287 detailed balance condition, 289 implementation, 289 irreducibility, 289 metropolis algorithm acceptance probability, 291 bivariate Gaussian distribution, sampling, heuristics, 290 implementation, 290 transition probability function, 290, 291 probability zones, 287 sampling, 286 states, gas molecules, 288 stochastic/random, 288 transition probability, 288 Matrix factorization method, 313 Maximum likelihood estimate (MLE) technique, Max unpooling, 360 Momentum-based optimizers, Monte Carlo method, 287 Multi-layer Perceptron (MLP), 99 O Object detection fast R-CNN network, 377 R-CNN network, sliding-window technique, 375 task, 375 Otsu s method, Overfitting, 84 P, Q PCA and ZCA whitening advantage, illustration, pixels, 340 spatial structure, 341 techniques, 340 whitening transform, 341 Perceptron, 92 Points of inflection, 26 Principal component analysis, 279 See also PCA and ZCA whitening Probability, 34 Bayes rule, 38 chain rule, 37 conditional independence of events, 38 correlation coefficient, 44 covariance, 44 distribution Bernoulli distribution, binomial distribution, 49 multivariate normal distribution, 48 normal distribution, Poisson distribution, 50 uniform distribution, expectation of random variable, 39 hypothesis testing and p value, independence of events, 37 likelihood function, 51 MLE, mutually exclusive events, 37 probability density function (pdf), 39 probability mass function (pmf), 38 skewness and Kurtosis, 40, 42 unions, intersection, and conditional, variance of random variable, R Rectified linear unit (ReLU) activation function, 106 Recurrent neural networks (RNNs) architectural principal, 252 bidirectional RNN, BPTT, 256 component, embeddings layer, 252 folded and unfolded structure, 252 GRU, language modeling, LSTM, MNIST digit identification, TensorFlow Alice in Wonderland, 273 implementation, LSTM,

5 input tensor shape, LSTM network, 265 next-word prediction and sentence completion, 268 traditional language models, 255 vanishing and exploding gradient problem gradient clipping, 261 LSTMs, memory-to-memory weight connection matrix and ReLU units, 261 sigmoid function, 259 temporal components, 259 Restricted Boltzmann machines (RBMs) Block Gibbs sampling, 305 collaborative filtering binary visible unit, 315 contrastive divergence, 315 hidden units, , 317 joint configuration, 316 Netflix Challenge, 314 probability of hidden unit, 316 schematic diagram, matrix factorization method, 313 SoftMax function, 315 three-way energy configuration, 317 conditional probability distribution, 296 contrastive divergence, DBNs (see Deep belief networks (DBNs)) deep networks, 294 discrete variables, 297 Gibbs sampling, graphical probabilistic model, 295 implementation, MNIST dataset, 309 joint configuration, 295 joint probability distribution, 295, 298 machine learning algorithms, 294 partition function Z, 295 sigmoid function, 299 symmetrical undirected network, 299 training, 299 visible and hidden layers architecture, 294 Ridge regression, 86 Ridge regularization, 16 RMSprop, S Saddle points, 127, 129, Semantic segmentation, 355 in TensorFlow, FCN network, 365 Sigmoid activation function, Singular value decomposition (SVD), , 313, 340 Skip-gram models, 236 TensorFlow implementation, 240 word embedding, Sliding window approach, 355 SoftMax activation function, Sparse auto-encoders hidden layer output, 329 hidden layer sigmoid activations, 328 hidden structures, input data, 328 implementation, TensorFlow, 329 Stochastic gradient descent (SGD), 71, 127 Supremum norm, 15 T Tanh activation function, 107 Taylor series expansion, 34 TensorFlow commands, define check Tensor shape, 120 explicit evaluation, 120 Interactive Session() command, invoke session and display, variable, 121 Numpy Array to Tensor conversion, 122 placeholders and feed dictionary, 122 TensorFlow and Numpy Library, 119 TensorFlow constants, 120 TensorFlow variable, random initial values, 121 tf.session(), 121 variables, 121 variable state update, 122 deep-learning packages, 118 features, deep-learning frameworks, gradient-descent optimization methods elliptical contours, 123, 125 non-convexity of cost functions, 126 saddle points, 127, 129 installation, 119 linear regression actual house price vs. predicted house price, 146 cost plot over epochs, 145 implementation, 143 meta graph definition, 390 mini-batch stochastic gradient descent, rate, 129 models deployment, production, multi-class classification, SoftMax function full-batch gradient descent, 146 stochastic gradient descent, 149 optimizers AdadeltaOptimizer, AdagradOptimizer, AdamOptimizer, 135 batch size, 138 epochs, 138 GradientDescentOptimizer,

6 TensorFlow (cont.) MomentumOptimizer and Neterov Algorithm, number of batches, 138 RMSprop, XOR implementation computation graph, hidden layers, 138 linear activation functions, hidden layer, 142 Traditional language models, 255 Transfer learning, 211 with Google InceptionV3, , 216 guidelines, 212 with pre-trained VGG16, , 221 Transpose convolution, 361, 363 U U-Net architecture, 364 Unpooling, 359 V Vector representation of words, 227 Vector space model (VSM), 227 W, X, Y Watershed algorithm, Word-embeddings plot, 245 Word-embedding vector, Word2Vec CBOW method (see Continuous bag of words (CBOW)) global co-occurrence methods, 240 GloVe, 245 skip-gram models, TensorFlow implementation, CBOW, 231 word analogy, word vectors, 249 Word-vector embeddings matrix, 242 Z Zero sum game,

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?