SEBASTIAN RASCHKA. Introduction to Artificial Neural Networks and Deep Learning. with Applications in Python

Similar documents
SEBASTIAN RASCHKA. Introduction to Artificial Neural Networks and Deep Learning. with Applications in Python

SEBASTIAN RASCHKA. Introduction to Artificial Neural Networks and Deep Learning. with Applications in Python

Introduction to Artificial Neural Networks and Deep Learning

Probability Theory and Statistics. Peter Jochumzen

Table of mathematical symbols - Wikipedia, the free encyclopedia

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Stat 5101 Notes: Algorithms (thru 2nd midterm)

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Review of Linear Algebra

ENGG2430A-Homework 2

Stat 5101 Notes: Algorithms

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review of Probability. CS1538: Introduction to Simulations

OR MSc Maths Revision Course

Propositional Logic, Predicates, and Equivalence

Review (Probability & Linear Algebra)

Foundations of Mathematics MATH 220 FALL 2017 Lecture Notes

Professor Terje Haukaas University of British Columbia, Vancouver Notation

Asymptotic Statistics-III. Changliang Zou

Some Concepts of Probability (Review) Volker Tresp Summer 2018

ADDITIONAL MATHEMATICS

ADDITIONAL MATHEMATICS

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Review of Probability Theory

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Logic, Sets, and Proofs

MATHEMATICS. Higher 2 (Syllabus 9740)

Background Mathematics (2/2) 1. David Barber

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Stat 206: Sampling theory, sample moments, mahalanobis

Background for Discrete Mathematics

Introduction to Machine Learning

Probability on a Riemannian Manifold

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Discrete Mathematics. W. Ethan Duckworth. Fall 2017, Loyola University Maryland

MAS113 Introduction to Probability and Statistics. Proofs of theorems

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Elementary Point-Set Topology

Notes on Random Vectors and Multivariate Normal

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

3. Probability and Statistics

Chapter 2 - Basics Structures MATH 213. Chapter 2: Basic Structures. Dr. Eric Bancroft. Fall Dr. Eric Bancroft MATH 213 Fall / 60

SYLLABUS FOR ENTRANCE EXAMINATION NANYANG TECHNOLOGICAL UNIVERSITY FOR INTERNATIONAL STUDENTS A-LEVEL MATHEMATICS

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Chapter 2 - Basics Structures

Notes on Random Processes

This appendix provides a very basic introduction to linear algebra concepts.

Linear Algebra March 16, 2019

BASICS OF PROBABILITY

Let X and Y denote two random variables. The joint distribution of these random

Algorithms for Uncertainty Quantification

Simple Linear Regression Estimation and Properties

Math 105 Course Outline

Math 10850, fall 2017, University of Notre Dame

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Random Variables and Their Distributions

Final Exam # 3. Sta 230: Probability. December 16, 2012

Lecture 2: Repetition of probability theory and statistics

Statistical Pattern Recognition

1 Proof techniques. CS 224W Linear Algebra, Probability, and Proof Techniques

EE4601 Communication Systems

Elements of Probability Theory

Linear Algebra (Review) Volker Tresp 2018

Basic Linear Algebra in MATLAB

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /13/2016 1/33

Cambridge International Level 3 Pre-U Certificate in MATHEMATICS (STATISTICS WITH PURE MATHEMATICS) SHORT COURSE

Quick Tour of Basic Probability Theory and Linear Algebra

Pre-School Linear Algebra

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Class 8 Review Problems solutions, 18.05, Spring 2014

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture 1: August 28

Mathematical Preliminaries. Sipser pages 1-28

FE 5204 Stochastic Differential Equations

Copyright c 2007 Jason Underdown Some rights reserved. statement. sentential connectives. negation. conjunction. disjunction

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Review 1. Andreas Klappenecker

Automata Theory and Formal Grammars: Lecture 1

STAT/MATH 395 PROBABILITY II

This chapter illustrates some of the basic features of Mathematica useful for finite mathematics and business calculus. Example

Lecture 4 Backpropagation

Vectors and Matrices Statistics with Vectors and Matrices

Lecture 2: Review of Probability

Jointly Distributed Random Variables

Packet #1: Logic & Proofs. Applied Discrete Mathematics

Lecture 2: Linear Algebra Review

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Joint probability distributions: Discrete Variables. Two Discrete Random Variables. Example 1. Example 1

Probability Background

Functions of two random variables. Conditional pairs

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

PROBABILITY THEORY REVIEW

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

ECE 650 Lecture 4. Intro to Estimation Theory Random Vectors. ECE 650 D. Van Alphen 1

2 (Statistics) Random variables

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Transcription:

SEBASTIAN RASCHKA Introduction to Artificial Neural Networks and Deep Learning with Applications in Python

Introduction to Artificial Neural Networks with Applications in Python Sebastian Raschka Last updated: May 25, 2018 This book will be available at http://leanpub.com/ann-and-deeplearning. Please visit https://github.com/rasbt/deep-learning-book for more information, supporting material, and code examples. c 2016-2018 Sebastian Raschka

Contents A Mathematical Notation Reference 4 A.1 Sets and Intervals......................... 5 A.2 Sequences............................. 6 A.3 Functions.............................. 6 A.4 Linear Algebra........................... 7 A.5 Calculus.............................. 9 A.6 Probability and Statistics..................... 11 A.7 Numbers.............................. 13 A.8 Approximation.......................... 13 A.9 Logic................................ 14 i

Website Please visit the GitHub repository to download the code examples accompanying this book and other supplementary material. If you like the content, please consider supporting the work by buying a copy of the book on Leanpub. Also, I would appreciate hearing your opinion and feedback about the book, and if you have any questions about the contents, please don t hesitate to get in touch with me via mail@sebastianraschka.com. Happy learning! Sebastian Raschka 1

About the Author Sebastian Raschka received his doctorate from Michigan State University developing novel computational methods in the field of computational biology. In summer 2018, he joined the University of Wisconsin Madison as Assistant Professor of Statistics. Among others, his research activities include the development of new deep learning architectures to solve problems in the field of biometrics. Among his other works is his book "Python Machine Learning," a bestselling title at Packt and on Amazon.com, which received the ACM Best of Computing award in 2016 and was translated into many different languages, including German, Korean, Italian, traditional Chinese, simplified Chinese, Russian, Polish, and Japanese. Sebastian is also an avid open-source contributor and likes to contribute to the scientific Python ecosystem in his free-time. If you like to find more about what Sebastian is currently up to or like to get in touch, you can find his personal website at https://sebastianraschka.com. 2

Acknowledgements I would like to give my special thanks to the readers, who provided feedback, caught various typos and errors, and offered suggestions for clarifying my writing. Appendix A: Artem Sobolev, Ryan Sun Appendix B: Brett Miller, Ryan Sun Appendix D: Marcel Blattner, Ignacio Campabadal, Ryan Sun, Denis Parra Santander Appendix F: Guillermo Monecchi, Ged Ridgway, Ryan Sun, Patric Hindenberger Appendix H: Brett Miller, Ryan Sun, Nicolas Palopoli, Kevin Zakka 3

Appendix A Mathematical Notation Reference This appendix provides a brief overview of the mathematical notation used throughout this book. The following appendices describe most of the corresponding concepts in more detail, and additional information is provided in the context of the applications in the main chapters. 4

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 5 A.1 Sets and Intervals Z set of integers, {..., 2, 1, 0, 1, 2,...} N set of natural numbers, {0, 1, 2, 3,...} N + set of natural numbers excluding zero, {1, 2, 3,...} R set of real numbers element of symbol; for example, x A translates to "x is an element of set A" / not an element of symbol A B A B A B A B null set, empty set union of two sets, A and B intersection of two sets, A and B A is a subset of B or included in B symmetric difference between two sets A and B A cardinality of a set A (number of elements in a set A) (a, b) [a, b] [a, b) (a, b] open interval from a to b, excluding a and b closed interval from a to b, including a and b half-open interval from a to b, including a but not b half-open interval from a to b, including b but not a

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 6 A.2 Sequences n x i i=1 n x i i=1 summation of an indexed variable x i, defined as n x i = x 1 + x 2 + + x n product over an indexed variable x i, defined as n x i = x 1 x 2... x n i=1 i=1 A.3 Functions f : A B function f with domain A and codomain B (g f)(x) composition of two functions g and f alternative form: g[f(x)] f 1 (x) inverse of a function f, such that f(y) = x if f 1 stands for y x absolute value of x; for example, 2 = 2 log b log base-b logarithm natural logarithm (base-e logarithm) n! n-factorial, where 0! = 1 and n! = n(n 1)(n 2) 2 1 for n > 0 ( n k) binomial coefficient ("n choose k"); ( n) k = n! k!(n k)! for 0 k n arg max f(x) arg min f(x) the x value that makes f(x) as large as possible the x value that makes f(x) as small as possible

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 7 A.4 Linear Algebra x x scalar (lower-case italics notation) column vector (lower-case bold notation) or n 1-matrix a b dot product of two vectors, a and b; if a and b are n 1-matrices, also written as a T b; a b = a T b = i a ib i = a 1 b 1 + a 2 b 2 + + a n b n X X R n m n-matrix (upper-case bold notation) 3D-tensor (upper-case italics notation) real coordinate space, written as a column vector with length n x = x 1 x 2. x n x T transpose of a n 1-matrix ] x T = [x 1 x 2... x n = T x 1 x 2. x n x p x L p norm, vector p-norm, x p = ( x p 1 + xp 2 + + xp n ) 1/p L norm, max norm; largest absolute value of a vector x = max i x i

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 8 x vector norm, L 2 -norm, x = x 2 x = x 2 1 + x2 2 + + x2 n A i,: A :,j A T ith row of matrix A jth column of matrix A transpose of a matrix, matrix element A i,j becomes A T j,i 1 2 T [ ] 1 3 5 for example, 3 4 = 2 4 6 5 6 I n n n identity matrix 1 0 0 I 3 = 0 1 0 0 0 1 A 1 tr A det A diag(a 1, a 2,..., a n ) A B inverse of a matrix A, such that AA 1 = A 1 A = I trace of a matrix A (sum of the diagonal elements) tr A = n A i,i i=1 determinant of a matrix A diagonal matrix, matrix whose diagonal have the values a 1, a 2,..., a n and all other elements are zero Hadamard product, element-wise matrix multiplication

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 9 A.5 Calculus lim f(x) x a lim f(x) x a lim f(x) x a+ df dx d n f dx n limit of f(x) as x approaches a limit of f(x) as x approaches a from the left limit of f(x) as x approaches a from the right derivative of f n-th derivative of f f x partial derivative of f(x, y,...) with respect to variable x, where x is a scalar f gradient of a function f : R n R f x 1 f x f(x 1, x 2,..., x n ) = 2. f x n f Laplacian of a function f : R n R f = n i=1 2 f x 2 i

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 10 Hf Hessian of a function f : R n R Hf = 2 f x 1 x 1 2 f x 2 x 1. 2 f x n x 1 2 f x 1 x 2... 2 f 2 f x 1 x n 2 f x 2 x n 2 f x n x n x 2 x 2........ 2 f x n x 2... f j x i partial derivative of component function f j and the variable x j, where f : R n R m, such that f 1 (x) f 2 (x) f(x) =. f m (x) f x i = f 1 x i f 2 x i. f m x i Df Jacobian matrix of f. f 1 f 1 f x 1 x 2... 1 x n f 2 f 2 f x Df = 1 x 2... 2 x n...... f m f m f x 1 x 2... m x n f(x)dx b a f(x)dx indefinite integral of f (derivative of F ) with f : R R definite integral of f (derivative of F ) with f : R R

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 11 A.6 Probability and Statistics P (A B) P (A B) P (A B) E(X), µ X X var(x), σx 2 probability that event A and B occur probability that event A or B occurs conditional probability of A given B expected value (mean) of a random variable X E(X) = p i x i for a discrete random variable X i=1 with values x 1, x 2,... and probabilities p 1, p 2,.... E(X) = xf(x)dx for a continuos random variable and probability density function f(x). sample average of numerical data X 1,..., X n X = 1 n n X i i=1 variance of a random variable X var(x) = E [ (X µ X ) 2] = E(X 2 ) E(X) 2 s 2 X sample variance of numerical data X 1,..., X n s 2 X = 1 n n (X i X) 2 i=1

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 12 std(x), σ x standard deviation of a random variable, square root of the variance s X cov(x, Y ) sample standard deviation, the square root of the sample variance s 2 X covariance of two random variables X and Y cov(xy ) = E[(X E(X))(Y E(Y ))] = E(XY ) E(X)E(Y ) s XY sample covariance of numerical data X 1,..., X n, and Y 1,..., Y n s XY = 1 n n (X i X)(Y i Ȳ ) i=1 corr(x, Y ) correlation coefficient of two random variables X and Y, corr(x, Y ) = cov(x,y ) σ X σ Y H(X) entropy of a random variable X discrete: H(X) = P (X = x) log b P (X = x) x continuous: H(X) = f(x) log b f(x)dx PMF probability mass function of a discrete random variable, f(x) = P (X = x) CDF PDF X D ˆθ cumulative distribution function of a continuous random variable, F (x) = P (X x) probability density function of a continuous random variable, P (X [a, b]) = b f(x)dx a random variable X has a distribution D estimator of a parameter θ N(x, µ, σ 2 ) normal (Gaussian) distribution over x with mean µ and variance σ 2

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 13 A.7 Numbers e Euler s number, mathematical constant approximated by 2.71828 π "pi", mathematical constant approximated by 3.14159 infinity symbol 1.234 10 5 scientific notation for 123, 400 or 1.234E05 < less than sign, for example, x < 10 means that x is smaller than 10 much less than sign > greater than sign, for example, x > 10 means that x is larger than 10 much greater than sign much less than sign A.8 Approximation approximate equality, for instance, e 2.71828 is the approximation of Euler s number f(x) g(x) symbol to assert that the ratio of two functions approaches 1 f(x) = 1, if x is small lim x 0 lim x g(x) f(x) g(x) = 1, if x is large f(x) g(x) the two functions f(x) and g(x) are proportional to each other T (n) O(n 2 ) big-o notation, an algorithm is asymptotically bounded by n 2 ; an algorithm has an order of n 2 time complexity

APPENDIX A. MATHEMATICAL NOTATION REFERENCE 14 A.9 Logic implication operator for example, A B translates to "if A implies B" or "if A then B" (or "B only if A") equality operator (if and only if (iff)) for example, A B translates to "A if and only if B" or "if A then B and if B then A" logical conjunction, and for example, A B means "A and B" logical (inclusive) disjunction, or for example, A B means "A or B" negation, not for example, A means "not A" or "if A is true then A is false" and vice versa universal quantifier, means for all for example, " x R, x > 1" translates to "for all real numbers x, x is greater than one" existential quantifier, means there exists for example, " x A, f(x)" translates to "there is an element in set A for which the predicate f(x) holds true"

Bibliography 15

Abbreviations and Terms CNN [Convolutional Neural Network] 16

Index 17