Lecture on Practical Deep Learning Statistical Physics Winter School, Pohang, January Prof. Kang-Hun Ahn.

Size: px
Start display at page:

Download "Lecture on Practical Deep Learning Statistical Physics Winter School, Pohang, January Prof. Kang-Hun Ahn."

Transcription

1 Lecture on Practical Deep Learning Statistical Physics Winter School, Pohang, January 2018 Prof. Kang-Hun Ahn Basic of Python Numerical Methods TensorFlow Convolutional Neural Networks Generative Adversarial Networks Thanks to Hyun Jae Kim, Maruchan Park

2 1. Basic of Python Fortran and C languages have traditionally been used for computation in physics. Python, a popular language for artificial intelligence-related programs, can also perform computer calculations for physics. Before I show you how to solve physics problems with Python, you can study the Python language and solve some examples. Python is an easy, fast-paced language developed by Guido van Rossum in the 90's. Somewhat subjectively, I think the Fortran language is very easy to learn, but these days everyone who learns new languages, is overwhelmed with the view that the python is easy. In this lesson I will explain Python in the context of Linux. Anyone who uses MS Windows will be able to use Linux as a virtual machine. There are various modules in Python, such as numpy, tensorflow, pytorch, scipy, and so on. You can import them, like so. import numpy as np After importing you can use the various function in numpy. In this case, attach np in front of the function. In Korea, tensorflow is the most popular AI development tool, but pytorch has grown rapidly in recent years. After installing Anaconda, which is that helps installing modules, create an environment that includes some new tools, by typing the following command in Linux. >>conda create name torch2 python=3 pytorch numpy This will create an environment named torch2, where pytorch, python ver.3 and numpy are installed on your computer and imported from the program. The extension of Python code is py, and you can run it as

3 >>Python filename.py Now let's go into the grammar of python. To me, every computer language has the following elements, and it is essential to learn them. 1) Loop 2) Conditional statement 3) Substructure 4) Array Let s start with Loop. 1) Loop counter=0 while (counter <5): print(counter) counter=counter+1 In this case, if the condition of counter <5 is satisfied, the instruction is executed repeatedly to output 0,1,2,3,4. Unlike other languages, statements in Python that are placed under conditional statements must be positioned using tabs. You should also write : at the end of the first sentence of the iteration. for count in range(3): print(count) In this case the program prints 0,1,2.

4 2) Conditional statements The following example demonstrates its functionality. if ( name == kanghun ): print( nice ) elif ( name == devil ): print( bad ) else: print( idontknow ) It looks like there s no need for explanation. elif is used to add a condition, and else is used to refer to all other conditions. Do not forget to include a tab in the conditional statement. 3) Substructure Python has a format including Class which contain functions and variables. Before we begin to learn about the class, I will introduce the function first. def ww(): print( aaa ) The above code defines the function ww. There are also : and tabs. You must include parentheses even if no parameters are given to the function. If you type ww() then aaa is printed. No ":" is required for execution.

5 Let s talk about the Class. Again, a Class is a collection of functions and variables. If your code performs only one task, you may not need classes. But we usually overwrite code, created by others, and even combine with what we've created previously. A class is useful because it creates a new code by combining different functions. Classes are constructed for combining various codes. Consider the following code: class staff: def init (self, bonus): self.bonus=bonus def salary(self): salary=10000+self.bonus return salary First we defined a class called staff, with a :. Parentheses () are not included in the class. Class display their contents using tabs. I have created two functions in the Staff class. They are the init and the salary. In the case of salary, it is named, but init is the name of the function created in python. The feature of this function is that it runs automatically when the instance of the class is made. Be careful with the self, as variables that precede "self." are common within the class format. That is, the self. indicates which area the variable is used in. Writing a Class code is like a plan to execute a command in the future, and it makes an instance that gets it ready to run. In the above case let s make an instance ahn. ahn=staff(10000)

6 At this time, the instance is created with a value of in the in the self.bonus.. If you want to call the salary function in the staff class, you can type as aa=ahn.salary() print(aa) Then it prints Be sure to include parenthes with salary(). 4) Array If we declare np.ones ((3,2)), we have all 1s in a 3 by 2 matrix. Therefore, array([[1., 1.], [1., 1.], [1., 1.]]) Is made. Note the double square brackets are two [[]] in the array. This is because the matrix represents an array with rank 2. As you'll see later, the number of square brackets increases when dealing with higher-rank arrays. np.zeros ((3,2)) is likewise a 3 by 2 matrix containing 0s. When we want to specify the components of the array, we can use np.array. If we declare dat1= np.array( [[1,2,3],[4,5,6],[7,8,9]] ), the array dat1 becomes, array([[1,2,3], [4,5,6], [7,8,9]]).

7 If you are concerned about the size of the matrix you created, type dat1.shape, then it returns the size information, in this case (3,3). Note that the size of the array can vary even with the same components. a1 = np.array( [1, 2, 3] ) # (3,) size of 1dimensional array a2 = np.array( [ [1, 2, 3] ] ) # (1,3) size of 2 dimensinoal array a3 = np.array( [ [1], [2], [3] ] ) # (3,1) size of 2 dimensional array In the case of a3, if we represent it visually, it becomes array([[1], [2], [3]]). There is a list that has the same content as array a1 but is in a different form than the array form. This is very useful and often used, Write a1=[1,2,3], and do not use np.array. In this case, a1[0]= 1, a1[1]=2, a1[2]=3. In python, the default index starts from 0 as in C language. So, a1[2] contains the last element 3 of a1. You can also create more lists in any list. For example, if a=[1,2,3,[ a, b ]], then a[3] contains [ a, b ] and a[-1] does as well So, what does a[-1][1] contain? The answer is b. The list format has a lot of features such as cut, paste, operate, insert, remove, return position, and so on. Here are a few basic examples. When a=[1,2,3] a.append(4)

8 results in a=[1,2,3,4]. Once again with a.append([5,6]), a new list can be attached to the original list. Again, when a=[1,2,3], and a.reverse() is performed, you will get a=[3,2,1]. When a=[0]*10, a=[0,0,0,0,0,0,0,0,0,0]. In addition, if you write such as (in python3) # File I/O f=open( text.txt, w ) print(type(f)) for i in range(1,10): f.write( %d th line.\ % i) f.close() f=open( text.txt, r ) for i in f: print(i) You can see the following message. <class _to.textiowrapper >

9 1 st line. 2 nd line. 9 th line. Here, there are two f variables. The computer prints type(f) at first, and shows what class f is, and uses f to read and print the file. The second f is different from the first f because the first was cleared through f.close(). Note that it doesn t show to the 10 th line but to the 9 th line. In the second loop, i will automatically refer to the string. %d means an integer. Ex) Integrate sin(x) function from 0 to π (np.pi). Ex) You can generate random numbers from 0 to 1 using np.random.random(). Assuming that x is a uniform random number from 0 to π, calculate the distribution by dividing the y value distribution of y=x*sin(x) by the interval of 0.1 using 1000

10 x s. 2. Basic numerical analysis 1) Euler method Many physics equations are made up of ordinary differential equations. Among them quadratic differential equations, especially when solving dynamic methods over time, are very common. A suitable method for solving such ODEs is Euler method. While it is known that this makes significant errors, but in fact it is not. Reducing errors by reducing time steps has been a technical problem due to high computational cost, which is not a problem nowadays, due to greatly improved computational powers have been greatly improved. The Runge Kutta method is a better way to reduce errors than the Euler method, but it is a bit more complicated, so usually it is sufficient to use Euler method to write code and get the results quickly. Before I talk about Euler methods, we need to talk about a few basic things. Computers do not know physical dimensions. Therefore, computers only give numbers. This is actually the mistake most beginners make. In physics, physical quantities are measurable quantities and consist of just three dimension: length (L), time (T), and mass (M). You may have heard of many other physical quantities, but all physical quantities can be described using L, T, and M. For example, even though the unit of energy is Joule (J), it is defined as 1 kg * 1 m 2 * 1 s 2 and has a physical dimension of ML 2 T 2. So you only need to set up three units on your computer. If you specify three units for the length, time, and mass, all other units should be represented by these. If you set 1 nanometer = 1, 1 picosecond = 1, and 0.1 microgram = 1 on your computer, you can not assign any energy to 1 at your setup. In this case, the energy unit will automatically be 0.1 microgram * (1 nanometer / 1 picosecond) 2 = 10-4 J. If a calculation has an energy dimension and a value of 2 is given, it is correct to interpret it as 2*10-4 J. When we use differential

11 equations, the coefficients of the equations start with dimensionless values. For example, in the case of a harmonic oscillator, the following equation is written, mx + bx + kx = Fsinωt. In this case, the dot on the variable means the derivative according to time t. Two dots mean second derivatives, and one dot means first derivative. The physical dimension of each term is force and it has a dimension of MLT^2. First of all, we guess what kind of exercise to do when we solve it. If you roughly estimate the amplitude and the vibration frequency of the harmonic oscillator and set the units, the numbers which appear on the computer are easy to handle. In the above equation, the oscillator is oscillated by external force at the angular velocity ω. Therefore, it can be assumed that the oscillator moves approximately 1 / ω of the unit of time. So we can use t = ωt as a new time variable. Or the resonance angular frequency ω 0 = k m as t = ω 0 t. Define a dimensionless time with t = ω 0 t and divide each term by m, 2 ω d2 x + b ω dx 0 dt 2 m 0 dt + ω 0 2 x = F sin ω t. m ω 0 And we divide by ω 0 2 again, so it becomes d 2 x dt 2 + b dx mω 0 dt + x = F sin ω t. k ω 0 This equation has the length dimension and F / k represents the approximate length of the problem. Therefore, if the unit of length is set to l 0 = F/k and the dimensionless length is defined as x = x/l 0, the equation is given as d 2 x dt 2 + b dx + x = sin ω t mω 0 dt ω 0 now all coefficients are given in dimensionless variables. You can let the computer solve this equation and interpret the result using defined units. As can be seen from the above equation, if the two dimensionless parameters b mω 0 and ω ω 0 are the

12 same, we can see that the solution of x (t) is exactly the same. If the parameters are the same, you only have to do computation once for a given group of parameters. So, when analyzing dynamics, the typical scientific papers show how the exercise style differs according to these dimensionless parameters. Now let's assume that we made up a dimensionless equation like above and omitted the ~ sign. Suppose that there is the following differential equation, x + ax + f(x) = g(t). Even if the second derivative exists, defining a new variable can make it a firstorder differential equation. Of course, the same is true for more derivatives, dv dt dx dt = v, = av f(x) + g(t), Note that there should be no differential on the right hand side. The first line is like a line defining speed, but note that v is on the right. The differential values are now treated as variables and named as dxdt and dvdt. If the time step to be calculated is dt (usually 0.001), the Euler method is as follows, x= initial value v= initial value for i=1,..,imax t= i* dt dxdt=v dvdt=-a*v f(x) + g(t) x=x+dxdt * dt v=v+dvdt * dt write (time, x)

13 This gives you a position x over time. The point of the Euler method is to update a single point according to the derivative value as indicated in the red color on the following figure. As shown in the figure above, there is always a rounding error in the Euler method. The reason why is that dt is actually not infinitely small. Several references have formulas that calculate and derive the error, but they are not actually needed. Decrease d to 1/10 and increase calculation time to imax by 10 times, plot a result see if they are the same, and if so then use it with no problem. If there is a lot of difference, you have to further reduce dt. How is it implemented with Python code? import numpy as np import matplotlib.pyplot as plt #Call pyplot from matplotlib and call plt. x0=1; v0=0. # When saving more than one variable, semicolon. a=0.1 dt=0.01 x=x0; v=v0 lx=[];lt=[] for i in range(1000): time=dt*i

14 lt.append(time) dxdt=v plt.plot(lt,lx) plt.show() dvdt=-a*v f(x) + g(time) x=x+dxdt*dt v=v+dvdt*dt lx.append(x) Exa) Using the above-mentioned Euler method, calculate motion according to b when m = 10 ng and k = 1 pn / nm for a harmonic oscillator without external force. The initial position is 10 nm and the initial velocity is zero.

15 2) Discrete Fourier transformation The Fourier transform is an important concept with wide application. n 1 A k = a m exp { 2πi mk n } m=0 n 1 k = 0,, 2 The above equation is for odd-numbered n. When n is even, we use n/2. Let a m be time series data with time and let the time interval be Δt (the time interval is given as the reciprocal of the sampling rate of the experimental equipment). Then, if the above Fourier transform is expressed as a continuous variable, t=m t and ω = 2π k n t import numpy as np import matplotlib.pyplot as plt def discrete_fourier_transform (a) : a = list(a) a_length = len(a) result = [] for k in range(0, a_length) : A_k = 0 for m in range(0, a_length) : i = (-2) * np.pi * 1j * m * k / a_length exp = np.exp(i) A_k = A_k + (a[m] * exp) result.append(a_k) return result

16 Ex) Fourier transform of exp(-0.01*m 1/10 )*sin (0.6m) m=1,2,3,,1000 import numpy as np import matplotlib.pyplot as plt lb=[];lk=[] def discrete_fourier_transform(a): a=list(a) a_length=len(a) result=[] for k in range(a_length): A_k=0 for m in range(a_length): i=(-2)*np.pi*1j*m*k/a_length exp=np.exp(i) A_k=A_k+exp*a[m] result.append(a_k) return result a=[] a.append(0) for m in range(1,1000): a.append(np.exp(-0.01*m**0.1)*np.sin(0.6*m)) b=discrete_fourier_transform(a) for k in range(0,500): lb.append(np.real(b[k])) lk.append(k) plt.plot(lk,lb) plt.show() Question) In the above example, k calculating to 500 is sufficient. Why?

17 Fourier transforms often deal with large amounts of data repeatedly in many cases. So, fast Fourier transform is often used. Example) Find the motion of the damped harmonic oscillator through the Euler

18 method, and then analyze the data using Fourier transform. 3) Gradient descent method Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent. Ex) Gradient descent has problems with pathological functions such as the Rosenbrock function shown here. f(x 1,x 2 )=(1-x 1 ) (x 2 -x 12 ) 2

19 The Rosenbrock function has a narrow curved valley which contains the minimum. The bottom of the valley is very flat. Because of the curved flat valley the optimization is zig-zagging slowly with small stepsizes towards the minimum.

20 3. Neural Network Artificial neural networks are computing systems inspired by the animal brain. Such systems learn by considering examples, generally without any intended algorithm. An artificial neural network is based on a collection of connected units or nodes called artificial neurons (analogous to biological neurons in an animal brain). Each connection (analogous to a synapse) between artificial neurons can transmit a signal from one to another. The artificial neuron that receives the signal can process it and then signal artificial neurons connected to it. The signal at a connection between artificial neurons is a real number, and the output of each artificial neuron is calculated by a non-linear function of the sum of its inputs. Artificial neurons and connections typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Artificial neurons have a threshold such that only if the aggregate signal crosses that threshold is the signal sent. Typically, the neurons are organized in layers. Different layers may perform different kinds of transformations on their inputs. Signals travel from the first (input), to the last (output) layer, possibly after traversing the layers multiple times.

21

22 Here, σ is the sigmoid function which makes the step function smooth. The nonlinear function which is used for the output of the neuron is called activation function. These days, people commonly use a Rectified Linear Unit (ReLU) as the activation function because it has several advantages over sigmoid. 트레이닝은위에나타난 loss 함수 ( 또는 cost 함수라불림 ) 를줄이도록 weight

23 factor 와 threshold 를조절해가는과정이다. 이때트레이닝을위한데이터가 있고그것이제대로작동하는가를테스트하는데이터가있어서대략 8:2 로나누 어서사용한다. Loss 함수는계산된결과와기대하는결과의차이를나타낸것 인데두값의차이를제곱하여합한 least square ( 위그림참조 ) 를사용하거나 Cross entropy 함수를사용한다. C= (1/n) batch x[y(x)lna(x)+(1 y(x))ln(1 a(x))], 뉴럴넷을훈련시킬때 loss 함수값이작아지도록 Gradiet descent 방법을사용 할수있고그때 loss 함수를 weighting factor 또는 bias 로미분한미분값이필 요하다. 그미분값은 back propagation 방법으로쉽게구할수있으나, Tensor flow 등을사 용할때이런작업의코딩을직접할필요는없다.

24 Hidden layer 는문제의복잡성이증가할때그수를늘리는것이도움이 되는데 ( 항상그런것은아니다 ), 다음의 XOR 문제의예를보면 hidden layer 를 도입했기때문에문제를풀수있다는것을알수있다. ( 조정효박사강의참조 ) Example ) Solve a XOR problem by using a neural network.

25 4. Tensorflow Tensorflow is an open source software developed by Google Brain, research organization of Google. This software is designed for configuring AI programs, so it is suitable for making neural networks. Neural networks can be represented as a graph as shown in the figure. The circles are called neurons, and passing data from one to another can be represented by arrows. In this graph, the neurons form a node with arrows, and the circle itself contains some sort of operation, including what we will introduce later, Sigmoid or ReLU. In tensorflow, the neural networks to be calculated are first constructed as graphs. These graphs are just a representation of a plan to perform some calculations, that is, a kind of code generation. When you perform something called Session, data is input and the actual calculation is performed. In this process, the resources of the computer can be used in parallel. The arrows indicate the name tensorflow because it is represented by a tensor (I am not 100% sure). Let's create a simple tensorflow program that multiplies two numbers. import tensorflow as tf a=tf.placeholder( float ) b=tf.placeholder( float ) y=tf.multiply(a,b)

26 This code constitutes a graph of the tensorflow. a has no value in this situation, instead, a will be ordered to stay in place. It is a placeholder. b is the same. Multiplication with two placeholders is performed with the multiply command of tf. You don t need to separately specify the placeholder of y. The following steps are preparing to execute the calculation and running session. sess=tf.session() print(sess.run(y, feed_dict={a:3,b:3})) If we consider a graph as an architectural design, creating a session means that you prepare construction workers and construction equipment. In the above, sess is the name of the session, which is prepared to do so. You need a sess.run to start this. The last line gives inputting values and printing outputs at the same time. Note that sess is not executed until sess.run appears. And at the moment sess is run, all variables on the graph are assigned proper values.

27 Ex) Single Neural Layer Network import tensorflow as tf import input_data mnist = input_data.read_data_sets("mnist_data/", one_hot=true) x= tf.placeholder( float,[none,784]) W=tf.Variable(tf.zeros([784,10])) b=tf.variable(tf.zeros([10])) y=tf.nn.softmax(tf.matmul(x,w)+b) y_=tf.placeholder( float,[none,10]) cross_entropy = - tf.reduce_sum(y_*tf.log(y)) train_step=tf.train.gradientdescentoptimizer(0.01).minimize(cross_entropy) sess=tf.session() sess.run(tf.global_variables_initializer()) for i in range(1000): batch_xs, batch_ys = mnist.train.next_batch(100) sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) This way, 100 pieces of data will be randomly sampled and used for training. Here, _xs means images and _ys means their labels. Now run the following code to check the test results. correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_:mnist.test.labels}))

28 In the first line, tf.argmax(y, 1) finds the largest value of y along axis = 1. The y matrix is None by 10 (= (None by 784) * (784 by 10)), where axis = 0 refers to the None side, row, while axis = 1 refers to the 10 side, column. "None" commonly means that any number is possible, and here, it means the number of input image data. Depending whether the maximum value of y and y_ in the first line is the same or not, it returns TRUE or FALSE as a shape of array. tf.cast will do this for [0,1,1,1,1,0,1,1... 1], by TRUE is 1 and FALSE is 0. Then, the accuracy percentage is obtained by tf.reduce_mean, which calculates the mean of the input array. Ref) MNIST data - The MNIST data-set is composed by a set of black and white images containing hand-written digits, containing more than examples for training a model, and for testing it. The MNIST data-set can be found at the MNIST database. - This data-set is ideal for most of the people who begin with pattern recognition on real examples without having to spend time on data preprocessing or formatting, two very important steps when dealing with images but expensive in time. - The images are centered in pixel frames by computing the mass center and moving it into the center of the frame. The images are like the ones shown here: - Also, the kind of learning required for this example is supervised learning; the images are labeled with the digit they represent. This is the most common form of Machine Learning.

29 - To download easily the data, you can use the script input_data.py, obtained from Google s site but uploaded to the book s github for your comodity. Simply download the code input_data.py in the same work directory where you are programming the neural network with TensorFlow. From your application you only need to import and use in the following way: import input_data mnist = input_data.read_data_sets("mnist_data", one_hot=true) - After executing these two instructions you will have the full training data-set in mnist.train and the test data-set in mnist.test. Each element is composed by an image, referenced as xs, and its corresponding label ys, to make easier to express the processing code. Remember that all data-sets, training and testing, contain xs and ys ; also, the training images are referenced in mnist.train.images and the training labels in mnist.train.labels.

30 5. Convolutional Neural Network (CNN) The convolutional Neural Network(CNN) is an innovative neural network introduced in 1998 by Yan LeCunn et al. This has led to dramatic improvements in automatic image processing and now, it is widely used in advanced machine learning models. Let s look at how to implement CNN through a simple example code. import input_data mnist = input_data.read_data_sets('mnist_data', one_hot=true) import tensorflow as tf x = tf.placeholder("float", shape=[none, 784]) y_ = tf.placeholder("float", shape=[none, 10]) x_image = tf.reshape(x, [-1,28,28,1]) In the first two lines, we load the MNIST data through tensorflow. Then the placeholder literally holds a place to make room for tensors. A reshape changes the shape of x tensor into the shape in square brackets, where the first -1 means that you did not specify what number to be input, like NONE. The second and the third numbers indicate the size(28x28) of the image data. And the last number is number of input data channel; here it has to be 1 because MNIST

31 data are gray scale images. def weight_variable(shape): initial = tf.truncated_normal(shape, stddev=0.1) return tf.variable(initial) def bias_variable(shape): initial = tf.constant(0.1, shape=shape) return tf.variable(initial) def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='same') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='same') Above codes are making functions for CNN. First two functions, weight_variable and bias_variable, are make random matrix with given shape. Conv2d is function for a convolution layer and max_pool_2x2 is function for a pooling layer. Convolution layers perform matrix multiplication among input data and weight variables. Then, in pooling layer, features are extracted by taking the largest value in each filter region. Basically, these processes are implemented by sharing all weight variables and bias of each filter through all hidden perceptrons. When the filters, convolution filters or pooling filters, pass through the image, the degree that the filter moves each time is called stride. Padding is attaching zero on boundary of input data to consider evenly about all input values. In above codes, convolution layer stride is (1x1), pooling layer stride is (2x2) and pooling filter size is (2x2). Both layers are performed with zero padding. See the below figures to understand how filtering processes work.

32 Zero padding Pooling layer ( Max

33 Above figure shows whole process that how data shapes change as data passes through each layer. After whole process, finally, input data was classified through a fully connected layer. - Full code from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('mnist_data', one_hot=true) import tensorflow as tf import matplotlib.pyplot as plt x = tf.placeholder("float", shape=[none, 784]) y_ = tf.placeholder("float", shape=[none, 10]) x_image = tf.reshape(x, [-1,28,28,1]) print("x_image=", x_image) def weight_variable(shape): initial = tf.truncated_normal(shape, stddev=0.1) return tf.variable(initial) def bias_variable(shape):

34 initial = tf.constant(0.1, shape=shape) return tf.variable(initial) def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='same') # 맨앞의 1 은한놈씩다룬다는뜻, 가운데둘은 1x1 stride 즉위나아래나한칸씩 움직인다. 이렇게하면 convolution 해도결과이미지의크기는바뀌지않겠지. 마지막은 1 은 흑백을의미하는것임. def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='same') # 커널사이즈 2x2 스트라이드 2x2 즉위아래두칸씩. 이러면결과이미지크기가반으로 1/2 x 1/2 만큼줄겠지. W_conv1 = weight_variable([5,5, 1, 32]) b_conv1 = bias_variable([32]) # 32 개의 5x5 필터를이용해서인풋개수 1 개의인풋이미지로 32 개의아웃풋이미지를 만드는필터. h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) h_pool1 = max_pool_2x2(h_conv1) #convolution 하고나서는여전히이미지크기가 28x28 이다. 풀링을하고나서 14x14 로 이미지크기가준다. 이러한이미지는 32 개가존재한다. print(x_image.get_shape()) # 이걸하면 (?,28,28,1) 이라고나오겠지. 28x28 이미지하나. print(h_conv1.get_shape()) # 이걸하면 (?,14,14,32) 라고나오겠지. 32 개의 14x14 이미지라는뜻. W_conv2 = weight_variable([5,5, 32, 64]) b_conv2 = bias_variable([64]) #32 개의이미지를훑는필터 64 개. 그필터의사이즈는 5x5. # 여기서 32 개의이미지를훑기때문에이미지가사실은 3 차원구조 14x14x32. h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) # 이걸거치고나면 64 개의 14x14 이미지가생긴다. h_pool2 = max_pool_2x2(h_conv2) #2x2 풀링을통해 7x7 이미지로변환. 64 개. print(h_conv2.get_shape()) #(?,14,14,64) print(h_pool2.get_shape()) #(?,7,7,64)

35 W_fc1 = weight_variable([7*7*64, 1024]) b_fc1 = bias_variable([1024]) # Fully connected network 을위한준비 h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) #none 을위한축을넣어서계산 모양을맞투어줌 h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) W_fc2 = weight_variable([1024, 10]) b_fc2 = bias_variable([10]) y_conv = tf.nn.softmax(tf.matmul(h_fc1,w_fc2)+b_fc2) cross_entropy = -tf.reduce_sum(y_*tf.log(y_conv)) train_step = tf.train.adamoptimizer(0.0003).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) sess = tf.session() sess.run(tf.global_variables_initializer()) Acc_train = [] Acc_test = [] acc_te = 0 for i in range(3001): batch = mnist.train.next_batch(50) sess.run(train_step, feed_dict={x: batch[0], y_: batch[1]}) # batch 는 [[ 데이터 ],[ 라벨 ]] 이렇게돼있다. 그리고데이터는 50x784 라벨은 50x10. if i % 10 == 0: acc_tr = sess.run(accuracy, feed_dict = {x:batch[0], y_:batch[1]}) acc_te = sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}) print("step %d, training accuracy %g"%(i, acc_tr),"test accuracy %g"%acc_te) Acc_train.append(acc_tr) Acc_test.append(acc_te) Project) 1. MNIST 를직접코드를돌려보고 layer 의개수와크기를변화시켜 높은정답률을보이시오. 시오. 2. 스스로이미지나데이터를구해서재미있고쓸모있는분류작업을구현하

36 6. Generative Adversarial Network A Generative Adversarial Nets (GAN) consists of two models, a discriminator and generator. The discriminator and generator compete with each other to improve performance. A training sequence involves, the generator making fake data from real data, and the discriminator classifying the fake and real data. Therefore, we train the two networks by optimizing loss function, min G max D V(D, G) = E x~pdata (x)[logd(x)] + E z~pz (z)[log (1 D(G(z)))],

37 Where D and G are the discriminator and generator, respectively, x is real data and z is latent variables, and D(x) is a probability that the input x is real data. G(z) has the same dimension as real data. First, we maximize V(D,G) by updating the parameters of the discriminator, and likewise minimize V(D,G) for the generator. It has been proven that solving the above equation produces equivalent fake and real data [1]. Proof) V(G, D) = x p data (x) log(d(x)) dx + p z (z) log (1 D(g(z))) dz z = p data (x) log(d(x)) + p g (x) log(1 D(x)) dx x Where p_g(x)dx = p_z(z)dz 가되도록 p_g 를선택한다. max D V(G, D) = p data (x) log(d G (x)) + p g (x) log(1 D G (x)) dx, x where D p data (x) G (x) = p data (x) + p g (x) 이건 D 로미분해서알게됨. 그래서 max D V(G, D) = E x~pdata (x)[logd G (x)] + E x~pg [log (1 D G (x))] p data (x) = E x~pdata (x) [log p data (x) + p g (x) ] + E p g (x) x~p g [log p data (x) + p g (x) ] p data (x) = log(2) + E x~pdata (x) [log + log (2)] log(2) p data (x) + p g (x) p g (x) + E x~pg [log + log (2)] p data (x) + p g (x) = log(4) + KL (p data p data + p g 2 = log(4) + 2 JSD(p data p g ) ) + KL (p g p data + p g ) 2 최적상태를구한다면만들어지는데이터의분포는 p_g 실재존재하는 데이터의분포 p_data 와같아진다. p_g=p_data JSD(p q) = 1 2 KL(p M) KL(q M), KL(p q) = p ilog ( p i ), M = 1 (p + q) q i 2 i

38 However, a practical GAN training differs from this theoretical process, so fake data vary widely from real data. Therefore, before programming a GAN, let s take a look at successful application first. 1) Least Square GAN A GAN uses cross-entropy for the loss function V(D,G) with sigmoid function on the output of the discriminator. In this case, since the real data and the fake data are very easy to distinguish at the beginning of training, D has a value close to 0, and the gradients become very small. Like the above GAN proof, we can get coefficients a, b, and c through solving the following optimization problems [2], min D V LSGAN (D) = 1 2 E x~p data (x)[(d(x) b) 2 ] E z~p z (z)[(d(g(z)) a) 2 ] min G V LSGAN (G) = 1 2 E x~p data (x)[(d(x) c) 2 ] E z~p z (z)[(d(g(z)) c) 2 ]. Practice 1) Find the V LSGAN for both discriminator and generator.

39 min D V LSGAN (D) = 1 2 E x~p data (x)[(d(x) b) 2 ] E z~p z (z)[(d(g(z)) a) 2 ] p_g(x)dx = p_z(z)dz p d (x)[(d(x) b) 2 ] + p g (x)[(d(x) b) 2 ] dx is optimal when min G V LSGAN (G) = 1 2 E x~p data (x)[(d(x) c) 2 ] E z~p z (z)[(d(g(z)) c) 2 ]. Define 이것을최소화하는것이 p_g =p_d 가되게하려면위의적분이 다음과같으면된다. = ((p d(x) + p g (x)) 2p g (x)) 2 p d (x) + p g (x) dx 그러므로 b-c =1 b-a=2 2) Conditional GAN [3] Let s consider that we successfully trained a GAN with the MNIST data set. A trained generator makes perfect hand written digits, but we cannot choose any particular digit. We can train the GAN with label information by conditioning the input of the discriminator and generator. In the case of MNIST, the discriminator gets images of

40 digit and labels, and the generator gets latent variables and labels for the images. 3) Deep Convolutional GAN A Deep Convolutional GAN (DCGAN) [4], which has successfully trained many data sets (especially image data), has the following structure. Here, strided convolution means any convolution with a stride larger than 2. Strided convolution downsizes input into output, and fractional-strided convolution extends input into output. For a generator with a convolutional net, fractionalstrided convolution can be used to match the output size with real data.

41 4) Fractional-strided convolution We can use fractional-strided convolution as a built-in function in tensorflow. tf.nn.conv2d_transpose() 5) Batch normalization When we train a model, the training iteration corresponds to the number of updates of the parameters, or weights. It is efficient to use GPU to update multiple data at once. Therefore, update values are averaged over multiple data. In this case, it is better to normalize values of the layers of the batch for stable training.

42 Training) Input: Values of x over a mini batch B = {x 1 m}; Parameters to be learned: γ, β Output: y i m μ B 1 x m i=1 i //mini-batch mean σ 2 B 1 m (x m i=1 i μ B ) 2 //mini-batch variance x i x i μb σ B 2 +ε y i γx i + β //normalize //scale and shift Test) x = x E[x] Var[x] + ε, E[x] E B[μ B ], Var[x] m m 1 E B[σ 2 B ] y = γx + β - ReLU ReLU(x) = { x, x > 0 0, Otherwise - Leaky ReLU LeakyReLU(x) = { cx x, x > 0, Otherwise [1] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems [2] Mao, Xudong, et al. "Least squares generative adversarial networks." 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, [3] Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets." arxiv preprint arxiv: (2014). [4] Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with

43 deep convolutional generative adversarial networks Structure of Generator r Xiv:7 Project) 3. x(t) = sin(w t) 함수를만드러내는 Generator 를 Generative Adversarial Network 을이용해서만들어내시오.

Lecture on Practical Deep Learning Statistical Physics Winter School, Pohang, January Prof. Kang-Hun Ahn.

Lecture on Practical Deep Learning Statistical Physics Winter School, Pohang, January Prof. Kang-Hun Ahn. Lecture on Practical Deep Learning Statistical Physics Winter School, Pohang, January 2018 Prof. Kang-Hun Ahn ahnkanghun@gmail.com http://deephearing.org Basic of Python Numerical Methods TensorFlow Convolutional

More information

introduction to convolutional networks using tensorflow

introduction to convolutional networks using tensorflow introduction to convolutional networks using tensorflow Jesús Fernández Bes, jfbes@ing.uc3m.es 8 de febrero de 2016 contents Install What is Tensorflow? Implementing Softmax Regression Deep Convolutional

More information

Crash Course on TensorFlow! Vincent Lepetit!

Crash Course on TensorFlow! Vincent Lepetit! Crash Course on TensorFlow Vincent Lepetit 1 TensorFlow Created by Google for easily implementing Deep Networks; Library for Python, but can also be used with C and Java; Exists for Linux, Mac OSX, Windows;

More information

(Artificial) Neural Networks in TensorFlow

(Artificial) Neural Networks in TensorFlow (Artificial) Neural Networks in TensorFlow By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Recall Supervised Learning Setup II. 2. Artificial Neural

More information

(Artificial) Neural Networks in TensorFlow

(Artificial) Neural Networks in TensorFlow (Artificial) Neural Networks in TensorFlow By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Recall Supervised Learning Setup II. 2. Artificial Neural

More information

Deep Learning In An Afternoon

Deep Learning In An Afternoon Deep Learning In An Afternoon John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2017 Deep Learning / Neural Nets Without question the biggest thing in ML and computer

More information

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI APPLIED DEEP LEARNING PROF ALEXIEI DINGLI TECH NEWS TECH NEWS HOW TO DO IT? TECH NEWS APPLICATIONS TECH NEWS TECH NEWS NEURAL NETWORKS Interconnected set of nodes and edges Designed to perform complex

More information

>TensorFlow and deep learning_

>TensorFlow and deep learning_ >TensorFlow and deep learning_ without a PhD deep Science! #Tensorflow deep Code... @martin_gorner Hello World: handwritten digits classification - MNIST? MNIST = Mixed National Institute of Standards

More information

Introduction to TensorFlow

Introduction to TensorFlow Large Scale Data Analysis Using Deep Learning (Prof. U Kang) Introduction to TensorFlow 2017.04.17 Beunguk Ahn ( beunguk.ahn@gmail.com) 1 What is TensorFlow? Consturction Phase Execution Phase Examples

More information

TensorFlow. Dan Evans

TensorFlow. Dan Evans TensorFlow Presentation references material from https://www.tensorflow.org/get_started/get_started and Data Science From Scratch by Joel Grus, 25, O Reilly, Ch. 8 Dan Evans TensorFlow www.tensorflow.org

More information

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 INF 5860 Machine learning for image classification Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 OUTLINE Deep learning frameworks TensorFlow TensorFlow graphs TensorFlow session

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Deep Learning In An Afternoon

Deep Learning In An Afternoon Deep Learning In An Afternoon John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2018 Deep Learning / Neural Nets Without question the biggest thing in ML and computer

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Tensor Flow. Tensors: n-dimensional arrays Vector: 1-D tensor Matrix: 2-D tensor

Tensor Flow. Tensors: n-dimensional arrays Vector: 1-D tensor Matrix: 2-D tensor Tensor Flow Tensors: n-dimensional arrays Vector: 1-D tensor Matrix: 2-D tensor Deep learning process are flows of tensors A sequence of tensor operations Can represent also many machine learning algorithms

More information

Generative Adversarial Networks, and Applications

Generative Adversarial Networks, and Applications Generative Adversarial Networks, and Applications Ali Mirzaei Nimish Srivastava Kwonjoon Lee Songting Xu CSE 252C 4/12/17 2/44 Outline: Generative Models vs Discriminative Models (Background) Generative

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Time Series Data I. 1.1. Deterministic II. 1.2. Stochastic III. 1.3. Dealing

More information

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

DM534 - Introduction to Computer Science

DM534 - Introduction to Computer Science Department of Mathematics and Computer Science University of Southern Denmark, Odense October 21, 2016 Marco Chiarandini DM534 - Introduction to Computer Science Training Session, Week 41-43, Autumn 2016

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Time Series Data I. 1.1. Deterministic II. 1.2. Stochastic III. 1.3. Dealing

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.

Conjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b. Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear

More information

Generative adversarial networks

Generative adversarial networks 14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Machine Learning. Boris

Machine Learning. Boris Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Gradient Descent Methods

Gradient Descent Methods Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya

More information

Singing Voice Separation using Generative Adversarial Networks

Singing Voice Separation using Generative Adversarial Networks Singing Voice Separation using Generative Adversarial Networks Hyeong-seok Choi, Kyogu Lee Music and Audio Research Group Graduate School of Convergence Science and Technology Seoul National University

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2018 Due: Friday, February 23rd, 2018, 11:55 PM Submit code and report via EEE Dropbox You should submit a

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Convolutional Networks Convolutional

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

HOMEWORK #4: LOGISTIC REGRESSION

HOMEWORK #4: LOGISTIC REGRESSION HOMEWORK #4: LOGISTIC REGRESSION Probabilistic Learning: Theory and Algorithms CS 274A, Winter 2019 Due: 11am Monday, February 25th, 2019 Submit scan of plots/written responses to Gradebook; submit your

More information

Feedforward Neural Networks

Feedforward Neural Networks Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based UNDERSTANDING CNN ADAS Tasks Self Driving Localizati on Perception Planning/ Control Driver state Vehicle Diagnosis Smart factory Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning

More information

Feedforward Neural Networks. Michael Collins, Columbia University

Feedforward Neural Networks. Michael Collins, Columbia University Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Name: Student number:

Name: Student number: UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2018 EXAMINATIONS CSC321H1S Duration 3 hours No Aids Allowed Name: Student number: This is a closed-book test. It is marked out of 35 marks. Please

More information

COMS 6100 Class Notes

COMS 6100 Class Notes COMS 6100 Class Notes Daniel Solus September 20, 2016 1 General Remarks The Lecture notes submitted by the class have been very good. Integer division seemed to be a common oversight when working the Fortran

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Sample questions for Fundamentals of Machine Learning 2018

Sample questions for Fundamentals of Machine Learning 2018 Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning  Ian Goodfellow Last updated Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

arxiv: v1 [cs.lg] 20 Apr 2017

arxiv: v1 [cs.lg] 20 Apr 2017 Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

TensorFlow: A Framework for Scalable Machine Learning

TensorFlow: A Framework for Scalable Machine Learning TensorFlow: A Framework for Scalable Machine Learning You probably Outline want to know... What is TensorFlow? Why did we create TensorFlow? How does Tensorflow Work? Example: Linear Regression Example:

More information

Math Assignment 5

Math Assignment 5 Math 2280 - Assignment 5 Dylan Zwick Fall 2013 Section 3.4-1, 5, 18, 21 Section 3.5-1, 11, 23, 28, 35, 47, 56 Section 3.6-1, 2, 9, 17, 24 1 Section 3.4 - Mechanical Vibrations 3.4.1 - Determine the period

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 2017: T-R: 11:30-1pm @ Lopata 101 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Staff: Abby Stylianou (abby@wustl.edu), Jarett Gross (jarett@wustl.edu) http://www.cse.wustl.edu/~ayan/courses/cse559a/

More information

GENERAL. CSE 559A: Computer Vision AUTOGRAD AUTOGRAD

GENERAL. CSE 559A: Computer Vision AUTOGRAD AUTOGRAD CSE 559A: Computer Vision Fall 2017: T-R: 11:30-1pm @ Lopata 101 GENERAL PSET 5 Posted. One day extension (now due on the Friday two weeks from now) You get to implement SLIC, and your own conv layer!

More information

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1 Welcome to the Machine Learning Practical Deep Neural Networks MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1 Introduction to MLP; Single Layer Networks (1) Steve Renals Machine Learning

More information

Introduction to (Convolutional) Neural Networks

Introduction to (Convolutional) Neural Networks Introduction to (Convolutional) Neural Networks Philipp Grohs Summer School DL and Vis, Sept 2018 Syllabus 1 Motivation and Definition 2 Universal Approximation 3 Backpropagation 4 Stochastic Gradient

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 26 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks SIBGRAPI 2017 Tutorial Everything you wanted to know about Deep Learning for Computer Vision but were afraid to ask Presentation content inspired by Ian Goodfellow s tutorial

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Introduction to TensorFlow

Introduction to TensorFlow Introduction to TensorFlow Oliver Dürr Datalab-Lunch Seminar Series Winterthur, 17 Nov, 2016 1 Abstract Introduc)on to TensorFlow TensorFlow is a mul/purpose open source so2ware library for numerical computa/on

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Image Processing in Numpy

Image Processing in Numpy Version: January 17, 2017 Computer Vision Laboratory, Linköping University 1 Introduction Image Processing in Numpy Exercises During this exercise, you will become familiar with image processing in Python.

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Do not tear exam apart!

Do not tear exam apart! 6.036: Final Exam: Fall 2017 Do not tear exam apart! This is a closed book exam. Calculators not permitted. Useful formulas on page 1. The problems are not necessarily in any order of di culty. Record

More information

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Emily Denton 1, Soumith Chintala 2, Arthur Szlam 2, Rob Fergus 2 1 New York University 2 Facebook AI Research Denotes equal

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Learning Lab Course 2017 (Deep Learning Practical) Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information