CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Similar documents
Reading Group on Deep Learning Session 1

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

CSC321 Lecture 5: Multilayer Perceptrons

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Introduction to Machine Learning

4. Multilayer Perceptrons

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Introduction to Neural Networks

CSCI567 Machine Learning (Fall 2018)

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural Networks (Part 1) Goals for the lecture

CSC 578 Neural Networks and Deep Learning

Midterm: CS 6375 Spring 2015 Solutions

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Neural Networks and Deep Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Artificial Neural Networks

Artificial Intelligence

Neural Networks Task Sheet 2. Due date: May

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Logistic Regression & Neural Networks

Neural Networks and Deep Learning.

1 What a Neural Network Computes

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Artificial Neural Networks. MGS Lecture 2

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Artificial Neural Networks

Machine Learning Lecture 14

Final Examination CS 540-2: Introduction to Artificial Intelligence

Final Examination CS540-2: Introduction to Artificial Intelligence

Multilayer Perceptron

Machine Learning

CSC321 Lecture 8: Optimization

Course 395: Machine Learning - Lectures

AI Programming CS F-20 Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Introduction to Machine Learning Spring 2018 Note Neural Networks

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Lecture 4: Perceptrons and Multilayer Perceptrons

Input layer. Weight matrix [ ] Output layer

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

epochs epochs

Neural Nets Supervised learning

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Neural Networks biological neuron artificial neuron 1

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Machine Learning (CSE 446): Neural Networks

Advanced statistical methods for data analysis Lecture 2

Jakub Hajic Artificial Intelligence Seminar I

Multilayer Perceptrons (MLPs)

Midterm: CS 6375 Spring 2018

Neural Networks and the Back-propagation Algorithm

Deep Feedforward Networks

Multilayer Perceptrons and Backpropagation

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

1 Machine Learning Concepts (16 points)

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Machine Learning Lecture 10

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

ECE521 Lecture 7/8. Logistic Regression

Multilayer Perceptron

Stochastic gradient descent; Classification

Artificial Neural Networks

Artificial Neural Networks Examination, June 2005

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Artifical Neural Networks

Lab 5: 16 th April Exercises on Neural Networks

CSC321 Lecture 7: Optimization

Chapter ML:VI (continued)

Neural Networks DWML, /25

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Lecture 5: Logistic Regression. Neural Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

COMS 4771 Introduction to Machine Learning. Nakul Verma

Statistical NLP for the Web

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

CSC 411 Lecture 10: Neural Networks

Midterm. Introduction to Machine Learning. CS 189 Spring You have 1 hour 20 minutes for the exam.

Revision: Neural Network

Deep Feedforward Networks

y(x n, w) t n 2. (1)

Introduction to Deep Learning

Machine Learning, Midterm Exam

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

ECS171: Machine Learning

Transcription:

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya at Porter s! (Especially now that UCSD, in their infinite wisdom, kicked them out of campus...what were they thinking???) There are 5 problems: Make sure you have all of them - AFTER YOU ARE TOLD TO START! Read each question carefully. Remain calm at all times! Problem Type Points Score 1 True/False 15 2 Short Answer 20 3 Multiple Choice 10 4 The Delta Rule 10 5 Forward/Backwards Propagation 15 Total 70 1

Problem 1: True/False (15 pts) (15 pts: +1 for correct, -0.5 for incorrect, 0 for no answer) If you would like to justify an answer, feel free. Similar to learning in neural networks with the backpropagation procedure, the perceptron learning algorithm also ensures that the network output will near the target at each iteration. Following the percpetron learning algorithm, a perceptron is guaranteed to perfectly learn a given linearly separable data set within a finite number of training steps. 1 The sigmoid function, y = g(x) = of the input, x, given the output, y. 1+e wt x may be simply interpreted as the probability It is best to have as many hidden units as there are patterns to be learned by a multilayer neural network. Robbie Jacob s adaptive learning rate method resulted in a different learning rate for every weight in the network. The backpropagation procedure is a powerful optimization technique that can be applied to hidden activation functions like sigmoid, tanh and binary threshold. Stochastic gradient descent will typically provide a more accurate estimate of the gradient of a loss function than the full gradient calculated over all examples - that is why this method is generally preferred. Overfitting occurs when the model learns the regularities present only in the training data, or in other words, the model fits the sampling error of the training set. In backpropagation learning, we should start with a small learning rate and slowly increase it during the learning process. People use the Rectified Linear Unit (ReLU) as an activation function in deep networks because 1) it works; and 2) it makes computing the slope trivial. While implementing backpropagation, it is a mistake to compute the deltas for a layer, change the weights, and then propagate the deltas back to the next layer. Unfortunately, minibatch learning is difficult to parallelize. In a deep neural network, while the error surface may be very complicated and nonconvex, locally, it may be well-approximated by a quadratic surface. A convolutional neural network learns features with shared weights (filters) in order to reduce the number of free parameters. One of the biggest puzzles in machine learning is who hid the hidden layers, and why. Wherever they are, they are probably buried deep, very deep. Some suspect Wally did it. 2

Problem 2: Short answer (20 pts) Only a very brief explanation is necessary! a) (2 pts) Explain why dropout in a neural network acts as a regularizer. b) (2 pts) Explain why backpropagation of the deltas is a linear operation. c) (3 pts) Describe two distinct advantages of stochastic gradient descent over the batch method. d) (2 pts) Fill in the value for w in this example of gradient descent in E(w). Calculate the weight for Iteration 2 of gradient descent where the step-size is η = 1.0 and the momentum coefficient is 0.5. Assume the momentum is initialized t0 0.0. Iteration w w E 0 1.0 1.0 1 2.0 0.5 2 0.25 e) (2 pts) Explain why we should use weight decay when training a neural network. f) (3 pts) A graduate student is competing in the ImageNet Challenge with 1000 classes, however, he is puzzled as to why his network doesn t work. He has two tanh hidden units in the final layer before the 1000-way output, but does not think this is a problem, since he has many units and layers leading up to this point. Explain the error in his thinking. 3

g) (4 pts) In the Efficient Backprop paper, preprocessing of the input data is recommended. Illustrate this process by starting with an elongated, oval-shaped cloud of points tilted at about 45 degrees,and showing effect of the mean cancellation step, the PCA step, and the variance scaling step. (so you should end up with 4 pictures from start to finish). h) (2 pts) What is wrong with using the logistic sigmoid in the hidden layers of a deep network? Give at least two reasons why it should be avoided. 4

Problem 3: Multiple Choice (10 pts, 2 each) a. Which of the following is the delta rule for the hidden units? i. δ i = (t i y i ) ii. δ j = k w jk δ k iii. δ j = y (a j ) k w jk δ k b. In a convolutional neural network, the image is of dimension x = 100 100 and one of the learned filters is of dimension 10 10 with a stride of 5. The resulting feature map of this filter over the image will have dimension, i. 21 21 ii. 19 19 iii. 5 5 iv. 20 20 v. 100 5 5 c. Assume we have an error function E and modify our cost function C by adding an L2-weight penalty, or specifically C = E + λ 2 j w2 j. The cost function is minimized with respect to w i when, i. w i = 1 E λ w i ii. w i = + E w i iii. w i = λ E 2 w i iv. w i = E w i 2 v. w i = 0 which describes how our weight magnitude should vary. HINT: recall that C is minimized when its derivative is 0. d. The best objective function for classification is i. Sum Squared Error ii. Cross-Entropy iii. Rectified linear unit iv. Logistic v. Funny tanh 5

e. Suppose we have a 3-dimensional input x = (x 1, x 2, x 3 ) connected to 4 neurons with the exact same weights w = (w 1, w 2, w 3 ) where: x 1 = 2, w 1 = 1, x 2 = 1, w 2 = 0.5, x 3 = 1, w 3 = 0, and the bias b = 0.5. We calculate the output of each of the four neurons using the input x, weights w and bias b. If y 1 = 0.95, y 2 = 3, y 3 = 1, y 4 = 3, then a valid guess for the neuron types of y 1, y 2, y 3 and y 4 is: i Rectified Linear, Logistic Sigmoid, Binary Threshold, Linear ii Linear, Binary Threshold, Logistic Sigmoid, Rectified Linear iii Logistic Sigmoid, Linear, Binary Threshold, Rectified Linear iv Rectified Linear, Linear, Binary Threshold, Logistic Sigmoid 6

Problem 4: The delta rule (10pts) Derive the delta rule for the case of a single layer network with a linear output and the sum squared error loss function. To make this as simple as possible, assume we are doing this for one input-output pattern p (then we can simply add these up over all of the patterns). So, starting with: SSE p = 1 2 (tp y p ) 2 (1) and y p = d w j x j (2) j=1 derive that: SSEp w i = (t p y p )x i (3) 7

Problem 5: Forward/Backward Propagation. (15 pts) Consider the simple neural network in Figure 1 with the corresponding initial weights and biases in Figure 2. Weights are indicated as numbers along connections and biases are indicated as numbers within a node. All units use the Sigmoid Activation function g(a) = f(a) = 1 and the cost function is the Cross-Entropy Loss. 1+e a On the following page, fill in the three panels. 1. (4 pts) In the first panel, record the a i s into each of the nodes. 2. (3 pts) In the second panel, record z i = g(a i ) for each of the nodes. You may use the table of approximate Sigmoid Activation values on the next page. 3. (5 pts) In the third panel, compute the δ for each node. Do this for training example, X = (1.0, 1.0) with target of t = 0.85. Update the weights. (3 pts) Given the δ s you computed, use gradient descent to calculate the new weight from hidden unit 1 (H1) to the Output (OUT) (currently 1.0). Use gradient descent with no momentum and learning rate η = 1.0. Figure 1. Figure 2. 8

9