Lecture 15: Neural Networks Theory
|
|
- Lenard Cobb
- 6 years ago
- Views:
Transcription
1 EECS 598-5: Theoretical Foundations of Machine Learning Fall 25 Lecture 5: Neural Networks Theory Lecturer: Matus Telgarsky Scribes: Xintong Wang, Editors: Yuan Zhuang 5. Neural Networks Definition and Overview 5.. Neural Networks Definition Definition 5. (Neural Network). A neural network is a function class defined by a DAG as follows: Input Hidd Input # Input #2 Output Input #3 Note: A neural network is not necessarily fully connected. Input nodes: g i (x) = x i. Internal nodes: g i (x) = σ( g parts of g i wgg(x) i + w). i Typical choices of activation function σ: ReLu: σ R (r) = max{, r}. ReLu 8 6 σ r Sigmoid: σ S (r) = +exp( r). 5-
2 Lecture 5: Neural Networks Theory 5-2 Sigmoid.8.6 σ r Output: Neural networks outputs value at output node and it can be multivariate. N := {fixed DAG with varying weights} = {x f(x; w) : w R p }, where p is the total number of weights and thresholds. Remark: Currt practical neural networks have some variations Theory Overview We have three aspects of theory (for classification specific setting): Approximation/Represtation How well the function class fits the problem? Known: Neural networks can fit arbitrary continuous functions. Unknown: Are good represtations learnable? What should be the size and the number of s of a good represtation? Optimization How well you optimize risk over the function class (on a finite sample)? Little theory in this aspect: training O() size neural network is NP-hard; in practice, we use gradit desct (on a nonconvex function). Estimation Differce in risk betwe sample and distribution. There is a nice VC theory, which is very difficult. It is still unclear what this VC theory means in practice. Remark: In terms of non-classification problems, many successes of neural networks are for unsupervised learning.
3 Lecture 5: Neural Networks Theory Represtation/Approximation 5.2. Neural Network with One Internal Node Suppose we have only one internal(non-input) node in a neural network. With activation function σ(r) = [r ], we can exactly represt the indicator function for a halfspace. Specifically, the following diagram shows the hyperplane {x R d : w, x = w } and a classification achieved by the corresponding halfspace indicator x σ( w, x w ). y w x = w x Now suppose σ is bounded and continuous, with lim r σ(r) = and lim r σ(r) =. Consequtly, there exists M such that r > M, σ(r) [ ɛ, + ɛ] and r < M, σ(r) [ ɛ, ɛ] Thus, giv any τ >, the function f(x) := σ( M τ ( w, x w )) approximates the earlier halfspace indicator in the following sse: { [ ɛ, +ɛ] wh w, x w τ, f(x) [ ɛ, + ɛ] wh w, x w + τ. On the other hand, f is not controlled in any way wh w, x w < τ. This approximate halfspace indicator f is depicted as follows. y 2τ w x w = τ w x
4 Lecture 5: Neural Networks Theory Neural Network with k Halfspaces Next note how a polyhedron P can be approximated by adding another to the preceding construction. Suppose the polyhedron is specified via k halfspaces, and each of these is approximated as before with some τ (, 4k ), giving a function g i. Input Halfspace...k Output Input Output Now consider adding a final node defined as g(x) := σ( M τ ( i g i(x) (k /2))). To see that this approximates an indicator on P, consider any x which does not fall within the error region of width 2τ of any of the preceding approximate halfspace indicators. If this x also fails to land within at least one of the halfspaces, th i g i(x) (k )( + τ) < k /2 τ, thus g(x) [ ɛ, +ɛ].on the other hand, if x is in the intersection of the halfspaces, th i g i(x) k( τ) > k /2 + τ, thus g(x) [ ɛ, + ɛ]. Homework Problem: based on what we have said so far, for any continuous f : [, ] d [, ] and any ɛ >, there exists a 3 (+input ) neural network g with f(x) g(x) dx ɛ. [,] d 5.3 Estimation/Statistics The main results in this section will be VC dimsion bounds which are polynomial in the number of s L. Before giving these bounds, note that it is not clear how to specify a meaningful Rademacher complexity bound which is polynomial in the number of s. For instance, consider a fixed DAG represting the layout of a neural network, and suppose the weights w R p obey a constraint w B (which is natural for instance in the case of linear separators). Th by placing weight B L along each edge of a chain in the network, the constraint w B is obeyed, and it can be se as the Lipschitz constant of the function computed by the network and thus its Rademacher complexity are upper bounded by ( B L )L VC dimsion of Linear Threshold Network(LTN) Notations S: a fixed set of n examples.
5 Lecture 5: Neural Networks Theory 5-5 k: number of non-input nodes in the neural network. d is the total number of input nodes. p: total number of parameters in the neural network. p i is the number of parameters involving node i. p = i p i. Moreover, suppose p n. L: total number of s in the neural network. D i := All output values possible for non-input nodes up to i, meaning D i = {(g d+ (S; w), g d+2 (S; w),..., g d+i (S; w)) : w R p }. This definition is the key idea of the proof: rather than keeping track of the possible outputs of the output node of the network, the outputs of all nodes are considered. It will now be shown by induction that D i i ( ) pj. Consider the base case i = ; by Sauer s Lemma, since n p p, ( ) p D. j= Now consider some i >. Ev though S is fixed, it is no longer the case that (non-input) node i is receiving a fixed set of inputs, since it is pottial taking input from non-input nodes. However, for any fixed outputs of the parts to node i, Sauer-Shelah can once again be used (once again using n p p i ). Combining this with the inductive hypothesis, ( ) pi D i p p j possible inputs to i ) pi D i ( p i i ( ) pj. j= p j To complete the VC dimsion calculation, first note that every distinct output for the network also incremts the value of D k, Π N (n) sup S =n D k, where N dotes this class of networks. Secondly, note that D k does not actually depd on S, the upper bound holds for all S with S n. As such we have, Π N (n) sup D k S =n k ( ) pi i= p i p i k () pi = () p. To finish the calculation, it is possible to use a guess-and-check. In order to show that the VC dimsion is at most d, it suffices to show that Π N (d) < 2 d. As such, suppose n cp ln(p) for some c. By the above calculation, ln(π N (n)) p ln(ecp ln(p)) = p ln(p) + p( + ln(c) + ln(ln(p))). Since this is strictly less than cp ln(p) ln(2) for sufficitly large c, it follows that the VC dimsion is O(p ln(p)). i=
6 Lecture 5: Neural Networks Theory Further VC dimsion results (without proof) The following table summarizes worst case VC dimsion bounds giv various conditions. There are two key points here: first, the linear threshold network really did not gain much power (whereas other networks do), and secondly the choice of σ is esstial. σ r [r ] piecewise polynomial convex for x <, concave for x >, and satisfying limit properties lim r σ(r) = and lim r σ(r) = Worst case VC dimsion With one non-input node (k = ), this is perceptron, thus VC is Θ(p). In geral, the VC dimsion is Õ(p ln(p)); s did not matter much in this example! Ω(pL) and Õ(pL2 ); this also holds in the piecewise affine case (in particular for the popular choice σ(r) = max{, r}). It s possible for VC dimsion to be infinite! (Note however that other members of this class, for instance the sigmoid σ(r) = /(+ exp( r)), have VC dimsion polynomial in p and L.)
Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationVC dimension and Model Selection
VC dimension and Model Selection Overview PAC model: review VC dimension: Definition Examples Sample: Lower bound Upper bound!!! Model Selection Introduction to Machine Learning 2 PAC model: Setting A
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationLecture 25: Subgradient Method and Bundle Methods April 24
IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationThe Decision List Machine
The Decision List Machine Marina Sokolova SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 sokolova@site.uottawa.ca Nathalie Japkowicz SITE, University of Ottawa Ottawa, Ont. Canada,K1N-6N5 nat@site.uottawa.ca
More informationCh.6 Deep Feedforward Networks (2/3)
Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU10701 11. Learning Theory Barnabás Póczos Learning Theory We have explored many ways of learning from data But How good is our classifier, really? How much data do we
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationHW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.
HW1 solutions Exercise 1 (Some sets of probability distributions.) Let x be a real-valued random variable with Prob(x = a i ) = p i, i = 1,..., n, where a 1 < a 2 < < a n. Of course p R n lies in the standard
More informationTHE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY
THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY Dan A. Simovici UMB, Doctoral Summer School Iasi, Romania What is Machine Learning? The Vapnik-Chervonenkis Dimension Probabilistic Learning Potential
More informationSome Statistical Properties of Deep Networks
Some Statistical Properties of Deep Networks Peter Bartlett UC Berkeley August 2, 2018 1 / 22 Deep Networks Deep compositions of nonlinear functions h = h m h m 1 h 1 2 / 22 Deep Networks Deep compositions
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationNN V: The generalized delta learning rule
NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More information1 Learning Linear Separators
10-601 Machine Learning Maria-Florina Balcan Spring 2015 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being from {0, 1} n or
More informationLearning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38
Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationStatistical Learning Learning From Examples
Statistical Learning Learning From Examples We want to estimate the working temperature range of an iphone. We could study the physics and chemistry that affect the performance of the phone too hard We
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More information1 Learning Linear Separators
8803 Machine Learning Theory Maria-Florina Balcan Lecture 3: August 30, 2011 Plan: Perceptron algorithm for learning linear separators. 1 Learning Linear Separators Here we can think of examples as being
More informationAustin Mohr Math 704 Homework 6
Austin Mohr Math 704 Homework 6 Problem 1 Integrability of f on R does not necessarily imply the convergence of f(x) to 0 as x. a. There exists a positive continuous function f on R so that f is integrable
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationVapnik-Chervonenkis Dimension of Neural Nets
P. L. Bartlett and W. Maass: Vapnik-Chervonenkis Dimension of Neural Nets 1 Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More informationFast Rates for Estimation Error and Oracle Inequalities for Model Selection
Fast Rates for Estimation Error and Oracle Inequalities for Model Selection Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@cs.berkeley.edu
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationGradient Descent Training Rule: The Details
Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More information5.1 Learning using Polynomial Threshold Functions
CS 395T Computational Learning Theory Lecture 5: September 17, 2007 Lecturer: Adam Klivans Scribe: Aparajit Raghavan 5.1 Learning using Polynomial Threshold Functions 5.1.1 Recap Definition 1 A function
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationConvex optimization COMS 4771
Convex optimization COMS 4771 1. Recap: learning via optimization Soft-margin SVMs Soft-margin SVM optimization problem defined by training data: w R d λ 2 w 2 2 + 1 n n [ ] 1 y ix T i w. + 1 / 15 Soft-margin
More informationComputational Learning Theory - Hilary Term : Learning Real-valued Functions
Computational Learning Theory - Hilary Term 08 8 : Learning Real-valued Functions Lecturer: Varun Kanade So far our focus has been on learning boolean functions. Boolean functions are suitable for modelling
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationComputational Learning Theory
Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm
More informationThe Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationMinimax risk bounds for linear threshold functions
CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationLecture 8. Strong Duality Results. September 22, 2008
Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation
More informationMATH4406 Assignment 5
MATH4406 Assignment 5 Patrick Laub (ID: 42051392) October 7, 2014 1 The machine replacement model 1.1 Real-world motivation Consider the machine to be the entire world. Over time the creator has running
More information1 Strict local optimality in unconstrained optimization
ORF 53 Lecture 14 Spring 016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Thursday, April 14, 016 When in doubt on the accuracy of these notes, please cross check with the instructor s
More informationLecture 1: Convex Sets January 23
IE 521: Convex Optimization Instructor: Niao He Lecture 1: Convex Sets January 23 Spring 2017, UIUC Scribe: Niao He Courtesy warning: These notes do not necessarily cover everything discussed in the class.
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationNearly-tight VC-dimension bounds for piecewise linear neural networks
Proceedings of Machine Learning Research vol 65:1 5, 2017 Nearly-tight VC-dimension bounds for piecewise linear neural networks Nick Harvey Christopher Liaw Abbas Mehrabian Department of Computer Science,
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationChapter 2 Convex Analysis
Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,
More informationHomework 4, 5, 6 Solutions. > 0, and so a n 0 = n + 1 n = ( n+1 n)( n+1+ n) 1 if n is odd 1/n if n is even diverges.
2..2(a) lim a n = 0. Homework 4, 5, 6 Solutions Proof. Let ɛ > 0. Then for n n = 2+ 2ɛ we have 2n 3 4+ ɛ 3 > ɛ > 0, so 0 < 2n 3 < ɛ, and thus a n 0 = 2n 3 < ɛ. 2..2(g) lim ( n + n) = 0. Proof. Let ɛ >
More informationBaum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions
Baum s Algorithm Learns Intersections of Halfspaces with respect to Log-Concave Distributions Adam R Klivans UT-Austin klivans@csutexasedu Philip M Long Google plong@googlecom April 10, 2009 Alex K Tang
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationVapnik-Chervonenkis Dimension of Neural Nets
Vapnik-Chervonenkis Dimension of Neural Nets Peter L. Bartlett BIOwulf Technologies and University of California at Berkeley Department of Statistics 367 Evans Hall, CA 94720-3860, USA bartlett@stat.berkeley.edu
More informationLecture 13: Introduction to Neural Networks
Lecture 13: Introduction to Neural Networks Instructor: Aditya Bhaskara Scribe: Dietrich Geisler CS 5966/6966: Theory of Machine Learning March 8 th, 2017 Abstract This is a short, two-line summary of
More informationClassification: The PAC Learning Framework
Classification: The PAC Learning Framework Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 5 Slides adapted from Eli Upfal Machine Learning: Jordan Boyd-Graber Boulder Classification:
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationAnalysis III. Exam 1
Analysis III Math 414 Spring 27 Professor Ben Richert Exam 1 Solutions Problem 1 Let X be the set of all continuous real valued functions on [, 1], and let ρ : X X R be the function ρ(f, g) = sup f g (1)
More information1 Boosting. COS 511: Foundations of Machine Learning. Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/2003
COS 511: Foundations of Machine Learning Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/003 1 Boosting Theorem 1 With probablity 1, f co (H), and θ >0 then Pr D yf (x) 0] Pr S yf (x) θ]+o 1 (m)ln(
More informationConvex Analysis and Optimization Chapter 2 Solutions
Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationarxiv:math/ v1 [math.co] 3 Sep 2000
arxiv:math/0009026v1 [math.co] 3 Sep 2000 Max Min Representation of Piecewise Linear Functions Sergei Ovchinnikov Mathematics Department San Francisco State University San Francisco, CA 94132 sergei@sfsu.edu
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationA Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation
1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,
More informationComputational Learning Theory for Artificial Neural Networks
Computational Learning Theory for Artificial Neural Networks Martin Anthony and Norman Biggs Department of Statistical and Mathematical Sciences, London School of Economics and Political Science, Houghton
More informationLecture 2: Uniform Entropy
STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationQF101: Quantitative Finance August 22, Week 1: Functions. Facilitator: Christopher Ting AY 2017/2018
QF101: Quantitative Finance August 22, 2017 Week 1: Functions Facilitator: Christopher Ting AY 2017/2018 The chief function of the body is to carry the brain around. Thomas A. Edison 1.1 What is a function?
More information