Learning From Data Lecture 7 Approximation Versus Generalization
|
|
- Dana Richards
- 5 years ago
- Views:
Transcription
1 Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100
2 recap: The Vapnik-Chervonenkis Bound (VC Bound) P[ E in (g) E out (g) > ǫ] 4m H (2N)e ǫ2 N/8, for an ǫ > 0. P[ E in (g) E out (g) >ǫ] 2 H e 2ǫ2 N finite H P[ E in (g) E out (g) ǫ] 1 4m H (2N)e ǫ2 N/8, for an ǫ > 0. P[ E in (g) E out (g) ǫ] 1 2 H e 2ǫ2 N finite H E out (g) E in (g)+ E out (g) E in (g)+ 1 2N log 2 H δ 8 N log 4m H(2N) finite H δ, w.p. at least 1 δ. m H (N) k 1 i=1 ( N i ) N k 1 +1 k is a break point. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 2 /22 VC dimension
3 The VC Dimension d vc m H (N) N k 1 The tightest bound is obtained with the smallest break point k. Definition [VC Dimension] d vc = k 1. The VC dimension is the largest N which can be shattered (m H (N) = 2 N ). N d vc : H could shatter our data (H can shatter some N points). N > d vc : N is a break point for H; H cannot possibl shatter our data. m H (N) N d vc +1 Nd vc E out (g) E in (g)+o ( ) d vc logn N c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 3 /22 d vc versus number of parameters
4 The VC-dimension is an Effective Number of Parameters N #Param d vc 2-D perceptron D pos. ra D pos. rectangles < pos. conve sets There are models with few parameters but infinite d vc. There are models with redundant parameters but small d vc. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 4 /22 d vc for perceptron
5 VC-dimension of the Perceptron in R d is d+1 This can be shown in two steps: 1.d vc d+1. What needs to be shown? 2.d vc d+1. What needs to be shown? (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+1 points that cannot be shattered. (c) Ever set of d+1 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+2 points that cannot be shattered. (c) Ever set of d+2 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (e) Ever set of d+2 points cannot be shattered. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 5 /22 Step 1 answer
6 VC-dimension of the Perceptron in R d is d+1 This can be shown in two steps: 1.d vc d+1. What needs to be shown? 2.d vc d+1. What needs to be shown? (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+1 points that cannot be shattered. (c) Ever set of d+1 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+2 points that cannot be shattered. (c) Ever set of d+2 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (e) Ever set of d+2 points cannot be shattered. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 6 /22 Step 2 answer
7 VC-dimension of the Perceptron in R d is d+1 This can be shown in two steps: 1.d vc d+1. What needs to be shown? 2.d vc d+1. What needs to be shown? (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+1 points that cannot be shattered. (c) Ever set of d+1 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+2 points that cannot be shattered. (c) Ever set of d+2 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (e) Ever set of d+2 points cannot be shattered. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 7 /22 d vc characterizes compleit in error bar
8 A Single Parameter Characterizes Compleit out-of-sample error E out (g) E in (g)+ 1 2N log2 H δ Error model compleit in-sample error H H E out (g) E in (g)+ 8 log4((2n)dvc +1) N δ }{{} penalt for model compleit Ω(d vc ) Error d vc out-of-sample error model compleit in-sample error VC dimension, d vc c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 8 /22 Sample compleit
9 Sample Compleit: How Man Data Points Do You Need? Set the error bar at ǫ. Solve for N: ǫ = 8 ln4((2n)dvc +1) N δ N = 8 ǫ 2ln4((2N)d vc +1) δ = O(d vc lnn) Eample. d vc = 3; error bar ǫ = 0.1; confidence 90% (δ = 0.1). A simple iterative method works well. Tring N = 1000 we get N 1 ( 4(2000) 3 ) 0.1 log We continue iterativel, and converge to N If d vc = 4, N 40000; for d vc = 5, N (N d vc, but gross overestimates) Practical Rule of Thumb: N = 10 d vc c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 9 /22 Theor versus practice
10 Theor Versus Practice The VC analsis allows us to reach outside the data for general H. a single parameter characterizes compleit of H d vc depends onl on H. E in can reach outside D to E out when d vc is finite. In Practice... The VC bound is loose. Hoeffding; m H (N) is a worst case # of dichotomies, not average case or likel case. The polnomial bound on m H (N) is loose. It is a good guide models with small d vc are good. Roughl 10 d vc eamples needed to get good generalization. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 10 /22 Test set
11 The Test Set Another wa to estimate E out (g) is using a test set to obtain E test (g). E test is better than E in : ou don t pa the price for fitting. You can use H = 1 in the Hoeffding bound with E test. Both a test and training set have variance. The training set has optimistic bias due to selection fitting the data. A test set has no bias. The price for a test set is fewer training eamples. (wh is this bad?) E test E out but now E test ma be bad. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 11 /22 Approimation versus Generalization
12 VC Bound Quantifies Approimation Versus Generalization The best H is H = {f}. You are better off buing a lotter ticket. d vc = better chance of approimating f (E in 0). d vc = better chance of generalizing to out of sample (E in E out ). E out E in +Ω(d vc ). VC analsis onl depends on H. Independent of f, P(), learning algorithm. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 12 /22 Bias-variance analsis
13 Bias-Variance Analsis Another wa to quantif the tradeoff: 1.How well can the learning approimate f....as opposed to how well did the learning approimate f in-sample (E in ). 2.How close can ou get to that approimation with a finite data set....as opposed to how close is E in to E out. Bias-variance analsis applies to squared errors (classification and regression) Bias-variance analsis can take into account the learning algorithm Different learning algorithms can have different E out when applied to the same H! c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 13 /22 Sin eample
14 A Simple Learning Problem 2 Data Points. 2 hpothesis sets: H 0 : h() = b H 1 : h() = a+b c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 14 /22 Man data sets
15 Let s Repeat the Eperiment Man Times For each data set D, ou get a different g D. So, for a fied, g D () is random value, depending on D. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 15 /22 Average behavior
16 What s Happening on Average ḡ() ḡ() sin() sin() We can define: g D () [ ḡ() = E D g D () ] random value, depending on D 1 K (gd 1 ()+ +gd K ()) our average prediction on [ var() = E D (g D () ḡ()) 2] [ = E D g D () 2] ḡ() 2 how variable is our prediction? c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 16 /22 Error on out-of-sample test point
17 E out on Test Point for Data D f() f() E D out() E D out() g D () g D () E D out() = (g D () f()) 2 E out () = E D [ E D out () ] squared error, a random value depending on D epected E out () before seeing D c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 17 /22 bias-vardecomposition
18 The Bias-Variance Decomposition E out () = E D [ (g D () f()) 2] = E D [ g D () 2 2g D ()f()+f() 2] = E D [ g D () 2] 2ḡ()f()+f() 2 understand this; the rest is just algebra = E D [ g D () 2] ḡ() 2 +ḡ() 2 2ḡ()f()+f() 2 [ = E D g D () 2] ḡ() 2 } {{ } var() +(ḡ() f()) 2 } {{ } bias() f H E out () = bias()+var() H bias Ver small model var Ver large model f If ou take average over : E out = bias+var c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 18 /22 Back to sin eample
19 Back to H 0 and H 1 ; and, our winner is... ḡ() ḡ() sin() sin() H 0 bias = 0.50 var = 0.25 E out = 0.75 H 1 bias = 0.21 var = 1.69 E out = 1.90 c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 19 /22 2 versus 5 data points
20 Match Learning Power to Data,...Not to f 2 Data Points 5 Data Points ḡ() ḡ() ḡ() ḡ() sin() sin() sin() sin() H 0 bias = 0.50; var = E out = 0.75 H 1 bias = 0.21; var = E out = 1.90 H 0 bias = 0.50; var = 0.1. E out = 0.6 H 1 bias = 0.21; var = E out = 0.42 c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 20 /22 Learning curves
21 Learning Curves: When Does the Balance Tip? Simple Model Comple Model Epected Error E out E in Epected Error E out E in Number of Data Points, N Number of Data Points, N E out = E [E out ()] c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 21 /22 Decomposing the learning curve
22 Decomposing The Learning Curve VC Analsis Bias-Variance Analsis Epected Error generalization error E out E in Epected Error variance E out E in in-sample error bias Number of Data Points, N Number of Data Points, N Pick H that can generalize and has a good chance to fit the data Pick (H,A) to approimate f and not behave wildl after seeing the data c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 22 /22
Learning From Data Lecture 5 Training Versus Testing
Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization (E in E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI
More informationLearning From Data Lecture 8 Linear Classification and Regression
Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear Regression M. Magdon-Ismail CSCI 4100/6100 recap: Approximation Versus Generalization VC Analysis E out E
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationLearning From Data Lecture 13 Validation and Model Selection
Learning From Data Lecture 13 Validation and Model Selection The Validation Set Model Selection Cross Validation M. Magdon-Ismail CSCI 4100/6100 recap: Regularization Regularization combats the effects
More informationLearning From Data Lecture 10 Nonlinear Transforms
Learning From Data Lecture 0 Nonlinear Transforms The Z-space Polynomial transforms Be careful M. Magdon-Ismail CSCI 400/600 recap: The Linear Model linear in w: makes the algorithms work linear in x:
More informationLearning From Data Lecture 3 Is Learning Feasible?
Learning From Data Lecture 3 Is Learning Feasible? Outside the Data Probability to the Rescue Learning vs. Verification Selection Bias - A Cartoon M. Magdon-Ismail CSCI 4100/6100 recap: The Perceptron
More informationLearning From Data Lecture 12 Regularization
Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented Error M. Magdon-Ismail CSCI 4100/6100 recap: Overfitting Fitting the data more than is warranted Data Target Fit
More informationLearning From Data Lecture 2 The Perceptron
Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1.
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More informationINTRODUCTION TO DIOPHANTINE EQUATIONS
INTRODUCTION TO DIOPHANTINE EQUATIONS In the earl 20th centur, Thue made an important breakthrough in the stud of diophantine equations. His proof is one of the first eamples of the polnomial method. His
More informationLearning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I
Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2
More information1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised
More informationLearning From Data Lecture 14 Three Learning Principles
Learning From Data Lecture 14 Three Learning Principles Occam s Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100 recap: Validation and Cross Validation Validation Cross Validation D (N)
More informationThe Force Table Introduction: Theory:
1 The Force Table Introduction: "The Force Table" is a simple tool for demonstrating Newton s First Law and the vector nature of forces. This tool is based on the principle of equilibrium. An object is
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More information15. Eigenvalues, Eigenvectors
5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does
More information(MTH5109) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES. 1. Introduction to Curves and Surfaces
(MTH509) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES DR. ARICK SHAO. Introduction to Curves and Surfaces In this module, we are interested in studing the geometr of objects. According to our favourite
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationMA123, Chapter 8: Idea of the Integral (pp , Gootman)
MA13, Chapter 8: Idea of the Integral (pp. 155-187, Gootman) Chapter Goals: Understand the relationship between the area under a curve and the definite integral. Understand the relationship between velocit
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationORIE 4741: Learning with Big Messy Data. Generalization
ORIE 4741: Learning with Big Messy Data Generalization Professor Udell Operations Research and Information Engineering Cornell September 23, 2017 1 / 21 Announcements midterm 10/5 makeup exam 10/2, by
More informationBiostatistics in Research Practice - Regression I
Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationFunctions of One Variable Basics
Functions of One Variable Basics Function Notation: = f() Variables and are placeholders for unknowns» We will cover functions of man variables during one of the in-residence QSW lectures. = independent
More information6.4 graphs OF logarithmic FUnCTIOnS
SECTION 6. graphs of logarithmic functions 9 9 learning ObjeCTIveS In this section, ou will: Identif the domain of a logarithmic function. Graph logarithmic functions. 6. graphs OF logarithmic FUnCTIOnS
More informationLinear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More informationRoberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 3. Separable ODE s
Roberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 3 Separable ODE s What ou need to know alread: What an ODE is and how to solve an eponential ODE. What ou can learn
More informationLearning From Data Lecture 25 The Kernel Trick
Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 400/600 recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationLearning From Data Lecture 9 Logistic Regression and Gradient Descent
Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression Gradient Descent M. Magdon-Ismail CSCI 4100/6100 recap: Linear Classification and Regression The linear signal:
More informationLecture 27: More on Rotational Kinematics
Lecture 27: More on Rotational Kinematics Let s work out the kinematics of rotational motion if α is constant: dω α = 1 2 α dω αt = ω ω ω = αt + ω ( t ) dφ α + ω = dφ t 2 α + ωo = φ φo = 1 2 = t o 2 φ
More informationExponential and Logarithmic Functions
Eponential and Logarithmic Functions 6 Figure Electron micrograph of E. Coli bacteria (credit: Mattosaurus, Wikimedia Commons) CHAPTER OUTLINE 6. Eponential Functions 6. Logarithmic Properties 6. Graphs
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationComputational Learning Theory (VC Dimension)
Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is
More informationCS 6375: Machine Learning Computational Learning Theory
CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationSystems of Linear Equations Monetary Systems Overload
Sstems of Linear Equations SUGGESTED LEARNING STRATEGIES: Shared Reading, Close Reading, Interactive Word Wall Have ou ever noticed that when an item is popular and man people want to bu it, the price
More informationLecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.
Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationSimplification of State Machines
T3 Lecture Notes State Reduction hapter 9 - of Simplification of State Machines This semester has been spent developing the techniques needed to design digital circuits. Now we can compile a list of just
More information4.7. Newton s Method. Procedure for Newton s Method HISTORICAL BIOGRAPHY
4. Newton s Method 99 4. Newton s Method HISTORICAL BIOGRAPHY Niels Henrik Abel (18 189) One of the basic problems of mathematics is solving equations. Using the quadratic root formula, we know how to
More informationLearning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38
Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and
More informationIntroduction to Differential Equations. National Chiao Tung University Chun-Jen Tsai 9/14/2011
Introduction to Differential Equations National Chiao Tung Universit Chun-Jen Tsai 9/14/011 Differential Equations Definition: An equation containing the derivatives of one or more dependent variables,
More information8.4. If we let x denote the number of gallons pumped, then the price y in dollars can $ $1.70 $ $1.70 $ $1.70 $ $1.
8.4 An Introduction to Functions: Linear Functions, Applications, and Models We often describe one quantit in terms of another; for eample, the growth of a plant is related to the amount of light it receives,
More informationCSC321 Lecture 4 The Perceptron Algorithm
CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:
More information5.4 dividing POlynOmIAlS
SECTION 5.4 dividing PolNomiAls 3 9 3 learning ObjeCTIveS In this section, ou will: Use long division to divide polnomials. Use snthetic division to divide polnomials. 5.4 dividing POlnOmIAlS Figure 1
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationLearning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin
Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good
More informationRidge Regression: Regulating overfitting when using many features
Ridge Regression: Regulating overfitting when using man features Emil Fox Universit of Washington April 10, 2018 Training, true, & test error vs. model complexit Overfitting if: Error Model complexit 2
More information7 Influence Functions
7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the
More informationPhysics Lab 1 - Measurements
Phsics 203 - Lab 1 - Measurements Introduction An phsical science requires measurement. This lab will involve making several measurements of the fundamental units of length, mass, and time. There is no
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationChemistry 141 Laboratory Lab Lecture Notes for Kinetics Lab Dr Abrash
Q: What is Kinetics? Chemistr 141 Laborator Lab Lecture Notes for Kinetics Lab Dr Abrash Kinetics is the stud of rates of reaction, i.e., the stud of the factors that control how quickl a reaction occurs.
More informationPolynomial and Rational Functions
Polnomial and Rational Functions Figure -mm film, once the standard for capturing photographic images, has been made largel obsolete b digital photograph. (credit film : modification of work b Horia Varlan;
More information5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph.
. Zeros Eample 1. At the right we have drawn the graph of the polnomial = ( 1) ( 2) ( 4). Argue that the form of the algebraic formula allows ou to see right awa where the graph is above the -ais, where
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationLearning From Data Lecture 26 Kernel Machines
Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M Magdon-Ismail CSCI 4100/6100 recap: The Kernel Allows Us to Bypass Z-space
More information5.6 RATIOnAl FUnCTIOnS. Using Arrow notation. learning ObjeCTIveS
CHAPTER PolNomiAl ANd rational functions learning ObjeCTIveS In this section, ou will: Use arrow notation. Solve applied problems involving rational functions. Find the domains of rational functions. Identif
More informationPolynomial approximation and Splines
Polnomial approimation and Splines 1. Weierstrass approimation theorem The basic question we ll look at toda is how to approimate a complicated function f() with a simpler function P () f() P () for eample,
More information4 The Cartesian Coordinate System- Pictures of Equations
The Cartesian Coordinate Sstem- Pictures of Equations Concepts: The Cartesian Coordinate Sstem Graphs of Equations in Two Variables -intercepts and -intercepts Distance in Two Dimensions and the Pthagorean
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationFebruary 11, JEL Classification: C72, D43, D44 Keywords: Discontinuous games, Bertrand game, Toehold, First- and Second- Price Auctions
Eistence of Mied Strateg Equilibria in a Class of Discontinuous Games with Unbounded Strateg Sets. Gian Luigi Albano and Aleander Matros Department of Economics and ELSE Universit College London Februar
More informationComputational Learning Theory
09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
More informationFinding Limits Graphically and Numerically. An Introduction to Limits
8 CHAPTER Limits and Their Properties Section Finding Limits Graphicall and Numericall Estimate a it using a numerical or graphical approach Learn different was that a it can fail to eist Stud and use
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationChapter 18 Quadratic Function 2
Chapter 18 Quadratic Function Completed Square Form 1 Consider this special set of numbers - the square numbers or the set of perfect squares. 4 = = 9 = 3 = 16 = 4 = 5 = 5 = Numbers like 5, 11, 15 are
More informationD u f f x h f y k. Applying this theorem a second time, we have. f xx h f yx k h f xy h f yy k k. f xx h 2 2 f xy hk f yy k 2
93 CHAPTER 4 PARTIAL DERIVATIVES We close this section b giving a proof of the first part of the Second Derivatives Test. Part (b) has a similar proof. PROOF OF THEOREM 3, PART (A) We compute the second-order
More informationPolynomial and Rational Functions
Name Date Chapter Polnomial and Rational Functions Section.1 Quadratic Functions Objective: In this lesson ou learned how to sketch and analze graphs of quadratic functions. Important Vocabular Define
More information5.7 Start Thinking. 5.7 Warm Up. 5.7 Cumulative Review Warm Up
.7 Start Thinking Graph the linear inequalities < + and > 9 on the same coordinate plane. What does the area shaded for both inequalities represent? What does the area shaded for just one of the inequalities
More informationHÉNON HEILES HAMILTONIAN CHAOS IN 2 D
ABSTRACT HÉNON HEILES HAMILTONIAN CHAOS IN D MODELING CHAOS & COMPLEXITY 008 YOUVAL DAR PHYSICS DAR@PHYSICS.UCDAVIS.EDU Chaos in two degrees of freedom, demonstrated b using the Hénon Heiles Hamiltonian
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationSTATISTICAL DATA ANALYSIS IN EXCEL
Microarra Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 5 Linear Regression dr. Petr Nazarov 14-1-213 petr.nazarov@crp-sante.lu Statistical data analsis in Ecel. 5. Linear regression OUTLINE Lecture
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationTime-Frequency Analysis: Fourier Transforms and Wavelets
Chapter 4 Time-Frequenc Analsis: Fourier Transforms and Wavelets 4. Basics of Fourier Series 4.. Introduction Joseph Fourier (768-83) who gave his name to Fourier series, was not the first to use Fourier
More informationComputational Learning Theory
Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm
More informationFinding Limits Graphically and Numerically. An Introduction to Limits
60_00.qd //0 :05 PM Page 8 8 CHAPTER Limits and Their Properties Section. Finding Limits Graphicall and Numericall Estimate a it using a numerical or graphical approach. Learn different was that a it can
More informationHTF: Ch4 B: Ch4. Linear Classifiers. R Greiner Cmput 466/551
HTF: Ch4 B: Ch4 Linear Classifiers R Greiner Cmput 466/55 Outline Framework Eact Minimize Mistakes Perceptron Training Matri inversion LMS Logistic Regression Ma Likelihood Estimation MLE of P Gradient
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More informationVectors and the Geometry of Space
Chapter 12 Vectors and the Geometr of Space Comments. What does multivariable mean in the name Multivariable Calculus? It means we stud functions that involve more than one variable in either the input
More informationCS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell
CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate
More informationTime-Frequency Analysis: Fourier Transforms and Wavelets
Chapter 4 Time-Frequenc Analsis: Fourier Transforms and Wavelets 4. Basics of Fourier Series 4.. Introduction Joseph Fourier (768-83) who gave his name to Fourier series, was not the first to use Fourier
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationReading. 4. Affine transformations. Required: Watt, Section 1.1. Further reading:
Reading Required: Watt, Section.. Further reading: 4. Affine transformations Fole, et al, Chapter 5.-5.5. David F. Rogers and J. Alan Adams, Mathematical Elements for Computer Graphics, 2 nd Ed., McGraw-Hill,
More informationLinear regression Class 25, Jeremy Orloff and Jonathan Bloom
1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total
More informationMath 20 Spring 2005 Final Exam Practice Problems (Set 2)
Math 2 Spring 2 Final Eam Practice Problems (Set 2) 1. Find the etreme values of f(, ) = 2 2 + 3 2 4 on the region {(, ) 2 + 2 16}. 2. Allocation of Funds: A new editor has been allotted $6, to spend on
More informationRepresentation of Functions by Power Series. Geometric Power Series
60_0909.qd //0 :09 PM Page 669 SECTION 9.9 Representation of Functions b Power Series 669 The Granger Collection Section 9.9 JOSEPH FOURIER (768 80) Some of the earl work in representing functions b power
More informationProblems and results for the ninth week Mathematics A3 for Civil Engineering students
Problems and results for the ninth week Mathematics A3 for Civil Engineering students. Production line I of a factor works 0% of time, while production line II works 70% of time, independentl of each other.
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationEstimators as Random Variables
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until
More information