Learning From Data Lecture 7 Approximation Versus Generalization

Size: px
Start display at page:

Download "Learning From Data Lecture 7 Approximation Versus Generalization"

Transcription

1 Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100

2 recap: The Vapnik-Chervonenkis Bound (VC Bound) P[ E in (g) E out (g) > ǫ] 4m H (2N)e ǫ2 N/8, for an ǫ > 0. P[ E in (g) E out (g) >ǫ] 2 H e 2ǫ2 N finite H P[ E in (g) E out (g) ǫ] 1 4m H (2N)e ǫ2 N/8, for an ǫ > 0. P[ E in (g) E out (g) ǫ] 1 2 H e 2ǫ2 N finite H E out (g) E in (g)+ E out (g) E in (g)+ 1 2N log 2 H δ 8 N log 4m H(2N) finite H δ, w.p. at least 1 δ. m H (N) k 1 i=1 ( N i ) N k 1 +1 k is a break point. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 2 /22 VC dimension

3 The VC Dimension d vc m H (N) N k 1 The tightest bound is obtained with the smallest break point k. Definition [VC Dimension] d vc = k 1. The VC dimension is the largest N which can be shattered (m H (N) = 2 N ). N d vc : H could shatter our data (H can shatter some N points). N > d vc : N is a break point for H; H cannot possibl shatter our data. m H (N) N d vc +1 Nd vc E out (g) E in (g)+o ( ) d vc logn N c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 3 /22 d vc versus number of parameters

4 The VC-dimension is an Effective Number of Parameters N #Param d vc 2-D perceptron D pos. ra D pos. rectangles < pos. conve sets There are models with few parameters but infinite d vc. There are models with redundant parameters but small d vc. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 4 /22 d vc for perceptron

5 VC-dimension of the Perceptron in R d is d+1 This can be shown in two steps: 1.d vc d+1. What needs to be shown? 2.d vc d+1. What needs to be shown? (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+1 points that cannot be shattered. (c) Ever set of d+1 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+2 points that cannot be shattered. (c) Ever set of d+2 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (e) Ever set of d+2 points cannot be shattered. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 5 /22 Step 1 answer

6 VC-dimension of the Perceptron in R d is d+1 This can be shown in two steps: 1.d vc d+1. What needs to be shown? 2.d vc d+1. What needs to be shown? (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+1 points that cannot be shattered. (c) Ever set of d+1 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+2 points that cannot be shattered. (c) Ever set of d+2 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (e) Ever set of d+2 points cannot be shattered. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 6 /22 Step 2 answer

7 VC-dimension of the Perceptron in R d is d+1 This can be shown in two steps: 1.d vc d+1. What needs to be shown? 2.d vc d+1. What needs to be shown? (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+1 points that cannot be shattered. (c) Ever set of d+1 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (a) There is a set of d+1 points that can be shattered. (b) There is a set of d+2 points that cannot be shattered. (c) Ever set of d+2 points can be shattered. (d) Ever set of d+1 points cannot be shattered. (e) Ever set of d+2 points cannot be shattered. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 7 /22 d vc characterizes compleit in error bar

8 A Single Parameter Characterizes Compleit out-of-sample error E out (g) E in (g)+ 1 2N log2 H δ Error model compleit in-sample error H H E out (g) E in (g)+ 8 log4((2n)dvc +1) N δ }{{} penalt for model compleit Ω(d vc ) Error d vc out-of-sample error model compleit in-sample error VC dimension, d vc c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 8 /22 Sample compleit

9 Sample Compleit: How Man Data Points Do You Need? Set the error bar at ǫ. Solve for N: ǫ = 8 ln4((2n)dvc +1) N δ N = 8 ǫ 2ln4((2N)d vc +1) δ = O(d vc lnn) Eample. d vc = 3; error bar ǫ = 0.1; confidence 90% (δ = 0.1). A simple iterative method works well. Tring N = 1000 we get N 1 ( 4(2000) 3 ) 0.1 log We continue iterativel, and converge to N If d vc = 4, N 40000; for d vc = 5, N (N d vc, but gross overestimates) Practical Rule of Thumb: N = 10 d vc c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 9 /22 Theor versus practice

10 Theor Versus Practice The VC analsis allows us to reach outside the data for general H. a single parameter characterizes compleit of H d vc depends onl on H. E in can reach outside D to E out when d vc is finite. In Practice... The VC bound is loose. Hoeffding; m H (N) is a worst case # of dichotomies, not average case or likel case. The polnomial bound on m H (N) is loose. It is a good guide models with small d vc are good. Roughl 10 d vc eamples needed to get good generalization. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 10 /22 Test set

11 The Test Set Another wa to estimate E out (g) is using a test set to obtain E test (g). E test is better than E in : ou don t pa the price for fitting. You can use H = 1 in the Hoeffding bound with E test. Both a test and training set have variance. The training set has optimistic bias due to selection fitting the data. A test set has no bias. The price for a test set is fewer training eamples. (wh is this bad?) E test E out but now E test ma be bad. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 11 /22 Approimation versus Generalization

12 VC Bound Quantifies Approimation Versus Generalization The best H is H = {f}. You are better off buing a lotter ticket. d vc = better chance of approimating f (E in 0). d vc = better chance of generalizing to out of sample (E in E out ). E out E in +Ω(d vc ). VC analsis onl depends on H. Independent of f, P(), learning algorithm. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 12 /22 Bias-variance analsis

13 Bias-Variance Analsis Another wa to quantif the tradeoff: 1.How well can the learning approimate f....as opposed to how well did the learning approimate f in-sample (E in ). 2.How close can ou get to that approimation with a finite data set....as opposed to how close is E in to E out. Bias-variance analsis applies to squared errors (classification and regression) Bias-variance analsis can take into account the learning algorithm Different learning algorithms can have different E out when applied to the same H! c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 13 /22 Sin eample

14 A Simple Learning Problem 2 Data Points. 2 hpothesis sets: H 0 : h() = b H 1 : h() = a+b c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 14 /22 Man data sets

15 Let s Repeat the Eperiment Man Times For each data set D, ou get a different g D. So, for a fied, g D () is random value, depending on D. c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 15 /22 Average behavior

16 What s Happening on Average ḡ() ḡ() sin() sin() We can define: g D () [ ḡ() = E D g D () ] random value, depending on D 1 K (gd 1 ()+ +gd K ()) our average prediction on [ var() = E D (g D () ḡ()) 2] [ = E D g D () 2] ḡ() 2 how variable is our prediction? c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 16 /22 Error on out-of-sample test point

17 E out on Test Point for Data D f() f() E D out() E D out() g D () g D () E D out() = (g D () f()) 2 E out () = E D [ E D out () ] squared error, a random value depending on D epected E out () before seeing D c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 17 /22 bias-vardecomposition

18 The Bias-Variance Decomposition E out () = E D [ (g D () f()) 2] = E D [ g D () 2 2g D ()f()+f() 2] = E D [ g D () 2] 2ḡ()f()+f() 2 understand this; the rest is just algebra = E D [ g D () 2] ḡ() 2 +ḡ() 2 2ḡ()f()+f() 2 [ = E D g D () 2] ḡ() 2 } {{ } var() +(ḡ() f()) 2 } {{ } bias() f H E out () = bias()+var() H bias Ver small model var Ver large model f If ou take average over : E out = bias+var c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 18 /22 Back to sin eample

19 Back to H 0 and H 1 ; and, our winner is... ḡ() ḡ() sin() sin() H 0 bias = 0.50 var = 0.25 E out = 0.75 H 1 bias = 0.21 var = 1.69 E out = 1.90 c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 19 /22 2 versus 5 data points

20 Match Learning Power to Data,...Not to f 2 Data Points 5 Data Points ḡ() ḡ() ḡ() ḡ() sin() sin() sin() sin() H 0 bias = 0.50; var = E out = 0.75 H 1 bias = 0.21; var = E out = 1.90 H 0 bias = 0.50; var = 0.1. E out = 0.6 H 1 bias = 0.21; var = E out = 0.42 c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 20 /22 Learning curves

21 Learning Curves: When Does the Balance Tip? Simple Model Comple Model Epected Error E out E in Epected Error E out E in Number of Data Points, N Number of Data Points, N E out = E [E out ()] c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 21 /22 Decomposing the learning curve

22 Decomposing The Learning Curve VC Analsis Bias-Variance Analsis Epected Error generalization error E out E in Epected Error variance E out E in in-sample error bias Number of Data Points, N Number of Data Points, N Pick H that can generalize and has a good chance to fit the data Pick (H,A) to approimate f and not behave wildl after seeing the data c AM L Creator: Malik Magdon-Ismail Approimation Versus Generalization: 22 /22

Learning From Data Lecture 5 Training Versus Testing

Learning From Data Lecture 5 Training Versus Testing Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization (E in E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI

More information

Learning From Data Lecture 8 Linear Classification and Regression

Learning From Data Lecture 8 Linear Classification and Regression Learning From Data Lecture 8 Linear Classification and Regression Linear Classification Linear Regression M. Magdon-Ismail CSCI 4100/6100 recap: Approximation Versus Generalization VC Analysis E out E

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Learning From Data Lecture 13 Validation and Model Selection

Learning From Data Lecture 13 Validation and Model Selection Learning From Data Lecture 13 Validation and Model Selection The Validation Set Model Selection Cross Validation M. Magdon-Ismail CSCI 4100/6100 recap: Regularization Regularization combats the effects

More information

Learning From Data Lecture 10 Nonlinear Transforms

Learning From Data Lecture 10 Nonlinear Transforms Learning From Data Lecture 0 Nonlinear Transforms The Z-space Polynomial transforms Be careful M. Magdon-Ismail CSCI 400/600 recap: The Linear Model linear in w: makes the algorithms work linear in x:

More information

Learning From Data Lecture 3 Is Learning Feasible?

Learning From Data Lecture 3 Is Learning Feasible? Learning From Data Lecture 3 Is Learning Feasible? Outside the Data Probability to the Rescue Learning vs. Verification Selection Bias - A Cartoon M. Magdon-Ismail CSCI 4100/6100 recap: The Perceptron

More information

Learning From Data Lecture 12 Regularization

Learning From Data Lecture 12 Regularization Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented Error M. Magdon-Ismail CSCI 4100/6100 recap: Overfitting Fitting the data more than is warranted Data Target Fit

More information

Learning From Data Lecture 2 The Perceptron

Learning From Data Lecture 2 The Perceptron Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA Other Views of Learning Is Learning Feasible: A Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Plan 1.

More information

Machine Learning Foundations

Machine Learning Foundations Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

INTRODUCTION TO DIOPHANTINE EQUATIONS

INTRODUCTION TO DIOPHANTINE EQUATIONS INTRODUCTION TO DIOPHANTINE EQUATIONS In the earl 20th centur, Thue made an important breakthrough in the stud of diophantine equations. His proof is one of the first eamples of the polnomial method. His

More information

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I What We Did The Machine Learning Zoo Moving Forward M Magdon-Ismail CSCI 4100/6100 recap: Three Learning Principles Scientist 2

More information

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised

More information

Learning From Data Lecture 14 Three Learning Principles

Learning From Data Lecture 14 Three Learning Principles Learning From Data Lecture 14 Three Learning Principles Occam s Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100 recap: Validation and Cross Validation Validation Cross Validation D (N)

More information

The Force Table Introduction: Theory:

The Force Table Introduction: Theory: 1 The Force Table Introduction: "The Force Table" is a simple tool for demonstrating Newton s First Law and the vector nature of forces. This tool is based on the principle of equilibrium. An object is

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

15. Eigenvalues, Eigenvectors

15. Eigenvalues, Eigenvectors 5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does

More information

(MTH5109) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES. 1. Introduction to Curves and Surfaces

(MTH5109) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES. 1. Introduction to Curves and Surfaces (MTH509) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES DR. ARICK SHAO. Introduction to Curves and Surfaces In this module, we are interested in studing the geometr of objects. According to our favourite

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

MA123, Chapter 8: Idea of the Integral (pp , Gootman)

MA123, Chapter 8: Idea of the Integral (pp , Gootman) MA13, Chapter 8: Idea of the Integral (pp. 155-187, Gootman) Chapter Goals: Understand the relationship between the area under a curve and the definite integral. Understand the relationship between velocit

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

ORIE 4741: Learning with Big Messy Data. Generalization

ORIE 4741: Learning with Big Messy Data. Generalization ORIE 4741: Learning with Big Messy Data Generalization Professor Udell Operations Research and Information Engineering Cornell September 23, 2017 1 / 21 Announcements midterm 10/5 makeup exam 10/2, by

More information

Biostatistics in Research Practice - Regression I

Biostatistics in Research Practice - Regression I Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

Functions of One Variable Basics

Functions of One Variable Basics Functions of One Variable Basics Function Notation: = f() Variables and are placeholders for unknowns» We will cover functions of man variables during one of the in-residence QSW lectures. = independent

More information

6.4 graphs OF logarithmic FUnCTIOnS

6.4 graphs OF logarithmic FUnCTIOnS SECTION 6. graphs of logarithmic functions 9 9 learning ObjeCTIveS In this section, ou will: Identif the domain of a logarithmic function. Graph logarithmic functions. 6. graphs OF logarithmic FUnCTIOnS

More information

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Roberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 3. Separable ODE s

Roberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 3. Separable ODE s Roberto s Notes on Integral Calculus Chapter 3: Basics of differential equations Section 3 Separable ODE s What ou need to know alread: What an ODE is and how to solve an eponential ODE. What ou can learn

More information

Learning From Data Lecture 25 The Kernel Trick

Learning From Data Lecture 25 The Kernel Trick Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 400/600 recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Learning From Data Lecture 9 Logistic Regression and Gradient Descent

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression Gradient Descent M. Magdon-Ismail CSCI 4100/6100 recap: Linear Classification and Regression The linear signal:

More information

Lecture 27: More on Rotational Kinematics

Lecture 27: More on Rotational Kinematics Lecture 27: More on Rotational Kinematics Let s work out the kinematics of rotational motion if α is constant: dω α = 1 2 α dω αt = ω ω ω = αt + ω ( t ) dφ α + ω = dφ t 2 α + ωo = φ φo = 1 2 = t o 2 φ

More information

Exponential and Logarithmic Functions

Exponential and Logarithmic Functions Eponential and Logarithmic Functions 6 Figure Electron micrograph of E. Coli bacteria (credit: Mattosaurus, Wikimedia Commons) CHAPTER OUTLINE 6. Eponential Functions 6. Logarithmic Properties 6. Graphs

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical

More information

Computational Learning Theory (VC Dimension)

Computational Learning Theory (VC Dimension) Computational Learning Theory (VC Dimension) 1 Difficulty of machine learning problems 2 Capabilities of machine learning algorithms 1 Version Space with associated errors error is the true error, r is

More information

CS 6375: Machine Learning Computational Learning Theory

CS 6375: Machine Learning Computational Learning Theory CS 6375: Machine Learning Computational Learning Theory Vibhav Gogate The University of Texas at Dallas Many slides borrowed from Ray Mooney 1 Learning Theory Theoretical characterizations of Difficulty

More information

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target

More information

Systems of Linear Equations Monetary Systems Overload

Systems of Linear Equations Monetary Systems Overload Sstems of Linear Equations SUGGESTED LEARNING STRATEGIES: Shared Reading, Close Reading, Interactive Word Wall Have ou ever noticed that when an item is popular and man people want to bu it, the price

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.

Lecture Slides for INTRODUCTION TO. Machine Learning. By:  Postedited by: R. Lecture Slides for INTRODUCTION TO Machine Learning By: alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml Postedited by: R. Basili Learning a Class from Examples Class C of a family car Prediction:

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

Simplification of State Machines

Simplification of State Machines T3 Lecture Notes State Reduction hapter 9 - of Simplification of State Machines This semester has been spent developing the techniques needed to design digital circuits. Now we can compile a list of just

More information

4.7. Newton s Method. Procedure for Newton s Method HISTORICAL BIOGRAPHY

4.7. Newton s Method. Procedure for Newton s Method HISTORICAL BIOGRAPHY 4. Newton s Method 99 4. Newton s Method HISTORICAL BIOGRAPHY Niels Henrik Abel (18 189) One of the basic problems of mathematics is solving equations. Using the quadratic root formula, we know how to

More information

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38 Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and

More information

Introduction to Differential Equations. National Chiao Tung University Chun-Jen Tsai 9/14/2011

Introduction to Differential Equations. National Chiao Tung University Chun-Jen Tsai 9/14/2011 Introduction to Differential Equations National Chiao Tung Universit Chun-Jen Tsai 9/14/011 Differential Equations Definition: An equation containing the derivatives of one or more dependent variables,

More information

8.4. If we let x denote the number of gallons pumped, then the price y in dollars can $ $1.70 $ $1.70 $ $1.70 $ $1.

8.4. If we let x denote the number of gallons pumped, then the price y in dollars can $ $1.70 $ $1.70 $ $1.70 $ $1. 8.4 An Introduction to Functions: Linear Functions, Applications, and Models We often describe one quantit in terms of another; for eample, the growth of a plant is related to the amount of light it receives,

More information

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 4 The Perceptron Algorithm CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:

More information

5.4 dividing POlynOmIAlS

5.4 dividing POlynOmIAlS SECTION 5.4 dividing PolNomiAls 3 9 3 learning ObjeCTIveS In this section, ou will: Use long division to divide polnomials. Use snthetic division to divide polnomials. 5.4 dividing POlnOmIAlS Figure 1

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about

More information

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin Learning Theory Machine Learning CSE546 Carlos Guestrin University of Washington November 25, 2013 Carlos Guestrin 2005-2013 1 What now n We have explored many ways of learning from data n But How good

More information

Ridge Regression: Regulating overfitting when using many features

Ridge Regression: Regulating overfitting when using many features Ridge Regression: Regulating overfitting when using man features Emil Fox Universit of Washington April 10, 2018 Training, true, & test error vs. model complexit Overfitting if: Error Model complexit 2

More information

7 Influence Functions

7 Influence Functions 7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the

More information

Physics Lab 1 - Measurements

Physics Lab 1 - Measurements Phsics 203 - Lab 1 - Measurements Introduction An phsical science requires measurement. This lab will involve making several measurements of the fundamental units of length, mass, and time. There is no

More information

Computational Learning Theory

Computational Learning Theory 1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number

More information

Chemistry 141 Laboratory Lab Lecture Notes for Kinetics Lab Dr Abrash

Chemistry 141 Laboratory Lab Lecture Notes for Kinetics Lab Dr Abrash Q: What is Kinetics? Chemistr 141 Laborator Lab Lecture Notes for Kinetics Lab Dr Abrash Kinetics is the stud of rates of reaction, i.e., the stud of the factors that control how quickl a reaction occurs.

More information

Polynomial and Rational Functions

Polynomial and Rational Functions Polnomial and Rational Functions Figure -mm film, once the standard for capturing photographic images, has been made largel obsolete b digital photograph. (credit film : modification of work b Horia Varlan;

More information

5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph.

5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph. . Zeros Eample 1. At the right we have drawn the graph of the polnomial = ( 1) ( 2) ( 4). Argue that the form of the algebraic formula allows ou to see right awa where the graph is above the -ais, where

More information

CSE 546 Midterm Exam, Fall 2014

CSE 546 Midterm Exam, Fall 2014 CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,

More information

Learning From Data Lecture 26 Kernel Machines

Learning From Data Lecture 26 Kernel Machines Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M Magdon-Ismail CSCI 4100/6100 recap: The Kernel Allows Us to Bypass Z-space

More information

5.6 RATIOnAl FUnCTIOnS. Using Arrow notation. learning ObjeCTIveS

5.6 RATIOnAl FUnCTIOnS. Using Arrow notation. learning ObjeCTIveS CHAPTER PolNomiAl ANd rational functions learning ObjeCTIveS In this section, ou will: Use arrow notation. Solve applied problems involving rational functions. Find the domains of rational functions. Identif

More information

Polynomial approximation and Splines

Polynomial approximation and Splines Polnomial approimation and Splines 1. Weierstrass approimation theorem The basic question we ll look at toda is how to approimate a complicated function f() with a simpler function P () f() P () for eample,

More information

4 The Cartesian Coordinate System- Pictures of Equations

4 The Cartesian Coordinate System- Pictures of Equations The Cartesian Coordinate Sstem- Pictures of Equations Concepts: The Cartesian Coordinate Sstem Graphs of Equations in Two Variables -intercepts and -intercepts Distance in Two Dimensions and the Pthagorean

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

February 11, JEL Classification: C72, D43, D44 Keywords: Discontinuous games, Bertrand game, Toehold, First- and Second- Price Auctions

February 11, JEL Classification: C72, D43, D44 Keywords: Discontinuous games, Bertrand game, Toehold, First- and Second- Price Auctions Eistence of Mied Strateg Equilibria in a Class of Discontinuous Games with Unbounded Strateg Sets. Gian Luigi Albano and Aleander Matros Department of Economics and ELSE Universit College London Februar

More information

Computational Learning Theory

Computational Learning Theory 09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997

More information

Finding Limits Graphically and Numerically. An Introduction to Limits

Finding Limits Graphically and Numerically. An Introduction to Limits 8 CHAPTER Limits and Their Properties Section Finding Limits Graphicall and Numericall Estimate a it using a numerical or graphical approach Learn different was that a it can fail to eist Stud and use

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Chapter 18 Quadratic Function 2

Chapter 18 Quadratic Function 2 Chapter 18 Quadratic Function Completed Square Form 1 Consider this special set of numbers - the square numbers or the set of perfect squares. 4 = = 9 = 3 = 16 = 4 = 5 = 5 = Numbers like 5, 11, 15 are

More information

D u f f x h f y k. Applying this theorem a second time, we have. f xx h f yx k h f xy h f yy k k. f xx h 2 2 f xy hk f yy k 2

D u f f x h f y k. Applying this theorem a second time, we have. f xx h f yx k h f xy h f yy k k. f xx h 2 2 f xy hk f yy k 2 93 CHAPTER 4 PARTIAL DERIVATIVES We close this section b giving a proof of the first part of the Second Derivatives Test. Part (b) has a similar proof. PROOF OF THEOREM 3, PART (A) We compute the second-order

More information

Polynomial and Rational Functions

Polynomial and Rational Functions Name Date Chapter Polnomial and Rational Functions Section.1 Quadratic Functions Objective: In this lesson ou learned how to sketch and analze graphs of quadratic functions. Important Vocabular Define

More information

5.7 Start Thinking. 5.7 Warm Up. 5.7 Cumulative Review Warm Up

5.7 Start Thinking. 5.7 Warm Up. 5.7 Cumulative Review Warm Up .7 Start Thinking Graph the linear inequalities < + and > 9 on the same coordinate plane. What does the area shaded for both inequalities represent? What does the area shaded for just one of the inequalities

More information

HÉNON HEILES HAMILTONIAN CHAOS IN 2 D

HÉNON HEILES HAMILTONIAN CHAOS IN 2 D ABSTRACT HÉNON HEILES HAMILTONIAN CHAOS IN D MODELING CHAOS & COMPLEXITY 008 YOUVAL DAR PHYSICS DAR@PHYSICS.UCDAVIS.EDU Chaos in two degrees of freedom, demonstrated b using the Hénon Heiles Hamiltonian

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Microarra Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 5 Linear Regression dr. Petr Nazarov 14-1-213 petr.nazarov@crp-sante.lu Statistical data analsis in Ecel. 5. Linear regression OUTLINE Lecture

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

Time-Frequency Analysis: Fourier Transforms and Wavelets

Time-Frequency Analysis: Fourier Transforms and Wavelets Chapter 4 Time-Frequenc Analsis: Fourier Transforms and Wavelets 4. Basics of Fourier Series 4.. Introduction Joseph Fourier (768-83) who gave his name to Fourier series, was not the first to use Fourier

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm

More information

Finding Limits Graphically and Numerically. An Introduction to Limits

Finding Limits Graphically and Numerically. An Introduction to Limits 60_00.qd //0 :05 PM Page 8 8 CHAPTER Limits and Their Properties Section. Finding Limits Graphicall and Numericall Estimate a it using a numerical or graphical approach. Learn different was that a it can

More information

HTF: Ch4 B: Ch4. Linear Classifiers. R Greiner Cmput 466/551

HTF: Ch4 B: Ch4. Linear Classifiers. R Greiner Cmput 466/551 HTF: Ch4 B: Ch4 Linear Classifiers R Greiner Cmput 466/55 Outline Framework Eact Minimize Mistakes Perceptron Training Matri inversion LMS Logistic Regression Ma Likelihood Estimation MLE of P Gradient

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data

More information

Vectors and the Geometry of Space

Vectors and the Geometry of Space Chapter 12 Vectors and the Geometr of Space Comments. What does multivariable mean in the name Multivariable Calculus? It means we stud functions that involve more than one variable in either the input

More information

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell

CS340 Machine learning Lecture 4 Learning theory. Some slides are borrowed from Sebastian Thrun and Stuart Russell CS340 Machine learning Lecture 4 Learning theory Some slides are borrowed from Sebastian Thrun and Stuart Russell Announcement What: Workshop on applying for NSERC scholarships and for entry to graduate

More information

Time-Frequency Analysis: Fourier Transforms and Wavelets

Time-Frequency Analysis: Fourier Transforms and Wavelets Chapter 4 Time-Frequenc Analsis: Fourier Transforms and Wavelets 4. Basics of Fourier Series 4.. Introduction Joseph Fourier (768-83) who gave his name to Fourier series, was not the first to use Fourier

More information

Active Learning and Optimized Information Gathering

Active Learning and Optimized Information Gathering Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Reading. 4. Affine transformations. Required: Watt, Section 1.1. Further reading:

Reading. 4. Affine transformations. Required: Watt, Section 1.1. Further reading: Reading Required: Watt, Section.. Further reading: 4. Affine transformations Fole, et al, Chapter 5.-5.5. David F. Rogers and J. Alan Adams, Mathematical Elements for Computer Graphics, 2 nd Ed., McGraw-Hill,

More information

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total

More information

Math 20 Spring 2005 Final Exam Practice Problems (Set 2)

Math 20 Spring 2005 Final Exam Practice Problems (Set 2) Math 2 Spring 2 Final Eam Practice Problems (Set 2) 1. Find the etreme values of f(, ) = 2 2 + 3 2 4 on the region {(, ) 2 + 2 16}. 2. Allocation of Funds: A new editor has been allotted $6, to spend on

More information

Representation of Functions by Power Series. Geometric Power Series

Representation of Functions by Power Series. Geometric Power Series 60_0909.qd //0 :09 PM Page 669 SECTION 9.9 Representation of Functions b Power Series 669 The Granger Collection Section 9.9 JOSEPH FOURIER (768 80) Some of the earl work in representing functions b power

More information

Problems and results for the ninth week Mathematics A3 for Civil Engineering students

Problems and results for the ninth week Mathematics A3 for Civil Engineering students Problems and results for the ninth week Mathematics A3 for Civil Engineering students. Production line I of a factor works 0% of time, while production line II works 70% of time, independentl of each other.

More information

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14 Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

Estimators as Random Variables

Estimators as Random Variables Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until

More information