Machine Learning Foundations

Size: px
Start display at page:

Download "Machine Learning Foundations"

Transcription

1 Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 0/22

2 1 When Can Machines Learn? 2 Wh Can Machines Learn? 3 How Can Machines Learn? Roadmap Lecture 12: Nonlinear Transform nonlinear via nonlinear feature transform Φ plus linear with price of model compleit 4 How Can Machines Learn Better? Lecture 13: Hazard of Overfitting What is Overfitting? The Role of Noise and Data Size Deterministic Noise Dealing with Overfitting Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 1/22

3 What is Overfitting? Bad Generalization regression for R with N = 5 eamples target f () = 2nd order polnomial label n = f ( n ) + ver small noise linear regression in Z-space + Φ = 4th order polnomial unique solution passing all eamples = E in (g) = 0 E out (g) huge Data Target Fit bad generalization: low E in, high E out Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/22

4 What is Overfitting? Bad Generalization and Overfitting take d VC = 1126 for learning: bad generalization (E out - E in ) large switch from d VC = d VC to d VC = 1126: overfitting E in, E out switch from d VC = d VC to d VC = 1: underfitting E in, E out Error d vc out-of-sample error model compleit in-sample error VC dimension, d vc bad generalization: low E in, high E out ; overfitting: lower E in, higher E out Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/22

5 What is Overfitting? Cause of Overfitting: A Driving Analog Data Target Fit good fit = overfit learning overfit use ecessive d VC noise limited data size N driving commit a car accident drive too fast bump road limited observations about road condition net: how does noise & data size affect overfitting? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/22

6 What is Overfitting? Fun Time Based on our discussion, for data of fied size, which of the following situation is relativel of the lowest risk of overfitting? 1 small noise, fitting from small d VC to median d VC 2 small noise, fitting from small d VC to large d VC 3 large noise, fitting from small d VC to median d VC 4 large noise, fitting from small d VC to large d VC Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/22

7 What is Overfitting? Fun Time Based on our discussion, for data of fied size, which of the following situation is relativel of the lowest risk of overfitting? 1 small noise, fitting from small d VC to median d VC 2 small noise, fitting from small d VC to large d VC 3 large noise, fitting from small d VC to median d VC 4 large noise, fitting from small d VC to large d VC Reference Answer: 1 Two causes of overfitting are noise and ecessive d VC. So if both are relativel under control, the risk of overfitting is smaller. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/22

8 The Role of Noise and Data Size Case Stud (1/2) 10-th order target function + noise 50-th order target function noiselessl Data Target Data Target overfitting from best g 2 H 2 to best g 10 H 10? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/22

9 The Role of Noise and Data Size Case Stud (2/2) 10-th order target function + noise 50-th order target function noiselessl Data 2nd Order Fit 10th Order Fit Data 2nd Order Fit 10th Order Fit g 2 H 2 g 10 H 10 E in E out g 2 H 2 g 10 H 10 E in E out overfitting from g 2 to g 10? both es! Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 7/22

10 The Role of Noise and Data Size Iron of Two Learners Data 2nd Order Fit 10th Order Fit learner Overfit: pick g 10 H 10 learner Restrict: pick g 2 H 2 when both know that target = 10th R gives up abilit to fit Data Target but R wins in E out a lot! philosoph: concession for advantage? :-) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/22

11 The Role of Noise and Data Size Learning Curves Revisited H 2 H 10 Epected Error E out E in Epected Error E out Number of Data Points, N E in Number of Data Points, N H 10 : lower E out when N, but much larger generalization error for small N gra area : O overfits! (E in, E out ) R alwas wins in E out if N small! Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 9/22

12 The Role of Noise and Data Size The No Noise Case Data 2nd Order Fit 10th Order Fit learner Overfit: pick g 10 H 10 Data Target learner Restrict: pick g 2 H 2 when both know that there is no noise R still wins is there reall no noise? target compleit acts like noise Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 10/22

13 The Role of Noise and Data Size Fun Time When having limited data, in which of the following case would learner R perform better than learner O? 1 limited data from a 10-th order target function with some noise 2 limited data from a 1126-th order target function with no noise 3 limited data from a 1126-th order target function with some noise 4 all of the above Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 11/22

14 The Role of Noise and Data Size Fun Time When having limited data, in which of the following case would learner R perform better than learner O? 1 limited data from a 10-th order target function with some noise 2 limited data from a 1126-th order target function with no noise 3 limited data from a 1126-th order target function with some noise 4 all of the above Reference Answer: 4 We discussed about 1 and 2, but ou shall be able to generalize :-) that R also wins in the more difficult case of 3. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 11/22

15 Deterministic Noise A Detailed Eperiment = f () + ɛ Gaussian ( Qf ) α q q, σ 2 q=0 } {{ } f () Gaussian iid noise ɛ with level σ 2 some uniform distribution on f () with compleit level Q f data size N Data Target goal: overfit level for different (N, σ 2 ) and (N, Q f )? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 12/22

16 Deterministic Noise The Overfit Measure Data 2nd Order Fit 10th Order Fit Data Target g 2 H 2 g 10 H 10 E in (g 10 ) E in (g 2 ) for sure overfit measure E out (g 10 ) E out (g 2 ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/22

17 Deterministic Noise The Results impact of σ 2 versus N impact of Q f versus N Noise Level, σ Number of Data Points, N fied Q f = Target Compleit, Qf Number of Data Points, N fied σ 2 = ring a bell? :-) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/22

18 impact of σ 2 versus N: stochastic noise Noise Level, σ Deterministic Noise Impact of Noise and Data Size Number of Data Points, N impact of Q f versus N: deterministic noise Target Compleit, Qf Number of Data Points, N four reasons of serious overfitting: data size N stochastic noise deterministic noise ecessive power overfit overfit overfit overfit overfitting easil happens Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 15/22

19 Deterministic Noise Deterministic Noise if f / H: something of f cannot be captured b H deterministic noise : difference between best h H and f acts like stochastic noise not new to CS: pseudo-random generator difference to stochastic noise: depends on H fied for a given f h philosoph: when teaching a kid, perhaps better not to use eamples from a complicated target function? :-) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 16/22

20 Deterministic Noise Fun Time Consider the target function being sin(1126) for [0, 2π]. When is uniforml sampled from the range, and we use all possible linear hpotheses h() = w to approimate the target function with respect to the squared error, what is the level of deterministic noise for each? 1 sin(1126) 2 sin(1126) 3 sin(1126) + 4 sin(1126) 1126 Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 17/22

21 Deterministic Noise Fun Time Consider the target function being sin(1126) for [0, 2π]. When is uniforml sampled from the range, and we use all possible linear hpotheses h() = w to approimate the target function with respect to the squared error, what is the level of deterministic noise for each? 1 sin(1126) 2 sin(1126) 3 sin(1126) + 4 sin(1126) 1126 Reference Answer: 1 You can tr a few different w and convince ourself that the best hpothesis h is h () = 0. The deterministic noise is the difference between f and h. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 17/22

22 Dealing with Overfitting Driving Analog Revisited learning overfit use ecessive d VC noise limited data size N start from simple model data cleaning/pruning data hinting regularization validation driving commit a car accident drive too fast bump road limited observations about road condition drive slowl use more accurate road information eploit more road information put the brakes monitor the dashboard all ver practical techniques to combat overfitting Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 18/22

23 Dealing with Overfitting Data Cleaning/Pruning if detect the outlier 5 at the top b too close to other, or too far from other wrong b current classifier... possible action 1: correct the label (data cleaning) possible action 2: remove the eample (data pruning) possibl helps, but effect varies Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 19/22

24 Dealing with Overfitting Data Hinting slightl shifted/rotated digits carr the same meaning possible action: add virtual eamples b shifting/rotating the given digits (data hinting) possibl helps, but watch out virtual eample not iid P(, )! Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 20/22

25 Dealing with Overfitting Fun Time Assume we know that f () is smmetric for some 1D regression application. That is, f () = f ( ). One possibilit of using the knowledge is to consider smmetric hpotheses onl. On the other hand, ou can also generate virtual eamples from the original data {( n, n )} as hints. What virtual eamples suit our needs best? 1 {( n, n )} 2 {( n, n )} 3 {( n, n )} 4 {(2 n, 2 n )} Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/22

26 Dealing with Overfitting Fun Time Assume we know that f () is smmetric for some 1D regression application. That is, f () = f ( ). One possibilit of using the knowledge is to consider smmetric hpotheses onl. On the other hand, ou can also generate virtual eamples from the original data {( n, n )} as hints. What virtual eamples suit our needs best? 1 {( n, n )} 2 {( n, n )} 3 {( n, n )} 4 {(2 n, 2 n )} Reference Answer: 3 We want the virtual eamples to encode the invariance when. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/22

27 Dealing with Overfitting Summar 1 When Can Machines Learn? 2 Wh Can Machines Learn? 3 How Can Machines Learn? Lecture 12: Nonlinear Transform 4 How Can Machines Learn Better? Lecture 13: Hazard of Overfitting What is Overfitting? lower E in but higher E out The Role of Noise and Data Size overfitting easil happens! Deterministic Noise what H cannot capture acts like noise Dealing with Overfitting data cleaning/pruning/hinting, and more net: putting the brakes with regularization Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/22

Machine Learning Foundations

Machine Learning Foundations Machine Learning Foundations ( 機器學習基石 ) Lecture 4: Feasibility of Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

More information

Machine Learning Foundations

Machine Learning Foundations Machine Learning Foundations ( 機器學習基石 ) Lecture 8: Noise and Error Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Machine Learning Foundations

Machine Learning Foundations Machine Learning Foundations ( 機器學習基石 ) Lecture 11: Linear Models for Classification Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan

More information

Array 10/26/2015. Department of Computer Science & Information Engineering. National Taiwan University

Array 10/26/2015. Department of Computer Science & Information Engineering. National Taiwan University Array 10/26/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Array 0/15 Primitive

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 15: Matrix Factorization Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 7: Blending and Bagging Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University (

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技巧 ) Lecture 6: Kernel Models for Regression Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技巧 ) Lecture 5: SVM and Logistic Regression Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 5: Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 2: Dual Support Vector Machine Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

More information

Polymorphism 11/09/2015. Department of Computer Science & Information Engineering. National Taiwan University

Polymorphism 11/09/2015. Department of Computer Science & Information Engineering. National Taiwan University Polymorphism 11/09/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Polymorphism

More information

Encapsulation 10/26/2015. Department of Computer Science & Information Engineering. National Taiwan University

Encapsulation 10/26/2015. Department of Computer Science & Information Engineering. National Taiwan University 10/26/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) 0/20 Recall: Basic

More information

Inheritance 11/02/2015. Department of Computer Science & Information Engineering. National Taiwan University

Inheritance 11/02/2015. Department of Computer Science & Information Engineering. National Taiwan University Inheritance 11/02/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Inheritance

More information

Basic Java OOP 10/12/2015. Department of Computer Science & Information Engineering. National Taiwan University

Basic Java OOP 10/12/2015. Department of Computer Science & Information Engineering. National Taiwan University Basic Java OOP 10/12/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Basic

More information

Learning From Data Lecture 7 Approximation Versus Generalization

Learning From Data Lecture 7 Approximation Versus Generalization Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure

1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised

More information

Feedforward Neural Networks

Feedforward Neural Networks Chapter 4 Feedforward Neural Networks 4. Motivation Let s start with our logistic regression model from before: P(k d) = softma k =k ( λ(k ) + w d λ(k, w) ). (4.) Recall that this model gives us a lot

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims

CS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target

More information

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem

Overview. IAML: Linear Regression. Examples of regression problems. The Regression Problem 3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples

More information

CSC 411: Lecture 02: Linear Regression

CSC 411: Lecture 02: Linear Regression CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression

More information

Learning From Data Lecture 12 Regularization

Learning From Data Lecture 12 Regularization Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented Error M. Magdon-Ismail CSCI 4100/6100 recap: Overfitting Fitting the data more than is warranted Data Target Fit

More information

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as:

Ridge Regression 1. to which some random noise is added. So that the training labels can be represented as: CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge

More information

CLASSIFICATION (DISCRIMINATION)

CLASSIFICATION (DISCRIMINATION) CLASSIFICATION (DISCRIMINATION) Caren Marzban http://www.nhn.ou.edu/ marzban Generalities This lecture relies on the previous lecture to some etent. So, read that one first. Also, as in the previous lecture,

More information

15. Eigenvalues, Eigenvectors

15. Eigenvalues, Eigenvectors 5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does

More information

INTRODUCTION TO DIOPHANTINE EQUATIONS

INTRODUCTION TO DIOPHANTINE EQUATIONS INTRODUCTION TO DIOPHANTINE EQUATIONS In the earl 20th centur, Thue made an important breakthrough in the stud of diophantine equations. His proof is one of the first eamples of the polnomial method. His

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

The Force Table Introduction: Theory:

The Force Table Introduction: Theory: 1 The Force Table Introduction: "The Force Table" is a simple tool for demonstrating Newton s First Law and the vector nature of forces. This tool is based on the principle of equilibrium. An object is

More information

4.7. Newton s Method. Procedure for Newton s Method HISTORICAL BIOGRAPHY

4.7. Newton s Method. Procedure for Newton s Method HISTORICAL BIOGRAPHY 4. Newton s Method 99 4. Newton s Method HISTORICAL BIOGRAPHY Niels Henrik Abel (18 189) One of the basic problems of mathematics is solving equations. Using the quadratic root formula, we know how to

More information

Polynomial approximation and Splines

Polynomial approximation and Splines Polnomial approimation and Splines 1. Weierstrass approimation theorem The basic question we ll look at toda is how to approimate a complicated function f() with a simpler function P () f() P () for eample,

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear

More information

Linear correlation. Contents. 1 Linear correlation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College

Linear correlation. Contents. 1 Linear correlation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College Introductor Statistics Lectures Linear correlation Testing two variables for a linear relationship Anthon Tanbakuchi Department of Mathematics Pima Communit College Redistribution of this material is prohibited

More information

Unit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents

Unit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents Unit NOTES Honors Common Core Math Da : Properties of Eponents Warm-Up: Before we begin toda s lesson, how much do ou remember about eponents? Use epanded form to write the rules for the eponents. OBJECTIVE

More information

MATH Line integrals III Fall The fundamental theorem of line integrals. In general C

MATH Line integrals III Fall The fundamental theorem of line integrals. In general C MATH 255 Line integrals III Fall 216 In general 1. The fundamental theorem of line integrals v T ds depends on the curve between the starting point and the ending point. onsider two was to get from (1,

More information

2.1 Rates of Change and Limits AP Calculus

2.1 Rates of Change and Limits AP Calculus . Rates of Change and Limits AP Calculus. RATES OF CHANGE AND LIMITS Limits Limits are what separate Calculus from pre calculus. Using a it is also the foundational principle behind the two most important

More information

5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph.

5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph. . Zeros Eample 1. At the right we have drawn the graph of the polnomial = ( 1) ( 2) ( 4). Argue that the form of the algebraic formula allows ou to see right awa where the graph is above the -ais, where

More information

Learning from Data: Regression

Learning from Data: Regression November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear

More information

CSE 546 Midterm Exam, Fall 2014

CSE 546 Midterm Exam, Fall 2014 CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Biostatistics in Research Practice - Regression I

Biostatistics in Research Practice - Regression I Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.

More information

MA123, Chapter 8: Idea of the Integral (pp , Gootman)

MA123, Chapter 8: Idea of the Integral (pp , Gootman) MA13, Chapter 8: Idea of the Integral (pp. 155-187, Gootman) Chapter Goals: Understand the relationship between the area under a curve and the definite integral. Understand the relationship between velocit

More information

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work. 1 A socioeconomic stud analzes two discrete random variables in a certain population of households = number of adult residents and = number of child residents It is found that their joint probabilit mass

More information

Physics Gravitational force. 2. Strong or color force. 3. Electroweak force

Physics Gravitational force. 2. Strong or color force. 3. Electroweak force Phsics 360 Notes on Griffths - pluses and minuses No tetbook is perfect, and Griffithsisnoeception. Themajorplusisthat it is prett readable. For minuses, see below. Much of what G sas about the del operator

More information

Learning From Data Lecture 10 Nonlinear Transforms

Learning From Data Lecture 10 Nonlinear Transforms Learning From Data Lecture 0 Nonlinear Transforms The Z-space Polynomial transforms Be careful M. Magdon-Ismail CSCI 400/600 recap: The Linear Model linear in w: makes the algorithms work linear in x:

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

Analytic Trigonometry

Analytic Trigonometry CHAPTER 5 Analtic Trigonometr 5. Fundamental Identities 5. Proving Trigonometric Identities 5.3 Sum and Difference Identities 5.4 Multiple-Angle Identities 5.5 The Law of Sines 5.6 The Law of Cosines It

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data

More information

8. BOOLEAN ALGEBRAS x x

8. BOOLEAN ALGEBRAS x x 8. BOOLEAN ALGEBRAS 8.1. Definition of a Boolean Algebra There are man sstems of interest to computing scientists that have a common underling structure. It makes sense to describe such a mathematical

More information

Figure 1: Visualising the input features. Figure 2: Visualising the input-output data pairs.

Figure 1: Visualising the input features. Figure 2: Visualising the input-output data pairs. Regression. Data visualisation (a) To plot the data in x trn and x tst, we could use MATLAB function scatter. For example, the code for plotting x trn is scatter(x_trn(:,), x_trn(:,));........... x (a)

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization

CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models

More information

LAB 05 Projectile Motion

LAB 05 Projectile Motion LAB 5 Projectile Motion CONTENT: 1. Introduction. Projectile motion A. Setup B. Various characteristics 3. Pre-lab: A. Activities B. Preliminar info C. Quiz 1. Introduction After introducing one-dimensional

More information

http://imgs.xkcd.com/comics/electoral_precedent.png Statistical Learning Theory CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell Chapter 7 (not 7.4.4 and 7.5)

More information

Problems set # 2 Physics 169 February 11, 2015

Problems set # 2 Physics 169 February 11, 2015 Prof. Anchordoqui Problems set # 2 Phsics 169 Februar 11, 2015 1. Figure 1 shows the electric field lines for two point charges separated b a small distance. (i) Determine the ratio q 1 /q 2. (ii) What

More information

FIRST- AND SECOND-ORDER IVPS The problem given in (1) is also called an nth-order initial-value problem. For example, Solve: Solve:

FIRST- AND SECOND-ORDER IVPS The problem given in (1) is also called an nth-order initial-value problem. For example, Solve: Solve: .2 INITIAL-VALUE PROBLEMS 3.2 INITIAL-VALUE PROBLEMS REVIEW MATERIAL Normal form of a DE Solution of a DE Famil of solutions INTRODUCTION We are often interested in problems in which we seek a solution

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

A. Real numbers greater than 2 B. Real numbers less than or equal to 2. C. Real numbers between 1 and 3 D. Real numbers greater than or equal to 2

A. Real numbers greater than 2 B. Real numbers less than or equal to 2. C. Real numbers between 1 and 3 D. Real numbers greater than or equal to 2 39 CHAPTER 9 DAY 0 DAY 0 Opportunities To Learn You are what ou are when nobod Is looking. - Ann Landers 6. Match the graph with its description. A. Real numbers greater than B. Real numbers less than

More information

Statistical Learning Theory: Generalization Error Bounds

Statistical Learning Theory: Generalization Error Bounds Statistical Learning Theory: Generalization Error Bounds CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 6.5.4 Schoelkopf/Smola Chapter 5 (beginning, rest

More information

February 11, JEL Classification: C72, D43, D44 Keywords: Discontinuous games, Bertrand game, Toehold, First- and Second- Price Auctions

February 11, JEL Classification: C72, D43, D44 Keywords: Discontinuous games, Bertrand game, Toehold, First- and Second- Price Auctions Eistence of Mied Strateg Equilibria in a Class of Discontinuous Games with Unbounded Strateg Sets. Gian Luigi Albano and Aleander Matros Department of Economics and ELSE Universit College London Februar

More information

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total

More information

1.2 Relations. 20 Relations and Functions

1.2 Relations. 20 Relations and Functions 0 Relations and Functions. Relations From one point of view, all of Precalculus can be thought of as studing sets of points in the plane. With the Cartesian Plane now fresh in our memor we can discuss

More information

14.2 Choosing Among Linear, Quadratic, and Exponential Models

14.2 Choosing Among Linear, Quadratic, and Exponential Models Name Class Date 14.2 Choosing Among Linear, Quadratic, and Eponential Models Essential Question: How do ou choose among, linear, quadratic, and eponential models for a given set of data? Resource Locker

More information

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007 Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized

More information

Symmetry Arguments and the Role They Play in Using Gauss Law

Symmetry Arguments and the Role They Play in Using Gauss Law Smmetr Arguments and the Role The la in Using Gauss Law K. M. Westerberg (9/2005) Smmetr plas a ver important role in science in general, and phsics in particular. Arguments based on smmetr can often simplif

More information

Representation of Functions by Power Series. Geometric Power Series

Representation of Functions by Power Series. Geometric Power Series 60_0909.qd //0 :09 PM Page 669 SECTION 9.9 Representation of Functions b Power Series 669 The Granger Collection Section 9.9 JOSEPH FOURIER (768 80) Some of the earl work in representing functions b power

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

4.2 Mean Value Theorem Calculus

4.2 Mean Value Theorem Calculus 4. MEAN VALUE THEOREM The Mean Value Theorem is considered b some to be the most important theorem in all of calculus. It is used to prove man of the theorems in calculus that we use in this course as

More information

The Maze Generation Problem is NP-complete

The Maze Generation Problem is NP-complete The Mae Generation Problem is NP-complete Mario Alviano Department of Mathematics, Universit of Calabria, 87030 Rende (CS), Ital alviano@mat.unical.it Abstract. The Mae Generation problem has been presented

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

LESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II

LESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Unit 12 Study Notes 1 Systems of Equations

Unit 12 Study Notes 1 Systems of Equations You should learn to: Unit Stud Notes Sstems of Equations. Solve sstems of equations b substitution.. Solve sstems of equations b graphing (calculator). 3. Solve sstems of equations b elimination. 4. Solve

More information

2: Distributions of Several Variables, Error Propagation

2: Distributions of Several Variables, Error Propagation : Distributions of Several Variables, Error Propagation Distribution of several variables. variables The joint probabilit distribution function of two variables and can be genericall written f(, with the

More information

14.1 Systems of Linear Equations in Two Variables

14.1 Systems of Linear Equations in Two Variables 86 Chapter 1 Sstems of Equations and Matrices 1.1 Sstems of Linear Equations in Two Variables Use the method of substitution to solve sstems of equations in two variables. Use the method of elimination

More information

Machine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io

Machine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

2.1 Rates of Change and Limits AP Calculus

2.1 Rates of Change and Limits AP Calculus .1 Rates of Change and Limits AP Calculus.1 RATES OF CHANGE AND LIMITS Limits Limits are what separate Calculus from pre calculus. Using a it is also the foundational principle behind the two most important

More information

What is Fuzzy Logic? Fuzzy logic is a tool for embedding human knowledge (experience, expertise, heuristics) Fuzzy Logic

What is Fuzzy Logic? Fuzzy logic is a tool for embedding human knowledge (experience, expertise, heuristics) Fuzzy Logic Fuzz Logic Andrew Kusiak 239 Seamans Center Iowa Cit, IA 52242 527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak (Based on the material provided b Professor V. Kecman) What is Fuzz Logic?

More information

Computer Problems for Taylor Series and Series Convergence

Computer Problems for Taylor Series and Series Convergence Computer Problems for Taylor Series and Series Convergence The two problems below are a set; the first should be done without a computer and the second is a computer-based follow up. 1. The drawing below

More information

LESSON 4.3 GRAPHING INEQUALITIES

LESSON 4.3 GRAPHING INEQUALITIES LESSON.3 GRAPHING INEQUALITIES LESSON.3 GRAPHING INEQUALITIES 9 OVERVIEW Here s what ou ll learn in this lesson: Linear Inequalities a. Ordered pairs as solutions of linear inequalities b. Graphing linear

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

Functions of One Variable Basics

Functions of One Variable Basics Functions of One Variable Basics Function Notation: = f() Variables and are placeholders for unknowns» We will cover functions of man variables during one of the in-residence QSW lectures. = independent

More information

(MTH5109) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES. 1. Introduction to Curves and Surfaces

(MTH5109) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES. 1. Introduction to Curves and Surfaces (MTH509) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES DR. ARICK SHAO. Introduction to Curves and Surfaces In this module, we are interested in studing the geometr of objects. According to our favourite

More information

MA Game Theory 2005, LSE

MA Game Theory 2005, LSE MA. Game Theor, LSE Problem Set The problems in our third and final homework will serve two purposes. First, the will give ou practice in studing games with incomplete information. Second (unless indicated

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

a. plotting points in Cartesian coordinates (Grade 9 and 10), b. using a graphing calculator such as the TI-83 Graphing Calculator,

a. plotting points in Cartesian coordinates (Grade 9 and 10), b. using a graphing calculator such as the TI-83 Graphing Calculator, GRADE PRE-CALCULUS UNIT C: QUADRATIC FUNCTIONS CLASS NOTES FRAME. After linear functions, = m + b, and their graph the Quadratic Functions are the net most important equation or function. The Quadratic

More information

Laurie s Notes. Overview of Section 3.5

Laurie s Notes. Overview of Section 3.5 Overview of Section.5 Introduction Sstems of linear equations were solved in Algebra using substitution, elimination, and graphing. These same techniques are applied to nonlinear sstems in this lesson.

More information

LESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II

LESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS

More information

EPIPOLAR GEOMETRY WITH MANY DETAILS

EPIPOLAR GEOMETRY WITH MANY DETAILS EPIPOLAR GEOMERY WIH MANY DEAILS hank ou for the slides. he come mostl from the following source. Marc Pollefes U. of North Carolina hree questions: (i) Correspondence geometr: Given an image point in

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Ch 5 Alg 2 L2 Note Sheet Key Do Activity 1 on your Ch 5 Activity Sheet.

Ch 5 Alg 2 L2 Note Sheet Key Do Activity 1 on your Ch 5 Activity Sheet. Ch Alg L Note Sheet Ke Do Activit 1 on our Ch Activit Sheet. Chapter : Quadratic Equations and Functions.1 Modeling Data With Quadratic Functions You had three forms for linear equations, ou will have

More information

16.5. Maclaurin and Taylor Series. Introduction. Prerequisites. Learning Outcomes

16.5. Maclaurin and Taylor Series. Introduction. Prerequisites. Learning Outcomes Maclaurin and Talor Series 6.5 Introduction In this Section we eamine how functions ma be epressed in terms of power series. This is an etremel useful wa of epressing a function since (as we shall see)

More information

Rolle s Theorem. THEOREM 3 Rolle s Theorem. x x. then there is at least one number c in (a, b) at which ƒ scd = 0.

Rolle s Theorem. THEOREM 3 Rolle s Theorem. x x. then there is at least one number c in (a, b) at which ƒ scd = 0. 4.2 The Mean Value Theorem 255 4.2 The Mean Value Theorem f '(c) 0 f() We know that constant functions have zero derivatives, ut could there e a complicated function, with man terms, the derivatives of

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1

Linear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1 Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.

More information

8.1 Exponents and Roots

8.1 Exponents and Roots Section 8. Eponents and Roots 75 8. Eponents and Roots Before defining the net famil of functions, the eponential functions, we will need to discuss eponent notation in detail. As we shall see, eponents

More information