Machine Learning Foundations
|
|
- Victor Underwood
- 5 years ago
- Views:
Transcription
1 Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 0/22
2 1 When Can Machines Learn? 2 Wh Can Machines Learn? 3 How Can Machines Learn? Roadmap Lecture 12: Nonlinear Transform nonlinear via nonlinear feature transform Φ plus linear with price of model compleit 4 How Can Machines Learn Better? Lecture 13: Hazard of Overfitting What is Overfitting? The Role of Noise and Data Size Deterministic Noise Dealing with Overfitting Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 1/22
3 What is Overfitting? Bad Generalization regression for R with N = 5 eamples target f () = 2nd order polnomial label n = f ( n ) + ver small noise linear regression in Z-space + Φ = 4th order polnomial unique solution passing all eamples = E in (g) = 0 E out (g) huge Data Target Fit bad generalization: low E in, high E out Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/22
4 What is Overfitting? Bad Generalization and Overfitting take d VC = 1126 for learning: bad generalization (E out - E in ) large switch from d VC = d VC to d VC = 1126: overfitting E in, E out switch from d VC = d VC to d VC = 1: underfitting E in, E out Error d vc out-of-sample error model compleit in-sample error VC dimension, d vc bad generalization: low E in, high E out ; overfitting: lower E in, higher E out Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/22
5 What is Overfitting? Cause of Overfitting: A Driving Analog Data Target Fit good fit = overfit learning overfit use ecessive d VC noise limited data size N driving commit a car accident drive too fast bump road limited observations about road condition net: how does noise & data size affect overfitting? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/22
6 What is Overfitting? Fun Time Based on our discussion, for data of fied size, which of the following situation is relativel of the lowest risk of overfitting? 1 small noise, fitting from small d VC to median d VC 2 small noise, fitting from small d VC to large d VC 3 large noise, fitting from small d VC to median d VC 4 large noise, fitting from small d VC to large d VC Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/22
7 What is Overfitting? Fun Time Based on our discussion, for data of fied size, which of the following situation is relativel of the lowest risk of overfitting? 1 small noise, fitting from small d VC to median d VC 2 small noise, fitting from small d VC to large d VC 3 large noise, fitting from small d VC to median d VC 4 large noise, fitting from small d VC to large d VC Reference Answer: 1 Two causes of overfitting are noise and ecessive d VC. So if both are relativel under control, the risk of overfitting is smaller. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/22
8 The Role of Noise and Data Size Case Stud (1/2) 10-th order target function + noise 50-th order target function noiselessl Data Target Data Target overfitting from best g 2 H 2 to best g 10 H 10? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/22
9 The Role of Noise and Data Size Case Stud (2/2) 10-th order target function + noise 50-th order target function noiselessl Data 2nd Order Fit 10th Order Fit Data 2nd Order Fit 10th Order Fit g 2 H 2 g 10 H 10 E in E out g 2 H 2 g 10 H 10 E in E out overfitting from g 2 to g 10? both es! Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 7/22
10 The Role of Noise and Data Size Iron of Two Learners Data 2nd Order Fit 10th Order Fit learner Overfit: pick g 10 H 10 learner Restrict: pick g 2 H 2 when both know that target = 10th R gives up abilit to fit Data Target but R wins in E out a lot! philosoph: concession for advantage? :-) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/22
11 The Role of Noise and Data Size Learning Curves Revisited H 2 H 10 Epected Error E out E in Epected Error E out Number of Data Points, N E in Number of Data Points, N H 10 : lower E out when N, but much larger generalization error for small N gra area : O overfits! (E in, E out ) R alwas wins in E out if N small! Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 9/22
12 The Role of Noise and Data Size The No Noise Case Data 2nd Order Fit 10th Order Fit learner Overfit: pick g 10 H 10 Data Target learner Restrict: pick g 2 H 2 when both know that there is no noise R still wins is there reall no noise? target compleit acts like noise Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 10/22
13 The Role of Noise and Data Size Fun Time When having limited data, in which of the following case would learner R perform better than learner O? 1 limited data from a 10-th order target function with some noise 2 limited data from a 1126-th order target function with no noise 3 limited data from a 1126-th order target function with some noise 4 all of the above Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 11/22
14 The Role of Noise and Data Size Fun Time When having limited data, in which of the following case would learner R perform better than learner O? 1 limited data from a 10-th order target function with some noise 2 limited data from a 1126-th order target function with no noise 3 limited data from a 1126-th order target function with some noise 4 all of the above Reference Answer: 4 We discussed about 1 and 2, but ou shall be able to generalize :-) that R also wins in the more difficult case of 3. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 11/22
15 Deterministic Noise A Detailed Eperiment = f () + ɛ Gaussian ( Qf ) α q q, σ 2 q=0 } {{ } f () Gaussian iid noise ɛ with level σ 2 some uniform distribution on f () with compleit level Q f data size N Data Target goal: overfit level for different (N, σ 2 ) and (N, Q f )? Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 12/22
16 Deterministic Noise The Overfit Measure Data 2nd Order Fit 10th Order Fit Data Target g 2 H 2 g 10 H 10 E in (g 10 ) E in (g 2 ) for sure overfit measure E out (g 10 ) E out (g 2 ) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/22
17 Deterministic Noise The Results impact of σ 2 versus N impact of Q f versus N Noise Level, σ Number of Data Points, N fied Q f = Target Compleit, Qf Number of Data Points, N fied σ 2 = ring a bell? :-) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/22
18 impact of σ 2 versus N: stochastic noise Noise Level, σ Deterministic Noise Impact of Noise and Data Size Number of Data Points, N impact of Q f versus N: deterministic noise Target Compleit, Qf Number of Data Points, N four reasons of serious overfitting: data size N stochastic noise deterministic noise ecessive power overfit overfit overfit overfit overfitting easil happens Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 15/22
19 Deterministic Noise Deterministic Noise if f / H: something of f cannot be captured b H deterministic noise : difference between best h H and f acts like stochastic noise not new to CS: pseudo-random generator difference to stochastic noise: depends on H fied for a given f h philosoph: when teaching a kid, perhaps better not to use eamples from a complicated target function? :-) Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 16/22
20 Deterministic Noise Fun Time Consider the target function being sin(1126) for [0, 2π]. When is uniforml sampled from the range, and we use all possible linear hpotheses h() = w to approimate the target function with respect to the squared error, what is the level of deterministic noise for each? 1 sin(1126) 2 sin(1126) 3 sin(1126) + 4 sin(1126) 1126 Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 17/22
21 Deterministic Noise Fun Time Consider the target function being sin(1126) for [0, 2π]. When is uniforml sampled from the range, and we use all possible linear hpotheses h() = w to approimate the target function with respect to the squared error, what is the level of deterministic noise for each? 1 sin(1126) 2 sin(1126) 3 sin(1126) + 4 sin(1126) 1126 Reference Answer: 1 You can tr a few different w and convince ourself that the best hpothesis h is h () = 0. The deterministic noise is the difference between f and h. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 17/22
22 Dealing with Overfitting Driving Analog Revisited learning overfit use ecessive d VC noise limited data size N start from simple model data cleaning/pruning data hinting regularization validation driving commit a car accident drive too fast bump road limited observations about road condition drive slowl use more accurate road information eploit more road information put the brakes monitor the dashboard all ver practical techniques to combat overfitting Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 18/22
23 Dealing with Overfitting Data Cleaning/Pruning if detect the outlier 5 at the top b too close to other, or too far from other wrong b current classifier... possible action 1: correct the label (data cleaning) possible action 2: remove the eample (data pruning) possibl helps, but effect varies Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 19/22
24 Dealing with Overfitting Data Hinting slightl shifted/rotated digits carr the same meaning possible action: add virtual eamples b shifting/rotating the given digits (data hinting) possibl helps, but watch out virtual eample not iid P(, )! Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 20/22
25 Dealing with Overfitting Fun Time Assume we know that f () is smmetric for some 1D regression application. That is, f () = f ( ). One possibilit of using the knowledge is to consider smmetric hpotheses onl. On the other hand, ou can also generate virtual eamples from the original data {( n, n )} as hints. What virtual eamples suit our needs best? 1 {( n, n )} 2 {( n, n )} 3 {( n, n )} 4 {(2 n, 2 n )} Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/22
26 Dealing with Overfitting Fun Time Assume we know that f () is smmetric for some 1D regression application. That is, f () = f ( ). One possibilit of using the knowledge is to consider smmetric hpotheses onl. On the other hand, ou can also generate virtual eamples from the original data {( n, n )} as hints. What virtual eamples suit our needs best? 1 {( n, n )} 2 {( n, n )} 3 {( n, n )} 4 {(2 n, 2 n )} Reference Answer: 3 We want the virtual eamples to encode the invariance when. Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/22
27 Dealing with Overfitting Summar 1 When Can Machines Learn? 2 Wh Can Machines Learn? 3 How Can Machines Learn? Lecture 12: Nonlinear Transform 4 How Can Machines Learn Better? Lecture 13: Hazard of Overfitting What is Overfitting? lower E in but higher E out The Role of Noise and Data Size overfitting easil happens! Deterministic Noise what H cannot capture acts like noise Dealing with Overfitting data cleaning/pruning/hinting, and more net: putting the brakes with regularization Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/22
Machine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 4: Feasibility of Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 8: Noise and Error Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 11: Linear Models for Classification Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan
More informationArray 10/26/2015. Department of Computer Science & Information Engineering. National Taiwan University
Array 10/26/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Array 0/15 Primitive
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 13: Deep Learning Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 15: Matrix Factorization Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 7: Blending and Bagging Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University (
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技巧 ) Lecture 6: Kernel Models for Regression Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技巧 ) Lecture 5: SVM and Logistic Regression Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 5: Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien
More informationMachine Learning Techniques
Machine Learning Techniques ( 機器學習技法 ) Lecture 2: Dual Support Vector Machine Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University
More informationPolymorphism 11/09/2015. Department of Computer Science & Information Engineering. National Taiwan University
Polymorphism 11/09/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Polymorphism
More informationEncapsulation 10/26/2015. Department of Computer Science & Information Engineering. National Taiwan University
10/26/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) 0/20 Recall: Basic
More informationInheritance 11/02/2015. Department of Computer Science & Information Engineering. National Taiwan University
Inheritance 11/02/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Inheritance
More informationBasic Java OOP 10/12/2015. Department of Computer Science & Information Engineering. National Taiwan University
Basic Java OOP 10/12/2015 Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University ( 國立台灣大學資訊工程系 ) Hsuan-Tien Lin (NTU CSIE) Basic
More informationLearning From Data Lecture 7 Approximation Versus Generalization
Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More information1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised
More informationFeedforward Neural Networks
Chapter 4 Feedforward Neural Networks 4. Motivation Let s start with our logistic regression model from before: P(k d) = softma k =k ( λ(k ) + w d λ(k, w) ). (4.) Recall that this model gives us a lot
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationCS340 Machine learning Lecture 5 Learning theory cont'd. Some slides are borrowed from Stuart Russell and Thorsten Joachims
CS340 Machine learning Lecture 5 Learning theory cont'd Some slides are borrowed from Stuart Russell and Thorsten Joachims Inductive learning Simplest form: learn a function from examples f is the target
More informationOverview. IAML: Linear Regression. Examples of regression problems. The Regression Problem
3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples
More informationCSC 411: Lecture 02: Linear Regression
CSC 411: Lecture 02: Linear Regression Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto (Most plots in this lecture are from Bishop s book) Zemel, Urtasun, Fidler (UofT) CSC 411: 02-Regression
More informationLearning From Data Lecture 12 Regularization
Learning From Data Lecture 12 Regularization Constraining the Model Weight Decay Augmented Error M. Magdon-Ismail CSCI 4100/6100 recap: Overfitting Fitting the data more than is warranted Data Target Fit
More informationRidge Regression 1. to which some random noise is added. So that the training labels can be represented as:
CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge
More informationCLASSIFICATION (DISCRIMINATION)
CLASSIFICATION (DISCRIMINATION) Caren Marzban http://www.nhn.ou.edu/ marzban Generalities This lecture relies on the previous lecture to some etent. So, read that one first. Also, as in the previous lecture,
More information15. Eigenvalues, Eigenvectors
5 Eigenvalues, Eigenvectors Matri of a Linear Transformation Consider a linear ( transformation ) L : a b R 2 R 2 Suppose we know that L and L Then c d because of linearit, we can determine what L does
More informationINTRODUCTION TO DIOPHANTINE EQUATIONS
INTRODUCTION TO DIOPHANTINE EQUATIONS In the earl 20th centur, Thue made an important breakthrough in the stud of diophantine equations. His proof is one of the first eamples of the polnomial method. His
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationThe Force Table Introduction: Theory:
1 The Force Table Introduction: "The Force Table" is a simple tool for demonstrating Newton s First Law and the vector nature of forces. This tool is based on the principle of equilibrium. An object is
More information4.7. Newton s Method. Procedure for Newton s Method HISTORICAL BIOGRAPHY
4. Newton s Method 99 4. Newton s Method HISTORICAL BIOGRAPHY Niels Henrik Abel (18 189) One of the basic problems of mathematics is solving equations. Using the quadratic root formula, we know how to
More informationPolynomial approximation and Splines
Polnomial approimation and Splines 1. Weierstrass approimation theorem The basic question we ll look at toda is how to approimate a complicated function f() with a simpler function P () f() P () for eample,
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear
More informationLinear correlation. Contents. 1 Linear correlation. 1.1 Introduction. Anthony Tanbakuchi Department of Mathematics Pima Community College
Introductor Statistics Lectures Linear correlation Testing two variables for a linear relationship Anthon Tanbakuchi Department of Mathematics Pima Communit College Redistribution of this material is prohibited
More informationUnit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents
Unit NOTES Honors Common Core Math Da : Properties of Eponents Warm-Up: Before we begin toda s lesson, how much do ou remember about eponents? Use epanded form to write the rules for the eponents. OBJECTIVE
More informationMATH Line integrals III Fall The fundamental theorem of line integrals. In general C
MATH 255 Line integrals III Fall 216 In general 1. The fundamental theorem of line integrals v T ds depends on the curve between the starting point and the ending point. onsider two was to get from (1,
More information2.1 Rates of Change and Limits AP Calculus
. Rates of Change and Limits AP Calculus. RATES OF CHANGE AND LIMITS Limits Limits are what separate Calculus from pre calculus. Using a it is also the foundational principle behind the two most important
More information5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph.
. Zeros Eample 1. At the right we have drawn the graph of the polnomial = ( 1) ( 2) ( 4). Argue that the form of the algebraic formula allows ou to see right awa where the graph is above the -ais, where
More informationLearning from Data: Regression
November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationBiostatistics in Research Practice - Regression I
Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.
More informationMA123, Chapter 8: Idea of the Integral (pp , Gootman)
MA13, Chapter 8: Idea of the Integral (pp. 155-187, Gootman) Chapter Goals: Understand the relationship between the area under a curve and the definite integral. Understand the relationship between velocit
More information0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.
1 A socioeconomic stud analzes two discrete random variables in a certain population of households = number of adult residents and = number of child residents It is found that their joint probabilit mass
More informationPhysics Gravitational force. 2. Strong or color force. 3. Electroweak force
Phsics 360 Notes on Griffths - pluses and minuses No tetbook is perfect, and Griffithsisnoeception. Themajorplusisthat it is prett readable. For minuses, see below. Much of what G sas about the del operator
More informationLearning From Data Lecture 10 Nonlinear Transforms
Learning From Data Lecture 0 Nonlinear Transforms The Z-space Polynomial transforms Be careful M. Magdon-Ismail CSCI 400/600 recap: The Linear Model linear in w: makes the algorithms work linear in x:
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationAnalytic Trigonometry
CHAPTER 5 Analtic Trigonometr 5. Fundamental Identities 5. Proving Trigonometric Identities 5.3 Sum and Difference Identities 5.4 Multiple-Angle Identities 5.5 The Law of Sines 5.6 The Law of Cosines It
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture (2 slots) Prof. L. Palagi 16/10/2014 1 What is (not) Data Mining? By Namwar Rizvi - Ad Hoc Query: ad Hoc queries just examines the current data
More information8. BOOLEAN ALGEBRAS x x
8. BOOLEAN ALGEBRAS 8.1. Definition of a Boolean Algebra There are man sstems of interest to computing scientists that have a common underling structure. It makes sense to describe such a mathematical
More informationFigure 1: Visualising the input features. Figure 2: Visualising the input-output data pairs.
Regression. Data visualisation (a) To plot the data in x trn and x tst, we could use MATLAB function scatter. For example, the code for plotting x trn is scatter(x_trn(:,), x_trn(:,));........... x (a)
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationCS168: The Modern Algorithmic Toolbox Lecture #6: Regularization
CS168: The Modern Algorithmic Toolbox Lecture #6: Regularization Tim Roughgarden & Gregory Valiant April 18, 2018 1 The Context and Intuition behind Regularization Given a dataset, and some class of models
More informationLAB 05 Projectile Motion
LAB 5 Projectile Motion CONTENT: 1. Introduction. Projectile motion A. Setup B. Various characteristics 3. Pre-lab: A. Activities B. Preliminar info C. Quiz 1. Introduction After introducing one-dimensional
More informationhttp://imgs.xkcd.com/comics/electoral_precedent.png Statistical Learning Theory CS4780/5780 Machine Learning Fall 2012 Thorsten Joachims Cornell University Reading: Mitchell Chapter 7 (not 7.4.4 and 7.5)
More informationProblems set # 2 Physics 169 February 11, 2015
Prof. Anchordoqui Problems set # 2 Phsics 169 Februar 11, 2015 1. Figure 1 shows the electric field lines for two point charges separated b a small distance. (i) Determine the ratio q 1 /q 2. (ii) What
More informationFIRST- AND SECOND-ORDER IVPS The problem given in (1) is also called an nth-order initial-value problem. For example, Solve: Solve:
.2 INITIAL-VALUE PROBLEMS 3.2 INITIAL-VALUE PROBLEMS REVIEW MATERIAL Normal form of a DE Solution of a DE Famil of solutions INTRODUCTION We are often interested in problems in which we seek a solution
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationA. Real numbers greater than 2 B. Real numbers less than or equal to 2. C. Real numbers between 1 and 3 D. Real numbers greater than or equal to 2
39 CHAPTER 9 DAY 0 DAY 0 Opportunities To Learn You are what ou are when nobod Is looking. - Ann Landers 6. Match the graph with its description. A. Real numbers greater than B. Real numbers less than
More informationStatistical Learning Theory: Generalization Error Bounds
Statistical Learning Theory: Generalization Error Bounds CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 6.5.4 Schoelkopf/Smola Chapter 5 (beginning, rest
More informationFebruary 11, JEL Classification: C72, D43, D44 Keywords: Discontinuous games, Bertrand game, Toehold, First- and Second- Price Auctions
Eistence of Mied Strateg Equilibria in a Class of Discontinuous Games with Unbounded Strateg Sets. Gian Luigi Albano and Aleander Matros Department of Economics and ELSE Universit College London Februar
More informationLinear regression Class 25, Jeremy Orloff and Jonathan Bloom
1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total
More information1.2 Relations. 20 Relations and Functions
0 Relations and Functions. Relations From one point of view, all of Precalculus can be thought of as studing sets of points in the plane. With the Cartesian Plane now fresh in our memor we can discuss
More information14.2 Choosing Among Linear, Quadratic, and Exponential Models
Name Class Date 14.2 Choosing Among Linear, Quadratic, and Eponential Models Essential Question: How do ou choose among, linear, quadratic, and eponential models for a given set of data? Resource Locker
More informationA Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007
Decision Trees, cont. Boosting Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 1 st, 2007 1 A Decision Stump 2 1 The final tree 3 Basic Decision Tree Building Summarized
More informationSymmetry Arguments and the Role They Play in Using Gauss Law
Smmetr Arguments and the Role The la in Using Gauss Law K. M. Westerberg (9/2005) Smmetr plas a ver important role in science in general, and phsics in particular. Arguments based on smmetr can often simplif
More informationRepresentation of Functions by Power Series. Geometric Power Series
60_0909.qd //0 :09 PM Page 669 SECTION 9.9 Representation of Functions b Power Series 669 The Granger Collection Section 9.9 JOSEPH FOURIER (768 80) Some of the earl work in representing functions b power
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More information4.2 Mean Value Theorem Calculus
4. MEAN VALUE THEOREM The Mean Value Theorem is considered b some to be the most important theorem in all of calculus. It is used to prove man of the theorems in calculus that we use in this course as
More informationThe Maze Generation Problem is NP-complete
The Mae Generation Problem is NP-complete Mario Alviano Department of Mathematics, Universit of Calabria, 87030 Rende (CS), Ital alviano@mat.unical.it Abstract. The Mae Generation problem has been presented
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationLESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II
LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationUnit 12 Study Notes 1 Systems of Equations
You should learn to: Unit Stud Notes Sstems of Equations. Solve sstems of equations b substitution.. Solve sstems of equations b graphing (calculator). 3. Solve sstems of equations b elimination. 4. Solve
More information2: Distributions of Several Variables, Error Propagation
: Distributions of Several Variables, Error Propagation Distribution of several variables. variables The joint probabilit distribution function of two variables and can be genericall written f(, with the
More information14.1 Systems of Linear Equations in Two Variables
86 Chapter 1 Sstems of Equations and Matrices 1.1 Sstems of Linear Equations in Two Variables Use the method of substitution to solve sstems of equations in two variables. Use the method of elimination
More informationMachine Learning. Lecture 2: Linear regression. Feng Li. https://funglee.github.io
Machine Learning Lecture 2: Linear regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2017 Supervised Learning Regression: Predict
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More information2.1 Rates of Change and Limits AP Calculus
.1 Rates of Change and Limits AP Calculus.1 RATES OF CHANGE AND LIMITS Limits Limits are what separate Calculus from pre calculus. Using a it is also the foundational principle behind the two most important
More informationWhat is Fuzzy Logic? Fuzzy logic is a tool for embedding human knowledge (experience, expertise, heuristics) Fuzzy Logic
Fuzz Logic Andrew Kusiak 239 Seamans Center Iowa Cit, IA 52242 527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak (Based on the material provided b Professor V. Kecman) What is Fuzz Logic?
More informationComputer Problems for Taylor Series and Series Convergence
Computer Problems for Taylor Series and Series Convergence The two problems below are a set; the first should be done without a computer and the second is a computer-based follow up. 1. The drawing below
More informationLESSON 4.3 GRAPHING INEQUALITIES
LESSON.3 GRAPHING INEQUALITIES LESSON.3 GRAPHING INEQUALITIES 9 OVERVIEW Here s what ou ll learn in this lesson: Linear Inequalities a. Ordered pairs as solutions of linear inequalities b. Graphing linear
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationFunctions of One Variable Basics
Functions of One Variable Basics Function Notation: = f() Variables and are placeholders for unknowns» We will cover functions of man variables during one of the in-residence QSW lectures. = independent
More information(MTH5109) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES. 1. Introduction to Curves and Surfaces
(MTH509) GEOMETRY II: KNOTS AND SURFACES LECTURE NOTES DR. ARICK SHAO. Introduction to Curves and Surfaces In this module, we are interested in studing the geometr of objects. According to our favourite
More informationMA Game Theory 2005, LSE
MA. Game Theor, LSE Problem Set The problems in our third and final homework will serve two purposes. First, the will give ou practice in studing games with incomplete information. Second (unless indicated
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationa. plotting points in Cartesian coordinates (Grade 9 and 10), b. using a graphing calculator such as the TI-83 Graphing Calculator,
GRADE PRE-CALCULUS UNIT C: QUADRATIC FUNCTIONS CLASS NOTES FRAME. After linear functions, = m + b, and their graph the Quadratic Functions are the net most important equation or function. The Quadratic
More informationLaurie s Notes. Overview of Section 3.5
Overview of Section.5 Introduction Sstems of linear equations were solved in Algebra using substitution, elimination, and graphing. These same techniques are applied to nonlinear sstems in this lesson.
More informationLESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II
LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS
More informationEPIPOLAR GEOMETRY WITH MANY DETAILS
EPIPOLAR GEOMERY WIH MANY DEAILS hank ou for the slides. he come mostl from the following source. Marc Pollefes U. of North Carolina hree questions: (i) Correspondence geometr: Given an image point in
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the
More informationCh 5 Alg 2 L2 Note Sheet Key Do Activity 1 on your Ch 5 Activity Sheet.
Ch Alg L Note Sheet Ke Do Activit 1 on our Ch Activit Sheet. Chapter : Quadratic Equations and Functions.1 Modeling Data With Quadratic Functions You had three forms for linear equations, ou will have
More information16.5. Maclaurin and Taylor Series. Introduction. Prerequisites. Learning Outcomes
Maclaurin and Talor Series 6.5 Introduction In this Section we eamine how functions ma be epressed in terms of power series. This is an etremel useful wa of epressing a function since (as we shall see)
More informationRolle s Theorem. THEOREM 3 Rolle s Theorem. x x. then there is at least one number c in (a, b) at which ƒ scd = 0.
4.2 The Mean Value Theorem 255 4.2 The Mean Value Theorem f '(c) 0 f() We know that constant functions have zero derivatives, ut could there e a complicated function, with man terms, the derivatives of
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationLinear Regression. Machine Learning CSE546 Kevin Jamieson University of Washington. Oct 5, Kevin Jamieson 1
Linear Regression Machine Learning CSE546 Kevin Jamieson University of Washington Oct 5, 2017 1 The regression problem Given past sales data on zillow.com, predict: y = House sale price from x = {# sq.
More information8.1 Exponents and Roots
Section 8. Eponents and Roots 75 8. Eponents and Roots Before defining the net famil of functions, the eponential functions, we will need to discuss eponent notation in detail. As we shall see, eponents
More information