Regularized Regression

Size: px
Start display at page:

Download "Regularized Regression"

Transcription

1 Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize teir models, veering away from te MLE solution to one were te coefficients ave smaller magnitude. Tis lecture is about regularization. It draws on te ideas and treatment in Hastie et al. (2009) (referred to below as ESL). Te bias-variance trade off We first discuss an important concept, te bias-variance trade off. In tis discussion we will take a frequentist perspective. Consider a set of random responses drawn from a linear regression wit true parameter, Y n j x n ; N.x n ; 2 /: () Te data are D D f.x n ; Y n /g. Note tat we are olding te covariates x n fixed; only te responses are random. (We are also assuming x n is a single covariate; in general, it is p-dimensional and we replace x n wit > x n.) Wit tis data set, te maximum likeliood estimate is a random variable wose distribution is governed by te distribution of te data O.D/. Recall tat is te true parameter tat generated te responses. How close to we expect O.D/ to be to? We can answer tis question in a couple of ways. First, suppose we observe a new data input x. We consider te mean squared error of our estimate of E O Œy j x D Ox. Tis is te difference between our predicted expectation of te response and te true expectation of te response, MSE D E. O.D/ > x > x/ 2 i : (2) It is important to keep track of wic variables are random. Te coefficient is not random;

2 it is te true parameter tat generated te data. Te coefficient O.D/ is random; it depends on te randomly generated data set D. Te expectation in tis equation is wit respect to te randomly generated data set. (For simplicity, we will sometimes supress tis notation below.) Te MSE decomposes in an interesting way, MSE D E. Ox/ i 2 D E. Ox/ i 2 D E. Ox/ i 2 i 2E Ox x C.x/ 2 2E. Ox/ i i 2 i E Ox C E Ox.x/ C.x/ 2 C E Ox/i 2. E. Ox/ i 2 x 2 (3) Te second term is te squared bias, i bias D E Ox x: (4) An estimate for wic tis term is zero is an unbiased estimate. Te first term is te variance, variance D E. Ox/ i 2 E Ox i 2 : (5) Tis reflects te spread of te estimates we migt find on account of te randomness inerent in te data. Note tat te decomposition olds for any linear function of te coefficients. A famous result in statistics is te Gauss-Markov teorem. Recall tat te MLE O is an unbiased estimate. Te teorem states tat te MLE is te unbiased estimate wit te smallest variance. If you insist on unbiasedness, and you care about te MSE, ten you can do no better tan te MLE. Often we care about expected prediction error. Suppose we observe a new input x. How wrong will we be on average wen we predict te true y j x wit E Œy j x from a fitted regression? Te expected squared prediction error is E D E Y. Ox Y / 2 ii Te first expectation is taken for te randomness of O, wic is a function of te data. Te 2

3 second is taken for te randomness of Y given x, wic comes from te true model. Tis decomposes as follows, E D E Y. Ox Y / 2 ii D Var.Y / C MSE. Ox/ (6) D 2 C Bias 2. Ox/ C Var. Ox/: (7) Te first term is te inerent uncertainty around te true mean; te second two terms are te bias variance decomposition of te estimator. We cannot do anyting about te inerent uncertainty; tus reducing te MSE also reduces expected prediction error. Classical statistics cared only about unbiased estimators. Modern statistics as explored te trade-off, were it may be wort accepting some bias for a reduction in variance. Tis can reduce te MSE and, consequently, te expected prediction error on future data. Here a simple picture to illustrate wy: beta at It may be tat te MSE is smaller for te biased estimator, because it nevers veers as far away from te trut as te unbiased estimator does. 2 Ridge regression Regularization. In regression, we can make tis trade-off wit regularization, wic means placing constraints on te coefficients. Here is a picture from ESL for our first example. 3

4 Elements of Statistical Learning c Hastie, Tibsirani & Friedman 200 Capter 3 2. ^ 2. ^ Figure 3.2: Estimation picture for te lasso (left) In tis picture, and ridgecontours regression represent (rigt). Sown values areofcontours witof equal te RSS (or, equivalently, likeliood). Our procedure error andfinds constraint te best functions. value tat Te solid is witin blue areas te blue are circle. te constraint regions β + β 2 t and β 2 + β2 2 t 2, respectively, wile te red ellipses are te contours of Tis reduces te variance because it limits te space tat te parameter vector can live te least squares error function. in. If te true MLE of lives outside tat space, ten te resulting estimate must be biased because of te Gauss-Markov teorem. Te picture also sows ow regularization encourages smaller and peraps simpler models. Simpler models are more robust to overfitting, generalizing pooly because of a close matc to te training data. Simpler models can also be more interpretable, wic is anoter goal of regression. (Tis is particularly true for te lasso, wic we will talk about later.) Ridge regression. Let s discuss te details of ridge regression. We optimize te RSS subject to a constraint on te sum of squares of te coefficients, minimize subject to P N nd P p id 2i 2.y n x n / 2 s (8) Tis constrains te coefficients to live witin a spere of radius s. (See te picture.) Question: Wat appens as te radius increases? Answer: Variance goes up; bias goes down. Wit some calculus, te ridge regression estimate can also be expressed as Oridge D arg min NX nd 2.y n x n / 2 C px id 2 i (9) Tis is nice because te problem is convex. Furter, it as an analytic solution. (See te reading.) Question: Is it sensitive to scaling? Answer: Yes, in practice we center and scale 4

5 te covariates. Tere is a - mapping between te radius s and complexity parameter. Eiter of tese parameters trades off an increase in bias for a decrease in variance. From ESL: Coefficients L Norm How do we coose? As we see, te value of te complexity parameter affects our estimate. Question: Wat would appen if we used training error as te criterion? (Look at te picture to see te answer.) In practice, we coose by cross validation. Tis is an attempt to minimize expected test error. (But later on we will discuss ierarcical models. Tis can be anoter way to coose te regularization parameter.) Here is ow it works: Divide te data into K folds (e.g., K D 0). Decide on candidate values of (e.g., a grid between 0 and ) For eac fold k and value of, Estimate Oridge k on te out-of-fold samples. For eac x n assigned to fold k, compute its squared error n D. Oy n y n / 2 ; (0) 5

6 were Oy n D E ridge O ŒY j x n. Note tat tis estimate of te coefficients did not use k.x n ; y n / as part of its training data. We now aggregate te individual errors. Te score for is MSE./ D N NX n : () nd Tis is an estimate of te test error. Coose tat minimizes tis score. Aside: Connection to Bayesian statistics. We ave motivated regularized regression via frequentist tinking, i.e., te bias-variance trade-off and an appeal to te true model. Regularized regression, in general, as connections to Bayesian modeling. We ave discussed two common ways of using te posterior to obtain an estimate. Te first is maximum a posteriori (MAP) estimation, Te second is te posterior mean, MAP D arg max p. j y ; : : : ; y N ; / (2) mean D E Œ j y ; : : : ; y N ; (3) Question: How are tese different from te MLE? Ridge regression and Bayesian metods. Ridge regression corresponds to MAP estimation in te following model: i N.0; =/ (4) y n j x n ; N.>x n ; 2 / (5) Here is te corresponding grapical model X n Y n β N λ [ Tis isn t quite rigt; sould be a small dot. ] 6

7 We will derive te relationsip. First, note tat p.i j / D p 2.=/ expf2 i g (6) We now compute te MAP estimate of, max p. j D; / D max D max D D max max log p. j y WN ; x WN ; / (7) log p.; y WN j x WN ; / (8) py log p.y WN j x WN ; / p.i j / (9) RSS.I D/ px id Ridge regression is equivalent to MAP estimation in te model. id 2 i : (20) Observe tat te yperparameter controls ow far away te estimate will be from te MLE. A small yperparameter (large variance) will coose te MLE; te data totally determine te estimate. As te yperparameter gets larger, te estimate moves furter from te MLE; te prior (E ΠD 0) becomes more influential. Tis matces our recurring teme in Bayesian estimation; bot te data and te prior influence te answer. Finally, note tat a true Bayesian would not set te yperparameter by cross-validation. Tis uses te data to set te prior. However, I tink it is a good idea. It is an instance of a more general principle called Empirical Bayes. Summary of ridge regression.. We constrain to be in a yperspere around Tis is equivalent to minimizing te RSS plus a regularization term. 3. We no longer find te O tat minimizes te RSS. (Contours illustrate constant RSS.) 4. Ridge regression is a kind of srinkage, so called because it reduces te components to be close to 0 and close to eac oter. 7

8 5. Ridge estimates trade off bias for variance. 3 Te lasso A closely related regularization metod is called te lasso. Te lasso optimizes te RSS subject to a different constraint, minimize subject to P N nd P p 2.y n x n / 2 id jij s Elements of Statistical Learning c Hastie, Tibsirani & Friedman 200 Capter 3 (2) Tis small cange yields very different estimates. Here is te picture of te constraint: From ESL: 2. ^ 2. ^ Figure 3.2: Estimation picture for te lasso (left) Question: Wat appens as s increases? Question: Were is te solution going to lie wit s and ridge regression (rigt). Sown are contours of te fixed? error and constraint functions. Te solid blue areas are te constraint regions β + β 2 t and β 2 + β2 2 t 2, It s a fact: unless it cooses O, terespectively, lasso (witwile p large) te red will ellipses set some are te ofcontours te coefficients of to te least squares error function. exactly zero. Te intuitions come from ESL: Unlike te disk, te diamond as corners; if te solution occurs at a corner, ten it as one parameter j equal to zero. Wen p > 2, te diamond becomes a romboid, and as many corners, flat edges and faces; tere are many more opporunities for te estimated parameters to be zero. (p 90). In a sense, te lasso is a form of feature selection, identifying a relevant subset of te covariates wit wic to predict. Like ridge regression, it trades off an increase in bias wit a decrease in variance. Furter, by zeroing out some of te covariates, it provides interpretable (as in, sparse) models. 8

9 Sparse models can also be important in real systems tat migt depend on many inputs. Once te sparse solution is found, we need only measure a few of te inputs in order to make predictions. Tis speeds up te performance of te system. Te lasso is equivalent to Olasso D arg min NX nd 2.y n x n / 2 C px jij (22) Again, tere is a - mapping between and s. Tis objective, toug it does not ave an analytic solution, is still convex. Wy is te lasso exciting? Prior to te lasso, te only sparse metod was subset selection, finding te best subset of features wit wic to model te data. But subset selection as problems: searcing over all subsets (of a fixed size) is computationally expensive. In contrast, te lasso efficiently finds a sparse solution by using convex optimization. In a sense, it is akin to a smoot version of subset selection. Note te lasso won t consider all possible subsets. From ESL: id Coefficients L Norm 9

10 Te Bayesian interpretation of te lasso. Like ridge regression, lasso regression corresponds to MAP estimation in a Bayesian model. For te lasso, te model is: i Laplace./ (23) Y n j x n ; N.>x n ; 2 /: (24) Here te coefficients come from a Laplace distribution, p.i j / D 2 expf jijg: (25) Te lasso, and te general idea of L penalized models, as become a cottage industry in modern statistics and macine learning. Te reason is tat we often want sparse solutions to ig-dimensional problems, and we want convex objective functions wen analyzing data. L penalized metods give us bot. Recent researc indicates tat tey ave good teoretical properties to boot. 4 (Optional) Generalized regularization In general, regularization can be seen as minimizing te RSS wit a constraint on a q- norm, minimize P N nd 2.y n x n / 2 subject to jjjj q s, were te penalty is jjjj q D =q px jij q id Te metods we discussed so far are q D 2 : ridge regression q D : lasso q D 0 : subset selection Here is te picture from ESL: 0

11 q =4 q =2 q = q =0.5 q =0. Figure 3.3: Contours of constant value of j β j q for given values of q. Tis brings us away from te minimum RSS solution, but migt provide better test prediction via te bias/variance trade-off. Complex models ave less bias; simpler models ave less variance. Regularization encourages simpler models. Note tat eac of tese metods correspond to a Bayesian solution wit a different coice of prior. Oridge D arg min NX nd 2.y n x n / 2 C jjjj q Te complexity parameter can be cosen wit cross validation. Lasso (q D ) is te only norm tat provides sparsity and convexity. And tere are oter variants, useful in te literature. Of note: Te elastic net is a convex combination of L and L 2. Te grouped lasso finds sparse groups of covariates to include. Finally, te glmnet package in R is amazing. It efficiently computes models for a regularization pat using L 2 or L penalization. It uses te same model syntax as lm or glm. References Hastie, T., Tibsirani, R., and Friedman, J. (2009). Te Elements of Statistical Learning. Springer, 2 edition.

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Continuity and Differentiability Worksheet

Continuity and Differentiability Worksheet Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;

More information

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser

More information

f a h f a h h lim lim

f a h f a h h lim lim Te Derivative Te derivative of a function f at a (denoted f a) is f a if tis it exists. An alternative way of defining f a is f a x a fa fa fx fa x a Note tat te tangent line to te grap of f at te point

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

MVT and Rolle s Theorem

MVT and Rolle s Theorem AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4. December 09, 20 Calculus PracticeTest s Name: (4 points) Find te absolute extrema of f(x) = x 3 0 on te interval [0, 4] Te derivative of f(x) is f (x) = 3x 2, wic is zero only at x = 0 Tus we only need

More information

Function Composition and Chain Rules

Function Composition and Chain Rules Function Composition and s James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 8, 2017 Outline 1 Function Composition and Continuity 2 Function

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example, NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing

More information

Notes on wavefunctions II: momentum wavefunctions

Notes on wavefunctions II: momentum wavefunctions Notes on wavefunctions II: momentum wavefunctions and uncertainty Te state of a particle at any time is described by a wavefunction ψ(x). Tese wavefunction must cange wit time, since we know tat particles

More information

Introduction to Derivatives

Introduction to Derivatives Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))

More information

1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible.

1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible. 004 Algebra Pretest answers and scoring Part A. Multiple coice questions. Directions: Circle te letter ( A, B, C, D, or E ) net to te correct answer. points eac, no partial credit. Wic one of te following

More information

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225 THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Mat 225 As we ave seen, te definition of derivative for a Mat 111 function g : R R and for acurveγ : R E n are te same, except for interpretation:

More information

Financial Econometrics Prof. Massimo Guidolin

Financial Econometrics Prof. Massimo Guidolin CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

A.P. CALCULUS (AB) Outline Chapter 3 (Derivatives)

A.P. CALCULUS (AB) Outline Chapter 3 (Derivatives) A.P. CALCULUS (AB) Outline Capter 3 (Derivatives) NAME Date Previously in Capter 2 we determined te slope of a tangent line to a curve at a point as te limit of te slopes of secant lines using tat point

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

2.11 That s So Derivative

2.11 That s So Derivative 2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point

More information

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk Bob Brown Mat 251 Calculus 1 Capter 3, Section 1 Completed 1 Te Tangent Line Problem Te idea of a tangent line first arises in geometry in te context of a circle. But before we jump into a discussion of

More information

Fundamentals of Concept Learning

Fundamentals of Concept Learning Aims 09s: COMP947 Macine Learning and Data Mining Fundamentals of Concept Learning Marc, 009 Acknowledgement: Material derived from slides for te book Macine Learning, Tom Mitcell, McGraw-Hill, 997 ttp://www-.cs.cmu.edu/~tom/mlbook.tml

More information

Derivatives of Exponentials

Derivatives of Exponentials mat 0 more on derivatives: day 0 Derivatives of Eponentials Recall tat DEFINITION... An eponential function as te form f () =a, were te base is a real number a > 0. Te domain of an eponential function

More information

CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering

CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering Outline Learning from eamples CSCE 78/878 Lecture : Concept Learning and te General-to-Specific Ordering Stepen D. Scott (Adapted from Tom Mitcell s slides) General-to-specific ordering over ypoteses Version

More information

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator. Lecture XVII Abstract We introduce te concept of directional derivative of a scalar function and discuss its relation wit te gradient operator. Directional derivative and gradient Te directional derivative

More information

7.1 Using Antiderivatives to find Area

7.1 Using Antiderivatives to find Area 7.1 Using Antiderivatives to find Area Introduction finding te area under te grap of a nonnegative, continuous function f In tis section a formula is obtained for finding te area of te region bounded between

More information

5.1 We will begin this section with the definition of a rational expression. We

5.1 We will begin this section with the definition of a rational expression. We Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go

More information

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

Higher Derivatives. Differentiable Functions

Higher Derivatives. Differentiable Functions Calculus 1 Lia Vas Higer Derivatives. Differentiable Functions Te second derivative. Te derivative itself can be considered as a function. Te instantaneous rate of cange of tis function is te second derivative.

More information

Derivatives. By: OpenStaxCollege

Derivatives. By: OpenStaxCollege By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY (Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative

More information

INTRODUCTION AND MATHEMATICAL CONCEPTS

INTRODUCTION AND MATHEMATICAL CONCEPTS Capter 1 INTRODUCTION ND MTHEMTICL CONCEPTS PREVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019 ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS MATH00030 SEMESTER 208/209 DR. ANTHONY BROWN 6. Differential Calculus 6.. Differentiation from First Principles. In tis capter, we will introduce

More information

3.1 Extreme Values of a Function

3.1 Extreme Values of a Function .1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find

More information

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk

More information

2.3 Algebraic approach to limits

2.3 Algebraic approach to limits CHAPTER 2. LIMITS 32 2.3 Algebraic approac to its Now we start to learn ow to find its algebraically. Tis starts wit te simplest possible its, and ten builds tese up to more complicated examples. Fact.

More information

Name: Answer Key No calculators. Show your work! 1. (21 points) All answers should either be,, a (finite) real number, or DNE ( does not exist ).

Name: Answer Key No calculators. Show your work! 1. (21 points) All answers should either be,, a (finite) real number, or DNE ( does not exist ). Mat - Final Exam August 3 rd, Name: Answer Key No calculators. Sow your work!. points) All answers sould eiter be,, a finite) real number, or DNE does not exist ). a) Use te grap of te function to evaluate

More information

1 Solutions to the in class part

1 Solutions to the in class part NAME: Solutions to te in class part. Te grap of a function f is given. Calculus wit Analytic Geometry I Exam, Friday, August 30, 0 SOLUTIONS (a) State te value of f(). (b) Estimate te value of f( ). (c)

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

HOMEWORK HELP 2 FOR MATH 151

HOMEWORK HELP 2 FOR MATH 151 HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,

More information

Cubic Functions: Local Analysis

Cubic Functions: Local Analysis Cubic function cubing coefficient Capter 13 Cubic Functions: Local Analysis Input-Output Pairs, 378 Normalized Input-Output Rule, 380 Local I-O Rule Near, 382 Local Grap Near, 384 Types of Local Graps

More information

Sin, Cos and All That

Sin, Cos and All That Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives

More information

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these. Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra

More information

IEOR 165 Lecture 10 Distribution Estimation

IEOR 165 Lecture 10 Distribution Estimation IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat

More information

INTRODUCTION AND MATHEMATICAL CONCEPTS

INTRODUCTION AND MATHEMATICAL CONCEPTS INTODUCTION ND MTHEMTICL CONCEPTS PEVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips of sine,

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

3.4 Worksheet: Proof of the Chain Rule NAME

3.4 Worksheet: Proof of the Chain Rule NAME Mat 1170 3.4 Workseet: Proof of te Cain Rule NAME Te Cain Rule So far we are able to differentiate all types of functions. For example: polynomials, rational, root, and trigonometric functions. We are

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x

More information

Material for Difference Quotient

Material for Difference Quotient Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient

More information

Section 15.6 Directional Derivatives and the Gradient Vector

Section 15.6 Directional Derivatives and the Gradient Vector Section 15.6 Directional Derivatives and te Gradient Vector Finding rates of cange in different directions Recall tat wen we first started considering derivatives of functions of more tan one variable,

More information

Section 2: The Derivative Definition of the Derivative

Section 2: The Derivative Definition of the Derivative Capter 2 Te Derivative Applied Calculus 80 Section 2: Te Derivative Definition of te Derivative Suppose we drop a tomato from te top of a 00 foot building and time its fall. Time (sec) Heigt (ft) 0.0 00

More information

2.3 Product and Quotient Rules

2.3 Product and Quotient Rules .3. PRODUCT AND QUOTIENT RULES 75.3 Product and Quotient Rules.3.1 Product rule Suppose tat f and g are two di erentiable functions. Ten ( g (x)) 0 = f 0 (x) g (x) + g 0 (x) See.3.5 on page 77 for a proof.

More information

Section 3: The Derivative Definition of the Derivative

Section 3: The Derivative Definition of the Derivative Capter 2 Te Derivative Business Calculus 85 Section 3: Te Derivative Definition of te Derivative Returning to te tangent slope problem from te first section, let's look at te problem of finding te slope

More information

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to

More information

Lab 6 Derivatives and Mutant Bacteria

Lab 6 Derivatives and Mutant Bacteria Lab 6 Derivatives and Mutant Bacteria Date: September 27, 20 Assignment Due Date: October 4, 20 Goal: In tis lab you will furter explore te concept of a derivative using R. You will use your knowledge

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist Mat 1120 Calculus Test 2. October 18, 2001 Your name Te multiple coice problems count 4 points eac. In te multiple coice section, circle te correct coice (or coices). You must sow your work on te oter

More information

How to Find the Derivative of a Function: Calculus 1

How to Find the Derivative of a Function: Calculus 1 Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te

More information

Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT the Math. What is the half-life of radon?

Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT the Math. What is the half-life of radon? 8.5 Solving Exponential Equations GOAL Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT te Mat All radioactive substances decrease in mass over time. Jamie works in

More information

Spike train entropy-rate estimation using hierarchical Dirichlet process priors

Spike train entropy-rate estimation using hierarchical Dirichlet process priors publised in: Advances in Neural Information Processing Systems 26 (23), 276 284. Spike train entropy-rate estimation using ierarcical Diriclet process priors Karin Knudson Department of Matematics kknudson@mat.utexas.edu

More information

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if Computational Aspects of its. Keeping te simple simple. Recall by elementary functions we mean :Polynomials (including linear and quadratic equations) Eponentials Logaritms Trig Functions Rational Functions

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

1watt=1W=1kg m 2 /s 3

1watt=1W=1kg m 2 /s 3 Appendix A Matematics Appendix A.1 Units To measure a pysical quantity, you need a standard. Eac pysical quantity as certain units. A unit is just a standard we use to compare, e.g. a ruler. In tis laboratory

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f = Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.

More information

. If lim. x 2 x 1. f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x) Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value

More information

The Priestley-Chao Estimator

The Priestley-Chao Estimator Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are

More information

Math 312 Lecture Notes Modeling

Math 312 Lecture Notes Modeling Mat 3 Lecture Notes Modeling Warren Weckesser Department of Matematics Colgate University 5 7 January 006 Classifying Matematical Models An Example We consider te following scenario. During a storm, a

More information

CS522 - Partial Di erential Equations

CS522 - Partial Di erential Equations CS5 - Partial Di erential Equations Tibor Jánosi April 5, 5 Numerical Di erentiation In principle, di erentiation is a simple operation. Indeed, given a function speci ed as a closed-form formula, its

More information

University Mathematics 2

University Mathematics 2 University Matematics 2 1 Differentiability In tis section, we discuss te differentiability of functions. Definition 1.1 Differentiable function). Let f) be a function. We say tat f is differentiable at

More information

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations Pysically Based Modeling: Principles and Practice Implicit Metods for Differential Equations David Baraff Robotics Institute Carnegie Mellon University Please note: Tis document is 997 by David Baraff

More information

Time (hours) Morphine sulfate (mg)

Time (hours) Morphine sulfate (mg) Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15

More information

MAT 145. Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points

MAT 145. Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points MAT 15 Test #2 Name Solution Guide Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points Use te grap of a function sown ere as you respond to questions 1 to 8. 1. lim f (x) 0 2. lim

More information

Pre-Calculus Review Preemptive Strike

Pre-Calculus Review Preemptive Strike Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly

More information

Lecture 10: Carnot theorem

Lecture 10: Carnot theorem ecture 0: Carnot teorem Feb 7, 005 Equivalence of Kelvin and Clausius formulations ast time we learned tat te Second aw can be formulated in two ways. e Kelvin formulation: No process is possible wose

More information

MAT244 - Ordinary Di erential Equations - Summer 2016 Assignment 2 Due: July 20, 2016

MAT244 - Ordinary Di erential Equations - Summer 2016 Assignment 2 Due: July 20, 2016 MAT244 - Ordinary Di erential Equations - Summer 206 Assignment 2 Due: July 20, 206 Full Name: Student #: Last First Indicate wic Tutorial Section you attend by filling in te appropriate circle: Tut 0

More information

Average Rate of Change

Average Rate of Change Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope

More information

Exponentials and Logarithms Review Part 2: Exponentials

Exponentials and Logarithms Review Part 2: Exponentials Eponentials and Logaritms Review Part : Eponentials Notice te difference etween te functions: g( ) and f ( ) In te function g( ), te variale is te ase and te eponent is a constant. Tis is called a power

More information

New families of estimators and test statistics in log-linear models

New families of estimators and test statistics in log-linear models Journal of Multivariate Analysis 99 008 1590 1609 www.elsevier.com/locate/jmva ew families of estimators and test statistics in log-linear models irian Martín a,, Leandro Pardo b a Department of Statistics

More information

1 Limits and Continuity

1 Limits and Continuity 1 Limits and Continuity 1.0 Tangent Lines, Velocities, Growt In tion 0.2, we estimated te slope of a line tangent to te grap of a function at a point. At te end of tion 0.3, we constructed a new function

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

Math 212-Lecture 9. For a single-variable function z = f(x), the derivative is f (x) = lim h 0

Math 212-Lecture 9. For a single-variable function z = f(x), the derivative is f (x) = lim h 0 3.4: Partial Derivatives Definition Mat 22-Lecture 9 For a single-variable function z = f(x), te derivative is f (x) = lim 0 f(x+) f(x). For a function z = f(x, y) of two variables, to define te derivatives,

More information

AMS 147 Computational Methods and Applications Lecture 09 Copyright by Hongyun Wang, UCSC. Exact value. Effect of round-off error.

AMS 147 Computational Methods and Applications Lecture 09 Copyright by Hongyun Wang, UCSC. Exact value. Effect of round-off error. Lecture 09 Copyrigt by Hongyun Wang, UCSC Recap: Te total error in numerical differentiation fl( f ( x + fl( f ( x E T ( = f ( x Numerical result from a computer Exact value = e + f x+ Discretization error

More information

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES (Section.0: Difference Quotients).0. SECTION.0: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES Define average rate of cange (and average velocity) algebraically and grapically. Be able to identify, construct,

More information

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties

More information

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h

Lecture 21. Numerical differentiation. f ( x+h) f ( x) h h Lecture Numerical differentiation Introduction We can analytically calculate te derivative of any elementary function, so tere migt seem to be no motivation for calculating derivatives numerically. However

More information

3.4 Algebraic Limits. Ex 1) lim. Ex 2)

3.4 Algebraic Limits. Ex 1) lim. Ex 2) Calculus Maimus.4 Algebraic Limits At tis point, you sould be very comfortable finding its bot grapically and numerically wit te elp of your graping calculator. Now it s time to practice finding its witout

More information

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative Matematics 5 Workseet 11 Geometry, Tangency, and te Derivative Problem 1. Find te equation of a line wit slope m tat intersects te point (3, 9). Solution. Te equation for a line passing troug a point (x

More information

Exercises for numerical differentiation. Øyvind Ryan

Exercises for numerical differentiation. Øyvind Ryan Exercises for numerical differentiation Øyvind Ryan February 25, 2013 1. Mark eac of te following statements as true or false. a. Wen we use te approximation f (a) (f (a +) f (a))/ on a computer, we can

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

Polynomials 3: Powers of x 0 + h

Polynomials 3: Powers of x 0 + h near small binomial Capter 17 Polynomials 3: Powers of + Wile it is easy to compute wit powers of a counting-numerator, it is a lot more difficult to compute wit powers of a decimal-numerator. EXAMPLE

More information

The derivative function

The derivative function Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative

More information

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain

More information

The Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA

The Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA Te Krewe of Caesar Problem David Gurney Souteastern Louisiana University SLU 10541, 500 Western Avenue Hammond, LA 7040 June 19, 00 Krewe of Caesar 1 ABSTRACT Tis paper provides an alternative to te usual

More information

Chapters 19 & 20 Heat and the First Law of Thermodynamics

Chapters 19 & 20 Heat and the First Law of Thermodynamics Capters 19 & 20 Heat and te First Law of Termodynamics Te Zerot Law of Termodynamics Te First Law of Termodynamics Termal Processes Te Second Law of Termodynamics Heat Engines and te Carnot Cycle Refrigerators,

More information