Regularized Regression
|
|
- Eustace Wiggins
- 5 years ago
- Views:
Transcription
1 Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize teir models, veering away from te MLE solution to one were te coefficients ave smaller magnitude. Tis lecture is about regularization. It draws on te ideas and treatment in Hastie et al. (2009) (referred to below as ESL). Te bias-variance trade off We first discuss an important concept, te bias-variance trade off. In tis discussion we will take a frequentist perspective. Consider a set of random responses drawn from a linear regression wit true parameter, Y n j x n ; N.x n ; 2 /: () Te data are D D f.x n ; Y n /g. Note tat we are olding te covariates x n fixed; only te responses are random. (We are also assuming x n is a single covariate; in general, it is p-dimensional and we replace x n wit > x n.) Wit tis data set, te maximum likeliood estimate is a random variable wose distribution is governed by te distribution of te data O.D/. Recall tat is te true parameter tat generated te responses. How close to we expect O.D/ to be to? We can answer tis question in a couple of ways. First, suppose we observe a new data input x. We consider te mean squared error of our estimate of E O Œy j x D Ox. Tis is te difference between our predicted expectation of te response and te true expectation of te response, MSE D E. O.D/ > x > x/ 2 i : (2) It is important to keep track of wic variables are random. Te coefficient is not random;
2 it is te true parameter tat generated te data. Te coefficient O.D/ is random; it depends on te randomly generated data set D. Te expectation in tis equation is wit respect to te randomly generated data set. (For simplicity, we will sometimes supress tis notation below.) Te MSE decomposes in an interesting way, MSE D E. Ox/ i 2 D E. Ox/ i 2 D E. Ox/ i 2 i 2E Ox x C.x/ 2 2E. Ox/ i i 2 i E Ox C E Ox.x/ C.x/ 2 C E Ox/i 2. E. Ox/ i 2 x 2 (3) Te second term is te squared bias, i bias D E Ox x: (4) An estimate for wic tis term is zero is an unbiased estimate. Te first term is te variance, variance D E. Ox/ i 2 E Ox i 2 : (5) Tis reflects te spread of te estimates we migt find on account of te randomness inerent in te data. Note tat te decomposition olds for any linear function of te coefficients. A famous result in statistics is te Gauss-Markov teorem. Recall tat te MLE O is an unbiased estimate. Te teorem states tat te MLE is te unbiased estimate wit te smallest variance. If you insist on unbiasedness, and you care about te MSE, ten you can do no better tan te MLE. Often we care about expected prediction error. Suppose we observe a new input x. How wrong will we be on average wen we predict te true y j x wit E Œy j x from a fitted regression? Te expected squared prediction error is E D E Y. Ox Y / 2 ii Te first expectation is taken for te randomness of O, wic is a function of te data. Te 2
3 second is taken for te randomness of Y given x, wic comes from te true model. Tis decomposes as follows, E D E Y. Ox Y / 2 ii D Var.Y / C MSE. Ox/ (6) D 2 C Bias 2. Ox/ C Var. Ox/: (7) Te first term is te inerent uncertainty around te true mean; te second two terms are te bias variance decomposition of te estimator. We cannot do anyting about te inerent uncertainty; tus reducing te MSE also reduces expected prediction error. Classical statistics cared only about unbiased estimators. Modern statistics as explored te trade-off, were it may be wort accepting some bias for a reduction in variance. Tis can reduce te MSE and, consequently, te expected prediction error on future data. Here a simple picture to illustrate wy: beta at It may be tat te MSE is smaller for te biased estimator, because it nevers veers as far away from te trut as te unbiased estimator does. 2 Ridge regression Regularization. In regression, we can make tis trade-off wit regularization, wic means placing constraints on te coefficients. Here is a picture from ESL for our first example. 3
4 Elements of Statistical Learning c Hastie, Tibsirani & Friedman 200 Capter 3 2. ^ 2. ^ Figure 3.2: Estimation picture for te lasso (left) In tis picture, and ridgecontours regression represent (rigt). Sown values areofcontours witof equal te RSS (or, equivalently, likeliood). Our procedure error andfinds constraint te best functions. value tat Te solid is witin blue areas te blue are circle. te constraint regions β + β 2 t and β 2 + β2 2 t 2, respectively, wile te red ellipses are te contours of Tis reduces te variance because it limits te space tat te parameter vector can live te least squares error function. in. If te true MLE of lives outside tat space, ten te resulting estimate must be biased because of te Gauss-Markov teorem. Te picture also sows ow regularization encourages smaller and peraps simpler models. Simpler models are more robust to overfitting, generalizing pooly because of a close matc to te training data. Simpler models can also be more interpretable, wic is anoter goal of regression. (Tis is particularly true for te lasso, wic we will talk about later.) Ridge regression. Let s discuss te details of ridge regression. We optimize te RSS subject to a constraint on te sum of squares of te coefficients, minimize subject to P N nd P p id 2i 2.y n x n / 2 s (8) Tis constrains te coefficients to live witin a spere of radius s. (See te picture.) Question: Wat appens as te radius increases? Answer: Variance goes up; bias goes down. Wit some calculus, te ridge regression estimate can also be expressed as Oridge D arg min NX nd 2.y n x n / 2 C px id 2 i (9) Tis is nice because te problem is convex. Furter, it as an analytic solution. (See te reading.) Question: Is it sensitive to scaling? Answer: Yes, in practice we center and scale 4
5 te covariates. Tere is a - mapping between te radius s and complexity parameter. Eiter of tese parameters trades off an increase in bias for a decrease in variance. From ESL: Coefficients L Norm How do we coose? As we see, te value of te complexity parameter affects our estimate. Question: Wat would appen if we used training error as te criterion? (Look at te picture to see te answer.) In practice, we coose by cross validation. Tis is an attempt to minimize expected test error. (But later on we will discuss ierarcical models. Tis can be anoter way to coose te regularization parameter.) Here is ow it works: Divide te data into K folds (e.g., K D 0). Decide on candidate values of (e.g., a grid between 0 and ) For eac fold k and value of, Estimate Oridge k on te out-of-fold samples. For eac x n assigned to fold k, compute its squared error n D. Oy n y n / 2 ; (0) 5
6 were Oy n D E ridge O ŒY j x n. Note tat tis estimate of te coefficients did not use k.x n ; y n / as part of its training data. We now aggregate te individual errors. Te score for is MSE./ D N NX n : () nd Tis is an estimate of te test error. Coose tat minimizes tis score. Aside: Connection to Bayesian statistics. We ave motivated regularized regression via frequentist tinking, i.e., te bias-variance trade-off and an appeal to te true model. Regularized regression, in general, as connections to Bayesian modeling. We ave discussed two common ways of using te posterior to obtain an estimate. Te first is maximum a posteriori (MAP) estimation, Te second is te posterior mean, MAP D arg max p. j y ; : : : ; y N ; / (2) mean D E Œ j y ; : : : ; y N ; (3) Question: How are tese different from te MLE? Ridge regression and Bayesian metods. Ridge regression corresponds to MAP estimation in te following model: i N.0; =/ (4) y n j x n ; N.>x n ; 2 / (5) Here is te corresponding grapical model X n Y n β N λ [ Tis isn t quite rigt; sould be a small dot. ] 6
7 We will derive te relationsip. First, note tat p.i j / D p 2.=/ expf2 i g (6) We now compute te MAP estimate of, max p. j D; / D max D max D D max max log p. j y WN ; x WN ; / (7) log p.; y WN j x WN ; / (8) py log p.y WN j x WN ; / p.i j / (9) RSS.I D/ px id Ridge regression is equivalent to MAP estimation in te model. id 2 i : (20) Observe tat te yperparameter controls ow far away te estimate will be from te MLE. A small yperparameter (large variance) will coose te MLE; te data totally determine te estimate. As te yperparameter gets larger, te estimate moves furter from te MLE; te prior (E Œ D 0) becomes more influential. Tis matces our recurring teme in Bayesian estimation; bot te data and te prior influence te answer. Finally, note tat a true Bayesian would not set te yperparameter by cross-validation. Tis uses te data to set te prior. However, I tink it is a good idea. It is an instance of a more general principle called Empirical Bayes. Summary of ridge regression.. We constrain to be in a yperspere around Tis is equivalent to minimizing te RSS plus a regularization term. 3. We no longer find te O tat minimizes te RSS. (Contours illustrate constant RSS.) 4. Ridge regression is a kind of srinkage, so called because it reduces te components to be close to 0 and close to eac oter. 7
8 5. Ridge estimates trade off bias for variance. 3 Te lasso A closely related regularization metod is called te lasso. Te lasso optimizes te RSS subject to a different constraint, minimize subject to P N nd P p 2.y n x n / 2 id jij s Elements of Statistical Learning c Hastie, Tibsirani & Friedman 200 Capter 3 (2) Tis small cange yields very different estimates. Here is te picture of te constraint: From ESL: 2. ^ 2. ^ Figure 3.2: Estimation picture for te lasso (left) Question: Wat appens as s increases? Question: Were is te solution going to lie wit s and ridge regression (rigt). Sown are contours of te fixed? error and constraint functions. Te solid blue areas are te constraint regions β + β 2 t and β 2 + β2 2 t 2, It s a fact: unless it cooses O, terespectively, lasso (witwile p large) te red will ellipses set some are te ofcontours te coefficients of to te least squares error function. exactly zero. Te intuitions come from ESL: Unlike te disk, te diamond as corners; if te solution occurs at a corner, ten it as one parameter j equal to zero. Wen p > 2, te diamond becomes a romboid, and as many corners, flat edges and faces; tere are many more opporunities for te estimated parameters to be zero. (p 90). In a sense, te lasso is a form of feature selection, identifying a relevant subset of te covariates wit wic to predict. Like ridge regression, it trades off an increase in bias wit a decrease in variance. Furter, by zeroing out some of te covariates, it provides interpretable (as in, sparse) models. 8
9 Sparse models can also be important in real systems tat migt depend on many inputs. Once te sparse solution is found, we need only measure a few of te inputs in order to make predictions. Tis speeds up te performance of te system. Te lasso is equivalent to Olasso D arg min NX nd 2.y n x n / 2 C px jij (22) Again, tere is a - mapping between and s. Tis objective, toug it does not ave an analytic solution, is still convex. Wy is te lasso exciting? Prior to te lasso, te only sparse metod was subset selection, finding te best subset of features wit wic to model te data. But subset selection as problems: searcing over all subsets (of a fixed size) is computationally expensive. In contrast, te lasso efficiently finds a sparse solution by using convex optimization. In a sense, it is akin to a smoot version of subset selection. Note te lasso won t consider all possible subsets. From ESL: id Coefficients L Norm 9
10 Te Bayesian interpretation of te lasso. Like ridge regression, lasso regression corresponds to MAP estimation in a Bayesian model. For te lasso, te model is: i Laplace./ (23) Y n j x n ; N.>x n ; 2 /: (24) Here te coefficients come from a Laplace distribution, p.i j / D 2 expf jijg: (25) Te lasso, and te general idea of L penalized models, as become a cottage industry in modern statistics and macine learning. Te reason is tat we often want sparse solutions to ig-dimensional problems, and we want convex objective functions wen analyzing data. L penalized metods give us bot. Recent researc indicates tat tey ave good teoretical properties to boot. 4 (Optional) Generalized regularization In general, regularization can be seen as minimizing te RSS wit a constraint on a q- norm, minimize P N nd 2.y n x n / 2 subject to jjjj q s, were te penalty is jjjj q D =q px jij q id Te metods we discussed so far are q D 2 : ridge regression q D : lasso q D 0 : subset selection Here is te picture from ESL: 0
11 q =4 q =2 q = q =0.5 q =0. Figure 3.3: Contours of constant value of j β j q for given values of q. Tis brings us away from te minimum RSS solution, but migt provide better test prediction via te bias/variance trade-off. Complex models ave less bias; simpler models ave less variance. Regularization encourages simpler models. Note tat eac of tese metods correspond to a Bayesian solution wit a different coice of prior. Oridge D arg min NX nd 2.y n x n / 2 C jjjj q Te complexity parameter can be cosen wit cross validation. Lasso (q D ) is te only norm tat provides sparsity and convexity. And tere are oter variants, useful in te literature. Of note: Te elastic net is a convex combination of L and L 2. Te grouped lasso finds sparse groups of covariates to include. Finally, te glmnet package in R is amazing. It efficiently computes models for a regularization pat using L 2 or L penalization. It uses te same model syntax as lm or glm. References Hastie, T., Tibsirani, R., and Friedman, J. (2009). Te Elements of Statistical Learning. Springer, 2 edition.
COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationContinuity and Differentiability Worksheet
Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;
More informationPreface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser
More informationf a h f a h h lim lim
Te Derivative Te derivative of a function f at a (denoted f a) is f a if tis it exists. An alternative way of defining f a is f a x a fa fa fx fa x a Note tat te tangent line to te grap of f at te point
More informationExam 1 Review Solutions
Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),
More informationMVT and Rolle s Theorem
AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state
More informationCombining functions: algebraic methods
Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)
More informationSolution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.
December 09, 20 Calculus PracticeTest s Name: (4 points) Find te absolute extrema of f(x) = x 3 0 on te interval [0, 4] Te derivative of f(x) is f (x) = 3x 2, wic is zero only at x = 0 Tus we only need
More informationFunction Composition and Chain Rules
Function Composition and s James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 8, 2017 Outline 1 Function Composition and Continuity 2 Function
More information232 Calculus and Structures
3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE
More informationNUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,
NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing
More informationNotes on wavefunctions II: momentum wavefunctions
Notes on wavefunctions II: momentum wavefunctions and uncertainty Te state of a particle at any time is described by a wavefunction ψ(x). Tese wavefunction must cange wit time, since we know tat particles
More informationIntroduction to Derivatives
Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))
More information1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible.
004 Algebra Pretest answers and scoring Part A. Multiple coice questions. Directions: Circle te letter ( A, B, C, D, or E ) net to te correct answer. points eac, no partial credit. Wic one of te following
More informationTHE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225
THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Mat 225 As we ave seen, te definition of derivative for a Mat 111 function g : R R and for acurveγ : R E n are te same, except for interpretation:
More informationFinancial Econometrics Prof. Massimo Guidolin
CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis
More informationConsider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.
Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions
More informationA.P. CALCULUS (AB) Outline Chapter 3 (Derivatives)
A.P. CALCULUS (AB) Outline Capter 3 (Derivatives) NAME Date Previously in Capter 2 we determined te slope of a tangent line to a curve at a point as te limit of te slopes of secant lines using tat point
More information2.8 The Derivative as a Function
.8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open
More informationCopyright c 2008 Kevin Long
Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula
More information2.11 That s So Derivative
2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point
More informationBob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk
Bob Brown Mat 251 Calculus 1 Capter 3, Section 1 Completed 1 Te Tangent Line Problem Te idea of a tangent line first arises in geometry in te context of a circle. But before we jump into a discussion of
More informationFundamentals of Concept Learning
Aims 09s: COMP947 Macine Learning and Data Mining Fundamentals of Concept Learning Marc, 009 Acknowledgement: Material derived from slides for te book Macine Learning, Tom Mitcell, McGraw-Hill, 997 ttp://www-.cs.cmu.edu/~tom/mlbook.tml
More informationDerivatives of Exponentials
mat 0 more on derivatives: day 0 Derivatives of Eponentials Recall tat DEFINITION... An eponential function as te form f () =a, were te base is a real number a > 0. Te domain of an eponential function
More informationCSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering
Outline Learning from eamples CSCE 78/878 Lecture : Concept Learning and te General-to-Specific Ordering Stepen D. Scott (Adapted from Tom Mitcell s slides) General-to-specific ordering over ypoteses Version
More informationLecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.
Lecture XVII Abstract We introduce te concept of directional derivative of a scalar function and discuss its relation wit te gradient operator. Directional derivative and gradient Te directional derivative
More information7.1 Using Antiderivatives to find Area
7.1 Using Antiderivatives to find Area Introduction finding te area under te grap of a nonnegative, continuous function f In tis section a formula is obtained for finding te area of te region bounded between
More information5.1 We will begin this section with the definition of a rational expression. We
Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go
More informationLIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT
LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as
More informationDifferentiation in higher dimensions
Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends
More informationHigher Derivatives. Differentiable Functions
Calculus 1 Lia Vas Higer Derivatives. Differentiable Functions Te second derivative. Te derivative itself can be considered as a function. Te instantaneous rate of cange of tis function is te second derivative.
More informationDerivatives. By: OpenStaxCollege
By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator
More informationA = h w (1) Error Analysis Physics 141
Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.
More informationSECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY
(Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative
More informationINTRODUCTION AND MATHEMATICAL CONCEPTS
Capter 1 INTRODUCTION ND MTHEMTICL CONCEPTS PREVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips
More informationACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019
ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS MATH00030 SEMESTER 208/209 DR. ANTHONY BROWN 6. Differential Calculus 6.. Differentiation from First Principles. In tis capter, we will introduce
More information3.1 Extreme Values of a Function
.1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find
More informationTe comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab
To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk
More information2.3 Algebraic approach to limits
CHAPTER 2. LIMITS 32 2.3 Algebraic approac to its Now we start to learn ow to find its algebraically. Tis starts wit te simplest possible its, and ten builds tese up to more complicated examples. Fact.
More informationName: Answer Key No calculators. Show your work! 1. (21 points) All answers should either be,, a (finite) real number, or DNE ( does not exist ).
Mat - Final Exam August 3 rd, Name: Answer Key No calculators. Sow your work!. points) All answers sould eiter be,, a finite) real number, or DNE does not exist ). a) Use te grap of te function to evaluate
More information1 Solutions to the in class part
NAME: Solutions to te in class part. Te grap of a function f is given. Calculus wit Analytic Geometry I Exam, Friday, August 30, 0 SOLUTIONS (a) State te value of f(). (b) Estimate te value of f( ). (c)
More informationDifferential Calculus (The basics) Prepared by Mr. C. Hull
Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit
More informationHOMEWORK HELP 2 FOR MATH 151
HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,
More informationCubic Functions: Local Analysis
Cubic function cubing coefficient Capter 13 Cubic Functions: Local Analysis Input-Output Pairs, 378 Normalized Input-Output Rule, 380 Local I-O Rule Near, 382 Local Grap Near, 384 Types of Local Graps
More informationSin, Cos and All That
Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives
More information4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.
Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra
More informationIEOR 165 Lecture 10 Distribution Estimation
IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat
More informationINTRODUCTION AND MATHEMATICAL CONCEPTS
INTODUCTION ND MTHEMTICL CONCEPTS PEVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips of sine,
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More information3.4 Worksheet: Proof of the Chain Rule NAME
Mat 1170 3.4 Workseet: Proof of te Cain Rule NAME Te Cain Rule So far we are able to differentiate all types of functions. For example: polynomials, rational, root, and trigonometric functions. We are
More informationLIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION
LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y
More information1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point
MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x
More informationMaterial for Difference Quotient
Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient
More informationSection 15.6 Directional Derivatives and the Gradient Vector
Section 15.6 Directional Derivatives and te Gradient Vector Finding rates of cange in different directions Recall tat wen we first started considering derivatives of functions of more tan one variable,
More informationSection 2: The Derivative Definition of the Derivative
Capter 2 Te Derivative Applied Calculus 80 Section 2: Te Derivative Definition of te Derivative Suppose we drop a tomato from te top of a 00 foot building and time its fall. Time (sec) Heigt (ft) 0.0 00
More information2.3 Product and Quotient Rules
.3. PRODUCT AND QUOTIENT RULES 75.3 Product and Quotient Rules.3.1 Product rule Suppose tat f and g are two di erentiable functions. Ten ( g (x)) 0 = f 0 (x) g (x) + g 0 (x) See.3.5 on page 77 for a proof.
More informationSection 3: The Derivative Definition of the Derivative
Capter 2 Te Derivative Business Calculus 85 Section 3: Te Derivative Definition of te Derivative Returning to te tangent slope problem from te first section, let's look at te problem of finding te slope
More informationLecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines
Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to
More informationLab 6 Derivatives and Mutant Bacteria
Lab 6 Derivatives and Mutant Bacteria Date: September 27, 20 Assignment Due Date: October 4, 20 Goal: In tis lab you will furter explore te concept of a derivative using R. You will use your knowledge
More informationPolynomial Interpolation
Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc
More information1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist
Mat 1120 Calculus Test 2. October 18, 2001 Your name Te multiple coice problems count 4 points eac. In te multiple coice section, circle te correct coice (or coices). You must sow your work on te oter
More informationHow to Find the Derivative of a Function: Calculus 1
Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te
More informationSolve exponential equations in one variable using a variety of strategies. LEARN ABOUT the Math. What is the half-life of radon?
8.5 Solving Exponential Equations GOAL Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT te Mat All radioactive substances decrease in mass over time. Jamie works in
More informationSpike train entropy-rate estimation using hierarchical Dirichlet process priors
publised in: Advances in Neural Information Processing Systems 26 (23), 276 284. Spike train entropy-rate estimation using ierarcical Diriclet process priors Karin Knudson Department of Matematics kknudson@mat.utexas.edu
More informationRecall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if
Computational Aspects of its. Keeping te simple simple. Recall by elementary functions we mean :Polynomials (including linear and quadratic equations) Eponentials Logaritms Trig Functions Rational Functions
More informationChapter 5 FINITE DIFFERENCE METHOD (FDM)
MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential
More information1watt=1W=1kg m 2 /s 3
Appendix A Matematics Appendix A.1 Units To measure a pysical quantity, you need a standard. Eac pysical quantity as certain units. A unit is just a standard we use to compare, e.g. a ruler. In tis laboratory
More information1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)
Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of
More informationIntroduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =
Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.
More information. If lim. x 2 x 1. f(x+h) f(x)
Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value
More informationThe Priestley-Chao Estimator
Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are
More informationMath 312 Lecture Notes Modeling
Mat 3 Lecture Notes Modeling Warren Weckesser Department of Matematics Colgate University 5 7 January 006 Classifying Matematical Models An Example We consider te following scenario. During a storm, a
More informationCS522 - Partial Di erential Equations
CS5 - Partial Di erential Equations Tibor Jánosi April 5, 5 Numerical Di erentiation In principle, di erentiation is a simple operation. Indeed, given a function speci ed as a closed-form formula, its
More informationUniversity Mathematics 2
University Matematics 2 1 Differentiability In tis section, we discuss te differentiability of functions. Definition 1.1 Differentiable function). Let f) be a function. We say tat f is differentiable at
More informationPhysically Based Modeling: Principles and Practice Implicit Methods for Differential Equations
Pysically Based Modeling: Principles and Practice Implicit Metods for Differential Equations David Baraff Robotics Institute Carnegie Mellon University Please note: Tis document is 997 by David Baraff
More informationTime (hours) Morphine sulfate (mg)
Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15
More informationMAT 145. Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points
MAT 15 Test #2 Name Solution Guide Type of Calculator Used TI-89 Titanium 100 points Score 100 possible points Use te grap of a function sown ere as you respond to questions 1 to 8. 1. lim f (x) 0 2. lim
More informationPre-Calculus Review Preemptive Strike
Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly
More informationLecture 10: Carnot theorem
ecture 0: Carnot teorem Feb 7, 005 Equivalence of Kelvin and Clausius formulations ast time we learned tat te Second aw can be formulated in two ways. e Kelvin formulation: No process is possible wose
More informationMAT244 - Ordinary Di erential Equations - Summer 2016 Assignment 2 Due: July 20, 2016
MAT244 - Ordinary Di erential Equations - Summer 206 Assignment 2 Due: July 20, 206 Full Name: Student #: Last First Indicate wic Tutorial Section you attend by filling in te appropriate circle: Tut 0
More informationAverage Rate of Change
Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope
More informationExponentials and Logarithms Review Part 2: Exponentials
Eponentials and Logaritms Review Part : Eponentials Notice te difference etween te functions: g( ) and f ( ) In te function g( ), te variale is te ase and te eponent is a constant. Tis is called a power
More informationNew families of estimators and test statistics in log-linear models
Journal of Multivariate Analysis 99 008 1590 1609 www.elsevier.com/locate/jmva ew families of estimators and test statistics in log-linear models irian Martín a,, Leandro Pardo b a Department of Statistics
More information1 Limits and Continuity
1 Limits and Continuity 1.0 Tangent Lines, Velocities, Growt In tion 0.2, we estimated te slope of a line tangent to te grap of a function at a point. At te end of tion 0.3, we constructed a new function
More informationlecture 26: Richardson extrapolation
43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)
More informationMath 212-Lecture 9. For a single-variable function z = f(x), the derivative is f (x) = lim h 0
3.4: Partial Derivatives Definition Mat 22-Lecture 9 For a single-variable function z = f(x), te derivative is f (x) = lim 0 f(x+) f(x). For a function z = f(x, y) of two variables, to define te derivatives,
More informationAMS 147 Computational Methods and Applications Lecture 09 Copyright by Hongyun Wang, UCSC. Exact value. Effect of round-off error.
Lecture 09 Copyrigt by Hongyun Wang, UCSC Recap: Te total error in numerical differentiation fl( f ( x + fl( f ( x E T ( = f ( x Numerical result from a computer Exact value = e + f x+ Discretization error
More informationSECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES
(Section.0: Difference Quotients).0. SECTION.0: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES Define average rate of cange (and average velocity) algebraically and grapically. Be able to identify, construct,
More informationA MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES
A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties
More informationLecture 21. Numerical differentiation. f ( x+h) f ( x) h h
Lecture Numerical differentiation Introduction We can analytically calculate te derivative of any elementary function, so tere migt seem to be no motivation for calculating derivatives numerically. However
More information3.4 Algebraic Limits. Ex 1) lim. Ex 2)
Calculus Maimus.4 Algebraic Limits At tis point, you sould be very comfortable finding its bot grapically and numerically wit te elp of your graping calculator. Now it s time to practice finding its witout
More informationMathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative
Matematics 5 Workseet 11 Geometry, Tangency, and te Derivative Problem 1. Find te equation of a line wit slope m tat intersects te point (3, 9). Solution. Te equation for a line passing troug a point (x
More informationExercises for numerical differentiation. Øyvind Ryan
Exercises for numerical differentiation Øyvind Ryan February 25, 2013 1. Mark eac of te following statements as true or false. a. Wen we use te approximation f (a) (f (a +) f (a))/ on a computer, we can
More informationNumerical Differentiation
Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function
More informationPolynomials 3: Powers of x 0 + h
near small binomial Capter 17 Polynomials 3: Powers of + Wile it is easy to compute wit powers of a counting-numerator, it is a lot more difficult to compute wit powers of a decimal-numerator. EXAMPLE
More informationThe derivative function
Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative
More informationProbabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm
Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain
More informationThe Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA
Te Krewe of Caesar Problem David Gurney Souteastern Louisiana University SLU 10541, 500 Western Avenue Hammond, LA 7040 June 19, 00 Krewe of Caesar 1 ABSTRACT Tis paper provides an alternative to te usual
More informationChapters 19 & 20 Heat and the First Law of Thermodynamics
Capters 19 & 20 Heat and te First Law of Termodynamics Te Zerot Law of Termodynamics Te First Law of Termodynamics Termal Processes Te Second Law of Termodynamics Heat Engines and te Carnot Cycle Refrigerators,
More information