Mathematics for Intelligent Systems Lecture 5 Homework Solutions

Size: px
Start display at page:

Download "Mathematics for Intelligent Systems Lecture 5 Homework Solutions"

Transcription

1 Mathematics for Intelligent Systems Lecture 5 Homework Solutions Advanced Calculus I: Derivatives and local geometry) Nathan Ratliff Nov 25, 204 Problem : Gradient and Hessian Calculations We ve seen that the gradient and Hessian of hx) x are hx) 2 hx) x x x ) x We can plug these in when they become necessary below.. Calculating the derivatives For generic differentiable h, we have log e hx) + e hx)) x k e h + e h I x x T ). 2) e h h + e h h )) x k x k eh e h h e h + e }{{ h. 3) x } k αx) This expression shows that the kth partial of ψx) is just the kth partial of h weighted by a weight αx). This weighting function has the nice property that as hx) gets large, it approaches the value, and as hx) approaches zero, it vanishes. We ll use those properties later on when analyzing limits. Stacking these partial derivatives, we get the following expression for the full gradient: ψx) αx) hx). 4)

2 Now for the Hessian. We simply take the second partial of the expression in Equation 3: 2 ψx) αx) h ) x k x k 2 h αx) + α h. 5) x k x k Already we know that the first of those terms is going to be the hessian of h weighted by αx). To calculate the second term, we need to compute the first partial of αx) α e h e h ) e h + e h e h + e h) ) e h h h h + e e h e h) ) e h h h h e e h + e h ) 2 7) αx) 2 ) h. 8) Plugging this expression back into Equation 5, we get 2 2 h ψx) αx) + x k x k αx) 2 ) h 6) h x k. 9) That first term is just the scaled Hessian of h as we mentioned above, and the second term is a scaled version of the outer product h s gradient vector. So, the full Hessian matrix is ) 2 ψx) αx) 2 h + αx) 2 h h T. 0) Note that correctly tracking the indices k and l throughout the computation is just about as easy as using a primed notation since these partial derivatives, themselves, represent just one-dimensional derivatives of the restricted function created by fixing the values of all variables accept the one in question. Often people make the notation even simpler by using xk x k. Importantly, correctly tracking these indices gives us confidence that the final solution is properly arranged in matrix form. In this case, the Hessian is always symmetric, so dimensionality analysis can sometimes help us guess the right answer, but when we make things a little more difficult by finding the Jacobian of mappings of the form φ : R n R n, for instance, or move even beyond that to tensors of higher order derivatives, things get more hairy. It s very very important, as the problems become harder that you meticulously track those indices. If you look up the Einstein summation convention for higher order tensors, you ll notice that for hard problems this technique of manipulating the partials is really all you can do.) So I encourage you to be careful with your indices. It ll save you a lot of anguish as derivatives become more complex. 2

3 .2 Taking the limits Now for the limits. From the intuition we gained from graphing the function, we might expect the gradient and Hessian of ψ away from zero to increasingly just look like the gradient and Hessian of hx) x. And then as we approach zero, the point of the cone formed by the graph of hx) becomes increasingly rounded; we d expect the gradient at least to be 0 at that point, and we might guess, if all goes right, that the Hessian is just the identity I since it s uniform in all directions. We can even look at the second derivative in one-dimension to better convince ourselves that the identity makes sense, but to be sure about all of these, beyond a modicum of doubt, we should perform the actual calculations. First we note that αx) approaches as x, and α0) 0. The gradient of ψ is the same as the gradient of h, just scaled by α, so as x gets large, since α approaches, the gradient simply approaches the gradient of h as we thought. We can make a similar argument for the first term of the Hessian in Equation 0. α approaches as x gets large, so that term approaches the Hessian of h. For the second term, we note that the gradients h x all have bounded norm specifically, the norm is always ). That means each element of the vector is bounded in absolute value, too. Since α as x gets large, the weight on this term α 2 approaches 0 as x gets large, so we can bound each element of the outer product h h T above and below by a value that approaches 0 as x gets large. That term, then, must go to zero in this limit. In all, that tells us that 2 ψ looks more and more like 2 h as x gets bigger and bigger in norm, matching our intuition. In the other direction, as x 0, the weight αx) approaches zero. So for the gradient ψ α h, since h always has norm, we can make the same bounding argument for its elements as we made for the second term of the Hessian to say that all components of this gradient must approach zero. The gradient, therefore, along along all paths converging to x 0, limits to 0. Note that from the viewpoint of the constituent functions that went into creating ψ, since h is undefined at x 0, it seems like the gradient of ψ is also undefined there as well. However, this function isn t really something constructed from its constituent components it s a mapping from all points x to the corresponding values ψx) R. And from that perspective, we know nothing about what went into its construction. If we return back to the definition of partial derivatives, we see that the derivative is actually the limit of the difference ratios as the perturbation of a variable x k gets smaller and smaller. That limit is perfectly well defined for this function at x 0, so we can conclude that the function is differentiable there. And the full gradient there, as we found in the above analysis is 0. Thus, it s absolutely correct to return the value 0 in an implementation of the gradient evaluation at x 0. This isn t just a trick, it s the right calculation for the function ψ. The limit of the Hessian is a bit trickier. Naïvely, it s tempting to say that, looking back at Equation 0, since α 0 as x 0, the first term vanishes and the weight on the second term approaches, leaving us with just the unweighted 3

4 second term h h T. In our case, that would suggest that the Hessian limits to x x T, meaning that it entirely depends on the path we take while approaching x 0. And we d conclude that there is no definitive limit, and this function behaves poorly around the origin. But we would be wrong! The reason is that while α 0, we need to remember that 2 h x I x x T ) explodes to infinity. Thus, we have a situation where two separate factors are approaching zero and infinity simultaneously, so this is an indeterminate form. We need to use L Hôspital! Denoting h x, we can collect the Hessian terms as follows 2 ψ α ) I x x T + α 2) x x T h α h I + α 2 α h ) x x T. ) It s clear what all elements of this expression that pertain to alpha do, except α h. That term, as x 0 approaches 0 0, so we need to use L Hôspital to understand its limit. Expanding that term, we get α h eh e h h e h + e h ). 2) Again this fraction is an indeterminate form, so we can take the derivative of the numerator and denominator separately and examine those ratios: e h + e h e h e h + e h ) + h e h e h ) + e h ) e h + e h + hα ). 3) Thus, this weird term limits to. Note that we could also apply L Hôspital directly to the fraction α h. The calculation is slightly more complicated, but the new ratio becomes α2, which also limits to. Returning back to the Hessian in Equation, we see now that the Hessian limits to α 2 ψ0) lim h 0 h I + α 2 α ) ) x x T 4) h I ) x x T I. So we conclude that the Hessian as we intuitively guessed above), is actually I at x 0..3 A note on checking your work with a computer. There are a number of tools these days for checking that you ve calculated and implemented these formulas right. I ll mention a few of them here since they came up during our discussion. By far, the easiest and most reliable is numeric differentiation i.e. finite-differencing). Section 5.3 of this week s lecture 4

5 notes Multivariate Calculus I: Derivatives and local geometry) talk about that method in length. The great thing about numeric differentiation is that it s largely agnostic to simplification of the expression. Independent of how convoluted you make your formula, as long as it s correct in the implementation, it ll spit out the right numbers, and numeric differentiation will report the same approximate) values. You don t have to stare out a different expression and try to verify that it s the same. On the other hand Mathematica and by extension, Matlab) has very powerful symbolic differentiation tools that we can use. Unfortunately, those tools may do a poor job of simplification which is a hard and largely ambiguous problem). They take symbolic representations of the function and report back their first and second derivatives, but in the end you may get grotesquely complicated strings of terms that you ll have to somehow relate back to your own expression. You can always evaluate those expressions at various points to gain confidence, but it s often easier to implement a numeric differentiation module and use it everywhere. One place where you really want to use it is in unit tests. You should unit test all of your derivative implementations using numeric differentiation. Unit tests will verify their correctness, and ensure that changes to the code never break anything. Once your function value implementation diverges from the gradient and Hessian computations, all bets are off. Those are hard bugs to reason through. The last tool we ll discuss is automatic differentiation. Automatic differentiation is different from both symbolic and numeric differentiation; you can think of it as lying somewhat between the two. The idea s been around for a very long time at least dating back to the 80 s), but, outside the backpropagation algorithm for Neural Networks in machine learning which is an independently developed instance of automatic differentiation used to calculate the gradient of the Neural Network), we re only now seeing a strong recurrence of its use within the machine learning and robotics communities. The basic idea is that the chain rule, product rule, and all other rules for differentiation are very regular and algorithmic. If you implement a function evaluation using a collection of well-known primitives such as exp), sin), cos), log)), while we re traversing the execution tree to evaluate the function, we can keep track of just a little bit more information along the way to additionally simultaneously compute the gradient, and even the Hessian. These tools are very powerful, and in some cases, you can use them to replace derivative calculations. Implement a function from primitives that support automatic differentiation, and you can call a pre-built method to calculate the value and gradient automatically. It ll calculate both the function value and the corresponding gradient at a given point with no effort to you at all. Unfortunately, the problem of finding the optimal computation tree i.e. the most efficient) for derivative calculations is intractable, and you can usually find a computationally more efficient way to directly implement the gradient and especially) the Hessian that automatic differentiators probably wouldn t find. But automatic differentiation is significantly more efficient than symbolic differentiation since it s just computing the value of the derivatives at a particular point, 5

6 and not the entire symbolic expression giving the gradient everywhere. And often in modern applications the networks of functions are complicated enough that it may make sense to trade off computational efficiency for speed of implementation and correctness. So use it if it makes sense you can really only get a feel for the trade-offs by playing around with it. Otherwise, I d recommend numeric differentiation over symbolic differentiation since it s a more general and reliable tool for unit testing your code. 2 Problem 2: Gradient and Hessian Calculations for Machine Learning 2. Sigmoid derivatives We present the derivatives as a lemma, and include the derivation below. Lemma. Sigmoid analysis. Let σx) +e x. Its first derivative is and its second derivative is d σ σ σ), 5) dx d 2 σ 2σ)σ σ). 6) dx2 Proof. We derive the result by straightforward calculation. d dx σ + e x ) 2 e x e x ) + e x + e x σ σ). For the second derivative we have d 2 dx 2 σ d e x ) dx + e x ) 2 + e x ) 2 e x + e x 2 + e x )e x + e x ) 4 e x + e x ) e x e x + e x + e x + e x σ σ) + 2σ σ) 2 [ ] σ) σ 2σ 2 2σ) σ σ). 6

7 2.2 Logistic regression Generally, for any y {, }, we can say ) + e ywt x py x, w) + e ywt x + e ywt x + x e ywt + e ywt x ) x e ywt + e ywt x e ywt x e ywt x + e ỹwt x p y x, w). where ỹ y So, as the problem explains, we can conclude that replacing y with y is equivalent to finding the probability of not y. As for the derivatives, these are actually slightly easier than the raw sigmoids above, because of the log. Indeed, this also shows that perhaps there s an easier way to find the above derivatives since d σ dx log σ σ, which implies σ σ d dx log σ. But we ll leave that for you guys to play with. The first partial of the logistic regression terms is log + e yiw T x i ) ) e yiw T x i y w k + e yiwt x i x k) i i And the second partials are 2 log + e yiw T x i ) w l w k w l py i x i, w))y i x k) i. + e yiwt x i e yiwt x i y i x l) i ) + e y iw T x 2 y i x k) i i p i p i ) x l) i x k) i, ) y i x k) i where p i py i x i, w). Here we ve also used the property that y i ) 2 independent of the value of y i since y i {, }. In vector and matrix form, the full gradient and Hessian become and N N log + e yiwt x i ) p i )y i x i. 7) 2 i N i log + e yiwt x i ) i N p i p i ) x i x T i. 8) i 7

8 3 Problem 3A): The geometry of revolute manipulators Taking the time derivative of the differentiable map φ : R n R n gives ẋ J φ q. Thus, just from calculus, we see that this Jacobian matrix J φ is a linear map that transforms velocities in the space of the joints to velocities at the end-effector. Given a velocity q, we simply multiply by J φ and we get back a velocity vector ẋ sitting at the end-effector that tells us the corresponding direction of motion of that end-effector point. In the other direction, now assume J φ is full rank and we re given a desired velocity ẋ d. What types of movements of the joints make the end-effector move in that direction? Since J φ is full rank, we can write its SVD as [ V T J φ U[S 0] // V T ], 9) as we ve seen before. In this case, though, U R 3 3 is a full square matrix since the matrix is full rank. The general solution to this linear system ẋ d J φ q is then q V // S U T ẋ d + V β, 20) for any vector of coefficients β. Any of these motions will do: a small motion in the direction of q will move the end-effector in the desired direction, simply because we re exploiting the linear relationship between velocities in the joint space and velocities at the end-effector. Note that this solution matrix V // S U T is often called the Moore-Penrose pseudoinverse, and is written J φ J T φ Jφ Jφ T ). 2) One may multiply the expression out in terms of the SVD above in Equation 9 to get insight into why it works. It calculates the matrix V // S U T without actually having to calculate the SVD. Additionally, the matrix I J φ J φ projects any arbitrary vector v onto the null space of J φ, so that I J φ J φ)v V β for some β. It s often useful in practice to choose some direction in joint space q d that we d like to move in for instance, it might be a direction pointing back toward some default configuration), and calculate q J φẋd + I J φ J φ) q d 22) to resolve the null space redundancy of the problem. These linear algebra results can easily be turned into an algorithm for controlling the robot s end-effector to a point by simply by choosing a desired velocity for the end-effector at each iteration that takes the robot in the right direction and moving the joints small step in the resulting joint space direction that implements that desired end-effector movement. How you specifically implement that procedure is flexible. But note 8

9 that a potential function of the form ψx d x), where ψ is the softmax function as defined in problem, creates a field of vectors negative gradient vectors) that nicely converge on x d. If you re familiar with differential equations, you ll recognize this field of negative potential gradient vectors as a differential equation whose integral curves i.e. the curves you get from always instantaneously following the directions specified by the vector field) converge on x d. So the problem of controlling the robot to x d simply becomes a problem of trying to numerically follow one of these integral curves by choosing appropriate joint velocity vectors q. There are a number of numeric differentiation tools readily available to choose from, such as Euler integration a particularly simple choice) or Runge Kutta a more numerically robust algorithm). And finally, when the Jacobian is reduced rank, we can t actually fully solve the problem. The best we can do is solve the least squares problem to find the best fit motion within the achievable set of end-effector velocities. 4 Problem 4: Inverse function theorem Suppose φ : R n R n. At the domain point x 0 we have a corresponding point in the co-domain y 0 φx 0 ). The first-order Taylor approximation of this nonlinear map around the point x 0 gives φx) y y 0 + J φ x x 0 ), 23) where J φ is the Jacobian at x 0. If J φ is invertible, we can think of this linear map as a bijective connection between the two spaces. Any notion of directionality mapping from x to y vs the reverse mapping from y to x) is simply an artifact of notation. This bijective linear map creates an association between points in the domain and points in the co-domain, and we can easily calculate an expression representing how points move from y to x by simply solving this linear system for x in terms of y. Doing so gives x x 0 + J φ y y 0). 24) This new expression looks like a Taylor expansion of a mapping from y to x, so it suggests that perhaps it is the first order Taylor expansion of the inverse map. If so, then it must be that J φ is the Jacobian of that inverse map. These are just heuristic arguments, but they demonstrate that the first order Taylor approximation can demonstrate structure of the nonlinear map that isn t evident from the original expression. That first order Taylor approximation connects the nonlinear function back to linear algebra and allows us to analyze the map locally using our linear algebra tools. In the above case, we see that it tells us something about the inverse of the map, but more generally, even when we re moving between R n and a space of different dimensionality R m, we can still use the first order Taylor approximation to understand the local structure of the map. When m n the best we can do is use pseudoinverses, but that s still quite handy when we have computational tools readily available while implementing complex systems. 9

Linear Algebra and Robot Modeling

Linear Algebra and Robot Modeling Linear Algebra and Robot Modeling Nathan Ratliff Abstract Linear algebra is fundamental to robot modeling, control, and optimization. This document reviews some of the basic kinematic equations and uses

More information

Lecture 10: Powers of Matrices, Difference Equations

Lecture 10: Powers of Matrices, Difference Equations Lecture 10: Powers of Matrices, Difference Equations Difference Equations A difference equation, also sometimes called a recurrence equation is an equation that defines a sequence recursively, i.e. each

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Ordinary Least Squares Linear Regression

Ordinary Least Squares Linear Regression Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics

More information

Lecture 6: Backpropagation

Lecture 6: Backpropagation Lecture 6: Backpropagation Roger Grosse 1 Introduction So far, we ve seen how to train shallow models, where the predictions are computed as a linear function of the inputs. We ve also observed that deeper

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Math Lecture 4 Limit Laws

Math Lecture 4 Limit Laws Math 1060 Lecture 4 Limit Laws Outline Summary of last lecture Limit laws Motivation Limits of constants and the identity function Limits of sums and differences Limits of products Limits of polynomials

More information

Lecture 4: Constructing the Integers, Rationals and Reals

Lecture 4: Constructing the Integers, Rationals and Reals Math/CS 20: Intro. to Math Professor: Padraic Bartlett Lecture 4: Constructing the Integers, Rationals and Reals Week 5 UCSB 204 The Integers Normally, using the natural numbers, you can easily define

More information

Automatic Differentiation and Neural Networks

Automatic Differentiation and Neural Networks Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield

More information

Difference Equations

Difference Equations 6.08, Spring Semester, 007 Lecture 5 Notes MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.08 Introduction to EECS I Spring Semester, 007 Lecture 5 Notes

More information

Generating Function Notes , Fall 2005, Prof. Peter Shor

Generating Function Notes , Fall 2005, Prof. Peter Shor Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

ARE211, Fall 2004 CONTENTS. 4. Univariate and Multivariate Differentiation (cont) Four graphical examples Taylor s Theorem 9

ARE211, Fall 2004 CONTENTS. 4. Univariate and Multivariate Differentiation (cont) Four graphical examples Taylor s Theorem 9 ARE211, Fall 24 LECTURE #18: TUE, NOV 9, 24 PRINT DATE: DECEMBER 17, 24 (CALCULUS3) CONTENTS 4. Univariate and Multivariate Differentiation (cont) 1 4.4. Multivariate Calculus: functions from R n to R

More information

Coordinate systems and vectors in three spatial dimensions

Coordinate systems and vectors in three spatial dimensions PHYS2796 Introduction to Modern Physics (Spring 2015) Notes on Mathematics Prerequisites Jim Napolitano, Department of Physics, Temple University January 7, 2015 This is a brief summary of material on

More information

Algebra Exam. Solutions and Grading Guide

Algebra Exam. Solutions and Grading Guide Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full

More information

Math (P)Review Part II:

Math (P)Review Part II: Math (P)Review Part II: Vector Calculus Computer Graphics Assignment 0.5 (Out today!) Same story as last homework; second part on vector calculus. Slightly fewer questions Last Time: Linear Algebra Touched

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

What is proof? Lesson 1

What is proof? Lesson 1 What is proof? Lesson The topic for this Math Explorer Club is mathematical proof. In this post we will go over what was covered in the first session. The word proof is a normal English word that you might

More information

3 Algebraic Methods. we can differentiate both sides implicitly to obtain a differential equation involving x and y:

3 Algebraic Methods. we can differentiate both sides implicitly to obtain a differential equation involving x and y: 3 Algebraic Methods b The first appearance of the equation E Mc 2 in Einstein s handwritten notes. So far, the only general class of differential equations that we know how to solve are directly integrable

More information

Solve Wave Equation from Scratch [2013 HSSP]

Solve Wave Equation from Scratch [2013 HSSP] 1 Solve Wave Equation from Scratch [2013 HSSP] Yuqi Zhu MIT Department of Physics, 77 Massachusetts Ave., Cambridge, MA 02139 (Dated: August 18, 2013) I. COURSE INFO Topics Date 07/07 Comple number, Cauchy-Riemann

More information

Please bring the task to your first physics lesson and hand it to the teacher.

Please bring the task to your first physics lesson and hand it to the teacher. Pre-enrolment task for 2014 entry Physics Why do I need to complete a pre-enrolment task? This bridging pack serves a number of purposes. It gives you practice in some of the important skills you will

More information

CS 124 Math Review Section January 29, 2018

CS 124 Math Review Section January 29, 2018 CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to

More information

Sequences and infinite series

Sequences and infinite series Sequences and infinite series D. DeTurck University of Pennsylvania March 29, 208 D. DeTurck Math 04 002 208A: Sequence and series / 54 Sequences The lists of numbers you generate using a numerical method

More information

The General Linear Model. How we re approaching the GLM. What you ll get out of this 8/11/16

The General Linear Model. How we re approaching the GLM. What you ll get out of this 8/11/16 8// The General Linear Model Monday, Lecture Jeanette Mumford University of Wisconsin - Madison How we re approaching the GLM Regression for behavioral data Without using matrices Understand least squares

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released

More information

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k. Problem du jour Week 6: Monday, Mar 7 Show that for any initial guess x 0 > 0, Newton iteration on f(x) = x 2 a produces a decreasing sequence x 1 x 2... x n a. What is the rate of convergence if a = 0?

More information

Notes 3.2: Properties of Limits

Notes 3.2: Properties of Limits Calculus Maimus Notes 3.: Properties of Limits 3. Properties of Limits When working with its, you should become adroit and adept at using its of generic functions to find new its of new functions created

More information

5.2 Infinite Series Brian E. Veitch

5.2 Infinite Series Brian E. Veitch 5. Infinite Series Since many quantities show up that cannot be computed exactly, we need some way of representing it (or approximating it). One way is to sum an infinite series. Recall that a n is the

More information

MITOCW watch?v=pqkyqu11eta

MITOCW watch?v=pqkyqu11eta MITOCW watch?v=pqkyqu11eta The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Getting Started with Communications Engineering

Getting Started with Communications Engineering 1 Linear algebra is the algebra of linear equations: the term linear being used in the same sense as in linear functions, such as: which is the equation of a straight line. y ax c (0.1) Of course, if we

More information

The Inductive Proof Template

The Inductive Proof Template CS103 Handout 24 Winter 2016 February 5, 2016 Guide to Inductive Proofs Induction gives a new way to prove results about natural numbers and discrete structures like games, puzzles, and graphs. All of

More information

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2 Final Review Sheet The final will cover Sections Chapters 1,2,3 and 4, as well as sections 5.1-5.4, 6.1-6.2 and 7.1-7.3 from chapters 5,6 and 7. This is essentially all material covered this term. Watch

More information

1 Review of the dot product

1 Review of the dot product Any typographical or other corrections about these notes are welcome. Review of the dot product The dot product on R n is an operation that takes two vectors and returns a number. It is defined by n u

More information

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017 1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function

More information

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this. Preface Here are my online notes for my Calculus II course that I teach here at Lamar University. Despite the fact that these are my class notes they should be accessible to anyone wanting to learn Calculus

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Main topics for the First Midterm Exam

Main topics for the First Midterm Exam Main topics for the First Midterm Exam The final will cover Sections.-.0, 2.-2.5, and 4.. This is roughly the material from first three homeworks and three quizzes, in addition to the lecture on Monday,

More information

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Lecture 02 Groups: Subgroups and homomorphism (Refer Slide Time: 00:13) We looked

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Sometimes the domains X and Z will be the same, so this might be written:

Sometimes the domains X and Z will be the same, so this might be written: II. MULTIVARIATE CALCULUS The first lecture covered functions where a single input goes in, and a single output comes out. Most economic applications aren t so simple. In most cases, a number of variables

More information

Lecture 2: Learning with neural networks

Lecture 2: Learning with neural networks Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for

More information

19. TAYLOR SERIES AND TECHNIQUES

19. TAYLOR SERIES AND TECHNIQUES 19. TAYLOR SERIES AND TECHNIQUES Taylor polynomials can be generated for a given function through a certain linear combination of its derivatives. The idea is that we can approximate a function by a polynomial,

More information

Lecture 11: Extrema. Nathan Pflueger. 2 October 2013

Lecture 11: Extrema. Nathan Pflueger. 2 October 2013 Lecture 11: Extrema Nathan Pflueger 2 October 201 1 Introduction In this lecture we begin to consider the notion of extrema of functions on chosen intervals. This discussion will continue in the lectures

More information

Descriptive Statistics (And a little bit on rounding and significant digits)

Descriptive Statistics (And a little bit on rounding and significant digits) Descriptive Statistics (And a little bit on rounding and significant digits) Now that we know what our data look like, we d like to be able to describe it numerically. In other words, how can we represent

More information

CHAPTER 7: TECHNIQUES OF INTEGRATION

CHAPTER 7: TECHNIQUES OF INTEGRATION CHAPTER 7: TECHNIQUES OF INTEGRATION DAVID GLICKENSTEIN. Introduction This semester we will be looking deep into the recesses of calculus. Some of the main topics will be: Integration: we will learn how

More information

Chapter 1 Review of Equations and Inequalities

Chapter 1 Review of Equations and Inequalities Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve

More information

L Hopital s Rule. We will use our knowledge of derivatives in order to evaluate limits that produce indeterminate forms.

L Hopital s Rule. We will use our knowledge of derivatives in order to evaluate limits that produce indeterminate forms. L Hopital s Rule We will use our knowledge of derivatives in order to evaluate its that produce indeterminate forms. Main Idea x c f x g x If, when taking the it as x c, you get an INDETERMINATE FORM..

More information

Section 1.x: The Variety of Asymptotic Experiences

Section 1.x: The Variety of Asymptotic Experiences calculus sin frontera Section.x: The Variety of Asymptotic Experiences We talked in class about the function y = /x when x is large. Whether you do it with a table x-value y = /x 0 0. 00.0 000.00 or with

More information

Finding Limits Graphically and Numerically

Finding Limits Graphically and Numerically Finding Limits Graphically and Numerically 1. Welcome to finding limits graphically and numerically. My name is Tuesday Johnson and I m a lecturer at the University of Texas El Paso. 2. With each lecture

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

DIFFERENTIAL EQUATIONS

DIFFERENTIAL EQUATIONS DIFFERENTIAL EQUATIONS Basic Concepts Paul Dawkins Table of Contents Preface... Basic Concepts... 1 Introduction... 1 Definitions... Direction Fields... 8 Final Thoughts...19 007 Paul Dawkins i http://tutorial.math.lamar.edu/terms.aspx

More information

Quadratic Equations Part I

Quadratic Equations Part I Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing

More information

Answers for Calculus Review (Extrema and Concavity)

Answers for Calculus Review (Extrema and Concavity) Answers for Calculus Review 4.1-4.4 (Extrema and Concavity) 1. A critical number is a value of the independent variable (a/k/a x) in the domain of the function at which the derivative is zero or undefined.

More information

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Algebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This document was written and copyrighted by Paul Dawkins. Use of this document and its online version is governed by the Terms and Conditions of Use located at. The online version of this document is

More information

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9 Problem du jour Week 3: Wednesday, Jan 9 1. As a function of matrix dimension, what is the asymptotic complexity of computing a determinant using the Laplace expansion (cofactor expansion) that you probably

More information

2. If the values for f(x) can be made as close as we like to L by choosing arbitrarily large. lim

2. If the values for f(x) can be made as close as we like to L by choosing arbitrarily large. lim Limits at Infinity and Horizontal Asymptotes As we prepare to practice graphing functions, we should consider one last piece of information about a function that will be helpful in drawing its graph the

More information

An analogy from Calculus: limits

An analogy from Calculus: limits COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm

More information

The Derivative of a Function

The Derivative of a Function The Derivative of a Function James K Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University March 1, 2017 Outline A Basic Evolutionary Model The Next Generation

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information

Continuity and One-Sided Limits

Continuity and One-Sided Limits Continuity and One-Sided Limits 1. Welcome to continuity and one-sided limits. My name is Tuesday Johnson and I m a lecturer at the University of Texas El Paso. 2. With each lecture I present, I will start

More information

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models

Chapter 2. Mathematical Reasoning. 2.1 Mathematical Models Contents Mathematical Reasoning 3.1 Mathematical Models........................... 3. Mathematical Proof............................ 4..1 Structure of Proofs........................ 4.. Direct Method..........................

More information

Partial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions.

Partial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions. Partial Fractions June 7, 04 In this section, we will learn to integrate another class of functions: the rational functions. Definition. A rational function is a fraction of two polynomials. For example,

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Algebra & Trig Review

Algebra & Trig Review Algebra & Trig Review 1 Algebra & Trig Review This review was originally written for my Calculus I class, but it should be accessible to anyone needing a review in some basic algebra and trig topics. The

More information

MITOCW ocw nov2005-pt1-220k_512kb.mp4

MITOCW ocw nov2005-pt1-220k_512kb.mp4 MITOCW ocw-3.60-03nov2005-pt1-220k_512kb.mp4 PROFESSOR: All right, I would like to then get back to a discussion of some of the basic relations that we have been discussing. We didn't get terribly far,

More information

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On Basics of Proofs The Putnam is a proof based exam and will expect you to write proofs in your solutions Similarly, Math 96 will also require you to write proofs in your homework solutions If you ve seen

More information

AP Calculus AB Summer Assignment

AP Calculus AB Summer Assignment AP Calculus AB Summer Assignment Name: When you come back to school, it is my epectation that you will have this packet completed. You will be way behind at the beginning of the year if you haven t attempted

More information

Solving with Absolute Value

Solving with Absolute Value Solving with Absolute Value Who knew two little lines could cause so much trouble? Ask someone to solve the equation 3x 2 = 7 and they ll say No problem! Add just two little lines, and ask them to solve

More information

MATH 310, REVIEW SHEET 2

MATH 310, REVIEW SHEET 2 MATH 310, REVIEW SHEET 2 These notes are a very short summary of the key topics in the book (and follow the book pretty closely). You should be familiar with everything on here, but it s not comprehensive,

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

One-to-one functions and onto functions

One-to-one functions and onto functions MA 3362 Lecture 7 - One-to-one and Onto Wednesday, October 22, 2008. Objectives: Formalize definitions of one-to-one and onto One-to-one functions and onto functions At the level of set theory, there are

More information

Bridging the gap between GCSE and A level mathematics

Bridging the gap between GCSE and A level mathematics Bridging the gap between GCSE and A level mathematics This booklet is designed to help you revise important algebra topics from GCSE and make the transition from GCSE to A level a smooth one. You are advised

More information

The Jacobian. Jesse van den Kieboom

The Jacobian. Jesse van den Kieboom The Jacobian Jesse van den Kieboom jesse.vandenkieboom@epfl.ch 1 Introduction 1 1 Introduction The Jacobian is an important concept in robotics. Although the general concept of the Jacobian in robotics

More information

Experiment 2 Electric Field Mapping

Experiment 2 Electric Field Mapping Experiment 2 Electric Field Mapping I hear and I forget. I see and I remember. I do and I understand Anonymous OBJECTIVE To visualize some electrostatic potentials and fields. THEORY Our goal is to explore

More information

Math 115 Spring 11 Written Homework 10 Solutions

Math 115 Spring 11 Written Homework 10 Solutions Math 5 Spring Written Homework 0 Solutions. For following its, state what indeterminate form the its are in and evaluate the its. (a) 3x 4x 4 x x 8 Solution: This is in indeterminate form 0. Algebraically,

More information

MATH 308 COURSE SUMMARY

MATH 308 COURSE SUMMARY MATH 308 COURSE SUMMARY Approximately a third of the exam cover the material from the first two midterms, that is, chapter 6 and the first six sections of chapter 7. The rest of the exam will cover the

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it

More information

Computing Neural Network Gradients

Computing Neural Network Gradients Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary

More information

CONSTRUCTION OF sequence of rational approximations to sets of rational approximating sequences, all with the same tail behaviour Definition 1.

CONSTRUCTION OF sequence of rational approximations to sets of rational approximating sequences, all with the same tail behaviour Definition 1. CONSTRUCTION OF R 1. MOTIVATION We are used to thinking of real numbers as successive approximations. For example, we write π = 3.14159... to mean that π is a real number which, accurate to 5 decimal places,

More information

Scalar Fields and Gauge

Scalar Fields and Gauge Physics 411 Lecture 23 Scalar Fields and Gauge Lecture 23 Physics 411 Classical Mechanics II October 26th, 2007 We will discuss the use of multiple fields to expand our notion of symmetries and conservation.

More information

ter. on Can we get a still better result? Yes, by making the rectangles still smaller. As we make the rectangles smaller and smaller, the

ter. on Can we get a still better result? Yes, by making the rectangles still smaller. As we make the rectangles smaller and smaller, the Area and Tangent Problem Calculus is motivated by two main problems. The first is the area problem. It is a well known result that the area of a rectangle with length l and width w is given by A = wl.

More information

MITOCW ocw f99-lec23_300k

MITOCW ocw f99-lec23_300k MITOCW ocw-18.06-f99-lec23_300k -- and lift-off on differential equations. So, this section is about how to solve a system of first order, first derivative, constant coefficient linear equations. And if

More information

How to Use Calculus Like a Physicist

How to Use Calculus Like a Physicist How to Use Calculus Like a Physicist Physics A300 Fall 2004 The purpose of these notes is to make contact between the abstract descriptions you may have seen in your calculus classes and the applications

More information

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)

PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) In class, we saw this graph, with each node representing people who are following each other on Twitter: Our

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the

More information

MITOCW ocw-18_02-f07-lec02_220k

MITOCW ocw-18_02-f07-lec02_220k MITOCW ocw-18_02-f07-lec02_220k The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free.

More information

Slope Fields: Graphing Solutions Without the Solutions

Slope Fields: Graphing Solutions Without the Solutions 8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,

More information

L Hopital s Rule. We will use our knowledge of derivatives in order to evaluate limits that produce indeterminate forms.

L Hopital s Rule. We will use our knowledge of derivatives in order to evaluate limits that produce indeterminate forms. L Hopital s Rule We will use our knowledge of derivatives in order to evaluate its that produce indeterminate forms. Indeterminate Limits Main Idea x c f x g x If, when taking the it as x c, you get an

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 21

Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 21 EECS 16A Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 21 21.1 Module Goals In this module, we introduce a family of ideas that are connected to optimization and machine learning,

More information

Vector, Matrix, and Tensor Derivatives

Vector, Matrix, and Tensor Derivatives Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions

More information

15. LECTURE 15. I can calculate the dot product of two vectors and interpret its meaning. I can find the projection of one vector onto another one.

15. LECTURE 15. I can calculate the dot product of two vectors and interpret its meaning. I can find the projection of one vector onto another one. 5. LECTURE 5 Objectives I can calculate the dot product of two vectors and interpret its meaning. I can find the projection of one vector onto another one. In the last few lectures, we ve learned that

More information

Discrete Structures Proofwriting Checklist

Discrete Structures Proofwriting Checklist CS103 Winter 2019 Discrete Structures Proofwriting Checklist Cynthia Lee Keith Schwarz Now that we re transitioning to writing proofs about discrete structures like binary relations, functions, and graphs,

More information

CALCULUS I. Review. Paul Dawkins

CALCULUS I. Review. Paul Dawkins CALCULUS I Review Paul Dawkins Table of Contents Preface... ii Review... 1 Introduction... 1 Review : Functions... Review : Inverse Functions...1 Review : Trig Functions...0 Review : Solving Trig Equations...7

More information