Mathematics for Intelligent Systems Lecture 5 Homework Solutions
|
|
- Morris Hopkins
- 5 years ago
- Views:
Transcription
1 Mathematics for Intelligent Systems Lecture 5 Homework Solutions Advanced Calculus I: Derivatives and local geometry) Nathan Ratliff Nov 25, 204 Problem : Gradient and Hessian Calculations We ve seen that the gradient and Hessian of hx) x are hx) 2 hx) x x x ) x We can plug these in when they become necessary below.. Calculating the derivatives For generic differentiable h, we have log e hx) + e hx)) x k e h + e h I x x T ). 2) e h h + e h h )) x k x k eh e h h e h + e }{{ h. 3) x } k αx) This expression shows that the kth partial of ψx) is just the kth partial of h weighted by a weight αx). This weighting function has the nice property that as hx) gets large, it approaches the value, and as hx) approaches zero, it vanishes. We ll use those properties later on when analyzing limits. Stacking these partial derivatives, we get the following expression for the full gradient: ψx) αx) hx). 4)
2 Now for the Hessian. We simply take the second partial of the expression in Equation 3: 2 ψx) αx) h ) x k x k 2 h αx) + α h. 5) x k x k Already we know that the first of those terms is going to be the hessian of h weighted by αx). To calculate the second term, we need to compute the first partial of αx) α e h e h ) e h + e h e h + e h) ) e h h h h + e e h e h) ) e h h h h e e h + e h ) 2 7) αx) 2 ) h. 8) Plugging this expression back into Equation 5, we get 2 2 h ψx) αx) + x k x k αx) 2 ) h 6) h x k. 9) That first term is just the scaled Hessian of h as we mentioned above, and the second term is a scaled version of the outer product h s gradient vector. So, the full Hessian matrix is ) 2 ψx) αx) 2 h + αx) 2 h h T. 0) Note that correctly tracking the indices k and l throughout the computation is just about as easy as using a primed notation since these partial derivatives, themselves, represent just one-dimensional derivatives of the restricted function created by fixing the values of all variables accept the one in question. Often people make the notation even simpler by using xk x k. Importantly, correctly tracking these indices gives us confidence that the final solution is properly arranged in matrix form. In this case, the Hessian is always symmetric, so dimensionality analysis can sometimes help us guess the right answer, but when we make things a little more difficult by finding the Jacobian of mappings of the form φ : R n R n, for instance, or move even beyond that to tensors of higher order derivatives, things get more hairy. It s very very important, as the problems become harder that you meticulously track those indices. If you look up the Einstein summation convention for higher order tensors, you ll notice that for hard problems this technique of manipulating the partials is really all you can do.) So I encourage you to be careful with your indices. It ll save you a lot of anguish as derivatives become more complex. 2
3 .2 Taking the limits Now for the limits. From the intuition we gained from graphing the function, we might expect the gradient and Hessian of ψ away from zero to increasingly just look like the gradient and Hessian of hx) x. And then as we approach zero, the point of the cone formed by the graph of hx) becomes increasingly rounded; we d expect the gradient at least to be 0 at that point, and we might guess, if all goes right, that the Hessian is just the identity I since it s uniform in all directions. We can even look at the second derivative in one-dimension to better convince ourselves that the identity makes sense, but to be sure about all of these, beyond a modicum of doubt, we should perform the actual calculations. First we note that αx) approaches as x, and α0) 0. The gradient of ψ is the same as the gradient of h, just scaled by α, so as x gets large, since α approaches, the gradient simply approaches the gradient of h as we thought. We can make a similar argument for the first term of the Hessian in Equation 0. α approaches as x gets large, so that term approaches the Hessian of h. For the second term, we note that the gradients h x all have bounded norm specifically, the norm is always ). That means each element of the vector is bounded in absolute value, too. Since α as x gets large, the weight on this term α 2 approaches 0 as x gets large, so we can bound each element of the outer product h h T above and below by a value that approaches 0 as x gets large. That term, then, must go to zero in this limit. In all, that tells us that 2 ψ looks more and more like 2 h as x gets bigger and bigger in norm, matching our intuition. In the other direction, as x 0, the weight αx) approaches zero. So for the gradient ψ α h, since h always has norm, we can make the same bounding argument for its elements as we made for the second term of the Hessian to say that all components of this gradient must approach zero. The gradient, therefore, along along all paths converging to x 0, limits to 0. Note that from the viewpoint of the constituent functions that went into creating ψ, since h is undefined at x 0, it seems like the gradient of ψ is also undefined there as well. However, this function isn t really something constructed from its constituent components it s a mapping from all points x to the corresponding values ψx) R. And from that perspective, we know nothing about what went into its construction. If we return back to the definition of partial derivatives, we see that the derivative is actually the limit of the difference ratios as the perturbation of a variable x k gets smaller and smaller. That limit is perfectly well defined for this function at x 0, so we can conclude that the function is differentiable there. And the full gradient there, as we found in the above analysis is 0. Thus, it s absolutely correct to return the value 0 in an implementation of the gradient evaluation at x 0. This isn t just a trick, it s the right calculation for the function ψ. The limit of the Hessian is a bit trickier. Naïvely, it s tempting to say that, looking back at Equation 0, since α 0 as x 0, the first term vanishes and the weight on the second term approaches, leaving us with just the unweighted 3
4 second term h h T. In our case, that would suggest that the Hessian limits to x x T, meaning that it entirely depends on the path we take while approaching x 0. And we d conclude that there is no definitive limit, and this function behaves poorly around the origin. But we would be wrong! The reason is that while α 0, we need to remember that 2 h x I x x T ) explodes to infinity. Thus, we have a situation where two separate factors are approaching zero and infinity simultaneously, so this is an indeterminate form. We need to use L Hôspital! Denoting h x, we can collect the Hessian terms as follows 2 ψ α ) I x x T + α 2) x x T h α h I + α 2 α h ) x x T. ) It s clear what all elements of this expression that pertain to alpha do, except α h. That term, as x 0 approaches 0 0, so we need to use L Hôspital to understand its limit. Expanding that term, we get α h eh e h h e h + e h ). 2) Again this fraction is an indeterminate form, so we can take the derivative of the numerator and denominator separately and examine those ratios: e h + e h e h e h + e h ) + h e h e h ) + e h ) e h + e h + hα ). 3) Thus, this weird term limits to. Note that we could also apply L Hôspital directly to the fraction α h. The calculation is slightly more complicated, but the new ratio becomes α2, which also limits to. Returning back to the Hessian in Equation, we see now that the Hessian limits to α 2 ψ0) lim h 0 h I + α 2 α ) ) x x T 4) h I ) x x T I. So we conclude that the Hessian as we intuitively guessed above), is actually I at x 0..3 A note on checking your work with a computer. There are a number of tools these days for checking that you ve calculated and implemented these formulas right. I ll mention a few of them here since they came up during our discussion. By far, the easiest and most reliable is numeric differentiation i.e. finite-differencing). Section 5.3 of this week s lecture 4
5 notes Multivariate Calculus I: Derivatives and local geometry) talk about that method in length. The great thing about numeric differentiation is that it s largely agnostic to simplification of the expression. Independent of how convoluted you make your formula, as long as it s correct in the implementation, it ll spit out the right numbers, and numeric differentiation will report the same approximate) values. You don t have to stare out a different expression and try to verify that it s the same. On the other hand Mathematica and by extension, Matlab) has very powerful symbolic differentiation tools that we can use. Unfortunately, those tools may do a poor job of simplification which is a hard and largely ambiguous problem). They take symbolic representations of the function and report back their first and second derivatives, but in the end you may get grotesquely complicated strings of terms that you ll have to somehow relate back to your own expression. You can always evaluate those expressions at various points to gain confidence, but it s often easier to implement a numeric differentiation module and use it everywhere. One place where you really want to use it is in unit tests. You should unit test all of your derivative implementations using numeric differentiation. Unit tests will verify their correctness, and ensure that changes to the code never break anything. Once your function value implementation diverges from the gradient and Hessian computations, all bets are off. Those are hard bugs to reason through. The last tool we ll discuss is automatic differentiation. Automatic differentiation is different from both symbolic and numeric differentiation; you can think of it as lying somewhat between the two. The idea s been around for a very long time at least dating back to the 80 s), but, outside the backpropagation algorithm for Neural Networks in machine learning which is an independently developed instance of automatic differentiation used to calculate the gradient of the Neural Network), we re only now seeing a strong recurrence of its use within the machine learning and robotics communities. The basic idea is that the chain rule, product rule, and all other rules for differentiation are very regular and algorithmic. If you implement a function evaluation using a collection of well-known primitives such as exp), sin), cos), log)), while we re traversing the execution tree to evaluate the function, we can keep track of just a little bit more information along the way to additionally simultaneously compute the gradient, and even the Hessian. These tools are very powerful, and in some cases, you can use them to replace derivative calculations. Implement a function from primitives that support automatic differentiation, and you can call a pre-built method to calculate the value and gradient automatically. It ll calculate both the function value and the corresponding gradient at a given point with no effort to you at all. Unfortunately, the problem of finding the optimal computation tree i.e. the most efficient) for derivative calculations is intractable, and you can usually find a computationally more efficient way to directly implement the gradient and especially) the Hessian that automatic differentiators probably wouldn t find. But automatic differentiation is significantly more efficient than symbolic differentiation since it s just computing the value of the derivatives at a particular point, 5
6 and not the entire symbolic expression giving the gradient everywhere. And often in modern applications the networks of functions are complicated enough that it may make sense to trade off computational efficiency for speed of implementation and correctness. So use it if it makes sense you can really only get a feel for the trade-offs by playing around with it. Otherwise, I d recommend numeric differentiation over symbolic differentiation since it s a more general and reliable tool for unit testing your code. 2 Problem 2: Gradient and Hessian Calculations for Machine Learning 2. Sigmoid derivatives We present the derivatives as a lemma, and include the derivation below. Lemma. Sigmoid analysis. Let σx) +e x. Its first derivative is and its second derivative is d σ σ σ), 5) dx d 2 σ 2σ)σ σ). 6) dx2 Proof. We derive the result by straightforward calculation. d dx σ + e x ) 2 e x e x ) + e x + e x σ σ). For the second derivative we have d 2 dx 2 σ d e x ) dx + e x ) 2 + e x ) 2 e x + e x 2 + e x )e x + e x ) 4 e x + e x ) e x e x + e x + e x + e x σ σ) + 2σ σ) 2 [ ] σ) σ 2σ 2 2σ) σ σ). 6
7 2.2 Logistic regression Generally, for any y {, }, we can say ) + e ywt x py x, w) + e ywt x + e ywt x + x e ywt + e ywt x ) x e ywt + e ywt x e ywt x e ywt x + e ỹwt x p y x, w). where ỹ y So, as the problem explains, we can conclude that replacing y with y is equivalent to finding the probability of not y. As for the derivatives, these are actually slightly easier than the raw sigmoids above, because of the log. Indeed, this also shows that perhaps there s an easier way to find the above derivatives since d σ dx log σ σ, which implies σ σ d dx log σ. But we ll leave that for you guys to play with. The first partial of the logistic regression terms is log + e yiw T x i ) ) e yiw T x i y w k + e yiwt x i x k) i i And the second partials are 2 log + e yiw T x i ) w l w k w l py i x i, w))y i x k) i. + e yiwt x i e yiwt x i y i x l) i ) + e y iw T x 2 y i x k) i i p i p i ) x l) i x k) i, ) y i x k) i where p i py i x i, w). Here we ve also used the property that y i ) 2 independent of the value of y i since y i {, }. In vector and matrix form, the full gradient and Hessian become and N N log + e yiwt x i ) p i )y i x i. 7) 2 i N i log + e yiwt x i ) i N p i p i ) x i x T i. 8) i 7
8 3 Problem 3A): The geometry of revolute manipulators Taking the time derivative of the differentiable map φ : R n R n gives ẋ J φ q. Thus, just from calculus, we see that this Jacobian matrix J φ is a linear map that transforms velocities in the space of the joints to velocities at the end-effector. Given a velocity q, we simply multiply by J φ and we get back a velocity vector ẋ sitting at the end-effector that tells us the corresponding direction of motion of that end-effector point. In the other direction, now assume J φ is full rank and we re given a desired velocity ẋ d. What types of movements of the joints make the end-effector move in that direction? Since J φ is full rank, we can write its SVD as [ V T J φ U[S 0] // V T ], 9) as we ve seen before. In this case, though, U R 3 3 is a full square matrix since the matrix is full rank. The general solution to this linear system ẋ d J φ q is then q V // S U T ẋ d + V β, 20) for any vector of coefficients β. Any of these motions will do: a small motion in the direction of q will move the end-effector in the desired direction, simply because we re exploiting the linear relationship between velocities in the joint space and velocities at the end-effector. Note that this solution matrix V // S U T is often called the Moore-Penrose pseudoinverse, and is written J φ J T φ Jφ Jφ T ). 2) One may multiply the expression out in terms of the SVD above in Equation 9 to get insight into why it works. It calculates the matrix V // S U T without actually having to calculate the SVD. Additionally, the matrix I J φ J φ projects any arbitrary vector v onto the null space of J φ, so that I J φ J φ)v V β for some β. It s often useful in practice to choose some direction in joint space q d that we d like to move in for instance, it might be a direction pointing back toward some default configuration), and calculate q J φẋd + I J φ J φ) q d 22) to resolve the null space redundancy of the problem. These linear algebra results can easily be turned into an algorithm for controlling the robot s end-effector to a point by simply by choosing a desired velocity for the end-effector at each iteration that takes the robot in the right direction and moving the joints small step in the resulting joint space direction that implements that desired end-effector movement. How you specifically implement that procedure is flexible. But note 8
9 that a potential function of the form ψx d x), where ψ is the softmax function as defined in problem, creates a field of vectors negative gradient vectors) that nicely converge on x d. If you re familiar with differential equations, you ll recognize this field of negative potential gradient vectors as a differential equation whose integral curves i.e. the curves you get from always instantaneously following the directions specified by the vector field) converge on x d. So the problem of controlling the robot to x d simply becomes a problem of trying to numerically follow one of these integral curves by choosing appropriate joint velocity vectors q. There are a number of numeric differentiation tools readily available to choose from, such as Euler integration a particularly simple choice) or Runge Kutta a more numerically robust algorithm). And finally, when the Jacobian is reduced rank, we can t actually fully solve the problem. The best we can do is solve the least squares problem to find the best fit motion within the achievable set of end-effector velocities. 4 Problem 4: Inverse function theorem Suppose φ : R n R n. At the domain point x 0 we have a corresponding point in the co-domain y 0 φx 0 ). The first-order Taylor approximation of this nonlinear map around the point x 0 gives φx) y y 0 + J φ x x 0 ), 23) where J φ is the Jacobian at x 0. If J φ is invertible, we can think of this linear map as a bijective connection between the two spaces. Any notion of directionality mapping from x to y vs the reverse mapping from y to x) is simply an artifact of notation. This bijective linear map creates an association between points in the domain and points in the co-domain, and we can easily calculate an expression representing how points move from y to x by simply solving this linear system for x in terms of y. Doing so gives x x 0 + J φ y y 0). 24) This new expression looks like a Taylor expansion of a mapping from y to x, so it suggests that perhaps it is the first order Taylor expansion of the inverse map. If so, then it must be that J φ is the Jacobian of that inverse map. These are just heuristic arguments, but they demonstrate that the first order Taylor approximation can demonstrate structure of the nonlinear map that isn t evident from the original expression. That first order Taylor approximation connects the nonlinear function back to linear algebra and allows us to analyze the map locally using our linear algebra tools. In the above case, we see that it tells us something about the inverse of the map, but more generally, even when we re moving between R n and a space of different dimensionality R m, we can still use the first order Taylor approximation to understand the local structure of the map. When m n the best we can do is use pseudoinverses, but that s still quite handy when we have computational tools readily available while implementing complex systems. 9
Linear Algebra and Robot Modeling
Linear Algebra and Robot Modeling Nathan Ratliff Abstract Linear algebra is fundamental to robot modeling, control, and optimization. This document reviews some of the basic kinematic equations and uses
More informationLecture 10: Powers of Matrices, Difference Equations
Lecture 10: Powers of Matrices, Difference Equations Difference Equations A difference equation, also sometimes called a recurrence equation is an equation that defines a sequence recursively, i.e. each
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationOrdinary Least Squares Linear Regression
Ordinary Least Squares Linear Regression Ryan P. Adams COS 324 Elements of Machine Learning Princeton University Linear regression is one of the simplest and most fundamental modeling ideas in statistics
More informationLecture 6: Backpropagation
Lecture 6: Backpropagation Roger Grosse 1 Introduction So far, we ve seen how to train shallow models, where the predictions are computed as a linear function of the inputs. We ve also observed that deeper
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationMath Lecture 4 Limit Laws
Math 1060 Lecture 4 Limit Laws Outline Summary of last lecture Limit laws Motivation Limits of constants and the identity function Limits of sums and differences Limits of products Limits of polynomials
More informationLecture 4: Constructing the Integers, Rationals and Reals
Math/CS 20: Intro. to Math Professor: Padraic Bartlett Lecture 4: Constructing the Integers, Rationals and Reals Week 5 UCSB 204 The Integers Normally, using the natural numbers, you can easily define
More informationAutomatic Differentiation and Neural Networks
Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield
More informationDifference Equations
6.08, Spring Semester, 007 Lecture 5 Notes MASSACHVSETTS INSTITVTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.08 Introduction to EECS I Spring Semester, 007 Lecture 5 Notes
More informationGenerating Function Notes , Fall 2005, Prof. Peter Shor
Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we
More informationThe classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know
The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is
More informationThe classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA
The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.
More informationLecture 15: Exploding and Vanishing Gradients
Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train
More informationARE211, Fall 2004 CONTENTS. 4. Univariate and Multivariate Differentiation (cont) Four graphical examples Taylor s Theorem 9
ARE211, Fall 24 LECTURE #18: TUE, NOV 9, 24 PRINT DATE: DECEMBER 17, 24 (CALCULUS3) CONTENTS 4. Univariate and Multivariate Differentiation (cont) 1 4.4. Multivariate Calculus: functions from R n to R
More informationCoordinate systems and vectors in three spatial dimensions
PHYS2796 Introduction to Modern Physics (Spring 2015) Notes on Mathematics Prerequisites Jim Napolitano, Department of Physics, Temple University January 7, 2015 This is a brief summary of material on
More informationAlgebra Exam. Solutions and Grading Guide
Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full
More informationMath (P)Review Part II:
Math (P)Review Part II: Vector Calculus Computer Graphics Assignment 0.5 (Out today!) Same story as last homework; second part on vector calculus. Slightly fewer questions Last Time: Linear Algebra Touched
More informationCPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017
CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class
More informationWhat is proof? Lesson 1
What is proof? Lesson The topic for this Math Explorer Club is mathematical proof. In this post we will go over what was covered in the first session. The word proof is a normal English word that you might
More information3 Algebraic Methods. we can differentiate both sides implicitly to obtain a differential equation involving x and y:
3 Algebraic Methods b The first appearance of the equation E Mc 2 in Einstein s handwritten notes. So far, the only general class of differential equations that we know how to solve are directly integrable
More informationSolve Wave Equation from Scratch [2013 HSSP]
1 Solve Wave Equation from Scratch [2013 HSSP] Yuqi Zhu MIT Department of Physics, 77 Massachusetts Ave., Cambridge, MA 02139 (Dated: August 18, 2013) I. COURSE INFO Topics Date 07/07 Comple number, Cauchy-Riemann
More informationPlease bring the task to your first physics lesson and hand it to the teacher.
Pre-enrolment task for 2014 entry Physics Why do I need to complete a pre-enrolment task? This bridging pack serves a number of purposes. It gives you practice in some of the important skills you will
More informationCS 124 Math Review Section January 29, 2018
CS 124 Math Review Section CS 124 is more math intensive than most of the introductory courses in the department. You re going to need to be able to do two things: 1. Perform some clever calculations to
More informationSequences and infinite series
Sequences and infinite series D. DeTurck University of Pennsylvania March 29, 208 D. DeTurck Math 04 002 208A: Sequence and series / 54 Sequences The lists of numbers you generate using a numerical method
More informationThe General Linear Model. How we re approaching the GLM. What you ll get out of this 8/11/16
8// The General Linear Model Monday, Lecture Jeanette Mumford University of Wisconsin - Madison How we re approaching the GLM Regression for behavioral data Without using matrices Understand least squares
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining MLE and MAP Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. 1 Admin Assignment 4: Due tonight. Assignment 5: Will be released
More informationBindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.
Problem du jour Week 6: Monday, Mar 7 Show that for any initial guess x 0 > 0, Newton iteration on f(x) = x 2 a produces a decreasing sequence x 1 x 2... x n a. What is the rate of convergence if a = 0?
More informationNotes 3.2: Properties of Limits
Calculus Maimus Notes 3.: Properties of Limits 3. Properties of Limits When working with its, you should become adroit and adept at using its of generic functions to find new its of new functions created
More information5.2 Infinite Series Brian E. Veitch
5. Infinite Series Since many quantities show up that cannot be computed exactly, we need some way of representing it (or approximating it). One way is to sum an infinite series. Recall that a n is the
More informationMITOCW watch?v=pqkyqu11eta
MITOCW watch?v=pqkyqu11eta The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To
More informationGetting Started with Communications Engineering
1 Linear algebra is the algebra of linear equations: the term linear being used in the same sense as in linear functions, such as: which is the equation of a straight line. y ax c (0.1) Of course, if we
More informationThe Inductive Proof Template
CS103 Handout 24 Winter 2016 February 5, 2016 Guide to Inductive Proofs Induction gives a new way to prove results about natural numbers and discrete structures like games, puzzles, and graphs. All of
More informationFinal Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2
Final Review Sheet The final will cover Sections Chapters 1,2,3 and 4, as well as sections 5.1-5.4, 6.1-6.2 and 7.1-7.3 from chapters 5,6 and 7. This is essentially all material covered this term. Watch
More information1 Review of the dot product
Any typographical or other corrections about these notes are welcome. Review of the dot product The dot product on R n is an operation that takes two vectors and returns a number. It is defined by n u
More informationLogistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017
1 Will Monroe CS 109 Logistic Regression Lecture Notes #22 August 14, 2017 Based on a chapter by Chris Piech Logistic regression is a classification algorithm1 that works by trying to learn a function
More informationCalculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.
Preface Here are my online notes for my Calculus II course that I teach here at Lamar University. Despite the fact that these are my class notes they should be accessible to anyone wanting to learn Calculus
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationMain topics for the First Midterm Exam
Main topics for the First Midterm Exam The final will cover Sections.-.0, 2.-2.5, and 4.. This is roughly the material from first three homeworks and three quizzes, in addition to the lecture on Monday,
More informationModern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur
Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Lecture 02 Groups: Subgroups and homomorphism (Refer Slide Time: 00:13) We looked
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationSometimes the domains X and Z will be the same, so this might be written:
II. MULTIVARIATE CALCULUS The first lecture covered functions where a single input goes in, and a single output comes out. Most economic applications aren t so simple. In most cases, a number of variables
More informationLecture 2: Learning with neural networks
Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for
More information19. TAYLOR SERIES AND TECHNIQUES
19. TAYLOR SERIES AND TECHNIQUES Taylor polynomials can be generated for a given function through a certain linear combination of its derivatives. The idea is that we can approximate a function by a polynomial,
More informationLecture 11: Extrema. Nathan Pflueger. 2 October 2013
Lecture 11: Extrema Nathan Pflueger 2 October 201 1 Introduction In this lecture we begin to consider the notion of extrema of functions on chosen intervals. This discussion will continue in the lectures
More informationDescriptive Statistics (And a little bit on rounding and significant digits)
Descriptive Statistics (And a little bit on rounding and significant digits) Now that we know what our data look like, we d like to be able to describe it numerically. In other words, how can we represent
More informationCHAPTER 7: TECHNIQUES OF INTEGRATION
CHAPTER 7: TECHNIQUES OF INTEGRATION DAVID GLICKENSTEIN. Introduction This semester we will be looking deep into the recesses of calculus. Some of the main topics will be: Integration: we will learn how
More informationChapter 1 Review of Equations and Inequalities
Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve
More informationL Hopital s Rule. We will use our knowledge of derivatives in order to evaluate limits that produce indeterminate forms.
L Hopital s Rule We will use our knowledge of derivatives in order to evaluate its that produce indeterminate forms. Main Idea x c f x g x If, when taking the it as x c, you get an INDETERMINATE FORM..
More informationSection 1.x: The Variety of Asymptotic Experiences
calculus sin frontera Section.x: The Variety of Asymptotic Experiences We talked in class about the function y = /x when x is large. Whether you do it with a table x-value y = /x 0 0. 00.0 000.00 or with
More informationFinding Limits Graphically and Numerically
Finding Limits Graphically and Numerically 1. Welcome to finding limits graphically and numerically. My name is Tuesday Johnson and I m a lecturer at the University of Texas El Paso. 2. With each lecture
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationRegression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.
Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if
More informationDIFFERENTIAL EQUATIONS
DIFFERENTIAL EQUATIONS Basic Concepts Paul Dawkins Table of Contents Preface... Basic Concepts... 1 Introduction... 1 Definitions... Direction Fields... 8 Final Thoughts...19 007 Paul Dawkins i http://tutorial.math.lamar.edu/terms.aspx
More informationQuadratic Equations Part I
Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing
More informationAnswers for Calculus Review (Extrema and Concavity)
Answers for Calculus Review 4.1-4.4 (Extrema and Concavity) 1. A critical number is a value of the independent variable (a/k/a x) in the domain of the function at which the derivative is zero or undefined.
More informationAlgebra. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
This document was written and copyrighted by Paul Dawkins. Use of this document and its online version is governed by the Terms and Conditions of Use located at. The online version of this document is
More informationBindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9
Problem du jour Week 3: Wednesday, Jan 9 1. As a function of matrix dimension, what is the asymptotic complexity of computing a determinant using the Laplace expansion (cofactor expansion) that you probably
More information2. If the values for f(x) can be made as close as we like to L by choosing arbitrarily large. lim
Limits at Infinity and Horizontal Asymptotes As we prepare to practice graphing functions, we should consider one last piece of information about a function that will be helpful in drawing its graph the
More informationAn analogy from Calculus: limits
COMP 250 Fall 2018 35 - big O Nov. 30, 2018 We have seen several algorithms in the course, and we have loosely characterized their runtimes in terms of the size n of the input. We say that the algorithm
More informationThe Derivative of a Function
The Derivative of a Function James K Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University March 1, 2017 Outline A Basic Evolutionary Model The Next Generation
More informationNotes for CS542G (Iterative Solvers for Linear Systems)
Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,
More informationContinuity and One-Sided Limits
Continuity and One-Sided Limits 1. Welcome to continuity and one-sided limits. My name is Tuesday Johnson and I m a lecturer at the University of Texas El Paso. 2. With each lecture I present, I will start
More informationChapter 2. Mathematical Reasoning. 2.1 Mathematical Models
Contents Mathematical Reasoning 3.1 Mathematical Models........................... 3. Mathematical Proof............................ 4..1 Structure of Proofs........................ 4.. Direct Method..........................
More informationPartial Fractions. June 27, In this section, we will learn to integrate another class of functions: the rational functions.
Partial Fractions June 7, 04 In this section, we will learn to integrate another class of functions: the rational functions. Definition. A rational function is a fraction of two polynomials. For example,
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationAlgebra & Trig Review
Algebra & Trig Review 1 Algebra & Trig Review This review was originally written for my Calculus I class, but it should be accessible to anyone needing a review in some basic algebra and trig topics. The
More informationMITOCW ocw nov2005-pt1-220k_512kb.mp4
MITOCW ocw-3.60-03nov2005-pt1-220k_512kb.mp4 PROFESSOR: All right, I would like to then get back to a discussion of some of the basic relations that we have been discussing. We didn't get terribly far,
More informationBasics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On
Basics of Proofs The Putnam is a proof based exam and will expect you to write proofs in your solutions Similarly, Math 96 will also require you to write proofs in your homework solutions If you ve seen
More informationAP Calculus AB Summer Assignment
AP Calculus AB Summer Assignment Name: When you come back to school, it is my epectation that you will have this packet completed. You will be way behind at the beginning of the year if you haven t attempted
More informationSolving with Absolute Value
Solving with Absolute Value Who knew two little lines could cause so much trouble? Ask someone to solve the equation 3x 2 = 7 and they ll say No problem! Add just two little lines, and ask them to solve
More informationMATH 310, REVIEW SHEET 2
MATH 310, REVIEW SHEET 2 These notes are a very short summary of the key topics in the book (and follow the book pretty closely). You should be familiar with everything on here, but it s not comprehensive,
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationOne-to-one functions and onto functions
MA 3362 Lecture 7 - One-to-one and Onto Wednesday, October 22, 2008. Objectives: Formalize definitions of one-to-one and onto One-to-one functions and onto functions At the level of set theory, there are
More informationBridging the gap between GCSE and A level mathematics
Bridging the gap between GCSE and A level mathematics This booklet is designed to help you revise important algebra topics from GCSE and make the transition from GCSE to A level a smooth one. You are advised
More informationThe Jacobian. Jesse van den Kieboom
The Jacobian Jesse van den Kieboom jesse.vandenkieboom@epfl.ch 1 Introduction 1 1 Introduction The Jacobian is an important concept in robotics. Although the general concept of the Jacobian in robotics
More informationExperiment 2 Electric Field Mapping
Experiment 2 Electric Field Mapping I hear and I forget. I see and I remember. I do and I understand Anonymous OBJECTIVE To visualize some electrostatic potentials and fields. THEORY Our goal is to explore
More informationMath 115 Spring 11 Written Homework 10 Solutions
Math 5 Spring Written Homework 0 Solutions. For following its, state what indeterminate form the its are in and evaluate the its. (a) 3x 4x 4 x x 8 Solution: This is in indeterminate form 0. Algebraically,
More informationMATH 308 COURSE SUMMARY
MATH 308 COURSE SUMMARY Approximately a third of the exam cover the material from the first two midterms, that is, chapter 6 and the first six sections of chapter 7. The rest of the exam will cover the
More informationDesigning Information Devices and Systems I Fall 2018 Lecture Notes Note Introduction to Linear Algebra the EECS Way
EECS 16A Designing Information Devices and Systems I Fall 018 Lecture Notes Note 1 1.1 Introduction to Linear Algebra the EECS Way In this note, we will teach the basics of linear algebra and relate it
More informationComputing Neural Network Gradients
Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary
More informationCONSTRUCTION OF sequence of rational approximations to sets of rational approximating sequences, all with the same tail behaviour Definition 1.
CONSTRUCTION OF R 1. MOTIVATION We are used to thinking of real numbers as successive approximations. For example, we write π = 3.14159... to mean that π is a real number which, accurate to 5 decimal places,
More informationScalar Fields and Gauge
Physics 411 Lecture 23 Scalar Fields and Gauge Lecture 23 Physics 411 Classical Mechanics II October 26th, 2007 We will discuss the use of multiple fields to expand our notion of symmetries and conservation.
More informationter. on Can we get a still better result? Yes, by making the rectangles still smaller. As we make the rectangles smaller and smaller, the
Area and Tangent Problem Calculus is motivated by two main problems. The first is the area problem. It is a well known result that the area of a rectangle with length l and width w is given by A = wl.
More informationMITOCW ocw f99-lec23_300k
MITOCW ocw-18.06-f99-lec23_300k -- and lift-off on differential equations. So, this section is about how to solve a system of first order, first derivative, constant coefficient linear equations. And if
More informationHow to Use Calculus Like a Physicist
How to Use Calculus Like a Physicist Physics A300 Fall 2004 The purpose of these notes is to make contact between the abstract descriptions you may have seen in your calculus classes and the applications
More informationPageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper)
PageRank: The Math-y Version (Or, What To Do When You Can t Tear Up Little Pieces of Paper) In class, we saw this graph, with each node representing people who are following each other on Twitter: Our
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationLECTURE 10: REVIEW OF POWER SERIES. 1. Motivation
LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the
More informationMITOCW ocw-18_02-f07-lec02_220k
MITOCW ocw-18_02-f07-lec02_220k The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free.
More informationSlope Fields: Graphing Solutions Without the Solutions
8 Slope Fields: Graphing Solutions Without the Solutions Up to now, our efforts have been directed mainly towards finding formulas or equations describing solutions to given differential equations. Then,
More informationL Hopital s Rule. We will use our knowledge of derivatives in order to evaluate limits that produce indeterminate forms.
L Hopital s Rule We will use our knowledge of derivatives in order to evaluate its that produce indeterminate forms. Indeterminate Limits Main Idea x c f x g x If, when taking the it as x c, you get an
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationDesigning Information Devices and Systems I Fall 2018 Lecture Notes Note 21
EECS 16A Designing Information Devices and Systems I Fall 2018 Lecture Notes Note 21 21.1 Module Goals In this module, we introduce a family of ideas that are connected to optimization and machine learning,
More informationVector, Matrix, and Tensor Derivatives
Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions
More information15. LECTURE 15. I can calculate the dot product of two vectors and interpret its meaning. I can find the projection of one vector onto another one.
5. LECTURE 5 Objectives I can calculate the dot product of two vectors and interpret its meaning. I can find the projection of one vector onto another one. In the last few lectures, we ve learned that
More informationDiscrete Structures Proofwriting Checklist
CS103 Winter 2019 Discrete Structures Proofwriting Checklist Cynthia Lee Keith Schwarz Now that we re transitioning to writing proofs about discrete structures like binary relations, functions, and graphs,
More informationCALCULUS I. Review. Paul Dawkins
CALCULUS I Review Paul Dawkins Table of Contents Preface... ii Review... 1 Introduction... 1 Review : Functions... Review : Inverse Functions...1 Review : Trig Functions...0 Review : Solving Trig Equations...7
More information