Notes on Neural Networks

Size: px
Start display at page:

Download "Notes on Neural Networks"

Transcription

1 Artificial neurons otes on eural etwors Paulo Eduardo Rauber 205 Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te tas of supervised learning consists on finding a function f : R m R d tat is able to generalize from te examples in D Supervised learning as been successful at image classification natural language processing and many oter tass Some types of artificial neural networs can be seen as a function f : R m R d computed by a networ of artificial neurons Te particular way in wic te neurons are connected defines te arcitecture of te networ Artificial neurons are simplified models of real neurons Te obective of tese models is not to simulate te beavior of real neurons but to replicate some of teir ig level functionality Consider an input vector x (x x m ) R m to an artificial neuron We define a weigt vector w (w w m ) R m a bias b R and a tresold θ R Using tese definitions te following expressions give te outputs of different models of artificial neurons: Linear neurons: b m i x iw i b xw; Rectified linear neurons: max(0 b xw); { if b xw θ Binary tresolded neurons: 0 oterwise; Sigmoid neurons: e (bxw) ; Stocastic binary neurons (sampling): P (Y : w b) e (bxw) Te bias term b is te negative intercept of te affine yperplane H {(x x m ) x w x m w m b} By consequence binary tresolded neurons separate R m into two alf-spaces by an affine yperplane Sigmoid neurons perform an analogous but smoot assignment of te input x to one of te alf-spaces Consider binary tresolded sigmoid or stocastic binary neurons Intuitively a ig bias indicates tat a neuron is easy to excite (output a positive value) A low bias indicates te opposite If eac input element x i is interpreted as a feature a positive weigt w i indicates tat te feature excites te neuron and a negative weigt w i indicates tat te feature inibits te neuron Supervised learning in artificial neural networs is performed by updating te weigt vectors of te artificial neurons to minimize a metric of prediction error on te training data set Wen applied to discriminate between d classes te artificial neural networ may output d real values wic correspond to te confidence tat te input belongs to eac class 2 Feedforward neural networs Tis section describes feedforward neural networs an important class of artificial neural networs Consider te data set D {(x i y i ) i { n} x i R m y i R d } Let L represent te number of layers in te networ and (l) represent te number of neurons in layer l wit (L) d We will refer to a neuron in layer l by a corresponding number between and (l) Te neurons in te first layer are also called input units te neurons in te output (last) layer called output units and te oter neurons called idden units etwors wit more tan 3 layers are called deep networs Let w (l) R represent te weigt reacing neuron in layer l from neuron in layer (l ) Te order of te indices is counterintuitive but maes te presentation simpler Furtermore let b (l) R represent te bias for neuron in layer l Consider a layer l for < l L and neuron for (l) Te weigted input to neuron in layer l is defined as z (l) (l ) b (l) w (l) a(l )

2 were te activation of neuron in layer l > is defined as a (l) σ(z (l) ) were σ is a differentiable activation function For example consider te sigmoid activation function defined by σ(z) e In tis case it is easy to sow tat σ (z) σ(z)( σ(z)) z It will be useful to define vectors (and a matrix) tat represent quantities associated to eac neuron in a given layer Te weigted input for layer l > is defined as z (l) (z (l) z(l) (l) ) and te activation vector for layer l is defined as a (l) (a (l) a(l) (l) ) Furtermore we define te bias vectors as b (l) (b (l) b(l) (l) ) and te weigt matrices W (l) of dimension (l) (l ) as (W (l) ) w (l) Using tese definitions te output of eac layer l > can be written as a (l) σ(w (l) a (l ) b (l) ) were te activation function is applied element-wise Given tese definitions te output of te feedforward neural networ f wen a () x is defined as te activation vector of te output layer f(x) a (L) Tis completes te definition of te feedforward neural networ model It is possible to sow tat a feedforward neural networ wit a single idden layer of sigmoid neurons can approximate any real-valued continuous function defined on a compact subset of R m Let te cost C be a differentiable function of te weigts and biases For example consider te (alved) mean squared cost C given by were C n (xy) D c c 2 a(l) y 2 wen a () x It is easy to sow tat a (L) y We define te tas of learning te parameters (weigts and biases) for a feedforward neural networ as finding parameters tat minimize C for a given training data set D Te fact tat te cost can be written as an average of costs for eac element of te data set will be crucial to te proposed optimization procedure Te procedure requires te computation of partial derivatives of te cost wit respect to weigts and biases wic are usually computed by a tecnique called bacpropagation Let te error of neuron in layer l for a fixed (x y) D be defined as δ (l) z (l) Te error for te neurons in layer l is denoted by δ (l) (δ (l) δ(l) ) (l) We also let a c ( (L) ) denote te gradient of c wit respect to a (L) Bacpropagation is a metod for computing te partial derivatives of te cost function of a feedforward artificial neural networ wit respect to its parameters Te metod is based solely on te following six statements wic 2

3 we will demonstrate sortly: δ (L) a c σ (z (L) ) () δ (l) ((W (l) ) T δ (l) ) σ (z (l) ) C C δ (l) (3) a (l ) δ (l) (4) n n (xy) D (xy) D (5) (6) were denotes element-wise multiplication otice ow every quantity on te rigt side can be computed easily from our definitions by starting wit te errors in te output layer for every observation Tat is wy tis metod is called bacpropagation Te statement δ (L) a c σ (z (L) ) is equivalent to for (L) By definition First notice tat a (L) Because z (L) δ (L) δ (L) σ (z (L) ) is a differentiable function of z (L) and c is a differentiable function of a (L) a (L) and a (L) a (L) do not directly affect eac oter (L) only affects c troug a (L) δ (L) (L) σ (z (L) ) were te second equality comes from te cain rule and te tird equality from te fact tat a(l) Tis completes te proof Te statement δ (l) ((W (l) ) T δ (l) ) σ (z (l) ) is equivalent to δ (l) σ (z (l) (l) ) w (l) δ (l) (L) 0 for for < l < L and (l) Because z (l) is a differentiable function of z (l) z(l) and c is a differentiable function of z (l) (l) z (l) (l) By definition terefore z (l) z (l) δ (l) z (l) z (l) (l) (l) b (l) z (l) w (l) z (l) i a (l) z (l) z (l) w (l) i a (l) i w (l) σ (z (l) ) 3

4 Tis gives (l) δ (l) z (l) wic completes te proof Consider te statement function of z (l) z(l) (l) w (l) σ (z (l) (l) ) δ (l) Because z(l) δ (l) w (l) σ (z (l) ) σ (z (l) (l) ) is a differentiable function of b (l) w (l) δ (l) and c is a differentiable (l) z (l) z (l) z (l) z (l) δ (l) z (l) 0 for and oterwise Tis completes te proof since z(l) Similarly consider te statement a (l ) and c is a differentiable function of z (l) z(l) (l) since z(l) i w wic gives (l) w i 0 if i By definition z (l) Te statements C a (l ) n z (l) δ (l) (xy) D z (l) i b (l) δ (l) Because z(l) z (l) i w as we wanted to sow and z (l) z (l) is a differentiable function of w (l) w(l) (l ) z (l) δ (l) w w (l ) i w i a (l ) i Terefore w (l) a(l ) a (l ) C n (xy) D follow easily from te definition of C and c and teir differentiability wit respect to te parameters Tis completes te proof of te six statements tat allow te computation of te gradient of C wit respect to te parameters of te networ troug bacpropagation Te statements are valid for oter activation and cost functions as we will sow in te next Section In eac case it is important to cec weter te assumptions made in te proofs apply Intuitively for eac observation bacpropagation considers te effect of a small increase of w on a parameter w in te networ Tis cange affects every subsequent neuron on a pat to te output and ultimately canges te cost c by a small c Consider a differentiable function f : R n R over variables x (x x n ) Suppose we are interested in finding a global minimum of f Because te gradient f(a) gives te direction of maximum local increase in f at a it would be natural to start at a point a 0 wic could be cosen at random and visit te sequence of points given by a t a t η f(a t ) were te learning rate η R is a small constant Tis tecnique is called gradient descent and can be used as a euristic to find te parameters tat minimize te cost C wit respect to te parameters of te networ Gradient descent is not guaranteed to converge Even if it converges te point at convergence may be a saddle point or a poor local minima Te coice of η considerably affects te success of gradient descent In a given iteration t of gradient descent instead of computing C and C as averages derived from a computation involving all (x y) D it is also possible to consider only a subset D D of randomly cosen observations Te data set D may also be partitioned randomly into subsets called batces wic are considered in sequence In tis case anoter random partition is considered once every subset is used Tis procedure called mini-batces stocastic gradient descent is widely used due to its efficiency Intuitively te procedure maes faster decisions based on sampling Regardless of tese coices a sequence of iterations tat considers all observations in te data set is called an epoc 4

5 Te basic coices involved in learning te parameters for a feedforward neural networ using stocastic gradient descent include at least te number of idden layers te number of neurons in eac idden layer size of te mini-batces te number of epocs and te learning rate η However more coices will be necessary for te enancements proposed in te next capters 3 Cost functions for feedforward neural networs Te previous section introduced feedforward neural networs For simplicity te presentation mentioned only sigmoid activation functions and te (alved) mean squared cost function Tis section begins describes better alternatives for tese functions Consider te weigted input z (L) to neuron in te output layer L If z (L) is large te corresponding neuron is said to be saturated In tat case if σ is te sigmoid function ten σ (z (L) ) is close to zero If te cost function is te mean squared cost te first bacpropagation statement gives δ (L) case δ (L) partial derivatives of te cost wit respect to w (L) (a (L) is close to zero even if te target output y is very different from a (L) y )σ (z (L) ) Terefore in tis Anoter consequence is tat te and b(l) for all will also be close to zero Tis is expected from te mean squared cost because a small cange to te output wic requires a significant cange to te parameters of a saturated neuron would not improve te cost significantly Tis as te undesired effect of slow learning for saturated neurons in te output layer Since canging te learning rate would impact learning on te wole networ tat is not an elegant solution Instead of considering te (alved) mean squared cost suppose we considered a cost function c for a single observation suc tat y σ (z (L) ) a(l) Te first bacpropagation statement would give δ(l) a (L) y wic would avoid slow learning for saturated neurons in te output layer Tat is precisely wy sigmoid output neurons are used togeter wit te cross-entropy cost function Consider te data set D {(x i y i ) i { n} x i R m y i R d } Te cross-entropy cost function c for a single pair (x y) D is defined as c d i wen a () x otice tat te cost function requires a (L) i wen y i a (L) i and tends to infinity wen a (L) i y i y i ln a (L) i ( y i ) ln ( a (L) i ) Consider te partial derivative of te cost c wit respect to a (L) (0 for all i Intuitively te cost is close to zero wic is given by y a (L) ( y ) ( a (L) ) y ( a (L) ) a (L) ( y ) a (L) ( a (L) ) a(l) a (L) y ( a (L) ) If a (L) σ(z (L) ) and σ is te sigmoid function ten a (L) ( a (L) ) σ (z (L) ) and y σ (z (L) ) a(l) By te first bacpropagation statement δ (L) a (L) y as we wanted to sow As previously mentioned te cross-entropy cost function avoids te slow down in learning caused by saturated sigmoid output neurons However tat does not necessarily solve te analogous problem for te previous layers In comparison wit te previous section te cross-entropy cost wit sigmoid neurons only affects te computation of a c It is also common to use a different activation function in te output layer activation function for neuron in te output layer L is is te identity a (L) pair (x y) D given by te (alved) mean squared cost wen a () x In tis case a (L) c 2 a(l) y 2 affects te first bacpropagation statement From te definition of δ (L) z (L) Consider a networ were te Consider again te cost per y Tis cange in te activation function for te last layer only it is easy to sow tat 5

6 (L) δ (L) a (L) y Te last equality follows from a(l) Terefore te (alved) mean squared cost does not slow learning for saturated linear output neurons Tis may be an appropriate coice for regression problems wic consist on predicting real-valued outputs In comparison wit te previous section te (alved) mean square cost wit linear output neurons only affects te first statement of bacpropagation wic sould be substituted by δ (L) a c a (L) y Consider a data set D {(x i y i ) i { n} x i R m y i R d } for a d-classes classification problem Concretely for every pair (x y) D y if x belongs to class and y 0 oterwise Te learning tas can be restated as finding te parameters θ tat maximize te lieliood L(θ : D) of observing y i given x i for every i in a conditional probability distribution P parameterized by θ Θ assuming te sample elements in D are identically distributed and independent given any θ Te lieliood L(θ : D) is defined as L(θ : D) n P (Y y i X x i : θ) i Tis is equivalent to finding te parameters θ tat minimize te negative log-lieliood l(θ : D) given by l(θ : D) n ln P (Y y i X x i : θ) i Suppose we could interpret te output a (L) of neuron in te output layer L of a feedforward neural networ as te probability of input a () x belonging to class Te negative log-lieliood cost function c for a pair (x y) D could be written as c d i y i ln a (L) i were a () x otice tat tis cost function requires a (L) > 0 Te softmax output layer is used precisely so te output of te networ can be interpreted as a parameterized conditional probability distribution P (Y X : θ) A feedforward neural networ as a softmax output layer wen a (L) e z(l) d ez(l) for every neuron d in te output layer It is easy to see tat d a(l) and a (L) > 0 for all Given input a () x te activation a (L) can be interpreted as P (Y e X x : θ) were e is te standard basis vector in direction and θ represents te parameters of te networ Te softmax activation function is significantly different from tose presented so far otice tat a (L) depends on z (L) i for all i In fact tis completely invalidates te first bacpropagation statement However we will now sow tat By definition δ (L) is a differentiable function of a (L) a (L) and a(l) It is also easy to sow tat δ (L) a (L) y Consider te negative log-lieliood cost function c d i y i ln a (L) i Because c d y a (L) is a differentiable function of z (L) for every and d Terefore 6

7 d y a (L) Because te softmax output layer will be applied solely in a classification tas y is for a single and zero oterwise Terefore a (L) Consider te derivative of te activation of neuron wit respect to te weigted input to neuron wic is given by e z(l) d d d d 2 e z (L) ez(l) ez(l) dz (L) e z(l) e z(l) e z (L) By separating fraction above into two and using te definition of te activations a (L) Consider d dz (L) e z (L) If ten d dz (L) Consider again te equation Using te expression for a(l) d dz (L) e z (L) d ez(l) a (L) a(l) e z (L) e z (L) d Oterwise dz (L) { Te equation above can also be written as a (L) ( a(l) ) if oterwise a (L) a(l) a (L) { a (L) if oterwise a (L) a (L) y and a (L) e z (L) 0 Terefore as we wanted to sow In comparison wit te previous section te softmax output layer wit negative log-lieliood cost only affects te first statement of bacpropagation wic sould be substituted by δ (L) a (L) y Tis combination of cost function and output activation is a principled and elegant formulation of te classification tas and is often used in practice 4 Improving feedforward neural networs Tis section begins by describing tecniques to deal wit overfitting In a feedforward neural networ learning consists on finding parameters (weigts and biases) tat minimize a cost function defined on te available data In tis context te networ is also called a model for te data A model is said to overfit wen it as a low cost over te available data but generalizes poorly for unseen data Te most common way to diagnose overfitting is to perform cross-validation In cross-validation te available data is partitioned into two sets One of tese sets is used for learning (training set) and te oter set is used for evaluation (test set) Tis is also ow model efficacy is evaluated 7

8 However feedforward neural networs also ave yperparameters Tese are te parameters tat must be defined even before learning occurs wic include te learning rate number of layers number of neurons in eac layer number of epocs mini-batc size etc Evaluating generalization efficacy troug yperparameters cosen according to teir efficacy on te test set is a metodological mistae Te yperparameters may also overfit te test data wic would result in overly optimistic evaluations Te available data is usually split into tree sets: training validation and testing Te validation set is used to compare te results of different yperparameters wic are used for learning using te training set Te best yperparameters on te validation set are used for learning using bot training and validation sets Finally te model is evaluated in te test set wic gives a generalization efficacy estimate Tis process may be repeated several times using scemes as -fold cross-validation or bootstrapping wic we do not detail ere In te case of feedforward neural networs it is common to create graps comparing cost and accuracy on training and validation data as epocs pass Tis may elp in diagnosing overfitting For example te accuracy on te training data may become perfect after a number of epocs wile te accuracy on te validation data becomes worse Early stopping consists on alting te learning procedure wen validation accuracy does not improve significantly after a given number of epocs Coosing yperparameters for a feedforward neural networ tat acieve good generalization in te validation set can be a very callenging tas Because te number of yperparameters is very large manual fine-tuning is often preferred to scemes suc as grid searc Regularization is an important tecnique to prevent overfitting Intuitively regularization penalizes large weigts in te networ wic could mae te output very sensitive to small canges in te input We define te L2- regularized cost function C for a data set D {(x i y i ) i { n} x i R m y i R d } as C c λ w 2 n 2n (xy) D were c is te cost for te pair (x y) wen a () x W is te set of all weigts in te networ and λ 0 is te regularization factor otice ow te second term increases te overall cost of aving large weigts wen te first term is fixed Intuitively in te case of an extremely large λ te networ would be rewarded for effectively removing most weigts It is important to note tat regularization is not considered into te cost c for a single observation because tat would invalidate te fourt property of bacpropagation It is easy to sow tat te partial derivative of te cost C wit respect to bias b (L) is w (L) C b (L) n (xy) D w W b (L) wic is te same as before Te reason for not penalizing te biases is mostly empirical Te partial derivative of te cost C wit respect to weigt w (L) is C λ n n w(l) w (L) (xy) D w (L) Wen using gradient descent to minimize C wit respect to te parameters of te networ te update rule for can be written as w (L) w(l) η C w (L) η w (L) n (xy) D ( ηλ n )w(l) η n w (L) (xy) D λ n w(l) otice tat te last rule is very similar to gradient descent on a cost function witout regularization except for te factor 0 ( ηλ n ) < (under sensible coices of η and λ) multiplying w(l) Tis regularization metod is also called weigt decay Wen performing stocastic gradient descent te decay factor is always ( ηλ n ) for every update irrespective of batc size w (L) 8

9 Anoter way to reduce overfitting is to use more training data Intuitively tis is useful to constrain te freedom in te coice of parameters wic is one of te maor causes of overfitting Because it is not always easy to obtain more data one particularly useful idea is to expand te available data by transforming input vectors in ways tat mimic expected variations For example if te input vectors correspond to images transformations suc as translation scaling and rotation often sould not affect classification Previously we mentioned tat stocastic gradient descent may begin at a random assignment to te weigts and biases Consider te neuron in layer l and let p (l ) denote te number of neurons in layer l wic is also te number of inputs to neuron Suppose te weigts reacing neuron were sampled according to a Gaussian distribution wit mean 0 and standard deviation σ suc tat w (l) (0 σ2 ) for all By te properties of Gaussian distributed random variables p w (l) (0 pσ2 ) If te weigts were sampled from a standard Gaussian distribution (0 ) ten p w(l) (0 p) If te number of inputs p to neuron were large te neuron could easily become saturated As already mentioned saturated neurons slow learning because small canges to teir parameters do not significantly affect teir outputs In contrast consider w (l) (0 p ) In tis case p w(l) (0 ) wic elps in avoiding early saturation and overall learning success Tis is a common recommendation for weigt initialization in feedforward neural networs Te bias b (l) for neuron can be sampled from (0 ) Consider a feedforward neural networ wit a single sigmoid neuron in eac layer and a cross-entropy cost function In tis case it is easy to sow tat te error δ (l) of te neuron at layer l for a single pair (x y) is L δ (l) (a (L) y ) w (i) σ (z (i) ) for every l > If te weigts in te networ are not very large a single saturated neuron could mae δ (l) very small Tus te partial derivatives of te parameters wit respect to te cost would also be small in layer l wic would compromise learning by gradient descent in all layers previous to and including l An analogous problem occurs in more general networs Because c can be seen as a function of te weigted inputs z (l) for all neurons in a given layer l tis issue is caracterized by z (l)c 0 wic is wy tis learning problem is called vanising gradients Wen te weigts in te networ are large exploding (large) gradients could also become a problem However large weigts are often followed by saturated neurons wic maes te problem less common It may also be important to scale input vector elements to ave zero mean and unit variance across every input vector to avoid early saturation It can be sown tat deep feedforward neural networs (L > 3 layers) can approximate more complicated functions wit fewer parameters tan sallower networs However in practice vanising gradients mae it difficult to obtain better efficacy wit deep feedforward neural networs trained by gradient descent We now describe two tecniques considered important in te context of deep feedforward neural networs: momentum and dropout Te momentum tecnique is a common euristic for training deep artificial neural networs In momentumbased stocastic gradient descent eac parameter w in te networ (weigt or bias) as a corresponding velocity v Te velocity is defined by v 0 0 and C v t µv t η w t were v i and w i correspond respectively to v and w at iteration i of stocastic gradient descent At eac iteration te parameter w is updated by te rule w t w t v t Intuitively te momentum tecnique remembers te velocity of eac parameter allowing larger updates wen te direction of decrease in cost is consistent over many iterations Te parameter 0 µ controls te effect of te previous velocity on te next velocity and µ is commonly interpreted as a coefficient of friction If µ 0 te tecnique is equivalent to stocastic gradient descent Dropout is anoter euristic for training deep artificial neural networs At every iteration of stocastic gradient descent alf te idden neurons are removed at random In most implementations tis can be accomplised by forcing te outputs of te corresponding neurons to be zero Te modified networ is applied as usual to te observations in a mini-batc and bacpropagation follows as if te networ were not canged Te resulting il 9

10 partial derivatives are used to update te parameters of te neurons tat were not removed After training is finised te weigts incoming from idden neurons are alved Tis euristic is believed to mae te networ robust to te absence of particular features wic migt be particular to te training data Dropout is considered related to regularization for trying to reduce overfitting Tere are many euristics for implementing feedforward neural networs tat will not be described in detail in tis text Tey include Hessian optimization input witening rmsprop and adaptive learning rates Altoug we focused most of te discussion on sigmoid neurons rectified linear neurons ave acieved superior results in important bencmars 5 Convolutional neural networs Convolutional neural networs were first developed for image classification altoug tey ave also been applied in oter tass Tis section focuses on image classification using convolutional neural networs A two-dimensional image is a function f : Z 2 R c If f(a) (f (a) f c (a)) ten f i is also called cannel i An element a Z 2 is called a pixel and f(a) is te value of pixel a A window W Z 2 is a finite set W s S s 2 S 2 tat corresponds to a rectangle in te image domain Te size of tis window W is denoted as w were w S s and S 2 s 2 Because te domain D of images of interest is usually a window it is possible to represent an image f by a vector x R c D In tis vector tere is a scalar value f i (a) corresponding to eac cannel i of eac pixel a D Consider a data set D {(x i y i ) i { n} x i R m y i R d } were eac vector x i corresponds to a distinct image f : Z 2 R c all images are defined on te same window D and m c D Te tas of image classification consists on assigning a class label for a given vector x based on generalization from D A convolutional neural networ is particularly well suited for image classification because it explores te spatial relationsips between pixels (organization in Z 2 ) Similarly to feedforward neural networs a convolutional neural networ is also a parameterized function and te parameters are usually learned by stocastic gradient descent on a cost function defined on te training set In contrast to feedforward neural networs tere are tree main types of layers in a convolutional neural networ: convolutional layers pooling layers and fully connected layers A convolutional layer receives an input image f and outputs an image o A convolutional layer is composed solely of artificial neurons Eac artificial neuron in a convolutional layer l receives as input te values in a window W s S s 2 S 2 D of size w were D is te domain of f Te weigted output z (l) of tat neuron is given by In te equation above a (l ) i z (l) c b(l) S i s S 2 w (l) i a(l ) i s 2 f i( ) te value of pixel ( ) in cannel i of te input image Also b (l) te bias of neuron and w (l) i te weigt tat neuron in layer l associates to f i( ) Te activation function for a convolutional layer is usually rectified linear so a (l) max(0 z(l) ) Te definition of z(l) is similar to te definition of te weigted input for a neuron in a feedforward neural networ Te only difference is tat a neuron in a convolutional layer is not necessarily connected to te activations of all neurons in te previous layer but only to te activations in a particular w window W Eac neuron in a convolutional layer as cw weigts and a single bias A neuron in a convolutional layer is replicated (troug parameter saring) for all windows of size w in te domain D wose centers are offset by pre-defined steps Tese steps are te orizontal and vertical strides Te activations corresponding to a neuron replicated in tis way correspond to te values in a single cannel of te output image o (appropriately arranged in Z 2 ) Tus an output image o : Z 2 R is obtained by replicating neurons over te wole domain of te input image Te total number of free parameters in a convolutional layer is only (cw ) If te parameters in a convolutional layer were not sared by replicated neurons te number of parameters would be M(cw ) were M is te number of windows of size w tat fit into f (for te given strides) Te weigted outputs (minus te bias) of replicated neurons correspond to an output cannel tat is analogous to te discrete (multicannel) convolution of te input f wit a particular image g Te values of g correspond to te (sared) weigts of te replicated neurons (appropriately arranged in Z 2 ) Tis assumes tat te orizontal and vertical strides are and tat te domain of te resulting image is always restricted to te window domain of f In oter words eac cannel o u in te output o of a convolutional layer corresponds to a convolution wit an is 0

11 image g u wic is also called a filter Tis is te origin of te name convolutional networ Terefore to define a convolutional layer it is enoug to specify te size of te filters (window size) te number of filters (number of cannels in te output image) orizontal and vertical strides (wic are usually ) Eac cannel in te output of a convolutional layer can also be seen as te response of te input image to a particular (learned) filter Based on tis interpretation eac cannel in te output image is also called an activation map Bacpropagation can be adapted to compute te partial derivative of te cost wit respect to every parameter in a convolutional layer Te fact tat a single weigt affects te output of several neurons must be taen into account We omit te details of bacpropagation for convolutional neural networs in tis text A pooling layer receives an input image f : Z 2 R c and outputs an image o : Z 2 R c A pooling layer reduces te size of te window domain D of f by an operation tat acts independently on eac cannel A common pooling tecnique is max-pooling In max-pooling te maximum value of cannel f i in a particular window of size w corresponds to an output value in cannel o i To define a max-pooling layer it is enoug to specify te size of tese windows and te strides (wic usually matc te window dimensions) Te obective of reducing te spatial domain of te image is to acieve similar results to using comparatively larger convolutional filters in te next layers Tis supposedly allows te detection of iger-level features in te input image wit a reduced number of parameters It is also believed tat max-pooling improves te invariance of te classification to translations of te original image In practice a sequence of alternating convolutional and max-pooling layers as obtained excellent results in many image classification tass Bacpropagation can also be performed troug max-pooling layers In summary a max-pooling layer receives an input image f : Z 2 R c and outputs an image o : Z 2 R c defined by o i ( ) max a W f i (a) were i { c} ( ) Z 2 D is te window domain of f and W D is te input window corresponding to output pixel ( ) A fully connected layer receives an input image f : Z 2 R c or an input vector x and outputs a vector o A fully connected layer is precisely analogous to a layer in a feedforward neural networ and can only be succeeded by oter fully connected layers Te final layer in a convolutional neural networ is always a fully connected layer wit d neurons wic is responsible for representing te classification Bacpropagation in fully connected layers is analogous to bacpropagation in feedforward networs Te tas of detection corresponds to delimiting te spatial extent of particular obects of interest witin an image Consider a convolutional neural networ trained on images wit a window domain D Te input to te first fully connected layer is an image wit a window domain W of size w wic is dependent on te number of pooling layers present in te networ Tis image wit domain W is ultimately mapped (troug forward pass troug fully connected layers) to a vector o R d wic represents te confidence tat te input image belongs to eac class otice tat it is possible to apply te same networ up to te first fully connected layer to an input image wit a muc larger window domain D L Tis results in a larger input image wit domain W L for te first fully connected layer It is possible to partition te window W L into smaller windows W l of size w wic matces te size required by te first fully connected layer Terefore eac smaller window W l of tis larger input image can be associated to an output vector o l R d wic represents te confidence tat te input image contains a particular obect in window W l As a consequence te original larger image is partitioned into rectangular windows for wic te confidence tat eac window contains a particular obect is nown Tis partition can be represented by an image g : Z 2 R d called class map Precisely te same class map would be obtained by exaustively applying te convolutional networ for eac window of size w wic fits into te larger input image altoug te computational cost would be significantly iger In summary for te purpose of detection te convolutional neural networ can be trained wit fixed-size images of obects of interest Detection is performed by resizing te input image so tat te expected size of te obects of interest matces te size observed during training Te resized image is received as input by te adapted networ wic partitions te input to te fully connected layers appropriately Te corresponding class map can be post-processed to obtain rectangular boundaries containing obects or confidences for te presence of a given obect Te coice of yperparameters for convolutional neural networs (types of layers number of layers filters per layer filter sizes strides) is crucial to acieve good performance Deep convolutional neural networs are usually trained in immense data sets requiring (non-trivial) efficient implementations wic include all te improvements mentioned in te previous section After training a convolutional neural networ for a particular data set it is possible to re-use te parameters of te networ (up to its last fully connected layer) as a starting point for anoter

12 classification tas Tis tecnique decouples representation learning from a specific image classification problem and as been very successful in practice 6 Recurrent neural networs Let A denote te set of non-empty sequences of elements in te set A and let X denote te lengt of a sequence X A We denote te t-t element of sequence X by Xt Consider te data set D {(X i Y i ) i { n} X i (R m ) Y i (R d ) } Furtermore let X Y for every (X Y ) D In oter words te data set D is composed of pairs (X Y ) of sequences of te same lengt Eac element of te two sequences is a real vector but Xt and Y t do not necessarily ave te same dimension Te tas of sequence element classification consists on finding a function f : (R m ) (R d ) tat is able to generalize from te examples in D Tis tas is more general tan sequence classification wic consists on finding a function f : (R m ) R d It is not straigtforward to apply te models presented so far to sequence element classification Recurrent neural networs compose a particular class of artificial neural networs tat can be applied to sequence element classification We start by introducing a simple recurrent neural networ wit a single idden layer Many definitions are analogous to tose presented for feedforward neural networs and will be omitted wen tere is no ambiguity Let (l) be te number of neurons in layer l wit () m and (3) d Let b (l) R represent te bias for neuron in layer l > and w (l) R represent te weigt reacing neuron in layer l > from neuron in layer (l ) Furtermore let ω R represent te weigt reacing neuron in layer 2 from neuron in layer 2 from te previous time step Te input activation at time t for (X Y ) D is defined as at () Xt Te weigted input to neuron in layer 2 at time step t is defined as () zt b w at() ω at were a0 may be zero altoug it could also be learned Te activation of neuron in a layer l > at time t is defined as at (l) σ(zt (l) activation function For concreteness we will consider sigmoid activation functions Te weigted input to neuron (3) in layer 3 is defined as zt (3) b (3) w (3) at ) were σ is a differentiable as in a feedforward neural networ Te output of te recurrent neural networ f on input X a () at () is te sequence a (3) at (3) Tis completes te definition of a recurrent neural networ wit a single recurrent idden layer Intuitively te sequence X is presented to te networ element by element Te networ beaves similarly to a single idden layer feedforward neural networ except for te fact tat te output activation at of te idden layer at time t possibly affects te weigted input zt of te idden layer at time t Intuitively an ideal recurrent neural networ would be capable of representing a sequence X : t by its idden layer activation at to allow correct classification of Xt Once again we define te tas of learning te parameters (weigts and biases) for a neural networ as finding parameters tat minimize a cost function C n (XY ) D c Several coices of c are possible as discussed in a previous section Concretely consider te case of a cross-entropy cost function c for sequence element classification defined as c T T d t i Y t i ln (at (3) i ) ( Y t i ) ln ( at (3) i ) were X a () at () is te input sequence and a (3) at (3) is te output sequence As mentioned in a previous section tis cost function requires tat every Y t be a standard basis vector wic represents te fact tat every Xt belongs to a single class 2

13 Te error of neuron in layer l at time step t is defined as δt (l) zt (l) Bacpropagation troug time is a metod for computing te partial derivatives of te cost function of a recurrent neural networ wit respect to its parameters In te case of a recurrent neural networ wit a single idden layer te metod depends solely on te following statements wic will be demonstrated sortly: δt (3) at (3) σ (zt ) ω C C C ω n n n T t T t T t σ (zt (3) ) () (3) w (3) δt(3) ω δt δt (l) (3) δt (l) at(l ) (4) at (5) (XY ) D (XY ) D (XY ) D ω (6) (7) (8) were δt is zero otice ow te error depends on te error δt for all and t Every quantity on te rigt side can be computed easily from our definitions by starting from δt (3) wic can only be computed after te networ observes te entire sequence Tat is wy te metod is called bacpropagation troug time A single idden layer recurrent neural networ can be unfolded into a feedforward networ wic may be useful to interpret bacpropagation troug time For eac pair (X Y ) D let X a () at () and consider a feedforward networ wit mt input neurons dt output neurons and T idden layers wit neurons eac Input at () is connected solely to te idden neurons in layer t and te idden neurons in layer t are connected to output neurons tat correspond to at (3) Furtermore te idden neurons in layer t < T are connected to te idden neurons in layer t Te parameters in te networ are sared bot between corresponding idden neurons in distinct layers and corresponding output neurons An analogous unfolding can be applied to any recurrent neural networ arcitecture otice tat te unfolded recurrent networ is deep for any sequence wit more tan one element wic indicates tat te model may be difficult to train for long sequences Te proof of te bacpropagation troug time statements presented below were derived independently by te autor of tis text Altoug te results are correct according to te literature te details sould be inspected carefully Consider te statement δt (3) at (3) σ (zt (3) ) Because c is differentiable wit respect to at (3) and zt (3) only affects c troug at (3) δt (3) zt (3) at (3) at (3) zt (3) at (3) σ (zt (3) ) 3

14 as we wanted to sow Furtermore if we consider a cross-entropy cost function for sequence element classification and a sigmoid activation function at (3) at (3) T at (3) T Y t T at (3) wic gives δt (3) at (3) T d t i Y t i ln (at (3) i ) ( Y t i ) ln ( at (3) i ) Y t ln (at (3) ) ( Y t ) ln ( at (3) ) ( Y t ) ( at (3) ) Y t at (3) ( at (3) )T at(3) Y t σ (zt (3) )T at (3) σ (zt (3) ) at(3) Y t T In te case of a softmax output layer it is no longer true tat zt (3) only affects c troug at (3) However it is still true tat δt (3) at(3) Y t T Te proof is analogous to tat presented in a previous section Consider te statement for a negative log-lieliood cost function c for sequence element classification σ (zt ) (3) w (3) δt(3) ω δt wic we will prove for every t < T As already mentioned we define δt 0 Because zt only affects c troug zt (3) zt(3) and troug zt (3) zt te cain rule gives zt From te definition of zt (3) zt (3) zt From te definition of zt zt zt (3) (3) zt zt zt zt (3) zt (3) zt δt (3) zt (3) zt b (3) b i ω σ (zt ) i () i ω i at i zt δt zt zt zt zt w (3) i at i w (3) σ (zt ) w i at () i i ω i at i 4

15 Going bac as we wanted to sow Consider te statement (3) δt (3) w(3) σ (zt ) σ (zt ) T (3) δt ω σ (zt ) w (3) δt(3) ω δt t δt(l) wic we prove independently for l 3 and l 2 Because b (3) only affects c troug z (3) zt (3) te cain rule gives b (3) T t zt (3) zt (3) b (3) T t δt (3) wic completes te proof for l 3 Altoug b only affects c troug z zt it is also true tat zt may affect zt for any and t By consequence te cain rule does not apply analogously to te previous case Consider as an artifice te introduction of independent biases bt b for eac time step t Because b may only affect c troug te new biases T bt T b t bt b t bt Because eac bias bt only affects c troug zt we ave bt zt zt b zt as we wanted to sow Consider te statement Because w (3) T t δt(l) at(l ) wic we prove independently for l 2 and l 3 only affects c troug z(3) zt (3) te cain rule gives w (3) T t zt (3) zt (3) w (3) T t δt (3) at wic completes te proof for l 3 Once again w only affects c troug z zt but zt may affect zt for any and t and so te cain rule does not apply analogously to te previous case As an artifice we introduce independent weigts wt w for eac time step t wic gives w T t Because wt only affects c troug zt wt wt wt w T t wt zt zt wt at () wic concludes te proof for l 2 Consider te statement ω T t at 5

16 Tis statement also requires te artifice of introducing independent recurrent weigts ωt time step t Te cain rule gives Because ωt ω T t ωt ωt ω T t ωt ω for eac refers to te weigt for time step t it only affects c troug zt Te te cain rule gives ωt zt zt ωt at as we wanted to sow Te tree last bacpropagation troug time statements follow easily from te definition of te cost C as te average of te cost c over every pair (X Y ) D Tis completes te definition of te partial derivatives of te cost function C wit respect to every parameter in a single idden layer recurrent neural networ Stocastic gradient descent can be employed to minimize C wit te usual lac of guarantees In practice online gradient descent (size one mini-batces) are commonly used Te improvements suggested in previous sections suc as momentum-based gradient descent and regularization can also be employed Finding partial derivatives of te cost wit respect to te parameters becomes very involved for more complicated networ arcitectures In tose cases it is igly advisable to implement recurrent neural networs using automatic differentiation algoritms 7 Long sort-term memory networs Long sort-term memory networs compose an important class of recurrent neural networs Tese networs were developed to deal wit exploding and vanising gradients wic may compromise learning in te recurrent networs presented in te previous section Long sort-term memory networs ave been sown to be very effective in practice Consider te sequence element classification data set D {(X i Y i ) i { n} X i (R m ) Y i (R d ) } Furtermore let X Y for every (X Y ) D We will present long sort-term memory networs wit a single idden layer composed of single-celled memory blocs Many definitions will be analogous to tose presented for single idden layer recurrent neural networs and will be omitted Let () m and (3) d Let denote te number of single-celled memory blocs in te idden layer Te input activation for te networ at time t for (X Y ) D is defined as Xt at () Similarly to a neuron in te idden layer of a typical recurrent neural networ memory bloc also receives te vectors at () and at at time step t and outputs a scalar at Long sort-term memory networs can be unfolded in time in an analogous manner to typical recurrent networs However te computations performed in a memory bloc are considerably more involved tan tose in a recurrent artificial neuron A single-celled memory bloc is composed of four modules: cell input gate I forget gate F and output gate O Te weigted input zt to te cell in memory bloc is defined as () zt b w at() ω at were a0 may be zero Tis is te analogous of te weigted input for neuron in te idden layer of a typical recurrent networ Te activation st of te cell in memory bloc is defined as st at Fst at I g(zt ) were s0 may be zero and g is a differentiable activation function Te terms at F and at I correspond to te activations of te forget and input gates respectively and will be defined sortly Because eac of tese 6

17 two scalars is usually between 0 and tey control ow muc te previous activation of te cell and te current weigted input to te cell affect its current activation Te weigted input zt G of a gate G I F or O in memory bloc is defined as zt G b G ψ G () st w G at() ω Gat were ψ G R is te so-called peepole weigt to te cell activation in te previous time step Te activation at G of a gate G in memory bloc is defined as at G f(zt G ) were f is a differentiable activation function wic is typically te sigmoid function otice tat eac gate G in memory bloc as its own parameters and beaves similarly to a typical recurrent neuron Finally te output activation at of memory bloc is defined as at at O (st ) were is a differentiable activation function Intuitively te activation of te output gate controls ow muc te current activation of te cell affects te output of te memory bloc Intuitively a memory bloc can be interpreted as a parameterized circuit By training te networ a memory bloc may learn wen to store output and erase its memory (cell activation) given te current input activation to te networ and te previous activation of te memory blocs Eac memory bloc as 4( () ) 7 parameters (weigts and biases) Te weigted input to neuron (3) in layer 3 is defined as zt (3) b (3) w (3) at as in a feedforward neural networ Te activation of neuron in layer 3 is at (3) σ(zt (3) ) Te output of te networ f on input X a () at () is te sequence a (3) at (3) Tis completes te definition of a long sort-term memory networ wit a single idden layer of single-celled memory blocs Te intuition beind long sort-term memory networs is very similar to tat for typical recurrent neural networs Te sequence X is presented to te networ element by element An ideal long sort-term memory networ would be capable of representing a sequence X : t by te activation of its memory blocs at and cells st to allow correct classification of Xt As usual learning consists on finding parameters tat minimize a cost function C n (XY ) D c For concreteness consider te case of a cross-entropy cost function c for sequence element classification defined as c T T d t i Y t i ln (at (3) i ) ( Y t i ) ln ( at (3) i ) were X a () at () is te input sequence and a (3) at (3) is te output sequence Tis cost function requires tat every Y t be a standard basis vector We will now present te bacpropagation troug time statements tat allow te computation of te partial derivatives of te cost function wit respect to every parameter in a long sort-term memory networ As usual tese partial derivatives can be used for gradient descent We start by defining auxiliary variables Te error δt (l) of neuron or bloc in layer l at time step t is defined as δt (l) zt (l) Te error G of gate G in layer 2 at time step t is defined as Te activation error ɛ a t G zt G of memory bloc in layer 2 at time step t is defined as 7

18 Te cell activation error ɛ s t ɛ a t at of memory bloc in layer 2 at time step t is defined as ɛ s t st Te following statements summarize error computation by bacpropagation troug time in long sort-term memory networs δt (3) at (3) σ (zt (3) ) () O f (zt O )(st )ɛ a t at I g (zt )ɛ s t (3) F f (zt F )st ɛ s t (4) I f (zt I )g(zt )ɛ s t (5) ɛ s t at O (st )ɛ a t at F ɛ st (3) ɛ a t w (3) δt(3) ω δt G {IFO} G {IFO} ψ G δt G (6) ω G δt G (7) were δt δt G ɛ at ɛ s T 0 and G is eiter I F or O Once again every quantity can be computed easily by starting from δt (3) wic can only be computed after te networ as observed te entire sequence Tese errors can be used to compute te partial derivative of te cost C wit respect to every parameter in te networ using te following statements b G w G ω G ψ G ω T t T t T t T t T t T t T t C p n G (8) G at() (9) Gat (0) Gst () δt (l) (2) δt (l) at(l ) (3) at (4) (XY ) D p (5) 8

19 were p is any parameter (weigt or bias) in te networ and G is eiter I F or O We now present proofs for te aforementioned statements for bacpropagation troug time in long sort-term memory networs Consider te statement δt (3) at (3) σ (zt (3) ) For concreteness let c be te cross-entropy cost function for sequence element classification and consider a sigmoid output layer Because c is differentiable wit respect to at (3) and zt (3) only affects c troug at (3) δt (3) zt (3) at (3) at (3) zt (3) at (3) σ (zt (3) ) as we wanted to sow Tis proof is identical to te one presented in te previous section A sigmoid output layer also gives δt (3) at (3) σ (zt (3) ) at(3) Y t T wic was also proved in te previous section A negative log-lieliood cost function c for sequence element classification wit a softmax output layer also results in δt (3) at (3) Y t T altoug te statement δt (3) σ (zt (3) at (3) ) is no longer true Te proof is straigtforward given te analysis of te negative log-lieliood cost function presented in a previous section A dependency grap may be useful to understand te relationsips between variables on wic te next proofs rely Consider te statement O f (zt O )(st )ɛ a t Because zt O only affects c troug at O zt O Because st is not affected by zt O O ɛ at at at zt O (st ) at O zt O ɛ a t zt O at O (st ) ɛ a t (st )f (zt O ) as we wanted to sow Consider te statement at I g (zt )ɛ s t Because zt only affects c troug st zt st st zt ɛ s t zt at Fst at I g(zt ) Because only g(zt ) is affected by zt on te expression between bracets ɛ s t at I g (zt ) as we wanted to sow otice ow tis error is regulated by te activation of te input gate Intuitively if te input gate is closed (wen its activation is close to zero) weigted input parameters do not significantly affect te cost Consider te statement F zt F F f (zt F st st zt F )st ɛ s t Because zt F only affects c troug st ɛ s t zt F Because only at F is affected by zt F on te expression between bracets F ɛ st f (zt F at Fst at I g(zt ) )st as we wanted to sow Consider te statement I f (zt I )g(zt )ɛ s t Because zt I only affects c troug st 9

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Recap: probability, language models, and feedforward networks Simple Recurrent Networks Adam Lopez Credits: Mirella Lapata

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

5.1 We will begin this section with the definition of a rational expression. We

5.1 We will begin this section with the definition of a rational expression. We Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go

More information

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 Mat 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 f(x+) f(x) 10 1. For f(x) = x 2 + 2x 5, find ))))))))) and simplify completely. NOTE: **f(x+) is NOT f(x)+! f(x+) f(x) (x+) 2 + 2(x+) 5 ( x 2

More information

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator.

Lecture XVII. Abstract We introduce the concept of directional derivative of a scalar function and discuss its relation with the gradient operator. Lecture XVII Abstract We introduce te concept of directional derivative of a scalar function and discuss its relation wit te gradient operator. Directional derivative and gradient Te directional derivative

More information

Continuity and Differentiability Worksheet

Continuity and Differentiability Worksheet Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

MVT and Rolle s Theorem

MVT and Rolle s Theorem AP Calculus CHAPTER 4 WORKSHEET APPLICATIONS OF DIFFERENTIATION MVT and Rolle s Teorem Name Seat # Date UNLESS INDICATED, DO NOT USE YOUR CALCULATOR FOR ANY OF THESE QUESTIONS In problems 1 and, state

More information

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

The derivative function

The derivative function Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative

More information

Learning based super-resolution land cover mapping

Learning based super-resolution land cover mapping earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc

More information

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems 5 Ordinary Differential Equations: Finite Difference Metods for Boundary Problems Read sections 10.1, 10.2, 10.4 Review questions 10.1 10.4, 10.8 10.9, 10.13 5.1 Introduction In te previous capters we

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

REVIEW LAB ANSWER KEY

REVIEW LAB ANSWER KEY REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g

More information

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f = Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

f a h f a h h lim lim

f a h f a h h lim lim Te Derivative Te derivative of a function f at a (denoted f a) is f a if tis it exists. An alternative way of defining f a is f a x a fa fa fx fa x a Note tat te tangent line to te grap of f at te point

More information

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to

More information

Time (hours) Morphine sulfate (mg)

Time (hours) Morphine sulfate (mg) Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15

More information

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY (Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

Sin, Cos and All That

Sin, Cos and All That Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives

More information

How to Find the Derivative of a Function: Calculus 1

How to Find the Derivative of a Function: Calculus 1 Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te

More information

3.1 Extreme Values of a Function

3.1 Extreme Values of a Function .1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find

More information

Excursions in Computing Science: Week v Milli-micro-nano-..math Part II

Excursions in Computing Science: Week v Milli-micro-nano-..math Part II Excursions in Computing Science: Week v Milli-micro-nano-..mat Part II T. H. Merrett McGill University, Montreal, Canada June, 5 I. Prefatory Notes. Cube root of 8. Almost every calculator as a square-root

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225

THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Math 225 THE IDEA OF DIFFERENTIABILITY FOR FUNCTIONS OF SEVERAL VARIABLES Mat 225 As we ave seen, te definition of derivative for a Mat 111 function g : R R and for acurveγ : R E n are te same, except for interpretation:

More information

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy

Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Deep Belief Network Training Improvement Using Elite Samples Minimizing Free Energy Moammad Ali Keyvanrad a, Moammad Medi Homayounpour a a Laboratory for Intelligent Multimedia Processing (LIMP), Computer

More information

Click here to see an animation of the derivative

Click here to see an animation of the derivative Differentiation Massoud Malek Derivative Te concept of derivative is at te core of Calculus; It is a very powerful tool for understanding te beavior of matematical functions. It allows us to optimize functions,

More information

Introduction to Derivatives

Introduction to Derivatives Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

Quantum Mechanics Chapter 1.5: An illustration using measurements of particle spin.

Quantum Mechanics Chapter 1.5: An illustration using measurements of particle spin. I Introduction. Quantum Mecanics Capter.5: An illustration using measurements of particle spin. Quantum mecanics is a teory of pysics tat as been very successful in explaining and predicting many pysical

More information

New Streamfunction Approach for Magnetohydrodynamics

New Streamfunction Approach for Magnetohydrodynamics New Streamfunction Approac for Magnetoydrodynamics Kab Seo Kang Brooaven National Laboratory, Computational Science Center, Building 63, Room, Upton NY 973, USA. sang@bnl.gov Summary. We apply te finite

More information

Flavius Guiaş. X(t + h) = X(t) + F (X(s)) ds.

Flavius Guiaş. X(t + h) = X(t) + F (X(s)) ds. Numerical solvers for large systems of ordinary differential equations based on te stocastic direct simulation metod improved by te and Runge Kutta principles Flavius Guiaş Abstract We present a numerical

More information

2.11 That s So Derivative

2.11 That s So Derivative 2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point

More information

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS

HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS HOW TO DEAL WITH FFT SAMPLING INFLUENCES ON ADEV CALCULATIONS Po-Ceng Cang National Standard Time & Frequency Lab., TL, Taiwan 1, Lane 551, Min-Tsu Road, Sec. 5, Yang-Mei, Taoyuan, Taiwan 36 Tel: 886 3

More information

. If lim. x 2 x 1. f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x) Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

Quaternion Dynamics, Part 1 Functions, Derivatives, and Integrals. Gary D. Simpson. rev 01 Aug 08, 2016.

Quaternion Dynamics, Part 1 Functions, Derivatives, and Integrals. Gary D. Simpson. rev 01 Aug 08, 2016. Quaternion Dynamics, Part 1 Functions, Derivatives, and Integrals Gary D. Simpson gsim1887@aol.com rev 1 Aug 8, 216 Summary Definitions are presented for "quaternion functions" of a quaternion. Polynomial

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION MODULE 5 THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA EXAMINATION NEW MODULAR SCHEME introduced from te examinations in 009 MODULE 5 SOLUTIONS FOR SPECIMEN PAPER B THE QUESTIONS ARE CONTAINED IN A SEPARATE FILE

More information

1. State whether the function is an exponential growth or exponential decay, and describe its end behaviour using limits.

1. State whether the function is an exponential growth or exponential decay, and describe its end behaviour using limits. Questions 1. State weter te function is an exponential growt or exponential decay, and describe its end beaviour using its. (a) f(x) = 3 2x (b) f(x) = 0.5 x (c) f(x) = e (d) f(x) = ( ) x 1 4 2. Matc te

More information

Convergence and Descent Properties for a Class of Multilevel Optimization Algorithms

Convergence and Descent Properties for a Class of Multilevel Optimization Algorithms Convergence and Descent Properties for a Class of Multilevel Optimization Algoritms Stepen G. Nas April 28, 2010 Abstract I present a multilevel optimization approac (termed MG/Opt) for te solution of

More information

Overdispersed Variational Autoencoders

Overdispersed Variational Autoencoders Overdispersed Variational Autoencoders Harsil Sa, David Barber and Aleksandar Botev Department of Computer Science, University College London Alan Turing Institute arsil.sa.15@ucl.ac.uk, david.barber@ucl.ac.uk,

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

Adaptive Neural Filters with Fixed Weights

Adaptive Neural Filters with Fixed Weights Adaptive Neural Filters wit Fixed Weigts James T. Lo and Justin Nave Department of Matematics and Statistics University of Maryland Baltimore County Baltimore, MD 150, U.S.A. e-mail: jameslo@umbc.edu Abstract

More information

Math 1241 Calculus Test 1

Math 1241 Calculus Test 1 February 4, 2004 Name Te first nine problems count 6 points eac and te final seven count as marked. Tere are 120 points available on tis test. Multiple coice section. Circle te correct coice(s). You do

More information

Journal of Computational and Applied Mathematics

Journal of Computational and Applied Mathematics Journal of Computational and Applied Matematics 94 (6) 75 96 Contents lists available at ScienceDirect Journal of Computational and Applied Matematics journal omepage: www.elsevier.com/locate/cam Smootness-Increasing

More information

1watt=1W=1kg m 2 /s 3

1watt=1W=1kg m 2 /s 3 Appendix A Matematics Appendix A.1 Units To measure a pysical quantity, you need a standard. Eac pysical quantity as certain units. A unit is just a standard we use to compare, e.g. a ruler. In tis laboratory

More information

Cubic Functions: Local Analysis

Cubic Functions: Local Analysis Cubic function cubing coefficient Capter 13 Cubic Functions: Local Analysis Input-Output Pairs, 378 Normalized Input-Output Rule, 380 Local I-O Rule Near, 382 Local Grap Near, 384 Types of Local Graps

More information

Financial Econometrics Prof. Massimo Guidolin

Financial Econometrics Prof. Massimo Guidolin CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis

More information

Solution for the Homework 4

Solution for the Homework 4 Solution for te Homework 4 Problem 6.5: In tis section we computed te single-particle translational partition function, tr, by summing over all definite-energy wavefunctions. An alternative approac, owever,

More information

Spike train entropy-rate estimation using hierarchical Dirichlet process priors

Spike train entropy-rate estimation using hierarchical Dirichlet process priors publised in: Advances in Neural Information Processing Systems 26 (23), 276 284. Spike train entropy-rate estimation using ierarcical Diriclet process priors Karin Knudson Department of Matematics kknudson@mat.utexas.edu

More information

GRID CONVERGENCE ERROR ANALYSIS FOR MIXED-ORDER NUMERICAL SCHEMES

GRID CONVERGENCE ERROR ANALYSIS FOR MIXED-ORDER NUMERICAL SCHEMES GRID CONVERGENCE ERROR ANALYSIS FOR MIXED-ORDER NUMERICAL SCHEMES Cristoper J. Roy Sandia National Laboratories* P. O. Box 5800, MS 085 Albuquerque, NM 8785-085 AIAA Paper 00-606 Abstract New developments

More information

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these.

4. The slope of the line 2x 7y = 8 is (a) 2/7 (b) 7/2 (c) 2 (d) 2/7 (e) None of these. Mat 11. Test Form N Fall 016 Name. Instructions. Te first eleven problems are wort points eac. Te last six problems are wort 5 points eac. For te last six problems, you must use relevant metods of algebra

More information

Continuous Stochastic Processes

Continuous Stochastic Processes Continuous Stocastic Processes Te term stocastic is often applied to penomena tat vary in time, wile te word random is reserved for penomena tat vary in space. Apart from tis distinction, te modelling

More information

Higher Derivatives. Differentiable Functions

Higher Derivatives. Differentiable Functions Calculus 1 Lia Vas Higer Derivatives. Differentiable Functions Te second derivative. Te derivative itself can be considered as a function. Te instantaneous rate of cange of tis function is te second derivative.

More information

A h u h = f h. 4.1 The CoarseGrid SystemandtheResidual Equation

A h u h = f h. 4.1 The CoarseGrid SystemandtheResidual Equation Capter Grid Transfer Remark. Contents of tis capter. Consider a grid wit grid size and te corresponding linear system of equations A u = f. Te summary given in Section 3. leads to te idea tat tere migt

More information

New Distribution Theory for the Estimation of Structural Break Point in Mean

New Distribution Theory for the Estimation of Structural Break Point in Mean New Distribution Teory for te Estimation of Structural Break Point in Mean Liang Jiang Singapore Management University Xiaou Wang Te Cinese University of Hong Kong Jun Yu Singapore Management University

More information

INTRODUCTION AND MATHEMATICAL CONCEPTS

INTRODUCTION AND MATHEMATICAL CONCEPTS INTODUCTION ND MTHEMTICL CONCEPTS PEVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips of sine,

More information

Chapter 4: Numerical Methods for Common Mathematical Problems

Chapter 4: Numerical Methods for Common Mathematical Problems 1 Capter 4: Numerical Metods for Common Matematical Problems Interpolation Problem: Suppose we ave data defined at a discrete set of points (x i, y i ), i = 0, 1,..., N. Often it is useful to ave a smoot

More information

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist

1. Questions (a) through (e) refer to the graph of the function f given below. (A) 0 (B) 1 (C) 2 (D) 4 (E) does not exist Mat 1120 Calculus Test 2. October 18, 2001 Your name Te multiple coice problems count 4 points eac. In te multiple coice section, circle te correct coice (or coices). You must sow your work on te oter

More information

Chapter 2 Limits and Continuity

Chapter 2 Limits and Continuity 4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(

More information

CHAPTER (A) When x = 2, y = 6, so f( 2) = 6. (B) When y = 4, x can equal 6, 2, or 4.

CHAPTER (A) When x = 2, y = 6, so f( 2) = 6. (B) When y = 4, x can equal 6, 2, or 4. SECTION 3-1 101 CHAPTER 3 Section 3-1 1. No. A correspondence between two sets is a function only if eactly one element of te second set corresponds to eac element of te first set. 3. Te domain of a function

More information

1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible.

1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible. 004 Algebra Pretest answers and scoring Part A. Multiple coice questions. Directions: Circle te letter ( A, B, C, D, or E ) net to te correct answer. points eac, no partial credit. Wic one of te following

More information

Gradient Descent etc.

Gradient Descent etc. 1 Gradient Descent etc EE 13: Networked estimation and control Prof Kan) I DERIVATIVE Consider f : R R x fx) Te derivative is defined as d fx) = lim dx fx + ) fx) Te cain rule states tat if d d f gx) )

More information

= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c)

= 0 and states ''hence there is a stationary point'' All aspects of the proof dx must be correct (c) Paper 1: Pure Matematics 1 Mark Sceme 1(a) (i) (ii) d d y 3 1x 4x x M1 A1 d y dx 1.1b 1.1b 36x 48x A1ft 1.1b Substitutes x = into teir dx (3) 3 1 4 Sows d y 0 and states ''ence tere is a stationary point''

More information

A First-Order System Approach for Diffusion Equation. I. Second-Order Residual-Distribution Schemes

A First-Order System Approach for Diffusion Equation. I. Second-Order Residual-Distribution Schemes A First-Order System Approac for Diffusion Equation. I. Second-Order Residual-Distribution Scemes Hiroaki Nisikawa W. M. Keck Foundation Laboratory for Computational Fluid Dynamics, Department of Aerospace

More information

Name: Answer Key No calculators. Show your work! 1. (21 points) All answers should either be,, a (finite) real number, or DNE ( does not exist ).

Name: Answer Key No calculators. Show your work! 1. (21 points) All answers should either be,, a (finite) real number, or DNE ( does not exist ). Mat - Final Exam August 3 rd, Name: Answer Key No calculators. Sow your work!. points) All answers sould eiter be,, a finite) real number, or DNE does not exist ). a) Use te grap of te function to evaluate

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x

More information

RECOGNITION of online handwriting aims at finding the

RECOGNITION of online handwriting aims at finding the SUBMITTED ON SEPTEMBER 2017 1 A General Framework for te Recognition of Online Handwritten Grapics Frank Julca-Aguilar, Harold Moucère, Cristian Viard-Gaudin, and Nina S. T. Hirata arxiv:1709.06389v1 [cs.cv]

More information

Material for Difference Quotient

Material for Difference Quotient Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient

More information

Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT the Math. What is the half-life of radon?

Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT the Math. What is the half-life of radon? 8.5 Solving Exponential Equations GOAL Solve exponential equations in one variable using a variety of strategies. LEARN ABOUT te Mat All radioactive substances decrease in mass over time. Jamie works in

More information

WYSE Academic Challenge 2004 Sectional Mathematics Solution Set

WYSE Academic Challenge 2004 Sectional Mathematics Solution Set WYSE Academic Callenge 00 Sectional Matematics Solution Set. Answer: B. Since te equation can be written in te form x + y, we ave a major 5 semi-axis of lengt 5 and minor semi-axis of lengt. Tis means

More information

DIGRAPHS FROM POWERS MODULO p

DIGRAPHS FROM POWERS MODULO p DIGRAPHS FROM POWERS MODULO p Caroline Luceta Box 111 GCC, 100 Campus Drive, Grove City PA 1617 USA Eli Miller PO Box 410, Sumneytown, PA 18084 USA Clifford Reiter Department of Matematics, Lafayette College,

More information

THE hidden Markov model (HMM)-based parametric

THE hidden Markov model (HMM)-based parametric JOURNAL OF L A TEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1 Modeling Spectral Envelopes Using Restricted Boltzmann Macines and Deep Belief Networks for Statistical Parametric Speec Syntesis Zen-Hua Ling,

More information

INTRODUCTION AND MATHEMATICAL CONCEPTS

INTRODUCTION AND MATHEMATICAL CONCEPTS Capter 1 INTRODUCTION ND MTHEMTICL CONCEPTS PREVIEW Tis capter introduces you to te basic matematical tools for doing pysics. You will study units and converting between units, te trigonometric relationsips

More information

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note

More information

Exercises for numerical differentiation. Øyvind Ryan

Exercises for numerical differentiation. Øyvind Ryan Exercises for numerical differentiation Øyvind Ryan February 25, 2013 1. Mark eac of te following statements as true or false. a. Wen we use te approximation f (a) (f (a +) f (a))/ on a computer, we can

More information

Average Rate of Change

Average Rate of Change Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope

More information

HOMEWORK HELP 2 FOR MATH 151

HOMEWORK HELP 2 FOR MATH 151 HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,

More information

Impact of Lightning Strikes on National Airspace System (NAS) Outages

Impact of Lightning Strikes on National Airspace System (NAS) Outages Impact of Ligtning Strikes on National Airspace System (NAS) Outages A Statistical Approac Aurélien Vidal University of California at Berkeley NEXTOR Berkeley, CA, USA aurelien.vidal@berkeley.edu Jasenka

More information

The total error in numerical differentiation

The total error in numerical differentiation AMS 147 Computational Metods and Applications Lecture 08 Copyrigt by Hongyun Wang, UCSC Recap: Loss of accuracy due to numerical cancellation A B 3, 3 ~10 16 In calculating te difference between A and

More information

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk

More information

11.6 DIRECTIONAL DERIVATIVES AND THE GRADIENT VECTOR

11.6 DIRECTIONAL DERIVATIVES AND THE GRADIENT VECTOR SECTION 11.6 DIRECTIONAL DERIVATIVES AND THE GRADIENT VECTOR 633 wit speed v o along te same line from te opposite direction toward te source, ten te frequenc of te sound eard b te observer is were c is

More information

158 Calculus and Structures

158 Calculus and Structures 58 Calculus and Structures CHAPTER PROPERTIES OF DERIVATIVES AND DIFFERENTIATION BY THE EASY WAY. Calculus and Structures 59 Copyrigt Capter PROPERTIES OF DERIVATIVES. INTRODUCTION In te last capter you

More information

Digital Filter Structures

Digital Filter Structures Digital Filter Structures Te convolution sum description of an LTI discrete-time system can, in principle, be used to implement te system For an IIR finite-dimensional system tis approac is not practical

More information

Derivatives. By: OpenStaxCollege

Derivatives. By: OpenStaxCollege By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator

More information

Lab 6 Derivatives and Mutant Bacteria

Lab 6 Derivatives and Mutant Bacteria Lab 6 Derivatives and Mutant Bacteria Date: September 27, 20 Assignment Due Date: October 4, 20 Goal: In tis lab you will furter explore te concept of a derivative using R. You will use your knowledge

More information

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line

Teaching Differentiation: A Rare Case for the Problem of the Slope of the Tangent Line Teacing Differentiation: A Rare Case for te Problem of te Slope of te Tangent Line arxiv:1805.00343v1 [mat.ho] 29 Apr 2018 Roman Kvasov Department of Matematics University of Puerto Rico at Aguadilla Aguadilla,

More information

Notes on wavefunctions II: momentum wavefunctions

Notes on wavefunctions II: momentum wavefunctions Notes on wavefunctions II: momentum wavefunctions and uncertainty Te state of a particle at any time is described by a wavefunction ψ(x). Tese wavefunction must cange wit time, since we know tat particles

More information