CS294A Lecture notes. Andrew Ng

Size: px
Start display at page:

Download "CS294A Lecture notes. Andrew Ng"

Transcription

1 CS294A Lecture notes Andrew Ng Sparse autoencoder 1 Introducton Supervsed learnng s one of the most powerful tools of AI, and has led to automatc zp code recognton, speech recognton, self-drvng cars, and a contnually mprovng understandng of the human genome. Despte ts sgnfcant successes, supervsed learnng today s stll severely lmted. Specfcally, most applcatons of t stll requre that we manually specfy the nput features x gven to the algorthm. Once a good feature representaton s gven, a supervsed learnng algorthm can do well. But n such domans as computer vson, audo processng, and natural language processng, there re now hundreds or perhaps thousands of researchers who ve spent years of ther lves slowly and laborously hand-engneerng vson, audo or text features. Whle much of ths feature-engneerng work s extremely clever, one has to wonder f we can do better. Certanly ths labor-ntensve hand-engneerng approach does not scale well to new problems; further, deally we d lke to have algorthms that can automatcally learn even better feature representatons than the hand-engneered ones. These notes descrbe the sparse autoencoder learnng algorthm, whch s one approach to automatcally learn features from unlabeled data. In some domans, such as computer vson, ths approach s not by tself compettve wth the best hand-engneered features, but the features t can learn do turn out to be useful for a range of problems (ncludng ones n audo, text, etc). Further, there re more sophstcated versons of the sparse autoencoder (not descrbed n these notes, but that you ll hear more about later n the class) that do surprsngly well, and n some cases are compettve wth or sometmes even better than some of the hand-engneered representatons. 1

2 Ths set of notes s organzed as follows. We wll frst descrbe feedforward neural networks and the backpropagaton algorthm for supervsed learnng. Then, we show how ths s used to construct an autoencoder, whch s an unsupervsed learnng algorthm, and fnally how we can buld on ths to derve a sparse autoencoder. Because these notes are farly notaton-heavy, the last page also contans a summary of the symbols used. 2 Neural networks Consder a supervsed learnng problem where we have access to labeled tranng examples (x (), y () ). Neural networks gve a way of defnng a complex, non-lnear form of hypotheses h W,b (x), wth parameters W, b that we can ft to our data. To descrbe neural networks, we use the followng dagram to denote a sngle neuron : Ths neuron s a computatonal unt that takes as nput x 1, x 2, x 3 (and a +1 ntercept term), and outputs h w,b (x) = f(w T x) = f( 3 =1 w x + b), where f : R R s called the actvaton functon. One possble choce for f( ) s the sgmod functon f(z) = 1/(1 + exp( z)); n that case, our sngle neuron corresponds exactly to the nput-output mappng defned by logstc regresson. In these notes, however, we ll use a dfferent actvaton functon, the hyperbolc tangent, or tanh, functon: Here s a plot of the tanh(z) functon: f(z) = tanh(z) = ez e z, (1) e z + e z 2

3 The tanh(z) functon s a rescaled verson of the sgmod, and ts output range s [ 1, 1] nstead of [0, 1]. Our descrpton of neural networks wll use ths actvaton functon. Note that unlke CS221 and (parts of) CS229, we are not usng the conventon here of x 0 = 1. Instead, the ntercept term s handled separated by the parameter b. Fnally, one dentty that ll be useful later: If f(z) = tanh(z), then ts dervatve s gven by f (z) = 1 (f(z)) 2. (Derve ths yourself usng the defnton of tanh(z) gven n Equaton 1.) 2.1 Neural network formulaton A neural network s put together by hookng together many of our smple neurons, so that the output of a neuron can be the nput of another. For example, here s a small neural network: 3

4 In ths fgure, we have used crcles to also denote the nputs to the network. The crcles labeled +1 are called bas unts, and correspond to the ntercept term. The leftmost layer of the network s called the nput layer, and the rghtmost layer the output layer (whch, n ths example, has only one node). The mddle layer of nodes s called the hdden layer, because ts values are not observed n the tranng set. We also say that our example neural network has 3 nput unts (not countng the bas unt), 3 hdden unts, and 1 output unt. We wll let n l denote the number of layers n our network; thus n l = 3 n our example. We label layer l as L l, so layer L 1 s the nput layer, and layer L nl the output layer. Our neural network has parameters (W, b) = (W (1), b (1), W (2), b (2) ), where we wrte W (l) j to denote the parameter (or weght) assocated wth the connecton between unt j n layer l, and unt n layer l+1. (Note the order of the ndces.) Also, b (l) s the bas assocated wth unt n layer l+1. Thus, n our example, we have W (1) R 3 3, and W (2) R 1 3. Note that bas unts don t have nputs or connectons gong nto them, snce they always output the value +1. We also let s l denote the number of nodes n layer l (not countng the bas unt). to denote the actvaton (meanng output value) of unt n layer l. For l = 1, we also use a (1) = x to denote the -th nput. Gven a fxed settng of the parameters W, b, our neural network defnes a hypothess h W,b (x) that outputs a real number. Specfcally, the computaton that ths neural network represents s gven by: We wll wrte a (l) a (2) 1 = f(w (1) 11 x 1 + W (1) 12 x 2 + W (1) 13 x 3 + b (1) 1 ) (2) a (2) 2 = f(w (1) 21 x 1 + W (1) 22 x 2 + W (1) 23 x 3 + b (1) 2 ) (3) a (2) 3 = f(w (1) 31 x 1 + W (1) 32 x 2 + W (1) 33 x 3 + b (1) 3 ) (4) h W,b (x) = a (3) 1 = f(w (2) 11 a 1 + W (2) 12 a 2 + W (2) 13 a 3 + b (2) 1 ) (5) In the sequel, we also let z (l) n layer l, ncludng the bas term (e.g., z (2) a (l) = f(z (l) ). denote the total weghted sum of nputs to unt = n j=1 W (1) j x j + b (1) ), so that Note that ths easly lends tself to a more compact notaton. Specfcally, f we extend the actvaton functon f( ) to apply to vectors n an elementwse fashon (.e., f([z 1, z 2, z 3 ]) = [tanh(z 1 ), tanh(z 2 ), tanh(z 3 )]), then we can 4

5 wrte Equatons (2-5) more compactly as: z (2) = W (1) x + b (1) a (2) = f(z (2) ) z (3) = W (2) a (2) + b (2) h W,b (x) = a (3) = f(z (3) ) More generally, recallng that we also use a (1) = x to also denote the values from the nput layer, then gven layer l s actvatons a (l), we can compute layer l + 1 s actvatons a (l+1) as: z (l+1) = W (l) a (l) + b (l) (6) a (l+1) = f(z (l+1) ) (7) By organzng our parameters n matrces and usng matrx-vector operatons, we can take advantage of fast lnear algebra routnes to quckly perform calculatons n our network. We have so far focused on one example neural network, but one can also buld neural networks wth other archtectures (meanng patterns of connectvty between neurons), ncludng ones wth multple hdden layers. The most common choce s a n l -layered network where layer 1 s the nput layer, layer n l s the output layer, and each layer l s densely connected to layer l + 1. In ths settng, to compute the output of the network, we can successvely compute all the actvatons n layer L 2, then layer L 3, and so on, up to layer L nl, usng Equatons (6-7). Ths s one example of a feedforward neural network, snce the connectvty graph does not have any drected loops or cycles. Neural networks can also have multple output unts. For example, here s a network wth two hdden layers layers L 2 and L 3 and two output unts n layer L 4 : 5

6 To tran ths network, we would need tranng examples (x (), y () ) where y () R 2. Ths sort of network s useful f there re multple outputs that you re nterested n predctng. (For example, n a medcal dagnoss applcaton, the vector x mght gve the nput features of a patent, and the dfferent outputs y s mght ndcate presence or absence of dfferent dseases.) 2.2 Backpropagaton algorthm We wll tran our neural network usng stochastc gradent descent. For much of CS221 and CS229, we consdered a settng n whch we have a fxed tranng set {(x (1), y (1) ),..., (x (m), y (m) )}, and we ran ether batch or stochastc gradent descent on that fxed tranng set. In these notes, wll take an onlne learnng vew, n whch we magne that our algorthm has access to an unendng sequence of tranng examples {(x (1), y (1) ), (x (2), y (2) ), (x (3), y (3) ),...}. In practce, f we have only a fnte tranng set, then we can form such a sequence by repeatedly vstng our fxed tranng set, so that the examples n the sequence wll repeat. But even n ths case, the onlne learnng vew wll make some of our algorthms easer to descrbe. In ths settng, stochastc gradent descent wll proceed as follows: For = 1, 2, 3,... Get next tranng example (x (), y () ). Update W (l) jk b (l) j (l) := W := b (l) j jk α α b (l) j W (l) jk J(W, b; x (), y () ) J(W, b; x (), y () ) Here, α s the learnng rate parameter, and J(W, b) = J(W, b; x, y) s a cost functon defned wth respect to a sngle tranng example. (When there s no rsk of ambguty, we drop the dependence of J on the tranng example x, y, and smply wrte J(W, b)). If the tranng examples are drawn IID from some tranng dstrbuton D, we can thnk of ths algorthm as tryng to mnmze E (x,y) D [J(W, b; x, y)]. Alternatvely, f our sequence of examples s obtaned by repeatng some fxed, fnte tranng set {(x (1), y (1) ),..., (x (m), y (m) )}, then ths algorthm s standard stochastc gradent descent for mnmzng 1 m m J(W, b; x, y). =1 6

7 To tran our neural network, we wll use the cost functon: J(W, b; x, y) = 1 2 ( hw,b (x) y 2) λ 2 n l 1 l=1 s l =1 s l+1 ( j=1 W (l) j The frst term s a sum-of-squares error term; the second s a regularzaton term (also called a weght decay term) that tends to decrease the magntude of the weghts, and helps prevent overfttng. 1 The weght decay parameter λ controls the relatve mportance of the two terms. Ths cost functon above s often used both for classfcaton and for regresson problems. For classfcaton, we let y = +1 or 1 represent the two class labels (recall that the tanh(z) actvaton functon outputs values n [ 1, 1], so we use +1/-1 valued outputs nstead of 0/1). For regresson problems, we frst scale our outputs to ensure that they le n the [ 1, 1] range. Our goal s to mnmze E (x,y) [J(W, b; x, y)] as a functon of W and b. To tran our neural network, we wll ntalze each parameter W (l) j and each b (l) to a small random value near zero (say accordng to a N (0, ɛ 2 ) dstrbuton for some small ɛ, say 0.01), and then apply stochastc gradent descent. Snce J(W, b; x, y) s a non-convex functon, gradent descent s susceptble to local optma; however, n practce gradent descent usually works farly well. Also, n neural network tranng, stochastc gradent descent s almost always used rather than batch gradent descent. Fnally, note that t s mportant to ntalze the parameters randomly, rather than to all 0 s. If all the parameters start off at dentcal values, then all the hdden layer unts wll end up learnng the same functon of the nput (more formally, W (1) j wll be the same for all values of, so that a (2) 1 = a (2) 2 =... for any nput x). The random ntalzaton serves the purpose of symmetry breakng. We now descrbe the backpropagaton algorthm, whch gves an effcent way to compute the partal dervatves we need n order to perform stochastc gradent descent. The ntuton behnd the algorthm s as follows. Gven a tranng example (x, y), we wll frst run a forward pass to compute all the actvatons throughout the network, ncludng the output value of the hypothess h W,b (x). Then, for each node n layer l, we would lke to compute 1 Usually weght decay s not appled to the bas terms b (l), as reflected n our defnton for J(W, b; x, y). Applyng weght decay to the bas unts usually makes only a small dfferent to the fnal network, however. If you took CS229, you may also recognze weght decay ths as essentally a varant of the Bayesan regularzaton method you saw there, where we placed a Gaussan pror on the parameters and dd MAP (nstead of maxmum lkelhood) estmaton. 7 ) 2

8 an error term δ (l) that measures how much that node was responsble for any errors n our output. For an output node, we can drectly measure the dfference between the network s actvaton and the true target value, and use that to defne δ (n l) (where layer n l s the output layer). How about hdden unts? For those, we wll compute δ (l) based on a weghted average of the error terms of the nodes that uses a (l) as an nput. In detal, here s the backpropagaton algorthm: 1. Perform a feedforward pass, computng the actvatons for layers L 2, L 3, and so on up to the output layer L nl. 2. For each output unt n layer n l (the output layer), set δ (n l) = z (n l) 1 2 y h W,b(x) 2 = (y a (n l) ) f (z (n l) ) 3. For l = n l 1, n l 2, n l 3,..., 2 For each node n layer l, set δ (l) = 4. Update each weght W (l) j ( sl+1 j=1 and b (l) j W (l) j := W (l) j α W (l) j δ(l+1) j ( b (l) := b (l) αδ (l+1). ) accordng to: a (l) j δ(l+1) f (z (l) ) ) + λw (l) j Although we have not proved t here, t turns out that W (l) j b (l) J(W, b; x, y) = a (l) j δ(l+1) + λw (l) j, J(W, b; x, y) = δ (l+1). Thus, ths algorthm s exactly mplementng stochastc gradent descent. Fnally, we can also re-wrte the algorthm usng matrx-vectoral notaton. We wll use to denote the element-wse product operator (denoted.* n Matlab or Octave, and also called the Hadamard product), so that f 8

9 a = b c, then a = b c. Smlar to how we extended the defnton of f( ) to apply element-wse to vectors, we also do the same for f ( ) (so that f ([z 1, z 2, z 3 ]) = [ z 1 tanh(z 1 ), z 2 tanh(z 2 ), z 3 tanh(z 3 )]). The algorthm can then be wrtten: 1. Perform a feedforward pass, computng the actvatons for layers L 2, L 3, up to the output layer L nl, usng Equatons (6-7). 2. For the output layer (layer n l ), set δ (n l) = (y a (n l) ) f (z (n) ) 3. For l = n l 1, n l 2, n l 3,..., 2 Set δ (l) = ( (W (l) ) T δ (l+1)) f (z (l) ) 4. Update the parameters accordng to: W (l) := W (l) α ( δ (l+1) (a (l) ) T + λw (l)) b (l) := b (l) αδ (l+1). Implementaton note 1: In steps 2 and 3 above, we need to compute f (z (l) ) for each value of. Assumng f(z) s the tanh actvaton functon, we would already have a (l) stored away from the forward pass through the network. Thus, usng the expresson that we worked out earler for f (z), we can compute ths as f (z (l) ) = 1 (a (l) ) 2. Implementaton note 2: Backpropagaton s a notorously dffcult algorthm to debug and get rght, especally snce many subtly buggy mplementatons of t for example, one that has an off-by-one error n the ndces and that thus only trans some of the layers of weghts, or an mplementaton that omts the bas term, etc. wll manage to learn somethng that can look surprsngly reasonable (whle performng less well than a correct mplementaton). Thus, even wth a buggy mplementaton, t may not at all be apparent that anythng s amss. So, when mplementng backpropagaton, do read and re-read your code to check t carefully. Some people also numercally check ther computaton of the dervatves; f you know how to do ths, t s worth consderng too. (Feel free to ask us f you want to learn more about ths.) 9

10 3 Autoencoders and sparsty So far, we have descrbed the applcaton of neural networks to supervsed learnng, n whch we are have labeled tranng examples. Now suppose we have only unlabeled tranng examples set {x (1), x (2), x (3),...}, where x () R n. An autoencoder neural network s an unsupervsed learnng algorthm that apples back propagaton, settng the target values to be equal to the nputs. I.e., t uses y () = x (). Here s an autoencoder: The autoencoder tres to learn a functon h W,b (x) x. In other words, t s tryng to learn an approxmaton to the dentty functon, so as to output ˆx that s smlar to x. The dentty functon seems a partcularly trval functon to be tryng to learn; but by placng constrants on the network, such as by lmtng the number of hdden unts, we can dscover nterestng structure about the data. As a concrete example, suppose the nputs x are the pxel ntensty values from a mage (100 pxels) so n = 100, and there are s 2 = 50 hdden unts n layer L 2. Note that we also have y R 100. Snce there are only 50 hdden unts, the network s forced to learn a compressed representaton of the nput. I.e., gven only the vector of hdden unt actvatons a (2) R 50, t must try to reconstruct the 100-pxel 10

11 nput x. If the nput were completely random say, each x comes from an IID Gaussan ndependent of the other features then ths compresson task would be very dffcult. But f there s structure n the data, for example, f some of the nput features are correlated, then ths algorthm wll be able to dscover some of those correlatons. 2 Our argument above reled on the number of hdden unts s 2 beng small. But even when the number of hdden unts s large (perhaps even greater than the number of nput pxels), we can stll dscover nterestng structure, by mposng other constrants on the network. In partcular, f we mpose a sparsty constrant on the hdden unts, then the autoencoder wll stll dscover nterestng structure n the data, even f the number of hdden unts s large. Informally, we wll thnk of a neuron as beng actve (or as frng ) f ts output value s close to 1, or as beng nactve f ts output value s close to -1. We would lke to constran the neurons to be nactve most of the tme. 3 We wll do ths n an onlne learnng fashon. More formally, we agan magne that our algorthm has access to an unendng sequence of tranng examples {x (1), x (2), x (3),...} drawn IID from some dstrbuton D. Also, let as usual denote the actvaton of hdden unt n the autoencoder. We would lke to (approxmately) enforce the constrant that [ ] E x D a (2) = ρ, a (2) where ρ s our sparsty parameter, typcally a value slghtly above -1.0 (say, ρ 0.9). In other words, we would lke the expected actvaton of each hdden neuron to be close to -0.9 (say). To satsfy ths expectaton constrant, the hdden unt s actvatons must mostly be near -1. Our algorthm for (approxmately) enforcng the expectaton constrant wll have two major components: [ ] Frst, for each hdden unt, we wll keep a runnng estmate of E x D a (2). Second, after each teraton of stochastc gradent descent, we wll slowly adjust that unt s parameters to make ths expected value closer to ρ. 2 In fact, ths smple autoencoder often ends up learnng a low-dmensonal representaton very smlar to PCA s. 3 The term sparsty comes from an alternatve formulaton of these deas usng networks wth a sgmod actvaton functon f, so that the actvatons are between 0 or 1 (rather than -1 and 1). In ths case, sparsty refers to most of the actvatons beng near 0. 11

12 In each teraton of gradent descent, when we see each tranng nput x we wll compute the hdden unts [ ] actvatons a (2) for each. We wll keep a runnng estmate ˆρ of E x D a (2) by updatng: ˆρ := 0.999ˆρ a (2). (Or, n vector notaton, ˆρ := 0.999ˆρ a (2).) Here, the (and ) s a parameter of the algorthm, and there s a wde range of values wll that wll work fne. Ths partcular choce causes ˆρ to be an exponentally-decayed weghted average of about the last 1000 observed values of a (2). Our runnng estmates ˆρ s can be ntalzed to 0 at the start of the algorthm. The second part of the algorthm modfes the parameters so as to try to satsfy the expectaton constrant. If ˆρ > ρ, then we would lke hdden unt to become less actve, or equvalently, for ts actvatons to become closer to -1. Recall that unt s actvaton s ( n ) = f W (1) j x j + b (1), (8) where b (1) a (2) j=1 s the bas term. Thus, we can make unt less actve by decreasng. Smlarly, f ˆρ < ρ, then we would lke unt s actvatons to become larger, whch we can do by ncreasng b (1). Fnally, the further ρ s from ρ, the more aggressvely we mght want to decrease or ncrease b (1) so as to drve the expectaton towards ρ. Concretely, we can use the followng learnng rule: b (1) b (1) := b (1) αβ(ˆρ ρ) (9) where β s an addtonal learnng rate parameter. To summarze, n order to learn a sparse autoencoder usng onlne learnng, upon gettng an example x, we wll () Run a forward pass on our network on nput x, to compute all unts actvatons; () Perform one step of stochastc gradent descent usng backpropagaton; () Perform the updates gven n Equatons (8-9). 4 Vsualzaton Havng traned a (sparse) autoencoder, we would now lke to vsualze the functon learned by the algorthm, to try to understand what t has learned. 12

13 Consder the case of tranng an autoencoder on mages, so that n = 100. Each hdden unt computes a functon of the nput: ( 100 ) a (2) = f W (1) j x j + b (1). j=1 We wll vsualze the functon computed by hdden unt whch depends on the parameters W (1) j (gnorng the bas term for now) usng a 2D mage. In partcular, we thnk of a (1) as some non-lnear feature of the nput x. We ask: What nput mage x would cause a (1) to be maxmally actvated? For ths queston to have a non-trval answer, we must mpose some constrants on x. If we suppose that the nput s norm constraned by x 2 = 100 =1 x2 1, then one can show (try dong ths yourself) that the nput whch maxmally actvates hdden unt s gven by settng pxel x j (for all 100 pxels, j = 1,..., 100) to x j = 100 W (1) j j=1 (W (1) j )2. By dsplayng the mage formed by these pxel ntensty values, we can begn to understand what feature hdden unt s lookng for. If we have an autoencoder wth 100 hdden unts (say), then we our vsualzaton wll have 100 such mages one per hdden unt. By examnng these 100 mages, we can try to understand what the ensemble of hdden unts s learnng. When we do ths for a sparse autoencoder (traned wth 100 hdden unts on 10x10 pxel nputs 4 ) we get the followng result: 4 The results below were obtaned by tranng on whtened natural mages. Whtenng s a preprocessng step whch removes redundancy n the nput, by causng adjacent pxels to become less correlated. 13

14 Each square n the fgure above shows the (norm bounded) nput mage x that maxmally actves one of 100 hdden unts. We see that the dfferent hdden unts have learned to detect edges at dfferent postons and orentatons n the mage. These features are, not surprsngly, useful for such tasks as object recognton and other vson tasks. When appled to other nput domans (such as audo), ths algorthm also learns useful representatons/features for those domans too. 14

15 5 Summary of notaton x Input features for a tranng example, x R n. y Output/target values. Here, y can be vector valued. In the case of an autoencoder, y = x. (x (), y () ) The -th tranng example h W,b (x) Output of our hypothess on nput x, usng parameters W, b. Ths should be a vector of the same dmenson as the target value y. W (l) j The parameter assocated wth the connecton between unt j n layer l, and unt n layer l + 1. b (l) The bas term assocated wth unt n layer l + 1. Can also be thought of as the parameter assocated wth the connecton between the bas unt n layer l and unt n layer l + 1. Actvaton (output) of unt n layer l of the network. In addton, snce layer L 1 s the nput layer, we also have a (1) = x. f( ) The actvaton functon. Throughout these notes, we used f(z) = tanh(z). Total weghted sum of nputs to unt n layer l. Thus, a (l) a (l) z (l) α s l n l λ ˆx ρ ˆρ β f(z (l) = ). Learnng rate parameter Number of unts n layer l (not countng the bas unt). Number layers n the network. Layer L 1 s usually the nput layer, and layer L nl the output layer. Weght decay parameter For an autoencoder, ts output;.e., ts reconstructon of the nput x. Same meanng as h W,b (x). Sparsty parameter, whch specfes our desred level of sparsty Our runnng estmate of the expected actvaton of unt (n the sparse autoencoder). Learnng rate parameter for algorthm tryng to (approxmately) satsfy the sparsty constrant. 15

CS294A Lecture notes. Andrew Ng

CS294A Lecture notes. Andrew Ng CS294A Lecture notes Andrew Ng Sparse autoencoder 1 Introducton Supervsed learnng s one of the most powerful tools of AI, and has led to automatc zp code recognton, speech recognton, self-drvng cars, and

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

MATH 567: Mathematical Techniques in Data Science Lab 8

MATH 567: Mathematical Techniques in Data Science Lab 8 1/14 MATH 567: Mathematcal Technques n Data Scence Lab 8 Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 11, 2017 Recall We have: a (2) 1 = f(w (1) 11 x 1 + W (1) 12 x 2 + W

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester 0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #

More information

Multi-layer neural networks

Multi-layer neural networks Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent

More information

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Vapnik-Chervonenkis theory

Vapnik-Chervonenkis theory Vapnk-Chervonenks theory Rs Kondor June 13, 2008 For the purposes of ths lecture, we restrct ourselves to the bnary supervsed batch learnng settng. We assume that we have an nput space X, and an unknown

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Limited Dependent Variables

Limited Dependent Variables Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Deep Learning. Boyang Albert Li, Jie Jay Tan

Deep Learning. Boyang Albert Li, Jie Jay Tan Deep Learnng Boyang Albert L, Je Jay Tan An Unrelated Vdeo A bcycle controller learned usng NEAT (Stanley) What do you mean, deep? Shallow Hdden Markov models ANNs wth one hdden layer Manually selected

More information

Lecture 2: Prelude to the big shrink

Lecture 2: Prelude to the big shrink Lecture 2: Prelude to the bg shrnk Last tme A slght detour wth vsualzaton tools (hey, t was the frst day... why not start out wth somethng pretty to look at?) Then, we consdered a smple 120a-style regresson

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

Unsupervised Learning

Unsupervised Learning Unsupervsed Learnng Kevn Swngler What s Unsupervsed Learnng? Most smply, t can be thought of as learnng to recognse and recall thngs Recognton I ve seen that before Recall I ve seen that before and I can

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Foundations of Arithmetic

Foundations of Arithmetic Foundatons of Arthmetc Notaton We shall denote the sum and product of numbers n the usual notaton as a 2 + a 2 + a 3 + + a = a, a 1 a 2 a 3 a = a The notaton a b means a dvdes b,.e. ac = b where c s an

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

= z 20 z n. (k 20) + 4 z k = 4

= z 20 z n. (k 20) + 4 z k = 4 Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5

More information

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1 C/CS/Phy9 Problem Set 3 Solutons Out: Oct, 8 Suppose you have two qubts n some arbtrary entangled state ψ You apply the teleportaton protocol to each of the qubts separately What s the resultng state obtaned

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

Introduction to the Introduction to Artificial Neural Network

Introduction to the Introduction to Artificial Neural Network Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0 Bezer curves Mchael S. Floater August 25, 211 These notes provde an ntroducton to Bezer curves. 1 Bernsten polynomals Recall that a real polynomal of a real varable x R, wth degree n, s a functon of the

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Spectral Graph Theory and its Applications September 16, Lecture 5

Spectral Graph Theory and its Applications September 16, Lecture 5 Spectral Graph Theory and ts Applcatons September 16, 2004 Lecturer: Danel A. Spelman Lecture 5 5.1 Introducton In ths lecture, we wll prove the followng theorem: Theorem 5.1.1. Let G be a planar graph

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.65/15.070J Fall 013 Lecture 1 10/1/013 Martngale Concentraton Inequaltes and Applcatons Content. 1. Exponental concentraton for martngales wth bounded ncrements.

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Notes on Frequency Estimation in Data Streams

Notes on Frequency Estimation in Data Streams Notes on Frequency Estmaton n Data Streams In (one of) the data streamng model(s), the data s a sequence of arrvals a 1, a 2,..., a m of the form a j = (, v) where s the dentty of the tem and belongs to

More information

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows:

A linear imaging system with white additive Gaussian noise on the observed data is modeled as follows: Supplementary Note Mathematcal bacground A lnear magng system wth whte addtve Gaussan nose on the observed data s modeled as follows: X = R ϕ V + G, () where X R are the expermental, two-dmensonal proecton

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

LECTURE 9 CANONICAL CORRELATION ANALYSIS

LECTURE 9 CANONICAL CORRELATION ANALYSIS LECURE 9 CANONICAL CORRELAION ANALYSIS Introducton he concept of canoncal correlaton arses when we want to quantfy the assocatons between two sets of varables. For example, suppose that the frst set of

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

One-sided finite-difference approximations suitable for use with Richardson extrapolation

One-sided finite-difference approximations suitable for use with Richardson extrapolation Journal of Computatonal Physcs 219 (2006) 13 20 Short note One-sded fnte-dfference approxmatons sutable for use wth Rchardson extrapolaton Kumar Rahul, S.N. Bhattacharyya * Department of Mechancal Engneerng,

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1.

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1. 7636S ADVANCED QUANTUM MECHANICS Soluton Set 1 Sprng 013 1 Warm-up Show that the egenvalues of a Hermtan operator  are real and that the egenkets correspondng to dfferent egenvalues are orthogonal (b)

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Hopfield Training Rules 1 N

Hopfield Training Rules 1 N Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Evaluation of classifiers MLPs

Evaluation of classifiers MLPs Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

Neural Networks. Neural Network Motivation. Why Neural Networks? Comments on Blue Gene. More Comments on Blue Gene

Neural Networks. Neural Network Motivation. Why Neural Networks? Comments on Blue Gene. More Comments on Blue Gene Motvaton for non-lnear Classfers Neural Networs CPS 27 Ron Parr Lnear methods are wea Mae strong assumptons Can only express relatvely smple functons of nputs Comng up wth good features can be hard Why

More information

Note on EM-training of IBM-model 1

Note on EM-training of IBM-model 1 Note on EM-tranng of IBM-model INF58 Language Technologcal Applcatons, Fall The sldes on ths subject (nf58 6.pdf) ncludng the example seem nsuffcent to gve a good grasp of what s gong on. Hence here are

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens THE CHINESE REMAINDER THEOREM KEITH CONRAD We should thank the Chnese for ther wonderful remander theorem. Glenn Stevens 1. Introducton The Chnese remander theorem says we can unquely solve any par of

More information

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique Outlne and Readng Dynamc Programmng The General Technque ( 5.3.2) -1 Knapsac Problem ( 5.3.3) Matrx Chan-Product ( 5.3.1) Dynamc Programmng verson 1.4 1 Dynamc Programmng verson 1.4 2 Dynamc Programmng

More information

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks Internet Engneerng Jacek Mazurkewcz, PhD Softcomputng Part 3: Recurrent Artfcal Neural Networks Self-Organsng Artfcal Neural Networks Recurrent Artfcal Neural Networks Feedback sgnals between neurons Dynamc

More information

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011 Stanford Unversty CS359G: Graph Parttonng and Expanders Handout 4 Luca Trevsan January 3, 0 Lecture 4 In whch we prove the dffcult drecton of Cheeger s nequalty. As n the past lectures, consder an undrected

More information

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede Fall 0 Analyss of Expermental easurements B. Esensten/rev. S. Errede We now reformulate the lnear Least Squares ethod n more general terms, sutable for (eventually extendng to the non-lnear case, and also

More information

From Biot-Savart Law to Divergence of B (1)

From Biot-Savart Law to Divergence of B (1) From Bot-Savart Law to Dvergence of B (1) Let s prove that Bot-Savart gves us B (r ) = 0 for an arbtrary current densty. Frst take the dvergence of both sdes of Bot-Savart. The dervatve s wth respect to

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 8 Luca Trevisan February 17, 2016 U.C. Berkeley CS94: Spectral Methods and Expanders Handout 8 Luca Trevsan February 7, 06 Lecture 8: Spectral Algorthms Wrap-up In whch we talk about even more generalzatons of Cheeger s nequaltes, and

More information