Probabilistic Learning

Size: px
Start display at page:

Download "Probabilistic Learning"

Transcription

1 Statistical Machine Learning Notes 11 Instructor: Justin Domke Probabilistic Learning Contents 1 Introuction 2 2 Maximum Likelihoo 2 3 Examles of Maximum Likelihoo Binomial Uniform Distribution Univariate Gaussian Multivariate Gaussian Sherical Multivariate Gaussian Proerties of Maximum Likelihoo Maximum Likelihoo is Consistent Maximum Likelihoo is Equivariant Maximum Likelihoo is Efficient Maximum Likelihoo Assumes a Whole Lot Maximum Likelihoo is Emirical Risk Minimization of the KL-ivergence Bayesian Methos 11 1

2 Probabilistic Learning 2 1 Introuction Almost all of our methos for learning have been base off the function of risk an loss. We have worke by icking some class of functions f(x) maing from inuts to oututs. We quantifie how we wante that function to behave in terms of the true risk R true (f) = E 0 [ L(f(x),y) ] =ˆ ˆ 0 (x,y)l(f(x),y)xy, (1.1) where 0 is the true (unknown) istribution. Then, we aroximate this by an emirical risk, fit the function f, an we were one. Gazing at Eq. 1.1,however, another ossible strategy comes to min. Namely, why on t we aroximate 0 with some function? Then, for some secific inut x, we can ick the best guess y by oing ˆ min y (x,y)l(y,y)y. Then we coul o everything exactly! The funamental question here is: when shoul we aly the loss function? In the traitional strategy, we aly it at training time: the reictor f(x) is fit to give the best ossible erformance, with the loss bake in. Now, we aly the loss function only at test time. Notice that we coul even change loss functions on the fly. We coul also fli things aroun. If we suenly ecie we woul rather reict x from y, we coul also o that. This may seem very attractive. As we will see, however, there is a rice to be ai for this generality. However, we ostone iscussion of all the traeoffs until later. The immeiate question is more basic: how shoul we fit? (Note: when fitting (x,y), notice that x an y are on an even footing. Thus, for simlicity, we will usually write the variables together as a single vector x.) 2 Maximum Likelihoo There have been many methos roose to fit istributions. In this class, we will focus on the maximum likelihoo metho. Suose that we are fitting a istribution (x; θ), arametrize by some vector θ. Where oes this istribution come from? You ick it. How o you ick it? We will come back to that! Let the ata be a set of vectors {ˆx}. The log-likelihoo is

3 Probabilistic Learning 3 l(θ) = ˆx log (ˆx; θ). The maximum likelihoo metho, surrisingly enough, consists of icking θ to maximize the likelihoo θ = arg maxl(θ). θ This metho has some nice roerties, but before worrying about them, let s try some examles. You may have seen these before in a statistics class. 3 Examles of Maximum Likelihoo 3.1 Binomial A binomial istribution is a istribution over a binary variable, with x {0, 1}, given by Given some training ata, we can calculate (x; θ) = θ x (1 θ) 1 x. l(θ) = ˆx log (ˆx; θ) = ˆx (ˆx log θ + (1 ˆx) log(1 θ) ) Now, we can maximize this by setting the erivative with resect to θ to zero. We have l θ = 0 = ˆx 1 (ˆx θ (1 ˆx) 1 ) 1 θ = #[ˆx = 1] 1 θ #[ˆx = 0] 1 1 θ, where #[ˆx = 1] is the number of oints in the training ata with ˆx = 1. This equation is solve by θ = #[ˆx = 1] #[ˆx = 1] + #[ˆx = 0]. Thus, the maximum likelihoo estimate is that the binomial istribution has the same robability of being 1 as in the training ata. This is quite intuitive.

4 Probabilistic Learning Uniform Distribution Consier the istribution uniform on 0 to θ. The log likelihoo is (x; θ) = 1 I[0 x θ] θ l(θ) = ˆx log (ˆx, θ). If θ is less than any value ˆx, then the robability of that oint is zero, an we can think of the log-likelihoo as being. On the other han, suose that Then, we have that θ > ˆx, ˆx. l(θ) = ˆx log 1 θ = ˆx log θ. Thus, the likelihoo is maximize by setting θ = min ˆx ˆx. 3.3 Univariate Gaussian A univariate gaussian istribution is efine by (x, µ, σ 2 ) = 1 ex( 1 (x µ) 2 ). 2πσ 2 2 σ 2 Through a bunch of maniulation, we can take the logarithm of this, an then the erivatives of the logarithm. log 2πσ 2 = 1 2 log(2πσ2 )

5 Probabilistic Learning 5 log (x; µ, σ 2 ) = 1 (x µ) σ 2 2 log σ2 1 log 2π 2 µ log (x; µ, (x µ) σ2 ) = σ 2 σ log (x; µ, σ) = 1 (x µ) (σ 2 ) 2 2 σ 2 Now, we want to o maximum likelihoo estimation. That is, we nee to maximize We can o this by solving the two equations max µ,σ 2 l(µ, σ2 ) µ l(µ, σ2 ) = ˆx σ 2l(µ, σ2 ) = ˆx From the first conition, it is easy to see that log (ˆx; µ, σ) = 0 µ log (ˆx; µ, σ) = 0 σ 2 From the secon conition, we can then fin µ = meanˆxˆx 0 = ˆx = ˆx = ˆx (1 (ˆx µ) (σ 2 ) 2 2 ((ˆx µ) 2 1 ) σ 2 ( (ˆx µ) 2 σ 2) 1 ) σ 2 σ 2 = meanˆx (ˆx µ) Multivariate Gaussian A multivariate Gaussian is efine by

6 Probabilistic Learning 6 (x; µ, Σ) = 1 (2π) /2 Σ ex( 1 1/2 2 (x µ)t Σ 1 (x µ) ) log (x; µ, Σ) = 1 2 (x µ)t Σ 1 (x µ) 2 log(2π) log Σ 1. It is unfortunate, but firmly establishe to use the symbol Σ to enote covariance matrix. It is imortant not to get this confuse with a sum. (In these notes, the ifference is inicate by the size of the symbol, as well as context.) First off, let s calculate some roerties of this istribution. It is not har to see that, by symmetry, E [x] = µ. It can also be shown that E [(x µ)(x µ) T ] = Σ. Thus, it makes sense to call µ the mean an Σ the covariance matrix. In orer to calculate the maximum likelihoo estimate, we will nee some erivatives. Using the fact that Σ 1 is symmetric, µ log (x; µ, Σ) = Σ 1 (x µ) Using the fact that X at Xa = aa T an the strange but true fact that again assuming that Σ is symmetric, we have Σ 1 log (x; µ, Σ) = 1 2 (x µ)(x µ)t Σ. log X X = X T, an Now, as ever, when oing maximum likelihoo estimation, our goal is to accomlish the maximization max Σ,µ l(σ, µ) = max Σ,µ log (ˆx; µ, Σ). ˆx

7 Probabilistic Learning 7 Setting l/µ = 0, we have ˆx log (ˆx; µ, Σ) = µ ˆx µ = meanˆxˆx. Σ 1 (ˆx µ) = 0 Setting l/σ 1 = 0, we have ˆx log (ˆx; µ, Σ) = Σ 1 ˆx ( 1 2 (x µ)(x µ)t Σ) Σ = meanˆx (ˆx µ)(ˆx µ) T. Again, this is all very intuitive. The mean is the emirical mean, an the covariance matrix is the emirical covariance matrix. However, this is not unbiase. (Recall to estimate the variance of a scalar variable, we shoul use the formula 1 n n 1 i=1 (x i µ) 2, rather than the emirical variance 1 n n i=1 (x i µ) 2.) So maximum likelihoo will ten to slightly overestimate the variance when the number of ata is small. 3.5 Sherical Multivariate Gaussian A sherical Gaussian is just a Gaussian istribution where we constrain the covariance matrix to take the form Σ = ai for some constant a. Using the fact that ai = a, this is (x; µ, a) = 1 (2π) /2 ai ex( 1 1/2 2a (x µ)t (x µ) ). This turns out to have a maximum likelihoo solution of µ = meanˆxˆx. a = 1 meanˆx(ˆx µ) T (ˆx µ).

8 Probabilistic Learning 8 4 Proerties of Maximum Likelihoo Here we will informally iscuss some of the roerties of maximum likelihoo. 4.1 Maximum Likelihoo is Consistent If the ata is actually being generate by a istribution (x; θ 0 ), for some vector θ 0, then (absent athological conitions) as the amount of ata goes to infinity, the arameters θ recovere by maximum likelihoo will converge to θ 0. This is a efinitely a goo roerty, as we robably woul consier any metho lacking it to be, more or less, broken. 4.2 Maximum Likelihoo is Equivariant Another nice roerty of the likelihoo is that it is equivariant. This just means that we can rearameterize with out affecting the solution. Secifically, suose we are consiering estimating some istribution (x; θ). Suose the maximum likelihoo estimate of θ on some ataset is θ. Now, we choose to instea arametrize our function by φ, which is some nonlinear transformation of θ. θ = g(φ) Now, if we efine q(x; φ) ( x;g(φ) ) an o maximum likelihoo estimation of φ, we will recover φ such that θ = g(φ ). (Proving this is quite easy.) Again, this is a reassuring roerty: The exact etails of how we have arametrize our function on t matter. Failing to be equivariant wouln t seem to be quite so isqualifying as failing to be consistent, but it is certainly comforting.

9 Probabilistic Learning Maximum Likelihoo is Efficient Perhas the strongest argument in favor of maximum likelihoo is that it is asymtotically efficient. Suose the ata is actually being generate by a istribution (x; θ 0 ), for some vector θ 0. As iscusse above, the maximum likelihoo is consistent, in the sense that it converges to θ 0. The next question is, how fast oes it o that? Is there some other measure that converges faster? Asymtotically, the answer is no. This result hinges on efining faster as the execte square istance between our estimate θ an the true arameters θ 0. This follows from two results that are escribe here informally. (For simlicity, these are state here for a scalar arameter θ.) These make use of a quantity calle the Fisher information. [( log (X; θ0 ) ) 2 ] I(θ 0 ) = E θ Intuitively, we can unerstan this. Consier a lanscae of ifferent values θ, in which we seek to locate the true value θ 0. If log-likelihoo changes a lot in the region aroun θ 0, then we shoul exect the true arameters to be relatively easy to locate. So, the two results are 1. The Cramer-Rao boun. This states that no unbiase estimator can have a variance 1 less than. (Technically, maximum likelihoo is not unbiase, but this is goo ni(θ 0 ) enough for our uroses, since we are looking for an asymtotic result anyway.) 2. The asymtotic normality the maximum likelihoo. This shows that, as the amount of 1 ata becomes large, the estimate arameters will be istribute with variance ni(θ 0 ). Secifically, they will be istribute as a Gaussian istribution with this variance centere at θ Maximum Likelihoo Assumes a Whole Lot Now, suose the true istribution is 0 (x). Most of the above roerties have hinge on the assumtion that there exists a vector θ 0 such that

10 Probabilistic Learning 10 0 (x) = (x; θ 0 ). Another way of stating this is that we have a well-secifie moel. You might ask: how coul we ever know this? The brief answer is that we robably on t. Now, we can create somewhat contrive situations where it is true. It is har to see how a binary variable can fail to be Binomial! In general, however, making a moel tens to be an eucate guess of sorts. 4.5 Maximum Likelihoo is Emirical Risk Minimization of the KL-ivergence All of the iscussion has eene on the assumtion that the true ata-generating istribution is known. As this is almost never true is ractice, it might seem like maximum likelihoo coul almost never be use! Unfortunately, without the assumtion of a well-secifie moel, almost of the above roerties isaear. After all, how can θ converge to θ 0 if θ 0 oesn t exist? On the other han, intuitively, it seems like the maximum likelihoo shoul still o something reasonable if the moel is almost well-secifie. That is, if there exists some vector of arameters θ such that 0 (x) (x; θ), shouln t maximum likelihoo converge to something close to 0? After all, in the revious case, the ata coul have come from arameters θ how coul the maximum likelihoo even know it i not? In fact, maximum likelihoo oes behave reasonably in the face of minor missecification. To unerstan this, we must first introuce the The Kullback-Leibler ivergence. ˆ KL( 0 ) = x 0 (x) log 0(x) (x) This is a sort of ivergence measure between robability istributions. Its origins come from information theory 1. The imortant thing to note about the KL-ivergence is that it is non-negative, an zero only when 0 =. Note also that it is not actually a istance measure, as it is not symmetric. Suose that there is some region of oints where 0 is significant, but is near zero. As the KL-ivergence measures the logarithm of, this region leas to a large ivergence. The following figures show several base istributions 0 (shown as otte lines). For each istribution, the Gaussian is comute that minimizes the KL-ivergence (shown as soli 1 Where it can be thought of as measuring the execte number of bits waste if you buil a coe for x assuming that the istribution is when the actual istribution is 0

11 Probabilistic Learning 11 lines). (x) arg min q KL( q) Now, consier the KL-ivergence between the true istribution 0, an the one that we fit,. arg min KL( 0 ) = arg min = arg max arg max 0 (x) log 0 (x) x x 0 (x) log (x) x log (ˆx) ˆx 0 (x) log (x) In thir line we have mae essentially an emirical aroximation of the true risk above. The way to unerstan this is that if the true istribution is any of the otte curves above an we fit a Gaussian then, as the amount of ata increases, we will recover the soli curve. 5 Bayesian Methos Suose we have a big jar full of bent coins. We haen to know that, insie of this bin, there are 75 coins that come u heas of tye A with robability 60% an 25 coins of tye B that come u with robability 40%. Now, we ick a coin at ranom out of the jar. We fli

12 Probabilistic Learning 12 it 8 times, an observe 3 heas, followe by 5 tails. What is the robability that we have in our hans coin of tye A? One aroach to this is to aly Bayes theorem. Pr(X Y ) = Pr(Y X)Pr(X) Pr(Y ) In our case, we want to calculate the robability that we have a coin of tye A, given that we have observe 3 heas in 10 coin flis. Pr(A Data) = Pr(Data A)Pr(A) Pr(Data) Pr(B Data) = Pr(Data B)Pr(B) Pr(Data) Now, in our case, we know that we have a 75% chance of grabbing a coin of tye A. Pr(A) =.75 Pr(B) =.25 Now, if we ha a coin of tye A, we woul have robability Pr(Data A) = Thus, we have Pr(Data B) = P r(a Data) = /P r(data) P r(b Data) = /P r(data). Now, notice that we on t nee to go through too much calculation to recover Pr(Data). Since we know that

13 Probabilistic Learning 13 we can just normalize an calculate Pr(A Data) + Pr(B Data) = 1, P(A Data) =.57 P(B Data) =.43. Thus, there is a 57% chance we have a coin of tye A. Now, all of the above may seem quite uncontroversial. However, when we say there is a 57% chance our coin is of tye A, what exactly oes that mean? After all, we icke one articular coin. It is either of tye A or it isn t. What robabilities exactly are we talking about here? The traitional view hols that talking about such robabilities is meaningless. On the other han, force to bet, wouln t everyone choose A 2? There are hilosohical issues here about the meaning of robability. We won t get too eely into these, however, just note that they exist, an are a art of the ebate in statistics between Bayesian an traitional frequentist methos. Now, let s try to formalize the rocess that we use above an scale it u to larger roblems. Instea of just two tyes of coins (two ifferent binomial istributions), imagine we have a set of otential robability istributions. Imagine also that we have, by some rior knowlege, a istribution Pr() over these istributions. What haens is the following: Some istribution is icke, with robability roortional to Pr(). A bunch of samles {ˆx} is rawn from. We get to see {ˆx}, an nee to make reictions about the future. The simlest way to aroach this situation is to again aly Bayes equation Pr( {ˆx}) = Pr({ˆx} )Pr(). Pr({ˆx}) Now, it makes sense to try to recover the most robable. This means searching for 2 Is it a contraiction to choose A an yet reject the iea of robabilities like this?

14 Probabilistic Learning 14 arg max P r( {ˆx}) = arg maxpr({ˆx} )Pr() = arg max log Pr({ˆx} ) + log Pr() = arg max log ˆx (ˆx) + log Pr() = arg max log (ˆx) + log Pr(). (In the first line, we exloit the fact that Pr({ˆx}) is constant with resect to an so oes not affect the maximizing. In the secon line, we take the logarithm. In the thir line, we use the fact that Pr({ˆx} ) = ˆx Pr(ˆx ) = ˆx (ˆx). The fourth line is just algebra.) Thus, in the last line, we just have the log-likelihoo lus the log rio P r(). Searching for to maximize Pr( {ˆx}) is known as maximum a osteriori (MAP) estimation. Notice the similarity to regularize maximum likelihoo estimation. For examle, it is common to arameterize by some vector θ, an set Pr() = Pr(θ) to be a Gaussian centere at the origin. It is easy to show that oing this results in log Pr(θ) = a θ 2, where a eens on the covariance of the Gaussian. Similarly, it can be shown that the lasso enalty corresons to a istribution of the form Pr(θ) ex ( a θ 1 ). Thus, many Bayesians view regularize maximum likelihoo estimation as imlicit MAP estimation. We shoul note, though, that real Bayesians o not o MAP estimation. To unerstan why not, suose that we have a robability istribution arameterize by a scalar, an the ostereior Pr(θ {ˆx}) looks something like the following: ˆx P(θ) θ (1) θ (2) θ MAP estimation will choose θ (1) as the most robable set of arameters. However, this oesn t look so goo, since most of the robability is in the area of θ (2). What real Bayesians o is not estimate one articular istribution, but rather, make reictions irectly from the osterior Pr( Data). How is this one? Let s look at an examle. Suose we nee to guess one single value for x. Consier the loss of some guess x :

15 Probabilistic Learning 15 min L(x x ˆx,x)Pr(x Data)x. Now, we can calculate the robability of some articular outut x by integrating over the ossible P r(x Data) = = ˆ ˆ ˆ θ θ θ P r(x, θ Data)θ P r(x θ)p r(θ Data)θ P r(x θ)p r(data θ)p r(θ)θ. Thus, finally, the true Bayesian chooses their best guess x by the roblem ˆ min L(x x ˆx,x)Pr(x θ)pr(data θ)pr(θ)xθ. (5.1) θ The question is, how to o this integral? In some situations, this can be one in close form. In general, however, one must resort to Markov chain Monte Carlo techniques for aroximately oing the integral 3. This can be quite comutationally challenging, which can be a major rawback of Bayesian methos. Let s consier the avantages an isavantages of the Bayesian aroach. The major avantage of this is that it is, in a certain sense, the otimal metho. If the true istribution is rawn from Pr(θ) then, on average, no metho for making reictions can have lower loss than Eq To make this recise, suose that we reeately get arameters θ from the istribution P r(θ), samle some ata from Pr(Data θ), make a reicte x, then measure the loss L(x,x) on some new x rawn from (x, θ). The above recie will have the lowest average loss of any metho. For this reason, many eole feel that Bayesian methos are the one, true way to o machine learning. A isavantage of Bayesian methos is that they can often be quite comutationally exensive. As mentione above, in comlex roblems, it is common to use MCMC techniques to o inference. These techniques o have guarantees of eventually converging to the right answer, but these guarantees are usually asymtotic in nature. Thus, given a finite amount of running time, one can be unsure how close the current answer is to the best one. Research is ongoing on faster MCMC methos with an eye on Bayesian inference. 3 See Introuction to Monte Carlo methos by Davi MacKay for a goo tutorial on these techniques.

16 Probabilistic Learning 16 The most obvious issue with Bayesian methos is the nee to secify the rior Pr(θ). In real alications, where oes this rior come from? This is similar to issue we face when oing (non-bayesian) robabilistic moeling we neee to secify a correct arametric moel (x; θ). While secifying the rior may aear to be a rawback of Bayesian methos, it is also something of an avantage. If you have a lot of knowlege about a articular omain, an you are able to secify this knowlege as a rior, Bayesian methos rovie a nice framework to combine your knowlege with knowlege gaine from ata. Note also that, in the view of some, techniques like regularization are essentially MAP estimation in all but name. There is a great eal of material out there on the ebate between frequentist an Bayesian statistics.

Probabilistic Learning

Probabilistic Learning Statistical Machine Learning Notes 10 Instructor: Justin Domke Probabilistic Learning Contents 1 Introuction 2 2 Maximum Likelihoo 2 3 Examles of Maximum Likelihoo 3 3.1 Binomial......................................

More information

Lecture 6 : Dimensionality Reduction

Lecture 6 : Dimensionality Reduction CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing

More information

Colin Cameron: Asymptotic Theory for OLS

Colin Cameron: Asymptotic Theory for OLS Colin Cameron: Asymtotic Theory for OLS. OLS Estimator Proerties an Samling Schemes.. A Roama Consier the OLS moel with just one regressor y i = βx i + u i. The OLS estimator b β = ³ P P i= x iy i canbewrittenas

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Consistency and asymptotic normality

Consistency and asymptotic normality Consistency an asymtotic normality Class notes for Econ 842 Robert e Jong March 2006 1 Stochastic convergence The asymtotic theory of minimization estimators relies on various theorems from mathematical

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

Colin Cameron: Brief Asymptotic Theory for 240A

Colin Cameron: Brief Asymptotic Theory for 240A Colin Cameron: Brief Asymtotic Theory for 240A For 240A we o not go in to great etail. Key OLS results are in Section an 4. The theorems cite in sections 2 an 3 are those from Aenix A of Cameron an Trivei

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

7. Introduction to Large Sample Theory

7. Introduction to Large Sample Theory 7. Introuction to Large Samle Theory Hayashi. 88-97/109-133 Avance Econometrics I, Autumn 2010, Large-Samle Theory 1 Introuction We looke at finite-samle roerties of the OLS estimator an its associate

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Derivatives and the Product Rule

Derivatives and the Product Rule Derivatives an the Prouct Rule James K. Peterson Department of Biological Sciences an Department of Mathematical Sciences Clemson University January 28, 2014 Outline Differentiability Simple Derivatives

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth MA 2232 Lecture 08 - Review of Log an Exponential Functions an Exponential Growth Friay, February 2, 2018. Objectives: Review log an exponential functions, their erivative an integration formulas. Exponential

More information

Convergence Analysis of Terminal ILC in the z Domain

Convergence Analysis of Terminal ILC in the z Domain 25 American Control Conference June 8-, 25 Portlan, OR, USA WeA63 Convergence Analysis of erminal LC in the Domain Guy Gauthier, an Benoit Boulet, Member, EEE Abstract his aer shows how we can aly -transform

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Differentiation ( , 9.5)

Differentiation ( , 9.5) Chapter 2 Differentiation (8.1 8.3, 9.5) 2.1 Rate of Change (8.2.1 5) Recall that the equation of a straight line can be written as y = mx + c, where m is the slope or graient of the line, an c is the

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Convergence of random variables, and the Borel-Cantelli lemmas

Convergence of random variables, and the Borel-Cantelli lemmas Stat 205A Setember, 12, 2002 Convergence of ranom variables, an the Borel-Cantelli lemmas Lecturer: James W. Pitman Scribes: Jin Kim (jin@eecs) 1 Convergence of ranom variables Recall that, given a sequence

More information

Bivariate distributions characterized by one family of conditionals and conditional percentile or mode functions

Bivariate distributions characterized by one family of conditionals and conditional percentile or mode functions Journal of Multivariate Analysis 99 2008) 1383 1392 www.elsevier.com/locate/jmva Bivariate istributions characterize by one family of conitionals an conitional ercentile or moe functions Barry C. Arnol

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

Learning Sequence Motif Models Using Gibbs Sampling

Learning Sequence Motif Models Using Gibbs Sampling Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under

More information

Lenny Jones Department of Mathematics, Shippensburg University, Shippensburg, Pennsylvania Daniel White

Lenny Jones Department of Mathematics, Shippensburg University, Shippensburg, Pennsylvania Daniel White #A10 INTEGERS 1A (01): John Selfrige Memorial Issue SIERPIŃSKI NUMBERS IN IMAGINARY QUADRATIC FIELDS Lenny Jones Deartment of Mathematics, Shiensburg University, Shiensburg, Pennsylvania lkjone@shi.eu

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION THOMAS MAILUND Machine learning means different things to different people, and there is no general agreed upon core set of algorithms that must be

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Integration Review. May 11, 2013

Integration Review. May 11, 2013 Integration Review May 11, 2013 Goals: Review the funamental theorem of calculus. Review u-substitution. Review integration by parts. Do lots of integration eamples. 1 Funamental Theorem of Calculus In

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Vectors in two dimensions

Vectors in two dimensions Vectors in two imensions Until now, we have been working in one imension only The main reason for this is to become familiar with the main physical ieas like Newton s secon law, without the aitional complication

More information

Econometrics I. September, Part I. Department of Economics Stanford University

Econometrics I. September, Part I. Department of Economics Stanford University Econometrics I Deartment of Economics Stanfor University Setember, 2008 Part I Samling an Data Poulation an Samle. ineenent an ientical samling. (i.i..) Samling with relacement. aroximates samling without

More information

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Normalized Ordinal Distance; A Performance Metric for Ordinal, Probabilistic-ordinal or Partial-ordinal Classification Problems

Normalized Ordinal Distance; A Performance Metric for Ordinal, Probabilistic-ordinal or Partial-ordinal Classification Problems Normalize rinal Distance; A Performance etric for rinal, Probabilistic-orinal or Partial-orinal Classification Problems ohamma Hasan Bahari, Hugo Van hamme Center for rocessing seech an images, KU Leuven,

More information

The Effect of a Finite Measurement Volume on Power Spectra from a Burst Type LDA

The Effect of a Finite Measurement Volume on Power Spectra from a Burst Type LDA The Effect of a Finite Measurement Volume on Power Sectra from a Burst Tye LDA Preben Buchhave 1,*, Clara M. Velte, an William K. George 3 1. Intarsia Otics, Birkerø, Denmark. Technical University of Denmark,

More information

Math 1271 Solutions for Fall 2005 Final Exam

Math 1271 Solutions for Fall 2005 Final Exam Math 7 Solutions for Fall 5 Final Eam ) Since the equation + y = e y cannot be rearrange algebraically in orer to write y as an eplicit function of, we must instea ifferentiate this relation implicitly

More information

Application of Measurement System R&R Analysis in Ultrasonic Testing

Application of Measurement System R&R Analysis in Ultrasonic Testing 17th Worl Conference on Nonestructive Testing, 5-8 Oct 8, Shanghai, China Alication of Measurement System & Analysis in Ultrasonic Testing iao-hai ZHANG, Bing-ya CHEN, Yi ZHU Deartment of Testing an Control

More information

Novel Algorithm for Sparse Solutions to Linear Inverse. Problems with Multiple Measurements

Novel Algorithm for Sparse Solutions to Linear Inverse. Problems with Multiple Measurements Novel Algorithm for Sarse Solutions to Linear Inverse Problems with Multile Measurements Lianlin Li, Fang Li Institute of Electronics, Chinese Acaemy of Sciences, Beijing, China Lianlinli1980@gmail.com

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Consistency and asymptotic normality

Consistency and asymptotic normality Consistency an ymtotic normality Cls notes for Econ 842 Robert e Jong Aril 2007 1 Stochtic convergence The ymtotic theory of minimization estimators relies on various theorems from mathematical statistics.

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

MINIMAL MAHLER MEASURE IN REAL QUADRATIC FIELDS. 1. Introduction

MINIMAL MAHLER MEASURE IN REAL QUADRATIC FIELDS. 1. Introduction INIAL AHLER EASURE IN REAL QUADRATIC FIELDS TODD COCHRANE, R.. S. DISSANAYAKE, NICHOLAS DONOHOUE,. I.. ISHAK, VINCENT PIGNO, CHRIS PINNER, AND CRAIG SPENCER Abstract. We consier uer an lower bouns on the

More information

Calculus and optimization

Calculus and optimization Calculus an optimization These notes essentially correspon to mathematical appenix 2 in the text. 1 Functions of a single variable Now that we have e ne functions we turn our attention to calculus. A function

More information

Section 7.1: Integration by Parts

Section 7.1: Integration by Parts Section 7.1: Integration by Parts 1. Introuction to Integration Techniques Unlike ifferentiation where there are a large number of rules which allow you (in principle) to ifferentiate any function, the

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Entanglement is not very useful for estimating multiple phases

Entanglement is not very useful for estimating multiple phases PHYSICAL REVIEW A 70, 032310 (2004) Entanglement is not very useful for estimating multiple phases Manuel A. Ballester* Department of Mathematics, University of Utrecht, Box 80010, 3508 TA Utrecht, The

More information

A Simple Exchange Economy with Complex Dynamics

A Simple Exchange Economy with Complex Dynamics FH-Kiel Universitf Alie Sciences Prof. Dr. Anreas Thiemer, 00 e-mail: anreas.thiemer@fh-kiel.e A Simle Exchange Economy with Comlex Dynamics (Revision: Aril 00) Summary: Mukherji (999) shows that a stanar

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Robust Control of Robot Manipulators Using Difference Equations as Universal Approximator

Robust Control of Robot Manipulators Using Difference Equations as Universal Approximator Proceeings of the 5 th International Conference of Control, Dynamic Systems, an Robotics (CDSR'18) Niagara Falls, Canaa June 7 9, 218 Paer No. 139 DOI: 1.11159/csr18.139 Robust Control of Robot Maniulators

More information

Linear and quadratic approximation

Linear and quadratic approximation Linear an quaratic approximation November 11, 2013 Definition: Suppose f is a function that is ifferentiable on an interval I containing the point a. The linear approximation to f at a is the linear function

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

SAS for Bayesian Mediation Analysis

SAS for Bayesian Mediation Analysis Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses

More information

II. First variation of functionals

II. First variation of functionals II. First variation of functionals The erivative of a function being zero is a necessary conition for the etremum of that function in orinary calculus. Let us now tackle the question of the equivalent

More information

Learning Markov Graphs Up To Edit Distance

Learning Markov Graphs Up To Edit Distance Learning Markov Grahs U To Eit Distance Abhik Das, Praneeth Netraalli, Sujay Sanghavi an Sriram Vishwanath Deartment of ECE, The University of Texas at Austin, USA Abstract This aer resents a rate istortion

More information

Nonlinear Estimation. Professor David H. Staelin

Nonlinear Estimation. Professor David H. Staelin Nonlinear Estimation Professor Davi H. Staelin Massachusetts Institute of Technology Lec22.5-1 [ DD 1 2] ˆ = 1 Best Fit, "Linear Regression" Case I: Nonlinear Physics Data Otimum Estimator P() ˆ D 1 D

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Lecture 6: Calculus. In Song Kim. September 7, 2011

Lecture 6: Calculus. In Song Kim. September 7, 2011 Lecture 6: Calculus In Song Kim September 7, 20 Introuction to Differential Calculus In our previous lecture we came up with several ways to analyze functions. We saw previously that the slope of a linear

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

He s Homotopy Perturbation Method for solving Linear and Non-Linear Fredholm Integro-Differential Equations

He s Homotopy Perturbation Method for solving Linear and Non-Linear Fredholm Integro-Differential Equations nternational Journal of Theoretical an Alie Mathematics 2017; 3(6): 174-181 htt://www.scienceublishinggrou.com/j/ijtam oi: 10.11648/j.ijtam.20170306.11 SSN: 2575-5072 (Print); SSN: 2575-5080 (Online) He

More information

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim QF101: Quantitative Finance September 5, 2017 Week 3: Derivatives Facilitator: Christopher Ting AY 2017/2018 I recoil with ismay an horror at this lamentable plague of functions which o not have erivatives.

More information

Bayesian classification CISC 5800 Professor Daniel Leeds

Bayesian classification CISC 5800 Professor Daniel Leeds Bayesian classification CISC 5800 Professor Daniel Leeds Classifying with robabilities Examle goal: Determine is it cloudy out Available data: Light detector: x 0,25 Potential class (atmosheric states):

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

Calculus of Variations

Calculus of Variations Calculus of Variations Lagrangian formalism is the main tool of theoretical classical mechanics. Calculus of Variations is a part of Mathematics which Lagrangian formalism is base on. In this section,

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Solutions to Practice Problems Tuesday, October 28, 2008

Solutions to Practice Problems Tuesday, October 28, 2008 Solutions to Practice Problems Tuesay, October 28, 2008 1. The graph of the function f is shown below. Figure 1: The graph of f(x) What is x 1 + f(x)? What is x 1 f(x)? An oes x 1 f(x) exist? If so, what

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

SYDE 112, LECTURE 1: Review & Antidifferentiation

SYDE 112, LECTURE 1: Review & Antidifferentiation SYDE 112, LECTURE 1: Review & Antiifferentiation 1 Course Information For a etaile breakown of the course content an available resources, see the Course Outline. Other relevant information for this section

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Lecture 10: October 30, 2017

Lecture 10: October 30, 2017 Information an Coing Theory Autumn 2017 Lecturer: Mahur Tulsiani Lecture 10: October 30, 2017 1 I-Projections an applications In this lecture, we will talk more about fining the istribution in a set Π

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Bayesian Networks Practice

Bayesian Networks Practice ayesian Networks Practice Part 2 2016-03-17 young-hee Kim Seong-Ho Son iointelligence ab CSE Seoul National University Agenda Probabilistic Inference in ayesian networks Probability basics D-searation

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton

CSC321: 2011 Introduction to Neural Networks and Machine Learning. Lecture 11: Bayesian learning continued. Geoffrey Hinton CSC31: 011 Introdution to Neural Networks and Mahine Learning Leture 11: Bayesian learning ontinued Geoffrey Hinton Bayes Theorem, Prior robability of weight vetor Posterior robability of weight vetor

More information

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations Diophantine Approximations: Examining the Farey Process an its Metho on Proucing Best Approximations Kelly Bowen Introuction When a person hears the phrase irrational number, one oes not think of anything

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Integration by Parts

Integration by Parts Integration by Parts 6-3-207 If u an v are functions of, the Prouct Rule says that (uv) = uv +vu Integrate both sies: (uv) = uv = uv + u v + uv = uv vu, vu v u, I ve written u an v as shorthan for u an

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Exam 2 Answers Math , Fall log x dx = x log x x + C. log u du = 1 3

Exam 2 Answers Math , Fall log x dx = x log x x + C. log u du = 1 3 Exam Answers Math -, Fall 7. Show, using any metho you like, that log x = x log x x + C. Answer: (x log x x+c) = x x + log x + = log x. Thus log x = x log x x+c.. Compute these. Remember to put boxes aroun

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

A secure approach for embedding message text on an elliptic curve defined over prime fields, and building 'EC-RSA-ELGamal' Cryptographic System

A secure approach for embedding message text on an elliptic curve defined over prime fields, and building 'EC-RSA-ELGamal' Cryptographic System International Journal of Comuter Science an Information Security (IJCSIS), Vol. 5, No. 6, June 7 A secure aroach for embeing message tet on an ellitic curve efine over rime fiels, an builing 'EC-RSA-ELGamal'

More information

A Sketch of Menshikov s Theorem

A Sketch of Menshikov s Theorem A Sketch of Menshikov s Theorem Thomas Bao March 14, 2010 Abstract Let Λ be an infinite, locally finite oriente multi-graph with C Λ finite an strongly connecte, an let p

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information