Statistical Estimation

Size: px
Start display at page:

Download "Statistical Estimation"

Transcription

1 Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from fitted model, e.g. least squares maximize likelihood (what is likelihood?) These involve optimization. 1

2 Statistical Methods as Optimization Problems Many common statistical methods are developed as solutions to optimization problems. In a common method of statistical estimation, we maximize a likelihood, which is a function proportional to a probability density at the point of the observed data. max θ Θ L(θ; x) In another method of estimation and in standard modeling techniques, we minimize a norm of the residuals. The best fit of a model is often defined in terms of a minimum of a norm, such as least squares. max θ y i E(Y i ; θ, x i ) 2

3 Fitting Statistical Models A common type of model relates one variable to others in a form such as y = f(x; θ) + ɛ, (1) in which θ is a vector of fixed parameters with unknown and unobservable values. Fitting the model is mechanically equivalent to estimating θ. The most familiar form of this model is the linear model y = x T β + ɛ, where x and β are vectors. We also often assume that ɛ is a random variable with a normal distribution. There are two approaches, both optimization problems. In both, the first step is to replace the fixed unknown parameter with a variable. 3

4 Fitted Residuals For a given value of the variable used in place of the parameter and for each pair of observed values (y i, x i ), we from a fitted residual. In the case of the linear model, for example, we form the fitted residual r i (b) = y i x T i b, in terms of a variable b in place of the unknown estimand β. We should note the distinction between the fitted residuals r i (b) and the unobservable residuals or errors, ɛ i = y i x T i β. When ɛ is a random variable, ɛ i is a realization of the random variable, but the fitted residual r i (b) is not a realization of ɛ (as beginning statistics students sometimes think of it). In methods that fit the residuals, there may not be a probability model. Only a model of the expectation. 4

5 Minimizing Residuals The idea of minimizing the residuals from the observed data in the model is intuitively appealing. Because there is a residual at each observation, however, minimizing the residuals is not well-defined without additional statements. When there are several things to be minimized, we must decide on some way of combining them into a single measure. It is then the single measure that we seek to minimize. A useful type of overall measure is a norm of the vector of residuals. There are various norms. The L 2 norm of the vector of residuals n i=1 (y i x T i b)2. 5

6 Least Squares of Residuals Fitting the model so as to minimize the L 2 norm of the vector of residuals is least squares. With n observations, the ordinary least squares estimator of β in the linear model is the solution to the optimization problem min b n i=1 (y i x T i b)2. This optimization problem is relatively simple, and its solution can be expressed in a closed form as a system of linear equations. Lots of variations... weighted LS, nonlinear LS, regularized LS 6

7 Least Squares of Residuals in More Complicated Models If the expectation model is nonlinear in the parameters, for example, y i = θ 0 + e (θ 1t i +θ 2 t 2 i ) + ɛ i, as in Exercise 1.9 in the revised Chapter 1, then we usually cannot get the least squares estimator in closed form. We form r i (θ) = y i θ 0 + e (θ 1t i +θ 2 t 2 i ). Then we minimize r 2 i (θ) = (r(θ)) T r(θ) In any method for least squares, we use the fact that the gradient at the solution is 0. 7

8 Least Squares of Residuals in More Complicated Models Because we cannot get the least squares estimator in closed form, we use an iterative method to find a point at which the gradient is 0. We might use Newton s method, which involves the gradient of (r(θ)) T r(θ) and the Hessian of the same quantity. The gradient involves the Jacobian r(θ). It is s(t) = 2(J r (t)) T r(t). The R function nls computes the least squares estimates in a nonlinear model. 8

9 Properties of Least Squares Estimators in More Complicated Models Anytime we do statistical estimation, we want to know the properties of the estimators. Probability distribution, or at least the expectation (for bias), and if possible the variance. This may require a probability model for the distribution of the observations. 9

10 Variance of Least Squares Estimators The first thing we want is an estimation of the variance of the error term in the model. Under the simple expectation model, that variance is σ 2 = E((Y i E(Y i )) 2 ), which we assume to be the same for all i. A good estimator of this is the sample variance (with a divisor that makes it unbiased). Based on the properties of the objective function at the point of the solution, good estimate of the variance-covariance matrix of the parameter estimators or ((J r ( θ)) T 1 J r ( θ)) σ 2 (H s ( θ)) 1 σ 2. 10

11 Maximum Likelihood Estimation Given a PDF, say p(x; θ), and some (independent) data x 1,..., x n from a distribution with that PDF, the likelihood function is L(θ; x) = n i=1 p(x i ; θ). Note that the likelihood function is nonnegative. Note that this is the likelihood of the model (not likelihood of the observations ) Within a class of PDFs parameterized by θ, we could say it is the likelihood of θ. Considering θ to be a variable, and choosing the value of the variable that maximizes the likelihood with the given data, yields the maximum likelihood estimate of θ. 11

12 Maximum Likelihood Estimation MLE This requires a probability distribution model; not just a model of the expectation. The MLE often has good statistical properties, especially asymptotically. Also, use log-likelihood l(θ; x) = log(l(θ; x)) 12

13 Maximum Likelihood Estimation Another way of fitting the model y = f(x; θ)+ɛ is by MLE. What is the PDF? It comes from the probability distribution of ɛ. Given the data, this is the optimization problem max t n i=1 p(y i f(x i ; t)), where p( ) is the probability function or the probability density function of the random error. This yields a maximum likelihood estimator (MLE). For a given probability density p( ), the maximization problem for MLE is usually different from a minimization problem of fitting residuals. 13

14 EM Methods EM methods are used in maximum likelihood estimation when the given sample consists of two components, one observed and one unobserved or missing. A simple example of missing data occurs in life-testing, when, for example, a number of electrical units are switched on and the time when each fails is recorded. In such an experiment, it is usually necessary to curtail the recordings prior to the failure of all units. The failure times of the units still working are unobserved. Another common example is analysis of a finite mixture model. Each observation comes from an unknown one of an assumed set of distributions. The missing data is the distribution indicator. Finally, EM methods are useful when there are known relationships among the parameters of a model; that is, in a model that is overparametrized. 14

15 Missing Data The missing data can be missing observations on the same random variable that yields the observed sample, as in the case of the censoring example. or The missing data can be from a different random variable that is related in some way to the random variable observed, such as the class membership in a finite mixture model. Sometimes an EM method can be constructed based on an artificial missing random variable to supplement the observable data. We call the actual observed data X, and the unobserved data, U. Thus, we have complete data C = (X, U). 15

16 Likelihoods Let L C (θ; c) be the likelihood of the complete data, and let L X (θ; x) be the likelihood of the observed data. Let l C (θ; c) and l X (θ; x) be the corresponding log-likelihoods. Maximizing log-likelihoods is equivalent to maximizing likelihoods. We will just work with the log-likelihood. Our objective is to maximize the log-likelihood of the observed data, l X (θ; x). We will also use the conditional likelihood or the conditional loglikelihood: l C X (θ; c x) = l C (θ; x, u) l X (θ; x). The log-likelihood of interest is therefore l X (θ; x) = l C (θ; x, u) l C X (θ; c x). (2) 16

17 Notation: We will drop the constants and just write l X (θ), l C (θ), and l C X (θ). 17

18 Conditional Expectation of the Log-Likelihood as a Function of U For a given x, and any fixed value of θ, say θ (k), we consider conditional expectations of the log-likelihoods as functions of the random variable U E U x,θ (k) (l X (θ)) = E U x,θ (k) (l C (θ)) E U x,θ (k) ( lc X (θ) ). *Note the notation: E U x,θ (k), etc. Each term is nonnegative, and the term on the left is constant with respect to U; that is, the left-hand side is just l X (θ). Hence, for any θ and x, we have l X (θ) E U x,θ (k) (l C (θ)). 18

19 Maximization of the Log-Likelihood E U x,θ (k) (l C (θ)), which as we have seen, is an upper bound for l X (θ), plays an important role in the EM method. Let q k (x, θ) = E U x,θ (k) (l C (θ)). We find θ (1), θ (2),... by maximizing q k (x, θ) for k = 0,1,... That is, given θ (k), we determine θ (k+1) as the value of θ that maximizes the conditional expectation E U x,θ (k) (l C (θ)). We then update q k (x, θ) using E U x,θ (k+1). 19

20 The EM Method The EM approach to maximizing l X (θ ; x) has two alternating steps. The steps are iterated until convergence. E step : compute q k (x, θ) = E U x,θ (k)(l C (θ; x, U)). (notice that q k (x, θ) is a function of θ (k) ) M step : determine θ (k+1) to maximize q k (x, θ), or at least to increase it (subject to any constraints on acceptable values of θ). Notice that max l X (θ) max q k (x, θ). The sequence θ (1), θ (2),... converges to a local maximum of the observed-data likelihood l X (θ) under fairly general conditions. (It can be very slow to converge, however.) 20

21 Example 1 One of the simplest examples of the EM method was given by Dempster, Laird, and Rubin (1977) in the pioneering article that first systematically discussed EM methods. See Exercise 1.9 (Exercise 1.13 in revised Chapter 1). Consider the multinomial distribution with four outcomes, that is, the multinomial with probability function, p(x 1, x 2, x 3, x 4 ) = n! x 1!x 2!x 3!x 4! πx 1 1 πx 2 2 πx 3 3 πx 4 4, with x 1 + x 2 + x 3 + x 4 = n and π 1 + π 2 + π 3 + π 4 = 1. Notice that a binomial distribution can be formed from a multinomial distribution in various ways. 21

22 Overparametrized Multinomial Now, suppose the probabilities are related by a single parameter, θ: π 1 = θ π 2 = θ π 3 = θ π 4 = 1 4 θ, where 0 θ 1. This relationship may arise from known facts about the phenomenon giving rise to the individual groups in the multinomial distribution. The multinomial is overparametrized. 22

23 Overparametrized Multinomial Given an observation x = (x 1, x 2, x 3, x 4 ), the log-likelihood function is l(θ) = x 1 log(2 + θ) + (x 2 + x 3 )log(1 θ) + x 4 log(θ) + c, for some c that is constant with respect to θ. The objective is to estimate θ by maximizing l(θ). Notice that for this simple problem, the MLE of θ can be determined by solving a simple polynonial equation, because dl(θ) dθ = x θ x 2 + x 3 1 θ + x 4 θ. Let s proceed, however, with an EM formulation. 23

24 Overparametrized Multinomial We notice that π 2 = π 3, and π 1 = π 4 + 1/2. So to use the EM algorithm on this problem, we can think of a multinomial with five classes, which is formed from the original multinomial by splitting the first class into two with associated probabilities 1/2 and θ/4. The original variable x 1 is now the sum of u 1 and u 2. The vector c = (u 1, u 2, x 2, x 3, x 4 ) is the complete data. 24

25 Overparametrized Multinomial Under this reformulation, we now have a maximum likelihood estimate of θ by considering u 2 + x 4 to be a realization of a binomial with n = u 2 + x 4 + x 2 + x 3 and π = θ. However, we do not know u 2. Proceeding as if we had a five-outcome multinomial observation with two missing elements, we have the log-likelihood for the complete data, l C (θ) = (u 2 + x 4 )log(θ) + (x 2 + x 3 )log(1 θ), which has a maximum at θ = u 2 + x 4 u 2 + x 2 + x 3 + x 4. 25

26 Overparametrized Multinomial The E-step of the iterative EM algorithm fills in the missing or unobservable value with its expected value given a current value of the parameter, θ (k), and the observed data. Because l C (θ) is linear in the data, we have E U2 x,θ (k) (l C (θ)) = E U2 x,θ (k) (U 2 +x 4 )log(θ)+e U2 x,θ (k) (x 2 +x 3 )log(1 θ). Under this setup, with θ = θ (k), we get E U2 x,θ (k) (U 2 ) = 1 ( 4 x 1θ (k) 1 / x 1θ (k) ), and now for the M-step, we take u (k) 2 = E U2 x,θ (k) (U 2 ). 26

27 Overparametrized Multinomial We now maximize E U2 x,θ (k) (l C (θ)). We have a closed-form solution. The maximum occurs at θ (k+1) = (u (k) 2 + x 4)/(u (k) 2 + x 2 + x 3 + x 4 ), and this becomes the new value of θ. The following R statements execute a single iteration, after we initialize t.k u2.k <- x[1]*t.k/(2+t.k) t.kp1 <- (u2.k + x[4])/(sum(x)-x[1]+u2.k) Dempster, Laird, and Rubin used the data n = 197 and x = (125,18,20,34). Beginning with t.k in the range of 0.1 to 0.9, this converges to within 10 7 in less than 10 iterations. 27

28 Testing for Convergence We can put the R statements in a loop so long as we test for convergence and put a limit on the number of iterations. # initialize t.kp1 as the starting value and initialize t.k to any different value # initialize iter = 0. # initialize control values, eps and maxit. while (abs(t.k-t.kp1)>eps & iter <= maxit) { iter <- iter + 1 t.k <- t.kp1 u2.k <- x[1]*t.k/(2+t.k) t.kp1 <- (u2.k + x[4])/(sum(x)-x[1]+u2.k) } 28

29 Example 2 A Life-Testing Experiment Using an Exponential Model The exponential model for failure times is p(t) = 1 θ e t/θ. In the classic life-testing experiment, n items are put on test, and the failure times of n 1 items (usually the first ones to fail) are recorded. All that is known of the remaining n n 1 items is the time at which they were taken from the test (usually the time at which the test is stopped). The complete data would be the failure times for all n items: t 1,..., t n. 29

30 A Life-Testing Experiment Using an Exponential Model We will allow the n n 1 items whose failure times were not measured to be removed from the test (or censored ) at varying times; that is, for each one, all we know is a lower bound on its lifetime. The data that we actually have is the set t 1,..., t n1, c n1 +1,..., c n, where we have indexed the observations so that the first n 1 correspond to those with observed failure times, and c n1 +1,..., c n are the censored times for the remaining items; that is, the times at which they were removed from the test. (In many cases, of course, all of the c s would be the same.) If n j n, then all that is known about t j is that t j c j. 30

31 A Life-Testing Experiment Using an Given the data vector x, Exponential Model x = (x 1,..., x n1, x n1 +1,..., x n ) = (t 1,..., t n1, c n1 +1,..., c n ), the objective is to obtain the MLE of θ. The probability of units n through n exceeding the values x n1 +1,..., x n is e n i=n1 +1 x i/θ, and so the likelihood of the observed data is e n i=n1 +1 x i/θ n 1 i=1 1 θ e x i/θ, and the log-likelihood for the observed data is l(θ) = n 1 log(θ) n i=1 x i /θ. (3) 31

32 A Life-Testing Experiment Using an Exponential Model The maximum of the log-likelihood is easily obtained of course. It occurs at ˆθ = n i=1 x i /n 1, but let s use the EM method to approach the problem from the standpoint of how to deal with the missing data. The log-likelihood for the complete data is l C (θ) = nlog(θ) n i=1 x i θ, where, for j > n 1, x j is the unobserved realization of the random variable corresponding to time of failure. 32

33 A Life-Testing Experiment Using an Exponential Model In the EM formulation for this problem, we first get the E step. Given the observed values of x, and, for a given value of θ, say θ (k), taking the expectation of the complete log-likelihood for random variables X n1 +1,..., X n, we get for the E step, q k (x, θ) = nlog(θ) = nlog(θ) n 1 i=1 n i=1 x i + n i=n 1 +1 ( xi + θ (k)) /θ x i + (n n 1 )θ (k) /θ (4) The value that maximizes this is taken to be θ (k+1) ; that is, θ (k+1) = n i=1 x i + (n n 1 )θ (k) /n. 33

34 Example 3 A two-component normal mixture model can be defined by two normal distributions, N(µ 1, σ1 2) and N(µ 2, σ2 2 ), and the probability that the random variable (the observable) arises from the first distribution is w. The parameter in this model is the vector θ = (w, µ 1, σ 2 1, µ 2, σ 2 2 ). (Note that w and the σs have the obvious constraints.) The PDF of the mixture is p(x; θ) = wp 1 (x; µ 1, σ 2 1 ) + (1 w)p 2(x; µ 2, σ 2 2 ), where p j (x; µ j, σ 2 j ) is the normal PDF with parameters µ j and σ 2 j. (I am just writing them this way for convenience; p 1 and p 2 are actually the same parametrized function of course.) 34

35 The Two-Component Mixture Model In the standard formulation with C = (X, U), X represents the observed data, and the unobserved U represents class membership. For the two-class mixture model, let U = 1 if the observation is from the first distribution and U = 0 if the observation is from the second distribution. The unconditional E(U) is the probability that an observation comes from the first distribution, which of course is w. Suppose we have n observations on X, x = (x 1,..., x n ). In the notation we have been using, q (k) i = q (k) (x i, θ (k) ) is the weight (or estimated probability) that the observation is in the first group. 35

36 Estimation in the Two-Component Normal Mixture Model Given a provisional value of θ, we can compute the conditional expected value E(U x) for any realization of X, x i. It is merely q (k) i = E(U x i, θ (k) ) = p ( w (k) p 1 x i ; µ (k) ) 1, σ2(k) 1 ( ). x i ; w (k), µ (k) 1, σ2(k) 1, µ (k) (5) 2, σ2(k) 2 36

37 The M step is just the familiar MLE of the parameters: w (k+1) = 1 n E (U x i, θ (k) ) σ 2(k+1) 2 = µ (k+1) 1 = 1 q (k) nw (k+1) i σ1 2(k+1) = 1 q (k) nw (k+1) i µ (k+1) 2 = ( 1 n ( ( 1 w (k+1)) 1 n(1 w (k+1) ) ( 1 q (k) i x i x i µ (k+1) 1 ) 1 q (k) ) ( i ) 2 x i x i µ (k+1) 2 ) 2 (6) 37

38 Easy to program in R 38

39 The General Mixture Model A multi-component mixture model can be defined by specifying the individual distributions and and the probability that the random variable (the observable) arises from the each distribution. For a mixture of g distributions, if the PDF of the j th distribution is p j (x; θ j ), the PDF of the mixture is p(x; θ) = where π j 0 and g j=1 π j = 1. g j=1 π j p j (x; θ j ), The g-vector π is the unconditional probabilities for a random variable from the mixture. 39

40 The General Mixture Model: Dummy Variables For observations from different groups, we can use a single indicator variable to designate the group, or we can use dummy variables that take values of 0 or 1, depending on the group of a given observation. In linear models when dummy variables are used to distinguish classes, the number of dummy variables is one less than the number of classes. This yields linearly independent column vectors in the data matrix. In mixture models, it is more convenient to use a dummy variable for each class. For an observation in a given class, only the dummy variable for that class takes a value of 1, and all others take the value 0; thus, the sum of the dummy variables for a given observation is 1. In a general probability mixture model, the expected value of a dummy variable is the probability that an observation comes from the group corresponding to the dummy variable. 40

41 The General Mixture Model In the formulation C = (X, U) where X represents the observed data, the unobserved U represents the g-vector of dummy variables. The unconditional E(U) is vector of probabilities that an observation comes from the various components of the mixture distribution. Suppose we have n observations on X, x = (x 1,..., x n ). In the notation we have been using, q (k) i = q (k) (x i, θ (k) ), the conditional expectation of U, is the estimated conditional probability vector for that observation. 41

42 Given provisional values of π and θ, π (k) and θ (k), we can compute the conditional expected value of the j th dummy variable for any realization of X, x i : q (k) ij = E(U j x i, θ (k) ) = x i ; θ (k) j p (x i ; θ (k)). π (k) ( j p j )

43 Estimation in a General Normal Mixture Model Given E(U x, θ (k) ) as before, we can perform the M step. First, we have, the conditional MLE for the g-vector of probabilities: π (k+1) = 1 n n i=1 E ( U x i, θ (k)). (7) For a mixture of normal distributions, θ j = (µ j, σj 2 ), that is, the mean and variance of the j th distribution, for the conditional MLEs, we have the weighted estimates µ (k+1) j = 1 π (k+1) j σ 2(k+1) j = 1 π (k+1) j ni=1 q (k) ij ni=1 q (k) ij x i ( x i µ (k+1) i ) 2. (8) These equations are similar to equations (6). 42

44 EM in R There are various packages and functions for applications of EM in R. Instead of a general function for EM, the functions depend on the applications; for example, mixtools and mclust use EM to estimate the parameters in mixtures of populations. A very general package that allows regression models instead of just probability models, to distinguish the different populations is flexmix. 43

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness Information in Data Sufficiency, Ancillarity, Minimality, and Completeness Important properties of statistics that determine the usefulness of those statistics in statistical inference. These general properties

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Optimization Problems

Optimization Problems Optimization Problems The goal in an optimization problem is to find the point at which the minimum (or maximum) of a real, scalar function f occurs and, usually, to find the value of the function at that

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

EM for ML Estimation

EM for ML Estimation Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Review of Maximum Likelihood Estimators

Review of Maximum Likelihood Estimators Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

2 Nonlinear least squares algorithms

2 Nonlinear least squares algorithms 1 Introduction Notes for 2017-05-01 We briefly discussed nonlinear least squares problems in a previous lecture, when we described the historical path leading to trust region methods starting from the

More information

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Find the maximum likelihood estimate of θ where θ is a parameter

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Greene, Econometric Analysis (6th ed, 2008)

Greene, Econometric Analysis (6th ed, 2008) EC771: Econometrics, Spring 2010 Greene, Econometric Analysis (6th ed, 2008) Chapter 17: Maximum Likelihood Estimation The preferred estimator in a wide variety of econometric settings is that derived

More information

New mixture models and algorithms in the mixtools package

New mixture models and algorithms in the mixtools package New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Statistical Methods as Optimization Problems

Statistical Methods as Optimization Problems models ( 1 Statistical Methods as Optimization Problems Optimization problems maximization or imization arise in many areas of statistics. Statistical estimation and modeling both are usually special types

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

Mixtures of Rasch Models

Mixtures of Rasch Models Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters

More information

Lecture 3 - Linear and Logistic Regression

Lecture 3 - Linear and Logistic Regression 3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin March 23, 2011 1 Introduction 2 3 4 5 Outline Introduction 1 Introduction 2 3 4 5 Preliminaries Introduction Ordinary least squares

More information

ECONOMETRICS I. Assignment 5 Estimation

ECONOMETRICS I. Assignment 5 Estimation ECONOMETRICS I Professor William Greene Phone: 212.998.0876 Office: KMC 7-90 Home page: people.stern.nyu.edu/wgreene Email: wgreene@stern.nyu.edu Course web page: people.stern.nyu.edu/wgreene/econometrics/econometrics.htm

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Regression Estimation Least Squares and Maximum Likelihood

Regression Estimation Least Squares and Maximum Likelihood Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Prof. C. F. Jeff Wu ISyE 8813 Section 1 Motivation What is parameter estimation? A modeler proposes a model M(θ) for explaining some observed phenomenon θ are the parameters

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ )

(θ θ ), θ θ = 2 L(θ ) θ θ θ θ θ (θ )= H θθ (θ ) 1 d θ (θ ) Setting RHS to be zero, 0= (θ )+ 2 L(θ ) (θ θ ), θ θ = 2 L(θ ) 1 (θ )= H θθ (θ ) 1 d θ (θ ) O =0 θ 1 θ 3 θ 2 θ Figure 1: The Newton-Raphson Algorithm where H is the Hessian matrix, d θ is the derivative

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

A note on shaved dice inference

A note on shaved dice inference A note on shaved dice inference Rolf Sundberg Department of Mathematics, Stockholm University November 23, 2016 Abstract Two dice are rolled repeatedly, only their sum is registered. Have the two dice

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Accelerating the EM Algorithm for Mixture Density Estimation

Accelerating the EM Algorithm for Mixture Density Estimation Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45 Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS 21 June 2010 9:45 11:45 Answer any FOUR of the questions. University-approved

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was

More information

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme Applied Mathematical Sciences, Vol. 12, 2018, no. 18, 879-891 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2018.8691 Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Review of Discrete Probability (contd.)

Review of Discrete Probability (contd.) Stat 504, Lecture 2 1 Review of Discrete Probability (contd.) Overview of probability and inference Probability Data generating process Observed data Inference The basic problem we study in probability:

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Basic Computations in Statistical Inference

Basic Computations in Statistical Inference 1 Basic Computations in Statistical Inference The purpose of an exploration of data may be rather limited and ad hoc, or the purpose may be more general, perhaps to gain understanding of some natural phenomenon.

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information