Foundations of Statistical Inference
|
|
- Ferdinand Eaton
- 5 years ago
- Views:
Transcription
1 Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT / 27
2 Course arrangements Lectures M.2 and Th.10 SPR1 weeks 1-8 Classes Weeks 3-8 Friday 12-1, 2 4-5, 5-6 SPR1/2 Hand in solutions by 12noon on Wednesday SPR1. Class Tutors : Jonathan Marchini, Charlotte Greenan Notes and Problem sheets will be available at Books Garthwaite, P. H., Jolliffe, I. T. and Jones, B. (2002) Statistical Inference, Oxford Science Publications Leonard, T., Hsu, J. S. (2005) Bayesian Methods, Cambridge University Press. D. R. Cox (2006) Principals of Statistical Inference This course builds on notes from Bob Griffiths and Geoff Nicholls Jonathan Marchini (University of Oxford) BS2a MT / 27
3 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Jonathan Marchini (University of Oxford) BS2a MT / 27
4 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Jonathan Marchini (University of Oxford) BS2a MT / 27
5 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Jonathan Marchini (University of Oxford) BS2a MT / 27
6 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) Jonathan Marchini (University of Oxford) BS2a MT / 27
7 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Jonathan Marchini (University of Oxford) BS2a MT / 27
8 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Interval estimation(credible intervals, HPD intervals) Jonathan Marchini (University of Oxford) BS2a MT / 27
9 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Interval estimation(credible intervals, HPD intervals) Priors(conjugate priors, improper priors, Jeffreys prior ) Jonathan Marchini (University of Oxford) BS2a MT / 27
10 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Interval estimation(credible intervals, HPD intervals) Priors(conjugate priors, improper priors, Jeffreys prior ) Hypothesis testing (marginal likelihoods, Bayes Factors) Jonathan Marchini (University of Oxford) BS2a MT / 27
11 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Interval estimation(credible intervals, HPD intervals) Priors(conjugate priors, improper priors, Jeffreys prior ) Hypothesis testing (marginal likelihoods, Bayes Factors) Jonathan Marchini (University of Oxford) BS2a MT / 27
12 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Jonathan Marchini (University of Oxford) BS2a MT / 27
13 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. Jonathan Marchini (University of Oxford) BS2a MT / 27
14 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Jonathan Marchini (University of Oxford) BS2a MT / 27
15 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. Jonathan Marchini (University of Oxford) BS2a MT / 27
16 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. What are limits of how well we can estimate a parameter θ? Jonathan Marchini (University of Oxford) BS2a MT / 27
17 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. What are limits of how well we can estimate a parameter θ? Cramer-Rao inequality (and bound). Jonathan Marchini (University of Oxford) BS2a MT / 27
18 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. What are limits of how well we can estimate a parameter θ? Cramer-Rao inequality (and bound). How can we find good estimators of a parameter θ? Jonathan Marchini (University of Oxford) BS2a MT / 27
19 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. What are limits of how well we can estimate a parameter θ? Cramer-Rao inequality (and bound). How can we find good estimators of a parameter θ? Rao-Blackwell Theorem and Lehmann-Scheffé Theorem. Jonathan Marchini (University of Oxford) BS2a MT / 27
20 Bayesian inference Parameters are treated as random variables. Inference starts by specifying a prior distribution on θ based on prior beliefs. Having collected some data we use Bayes Theorem to update our beliefs to obtain a posterior distribution. Jonathan Marchini (University of Oxford) BS2a MT / 27
21 Bayesian inference Parameters are treated as random variables. Inference starts by specifying a prior distribution on θ based on prior beliefs. Having collected some data we use Bayes Theorem to update our beliefs to obtain a posterior distribution. Quick Example Suppose I give a coin and tell you that it is bit biased. We might use a Beta(4,4) distribution to represent our beliefs about the θ. If we observe 30 heads and 10 tails we can use probability theory to infer a posterior distribution for θ of Beta(34, 14). Prior distribution Posterior distribution Jonathan Marchini (University of Oxford) BS2a MT / 27
22 Computational techniques for Bayesian inference It is not always possible to obtain an analytic solution when doing Bayesian Inference, so we study approximate computational techniques in this course. Jonathan Marchini (University of Oxford) BS2a MT / 27
23 Computational techniques for Bayesian inference It is not always possible to obtain an analytic solution when doing Bayesian Inference, so we study approximate computational techniques in this course. These include Markov chain Monte Carlo (MCMC) methods Jonathan Marchini (University of Oxford) BS2a MT / 27
24 Computational techniques for Bayesian inference It is not always possible to obtain an analytic solution when doing Bayesian Inference, so we study approximate computational techniques in this course. These include Markov chain Monte Carlo (MCMC) methods Approximations to marginal likelihoods NEW Variational Approximations Laplace approximations Bayesian Information Criterion (BIC) Jonathan Marchini (University of Oxford) BS2a MT / 27
25 Computational techniques for Bayesian inference It is not always possible to obtain an analytic solution when doing Bayesian Inference, so we study approximate computational techniques in this course. These include Markov chain Monte Carlo (MCMC) methods Approximations to marginal likelihoods NEW Variational Approximations Laplace approximations Bayesian Information Criterion (BIC) The EM algorithm NEW useful in Frequentist and Bayesian inference of missing data problems Jonathan Marchini (University of Oxford) BS2a MT / 27
26 Decision theory Quick Example You have been exposed to a deadly virus. About 1/3 of people who are exposed to the virus are infected by it, and all those infected by it die unless they receive a vaccine. By the time any symptoms of the virus show up, it is too late for the vaccine to work. You are offered a vaccine for 500. Do you take it or not? Jonathan Marchini (University of Oxford) BS2a MT / 27
27 Decision theory Quick Example You have been exposed to a deadly virus. About 1/3 of people who are exposed to the virus are infected by it, and all those infected by it die unless they receive a vaccine. By the time any symptoms of the virus show up, it is too late for the vaccine to work. You are offered a vaccine for 500. Do you take it or not? The most likely scenario is that you don t have the virus but basing a decision on this ignores the costs (or loss) associated with the decisions. We would put a very high loss on dying! Jonathan Marchini (University of Oxford) BS2a MT / 27
28 Decision theory Quick Example You have been exposed to a deadly virus. About 1/3 of people who are exposed to the virus are infected by it, and all those infected by it die unless they receive a vaccine. By the time any symptoms of the virus show up, it is too late for the vaccine to work. You are offered a vaccine for 500. Do you take it or not? The most likely scenario is that you don t have the virus but basing a decision on this ignores the costs (or loss) associated with the decisions. We would put a very high loss on dying! Decision theory allows us to formalise this problem. Jonathan Marchini (University of Oxford) BS2a MT / 27
29 Decision theory Quick Example You have been exposed to a deadly virus. About 1/3 of people who are exposed to the virus are infected by it, and all those infected by it die unless they receive a vaccine. By the time any symptoms of the virus show up, it is too late for the vaccine to work. You are offered a vaccine for 500. Do you take it or not? The most likely scenario is that you don t have the virus but basing a decision on this ignores the costs (or loss) associated with the decisions. We would put a very high loss on dying! Decision theory allows us to formalise this problem. Closely connected to Bayesian ideas. Applicable to point estimation and hypothesis testing. Inference is based on specifying a number of possible actions we might take and a loss associated with each action. The theory involves determining a rule to make decisions using an overall measure of loss. Jonathan Marchini (University of Oxford) BS2a MT / 27
30 Notation X, Y, Z Capital letters for random variables. x, y, z Lower case letters for realisations of random variables. E X ( ) Expectation with respect to the random variable X. φ = {φ 1,..., φ k } Sometimes we will use bold symbols to denote a vector of parameters. Jonathan Marchini (University of Oxford) BS2a MT / 27
31 Lecture 1 - Exponential families Jonathan Marchini (University of Oxford) BS2a MT / 27
32 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Jonathan Marchini (University of Oxford) BS2a MT / 27
33 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Jonathan Marchini (University of Oxford) BS2a MT / 27
34 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Examples 1. Normal N(θ, 1) : f (x; θ) = 1 e 1 2 (x θ)2 x R, θ R. 2π Jonathan Marchini (University of Oxford) BS2a MT / 27
35 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Examples 1. Normal N(θ, 1) : f (x; θ) = 1 e 1 2 (x θ)2 x R, θ R. 2π 2. Poisson: f (x; θ) = θx x! e θ, x = 0, 1, 2,..., θ > 0. Jonathan Marchini (University of Oxford) BS2a MT / 27
36 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Examples 1. Normal N(θ, 1) : f (x; θ) = 1 e 1 2 (x θ)2 x R, θ R. 2π 2. Poisson: f (x; θ) = θx x! e θ, x = 0, 1, 2,..., θ > Regression: f (y; θ) = n i=1 1 e 1 2 (y i p 2πσ j=1 x ij β j ) 2, y R n, σ > 0, β R p. θ = {β, σ}. Jonathan Marchini (University of Oxford) BS2a MT / 27
37 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Examples 1. Normal N(θ, 1) : f (x; θ) = 1 e 1 2 (x θ)2 x R, θ R. 2π 2. Poisson: f (x; θ) = θx x! e θ, x = 0, 1, 2,..., θ > Regression: f (y; θ) = n i=1 1 e 1 2 (y i p 2πσ j=1 x ij β j ) 2, y R n, σ > 0, β R p. θ = {β, σ}. Jonathan Marchini (University of Oxford) BS2a MT / 27
38 Exponential families of distributions Definition 1 GJJ 2.6, DRC 2.3 A rv X belongs to a k-parameter exponential family if its probability density function (pdf) can be written as k f (x; θ) = exp A j (θ)b j (x) + C(x) + D(θ), j=1 where x χ, θ Θ, A 1 (θ),..., A k (θ), D(θ) are functions of θ alone and B 1 (x), B 2 (x),..., B k (x), C(x) are well behaved functions of x alone. Jonathan Marchini (University of Oxford) BS2a MT / 27
39 Exponential families of distributions Definition 1 GJJ 2.6, DRC 2.3 A rv X belongs to a k-parameter exponential family if its probability density function (pdf) can be written as k f (x; θ) = exp A j (θ)b j (x) + C(x) + D(θ), j=1 where x χ, θ Θ, A 1 (θ),..., A k (θ), D(θ) are functions of θ alone and B 1 (x), B 2 (x),..., B k (x), C(x) are well behaved functions of x alone. Exponential families are widely used in practice - for example in generalised linear models (see BS1a). Jonathan Marchini (University of Oxford) BS2a MT / 27
40 Example 1 : Poisson We want to put the Poisson distribution in the form (with k = 1) f (x; θ) = exp {A(θ)B(x) + C(x) + D(θ)}, Jonathan Marchini (University of Oxford) BS2a MT / 27
41 Example 1 : Poisson We want to put the Poisson distribution in the form (with k = 1) f (x; θ) = exp {A(θ)B(x) + C(x) + D(θ)}, e θ θ x θ+x log θ log x! /x! = e = exp {(log θ)x log x! θ} Jonathan Marchini (University of Oxford) BS2a MT / 27
42 Example 1 : Poisson We want to put the Poisson distribution in the form (with k = 1) f (x; θ) = exp {A(θ)B(x) + C(x) + D(θ)}, e θ θ x θ+x log θ log x! /x! = e = exp {(log θ)x log x! θ} So A(θ) = log θ, B(x) = x, C(x) = log x!, D(θ) = θ. Jonathan Marchini (University of Oxford) BS2a MT / 27
43 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x n log(1 p) Jonathan Marchini (University of Oxford) BS2a MT / 27
44 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x Pois(θ) e θ θ x /x! log θ x log(x!) θ n log(1 p) Jonathan Marchini (University of Oxford) BS2a MT / 27
45 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x Pois(θ) e θ θ x /x! log θ x log(x!) θ n log(1 p) N(µ, 1) 1 2π exp{ (x µ)2 2 } µ x x 2 /2 1 2 (µ2 log(2π)) Jonathan Marchini (University of Oxford) BS2a MT / 27
46 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x Pois(θ) e θ θ x /x! log θ x log(x!) θ n log(1 p) N(µ, 1) 1 2π exp{ (x µ)2 2 } µ x x 2 /2 1 2 (µ2 log(2π)) Exp(θ) θe θx θ x 0 log θ Jonathan Marchini (University of Oxford) BS2a MT / 27
47 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x Pois(θ) e θ θ x /x! log θ x log(x!) θ n log(1 p) N(µ, 1) 1 2π exp{ (x µ)2 2 } µ x x 2 /2 1 2 (µ2 log(2π)) Exp(θ) θe θx θ x 0 log θ Others : negative binomial, Pareto (with known minimum), Weibull (with known shape), Laplace (with known mean), Log-normal, inverse Gaussian, beta, Dirichlet, Wishart. Exercise: check these distributions Jonathan Marchini (University of Oxford) BS2a MT / 27
48 Example 2 : a 2-parameter family (Gamma) If X Gamma(α, β) then let θ = (α, β) so f (x; θ) = βα x α 1 e βx Γ(α) Jonathan Marchini (University of Oxford) BS2a MT / 27
49 Example 2 : a 2-parameter family (Gamma) If X Gamma(α, β) then let θ = (α, β) so f (x; θ) = βα x α 1 e βx Γ(α) = exp {α log β + (α 1) log x βx log Γ(α)} Jonathan Marchini (University of Oxford) BS2a MT / 27
50 Example 2 : a 2-parameter family (Gamma) If X Gamma(α, β) then let θ = (α, β) so f (x; θ) = βα x α 1 e βx Γ(α) = exp {α log β + (α 1) log x βx log Γ(α)} = exp { (α 1) log x βx log [ Γ(α)β α]} Jonathan Marchini (University of Oxford) BS2a MT / 27
51 Example 2 : a 2-parameter family (Gamma) If X Gamma(α, β) then let θ = (α, β) so f (x; θ) = βα x α 1 e βx Γ(α) = exp {α log β + (α 1) log x βx log Γ(α)} = exp { (α 1) log x βx log [ Γ(α)β α]} And we have A 1 (θ) = α 1, B 1 (x) = log x, A 2 (θ) = β, B 2 (x) = x. Jonathan Marchini (University of Oxford) BS2a MT / 27
52 Some other 2-parameter Exponential families Distribution f (x; θ) A(θ) B(x) C(x) D(θ) N(µ, σ 2 ) e (x µ)2 2σ 2 A 2πσ 2 1 (θ) = 1/2σ 2 B 1 (x) = x log(2πσ2 ) A 2 (θ) = µ/σ 2 B 2 (x) = x µ2 /σ 2 Gamma β α x α 1 e βx Γ(α) A 1 (θ) = α 1 B 1 (x) = log x 0 log [ Γ(α)β α] A 2 (θ) = β B 2 (x) = x Jonathan Marchini (University of Oxford) BS2a MT / 27
53 Exponential family canonical form Let φ j = A j (θ), j = 1,..., k k f (x; φ) = exp φ j B j (x) + C(x) + D(φ). j=1 φ j, j = 1,..., k are the canonical parameters, B j, j = 1,..., k are the canonical observations. These are sometimes called the natural parameters and observations. Jonathan Marchini (University of Oxford) BS2a MT / 27
54 Since R f (x; φ)dx = 1 Jonathan Marchini (University of Oxford) BS2a MT / 27
55 Since we have R f (x; φ)dx = 1 R k exp φ j B j (x) + C(x) + D(φ) dx = 1 j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27
56 Since we have R f (x; φ)dx = 1 k exp φ R j B j (x) + C(x) + D(φ) dx = 1 j=1 k exp{d(φ)} exp φ R j B j (x) + C(x) dx = 1 j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27
57 Since we have R f (x; φ)dx = 1 k exp φ R j B j (x) + C(x) + D(φ) dx = 1 j=1 k exp{d(φ)} exp φ R j B j (x) + C(x) dx = 1 j=1 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27
58 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27
59 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i R j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27
60 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ R j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i j=1 k B i (x) exp φ j B j (x) + C(x) + D(φ) dx = D(φ) φ i R j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27
61 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ R j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i j=1 k B i (x) exp φ j B j (x) + C(x) + D(φ) dx = D(φ) φ i R j=1 E[B i (X)] = φ i D(φ) Jonathan Marchini (University of Oxford) BS2a MT / 27
62 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ R j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i j=1 k B i (x) exp φ j B j (x) + C(x) + D(φ) dx = D(φ) φ i R j=1 E[B i (X)] = φ i D(φ) Exercise Cov[B i (X), B j (X)] = 2 φ i φ j D(φ) Jonathan Marchini (University of Oxford) BS2a MT / 27
63 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ R j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i j=1 k B i (x) exp φ j B j (x) + C(x) + D(φ) dx = D(φ) φ i R j=1 E[B i (X)] = φ i D(φ) Exercise Cov[B i (X), B j (X)] = 2 φ i φ j D(φ) Exercise Var[B i (X)] = 2 φ 2 D(φ) i Jonathan Marchini (University of Oxford) BS2a MT / 27
64 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. Jonathan Marchini (University of Oxford) BS2a MT / 27
65 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} Jonathan Marchini (University of Oxford) BS2a MT / 27
66 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x Jonathan Marchini (University of Oxford) BS2a MT / 27
67 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) Jonathan Marchini (University of Oxford) BS2a MT / 27
68 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) = (φ 2 + 1) log( φ 1 ) log Γ(φ 2 + 1) Jonathan Marchini (University of Oxford) BS2a MT / 27
69 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) = (φ 2 + 1) log( φ 1 ) log Γ(φ 2 + 1) E[X] = D(φ) = (φ 2 + 1) = α φ 1 φ 1 β Jonathan Marchini (University of Oxford) BS2a MT / 27
70 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) = (φ 2 + 1) log( φ 1 ) log Γ(φ 2 + 1) E[X] = D(φ) = (φ 2 + 1) = α φ 1 φ 1 β Var[X] = 2 φ 2 D(φ) = (φ 2 + 1) 1 φ 2 = α β 1 2 Jonathan Marchini (University of Oxford) BS2a MT / 27
71 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) = (φ 2 + 1) log( φ 1 ) log Γ(φ 2 + 1) E[X] = D(φ) = (φ 2 + 1) = α φ 1 φ 1 β Var[X] = 2 φ 2 D(φ) = (φ 2 + 1) 1 φ 2 = α β 1 2 Exercise: show E[log X] = ψ 0 (α) log(β) where ψ 0 is the digamma function, and Γ (α) = Γ(α)ψ 0 (α). Jonathan Marchini (University of Oxford) BS2a MT / 27
72 Cumulant Generating Function In a scalar canonical exponential family (k = 1) E X [e sb(x) ] = exp {(φ + s)b(x) + C(x) + D(φ)} dx R = exp{d(φ) D(φ + s)} Jonathan Marchini (University of Oxford) BS2a MT / 27
73 Cumulant Generating Function In a scalar canonical exponential family (k = 1) E X [e sb(x) ] = exp {(φ + s)b(x) + C(x) + D(φ)} dx R = exp{d(φ) D(φ + s)} If M B(X) (s) is the Moment Generating Function (mgf) for B(X), then log(m B(X) (s)) = D(φ) D(φ + s) Jonathan Marchini (University of Oxford) BS2a MT / 27
74 Cumulant Generating Function In a scalar canonical exponential family (k = 1) E X [e sb(x) ] = exp {(φ + s)b(x) + C(x) + D(φ)} dx R = exp{d(φ) D(φ + s)} If M B(X) (s) is the Moment Generating Function (mgf) for B(X), then log(m B(X) (s)) = D(φ) D(φ + s) This is the cumulant generating function (defined as the log of the mgf) for the cumulants of B(X) i.e. log(m B(X) (s)) = κ r s r /r! where κ 1 = E(B(X)) and κ 2 = V (B(X)) Exercise : prove this Jonathan Marchini (University of Oxford) BS2a MT / 27 r=1
75 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. Jonathan Marchini (University of Oxford) BS2a MT / 27
76 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. φ = log p (1 p) p = eφ 1 + e φ Jonathan Marchini (University of Oxford) BS2a MT / 27
77 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. φ = log p (1 p) p = eφ 1 + e φ and B 1 (x) = x, D(φ) = n log(1 p) = n log(1 + e φ ) Jonathan Marchini (University of Oxford) BS2a MT / 27
78 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. φ = log p (1 p) p = eφ 1 + e φ and B 1 (x) = x, D(φ) = n log(1 p) = n log(1 + e φ ) log(m X (s)) = D(φ) D(φ + s) = n log(1 + e φ ) + n log(1 + e φ+s ) Jonathan Marchini (University of Oxford) BS2a MT / 27
79 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. φ = log p (1 p) p = eφ 1 + e φ and B 1 (x) = x, D(φ) = n log(1 p) = n log(1 + e φ ) log(m X (s)) = D(φ) D(φ + s) = n log(1 + e φ ) + n log(1 + e φ+s ) Therefore κ 1 = s log(m X (s)) = n eφ s=0 1 + e φ = np Jonathan Marchini (University of Oxford) BS2a MT / 27
80 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf f (x; θ) = θe x (1 + e x ) θ+1 Jonathan Marchini (University of Oxford) BS2a MT / 27
81 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf θe x f (x; θ) = (1 + e x ) θ+1 { ( e = exp θ log(1 + e x x ) } ) + log 1 + e x + log θ Jonathan Marchini (University of Oxford) BS2a MT / 27
82 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf θe x f (x; θ) = (1 + e x ) θ+1 { ( e = exp θ log(1 + e x x ) } ) + log 1 + e x + log θ and φ = θ, B 1 (x) = log(1 + e x ) and D(φ) = log θ = log( φ) Jonathan Marchini (University of Oxford) BS2a MT / 27
83 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf θe x f (x; θ) = (1 + e x ) θ+1 { ( e = exp θ log(1 + e x x ) } ) + log 1 + e x + log θ and φ = θ, B 1 (x) = log(1 + e x ) and D(φ) = log θ = log( φ) log(m X (s)) = D(φ) D(φ + s) = log( φ) log( (φ + s)) Jonathan Marchini (University of Oxford) BS2a MT / 27
84 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf θe x f (x; θ) = (1 + e x ) θ+1 { ( e = exp θ log(1 + e x x ) } ) + log 1 + e x + log θ and φ = θ, B 1 (x) = log(1 + e x ) and D(φ) = log θ = log( φ) log(m X (s)) = D(φ) D(φ + s) = log( φ) log( (φ + s)) E(log(1 + e x )) = 1 = 1 φ + s s=0 θ Var(log(1 + e x 1 s=0 )) = (φ + s) 2 = 1 θ 2 These results are harder to derive directly. Jonathan Marchini (University of Oxford) BS2a MT / 27
85 Family preserved under transformations A smooth invertible transformation of a rv from the Exponential family is also within the Exponential family. If X Y, Y = Y (X) then f Y (y; θ) = f X (x(y); θ) X/ Y k = exp A j (θ)b j (x(y)) + C(x(y)) + D(θ) X/ Y, j=1 The Jacobian depends only on y and so the natural observation B(x(y)), the natural parameter A(θ), and D(θ) do not change, while C(X) C(X(Y )) + log X/ Y. Jonathan Marchini (University of Oxford) BS2a MT / 27
86 Exponential family conditions 1. The range of X, χ does not depend on θ. Jonathan Marchini (University of Oxford) BS2a MT / 27
87 Exponential family conditions 1. The range of X, χ does not depend on θ. 2. ( A 1 (θ), A 2 (θ),..., A k (θ) ) have continuous second derivatives for θ Θ. Jonathan Marchini (University of Oxford) BS2a MT / 27
88 Exponential family conditions 1. The range of X, χ does not depend on θ. 2. ( A 1 (θ), A 2 (θ),..., A k (θ) ) have continuous second derivatives for θ Θ. 3. If dim Θ = d, then the Jacobian [ ] Ai (θ) J(θ) = θ j has full rank d for θ Θ. Jonathan Marchini (University of Oxford) BS2a MT / 27
89 Exponential family conditions 1. The range of X, χ does not depend on θ. 2. ( A 1 (θ), A 2 (θ),..., A k (θ) ) have continuous second derivatives for θ Θ. 3. If dim Θ = d, then the Jacobian [ ] Ai (θ) J(θ) = θ j has full rank d for θ Θ. [Part of the regularity conditions which we cite later.] Jonathan Marchini (University of Oxford) BS2a MT / 27
90 The family is minimal if no linear combination exists such that for all x χ. λ 0 + λ 1 B 1 (x) + + λ k B k (x) = 0 Jonathan Marchini (University of Oxford) BS2a MT / 27
91 The family is minimal if no linear combination exists such that λ 0 + λ 1 B 1 (x) + + λ k B k (x) = 0 for all x χ. The dimension of the family is defined to be d, the rank of J(θ). Jonathan Marchini (University of Oxford) BS2a MT / 27
92 The family is minimal if no linear combination exists such that for all x χ. λ 0 + λ 1 B 1 (x) + + λ k B k (x) = 0 The dimension of the family is defined to be d, the rank of J(θ). A minimal exponential family is said to be curved when d < k and linear when d = k. We refer to a (k, d) curved exponential family. Jonathan Marchini (University of Oxford) BS2a MT / 27
93 The family is minimal if no linear combination exists such that for all x χ. λ 0 + λ 1 B 1 (x) + + λ k B k (x) = 0 The dimension of the family is defined to be d, the rank of J(θ). A minimal exponential family is said to be curved when d < k and linear when d = k. We refer to a (k, d) curved exponential family. Example 6 (X 1, X 2 ) independent, normal, unit variance, means (θ, c/θ), c known. log f (x; θ) = x 1 θ + cx 2 /θ θ 2 /2 c 2 θ 2 / is a (2, 1) curved exponential family. Jonathan Marchini (University of Oxford) BS2a MT / 27
94 Linear relations among A s do not generate curvature Take a k-parameter exponential family and impose A 1 = aa 2 + b for constants a and b. k A i B i + C + D = i=1 k A i B i + A 2 (ab 1 + B 2 ) + (C + bb 1 ) + D i=3 This gives a k 1 parameter EF, not a (k, k 1)-CEF. Jonathan Marchini (University of Oxford) BS2a MT / 27
95 Examples not in an exponential family 1 Uniform on [0, θ], θ > 0. f (x; θ) = 1, x [0, θ] θ > 0 θ Jonathan Marchini (University of Oxford) BS2a MT / 27
96 Examples not in an exponential family 1 Uniform on [0, θ], θ > 0. f (x; θ) = 1, x [0, θ] θ > 0 θ 2 The Cauchy distribution f (x; θ) = [π(1 + (x θ) 2 )] 1, x R Jonathan Marchini (University of Oxford) BS2a MT / 27
Course arrangements. Foundations of Statistical Inference
Course arrangements Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 013 Lectures M. and Th.10 SPR1 weeks 1-8 Classes Weeks 3-8 Friday 1-1, 4-5, 5-6
More informationProof In the CR proof. and
Question Under what conditions will we be able to attain the Cramér-Rao bound and find a MVUE? Lecture 4 - Consequences of the Cramér-Rao Lower Bound. Searching for a MVUE. Rao-Blackwell Theorem, Lehmann-Scheffé
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationMATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours
MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationChapter 5. Bayesian Statistics
Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective
More informationStatistical Theory MT 2006 Problems 4: Solution sketches
Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine
More informationCompute f(x θ)f(θ) dθ
Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationMathematical Statistics
Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationSeverity Models - Special Families of Distributions
Severity Models - Special Families of Distributions Sections 5.3-5.4 Stat 477 - Loss Models Sections 5.3-5.4 (Stat 477) Claim Severity Models Brian Hartman - BYU 1 / 1 Introduction Introduction Given that
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationPrinciples of Statistics
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationPart IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015
Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationLecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15
Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 15 Contingency table analysis North Carolina State University
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationSTATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero
STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationUniversity of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions
Name: University of Chicago Graduate School of Business Business 490: Probability Final Exam Solutions Special Notes:. This is a closed-book exam. You may use an 8 piece of paper for the formulas.. Throughout
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More information1 Introduction. P (n = 1 red ball drawn) =
Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability
More informationStatistics 3657 : Moment Generating Functions
Statistics 3657 : Moment Generating Functions A useful tool for studying sums of independent random variables is generating functions. course we consider moment generating functions. In this Definition
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 January 18, 2018 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 18, 2018 1 / 9 Sampling from a Bernoulli Distribution Theorem (Beta-Bernoulli
More informationCLASS NOTES Models, Algorithms and Data: Introduction to computing 2018
CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,
More informationMaster s Written Examination - Solution
Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationLecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)
Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationTheory and Methods of Statistical Inference. PART I Frequentist theory and methods
PhD School in Statistics cycle XXVI, 2011 Theory and Methods of Statistical Inference PART I Frequentist theory and methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution
More informationConditional distributions
Conditional distributions Will Monroe July 6, 017 with materials by Mehran Sahami and Chris Piech Independence of discrete random variables Two random variables are independent if knowing the value of
More informationPhD Qualifying Examination Department of Statistics, University of Florida
PhD Qualifying xamination Department of Statistics, University of Florida January 24, 2003, 8:00 am - 12:00 noon Instructions: 1 You have exactly four hours to answer questions in this examination 2 There
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationMarch 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS
March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS Abstract. We will introduce a class of distributions that will contain many of the discrete and continuous we are familiar with. This class will help
More informationTheory and Methods of Statistical Inference. PART I Frequentist likelihood methods
PhD School in Statistics XXV cycle, 2010 Theory and Methods of Statistical Inference PART I Frequentist likelihood methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationPART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics
Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationPARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.
PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationStat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016
Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationDistributions of Functions of Random Variables. 5.1 Functions of One Random Variable
Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique
More informationConjugate Priors: Beta and Normal Spring January 1, /15
Conjugate Priors: Beta and Normal 18.05 Spring 2014 January 1, 2017 1 /15 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = 2θ on [0,1]. Data: heads on one
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationSTATISTICS SYLLABUS UNIT I
STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution
More informationBayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification
Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma
More informationBrief Review of Probability
Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationConjugate Priors: Beta and Normal Spring 2018
Conjugate Priors: Beta and Normal 18.05 Spring 018 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = θ on [0,1]. Data: heads on one toss. Question: Find
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationEstimation of Quantiles
9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles
More informationFinal Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor.
Final Examination a STA 532: Statistical Inference Wednesday, 2015 Apr 29, 7:00 10:00pm Thisisaclosed bookexam books&phonesonthefloor Youmayuseacalculatorandtwo pagesofyourownnotes Do not share calculators
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationOverview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation
Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationFinal Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon
Final Examination Saturday, 2001 May 5, 9:00am 12:00 noon This is an open-book examination, but you may not share materials. A normal distribution table, a PMF/PDF handout, and a blank worksheet are attached
More informationMIT Spring 2016
Dr. Kempthorne Spring 2016 1 Outline Building 1 Building 2 Definition Building Let X be a random variable/vector with sample space X R q and probability model P θ. The class of probability models P = {P
More informationTheory and Methods of Statistical Inference
PhD School in Statistics cycle XXIX, 2014 Theory and Methods of Statistical Inference Instructors: B. Liseo, L. Pace, A. Salvan (course coordinator), N. Sartori, A. Tancredi, L. Ventura Syllabus Some prerequisites:
More informationContinuous random variables
Continuous random variables Continuous r.v. s take an uncountably infinite number of possible values. Examples: Heights of people Weights of apples Diameters of bolts Life lengths of light-bulbs We cannot
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationt x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.
Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae
More informationSTAT Chapter 5 Continuous Distributions
STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationProbability and Distributions
Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationContinuous Random Variables and Continuous Distributions
Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Expectation & Variance of Continuous Random Variables ( 5.2) The Uniform Random Variable
More information