Foundations of Statistical Inference

Size: px
Start display at page:

Download "Foundations of Statistical Inference"

Transcription

1 Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT / 27

2 Course arrangements Lectures M.2 and Th.10 SPR1 weeks 1-8 Classes Weeks 3-8 Friday 12-1, 2 4-5, 5-6 SPR1/2 Hand in solutions by 12noon on Wednesday SPR1. Class Tutors : Jonathan Marchini, Charlotte Greenan Notes and Problem sheets will be available at Books Garthwaite, P. H., Jolliffe, I. T. and Jones, B. (2002) Statistical Inference, Oxford Science Publications Leonard, T., Hsu, J. S. (2005) Bayesian Methods, Cambridge University Press. D. R. Cox (2006) Principals of Statistical Inference This course builds on notes from Bob Griffiths and Geoff Nicholls Jonathan Marchini (University of Oxford) BS2a MT / 27

3 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Jonathan Marchini (University of Oxford) BS2a MT / 27

4 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Jonathan Marchini (University of Oxford) BS2a MT / 27

5 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Jonathan Marchini (University of Oxford) BS2a MT / 27

6 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) Jonathan Marchini (University of Oxford) BS2a MT / 27

7 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Jonathan Marchini (University of Oxford) BS2a MT / 27

8 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Interval estimation(credible intervals, HPD intervals) Jonathan Marchini (University of Oxford) BS2a MT / 27

9 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Interval estimation(credible intervals, HPD intervals) Priors(conjugate priors, improper priors, Jeffreys prior ) Jonathan Marchini (University of Oxford) BS2a MT / 27

10 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Interval estimation(credible intervals, HPD intervals) Priors(conjugate priors, improper priors, Jeffreys prior ) Hypothesis testing (marginal likelihoods, Bayes Factors) Jonathan Marchini (University of Oxford) BS2a MT / 27

11 Part A Statistics The majority of the statistics that you have learned up to now falls under the philosophy of classical (or Frequentist) statistics. This theory makes the assumption that we can randomly take repeated samples of data from the same population. You learned about three types of statistical inference Point estimation (Maximum likelihood, bias, consistency, efficiency, information) Interval estimation(exact and approximate intervals using CLT) Hypothesis testing (Neyman-Pearson lemma, uniformly most powerful tests, generalised likelihood ratio tests) You also had an introduction to Bayesian Statistics Posterior inference (Posterior Likelihood Prior) Interval estimation(credible intervals, HPD intervals) Priors(conjugate priors, improper priors, Jeffreys prior ) Hypothesis testing (marginal likelihoods, Bayes Factors) Jonathan Marchini (University of Oxford) BS2a MT / 27

12 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Jonathan Marchini (University of Oxford) BS2a MT / 27

13 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. Jonathan Marchini (University of Oxford) BS2a MT / 27

14 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Jonathan Marchini (University of Oxford) BS2a MT / 27

15 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. Jonathan Marchini (University of Oxford) BS2a MT / 27

16 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. What are limits of how well we can estimate a parameter θ? Jonathan Marchini (University of Oxford) BS2a MT / 27

17 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. What are limits of how well we can estimate a parameter θ? Cramer-Rao inequality (and bound). Jonathan Marchini (University of Oxford) BS2a MT / 27

18 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. What are limits of how well we can estimate a parameter θ? Cramer-Rao inequality (and bound). How can we find good estimators of a parameter θ? Jonathan Marchini (University of Oxford) BS2a MT / 27

19 Frequentist inference In BS2a we develop the theory of point estimation further. Are there families of distributions about which we can make general statements? Exponential families. How can we summarise all the information in a dataset about a parameter θ? Sufficiency and the Factorization Theorem. What are limits of how well we can estimate a parameter θ? Cramer-Rao inequality (and bound). How can we find good estimators of a parameter θ? Rao-Blackwell Theorem and Lehmann-Scheffé Theorem. Jonathan Marchini (University of Oxford) BS2a MT / 27

20 Bayesian inference Parameters are treated as random variables. Inference starts by specifying a prior distribution on θ based on prior beliefs. Having collected some data we use Bayes Theorem to update our beliefs to obtain a posterior distribution. Jonathan Marchini (University of Oxford) BS2a MT / 27

21 Bayesian inference Parameters are treated as random variables. Inference starts by specifying a prior distribution on θ based on prior beliefs. Having collected some data we use Bayes Theorem to update our beliefs to obtain a posterior distribution. Quick Example Suppose I give a coin and tell you that it is bit biased. We might use a Beta(4,4) distribution to represent our beliefs about the θ. If we observe 30 heads and 10 tails we can use probability theory to infer a posterior distribution for θ of Beta(34, 14). Prior distribution Posterior distribution Jonathan Marchini (University of Oxford) BS2a MT / 27

22 Computational techniques for Bayesian inference It is not always possible to obtain an analytic solution when doing Bayesian Inference, so we study approximate computational techniques in this course. Jonathan Marchini (University of Oxford) BS2a MT / 27

23 Computational techniques for Bayesian inference It is not always possible to obtain an analytic solution when doing Bayesian Inference, so we study approximate computational techniques in this course. These include Markov chain Monte Carlo (MCMC) methods Jonathan Marchini (University of Oxford) BS2a MT / 27

24 Computational techniques for Bayesian inference It is not always possible to obtain an analytic solution when doing Bayesian Inference, so we study approximate computational techniques in this course. These include Markov chain Monte Carlo (MCMC) methods Approximations to marginal likelihoods NEW Variational Approximations Laplace approximations Bayesian Information Criterion (BIC) Jonathan Marchini (University of Oxford) BS2a MT / 27

25 Computational techniques for Bayesian inference It is not always possible to obtain an analytic solution when doing Bayesian Inference, so we study approximate computational techniques in this course. These include Markov chain Monte Carlo (MCMC) methods Approximations to marginal likelihoods NEW Variational Approximations Laplace approximations Bayesian Information Criterion (BIC) The EM algorithm NEW useful in Frequentist and Bayesian inference of missing data problems Jonathan Marchini (University of Oxford) BS2a MT / 27

26 Decision theory Quick Example You have been exposed to a deadly virus. About 1/3 of people who are exposed to the virus are infected by it, and all those infected by it die unless they receive a vaccine. By the time any symptoms of the virus show up, it is too late for the vaccine to work. You are offered a vaccine for 500. Do you take it or not? Jonathan Marchini (University of Oxford) BS2a MT / 27

27 Decision theory Quick Example You have been exposed to a deadly virus. About 1/3 of people who are exposed to the virus are infected by it, and all those infected by it die unless they receive a vaccine. By the time any symptoms of the virus show up, it is too late for the vaccine to work. You are offered a vaccine for 500. Do you take it or not? The most likely scenario is that you don t have the virus but basing a decision on this ignores the costs (or loss) associated with the decisions. We would put a very high loss on dying! Jonathan Marchini (University of Oxford) BS2a MT / 27

28 Decision theory Quick Example You have been exposed to a deadly virus. About 1/3 of people who are exposed to the virus are infected by it, and all those infected by it die unless they receive a vaccine. By the time any symptoms of the virus show up, it is too late for the vaccine to work. You are offered a vaccine for 500. Do you take it or not? The most likely scenario is that you don t have the virus but basing a decision on this ignores the costs (or loss) associated with the decisions. We would put a very high loss on dying! Decision theory allows us to formalise this problem. Jonathan Marchini (University of Oxford) BS2a MT / 27

29 Decision theory Quick Example You have been exposed to a deadly virus. About 1/3 of people who are exposed to the virus are infected by it, and all those infected by it die unless they receive a vaccine. By the time any symptoms of the virus show up, it is too late for the vaccine to work. You are offered a vaccine for 500. Do you take it or not? The most likely scenario is that you don t have the virus but basing a decision on this ignores the costs (or loss) associated with the decisions. We would put a very high loss on dying! Decision theory allows us to formalise this problem. Closely connected to Bayesian ideas. Applicable to point estimation and hypothesis testing. Inference is based on specifying a number of possible actions we might take and a loss associated with each action. The theory involves determining a rule to make decisions using an overall measure of loss. Jonathan Marchini (University of Oxford) BS2a MT / 27

30 Notation X, Y, Z Capital letters for random variables. x, y, z Lower case letters for realisations of random variables. E X ( ) Expectation with respect to the random variable X. φ = {φ 1,..., φ k } Sometimes we will use bold symbols to denote a vector of parameters. Jonathan Marchini (University of Oxford) BS2a MT / 27

31 Lecture 1 - Exponential families Jonathan Marchini (University of Oxford) BS2a MT / 27

32 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Jonathan Marchini (University of Oxford) BS2a MT / 27

33 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Jonathan Marchini (University of Oxford) BS2a MT / 27

34 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Examples 1. Normal N(θ, 1) : f (x; θ) = 1 e 1 2 (x θ)2 x R, θ R. 2π Jonathan Marchini (University of Oxford) BS2a MT / 27

35 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Examples 1. Normal N(θ, 1) : f (x; θ) = 1 e 1 2 (x θ)2 x R, θ R. 2π 2. Poisson: f (x; θ) = θx x! e θ, x = 0, 1, 2,..., θ > 0. Jonathan Marchini (University of Oxford) BS2a MT / 27

36 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Examples 1. Normal N(θ, 1) : f (x; θ) = 1 e 1 2 (x θ)2 x R, θ R. 2π 2. Poisson: f (x; θ) = θx x! e θ, x = 0, 1, 2,..., θ > Regression: f (y; θ) = n i=1 1 e 1 2 (y i p 2πσ j=1 x ij β j ) 2, y R n, σ > 0, β R p. θ = {β, σ}. Jonathan Marchini (University of Oxford) BS2a MT / 27

37 Parametric families f (x; θ), θ Θ, probability density of a random variable (rv) which could be discrete or continuous. θ can be 1-dimensional or of higher dimension. Equivalent notation:f θ (x),f (x θ),f (x, θ). Likelihood L(θ; x) = f (x; θ) and log-likelihood l(θ; x) = log(l). Examples 1. Normal N(θ, 1) : f (x; θ) = 1 e 1 2 (x θ)2 x R, θ R. 2π 2. Poisson: f (x; θ) = θx x! e θ, x = 0, 1, 2,..., θ > Regression: f (y; θ) = n i=1 1 e 1 2 (y i p 2πσ j=1 x ij β j ) 2, y R n, σ > 0, β R p. θ = {β, σ}. Jonathan Marchini (University of Oxford) BS2a MT / 27

38 Exponential families of distributions Definition 1 GJJ 2.6, DRC 2.3 A rv X belongs to a k-parameter exponential family if its probability density function (pdf) can be written as k f (x; θ) = exp A j (θ)b j (x) + C(x) + D(θ), j=1 where x χ, θ Θ, A 1 (θ),..., A k (θ), D(θ) are functions of θ alone and B 1 (x), B 2 (x),..., B k (x), C(x) are well behaved functions of x alone. Jonathan Marchini (University of Oxford) BS2a MT / 27

39 Exponential families of distributions Definition 1 GJJ 2.6, DRC 2.3 A rv X belongs to a k-parameter exponential family if its probability density function (pdf) can be written as k f (x; θ) = exp A j (θ)b j (x) + C(x) + D(θ), j=1 where x χ, θ Θ, A 1 (θ),..., A k (θ), D(θ) are functions of θ alone and B 1 (x), B 2 (x),..., B k (x), C(x) are well behaved functions of x alone. Exponential families are widely used in practice - for example in generalised linear models (see BS1a). Jonathan Marchini (University of Oxford) BS2a MT / 27

40 Example 1 : Poisson We want to put the Poisson distribution in the form (with k = 1) f (x; θ) = exp {A(θ)B(x) + C(x) + D(θ)}, Jonathan Marchini (University of Oxford) BS2a MT / 27

41 Example 1 : Poisson We want to put the Poisson distribution in the form (with k = 1) f (x; θ) = exp {A(θ)B(x) + C(x) + D(θ)}, e θ θ x θ+x log θ log x! /x! = e = exp {(log θ)x log x! θ} Jonathan Marchini (University of Oxford) BS2a MT / 27

42 Example 1 : Poisson We want to put the Poisson distribution in the form (with k = 1) f (x; θ) = exp {A(θ)B(x) + C(x) + D(θ)}, e θ θ x θ+x log θ log x! /x! = e = exp {(log θ)x log x! θ} So A(θ) = log θ, B(x) = x, C(x) = log x!, D(θ) = θ. Jonathan Marchini (University of Oxford) BS2a MT / 27

43 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x n log(1 p) Jonathan Marchini (University of Oxford) BS2a MT / 27

44 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x Pois(θ) e θ θ x /x! log θ x log(x!) θ n log(1 p) Jonathan Marchini (University of Oxford) BS2a MT / 27

45 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x Pois(θ) e θ θ x /x! log θ x log(x!) θ n log(1 p) N(µ, 1) 1 2π exp{ (x µ)2 2 } µ x x 2 /2 1 2 (µ2 log(2π)) Jonathan Marchini (University of Oxford) BS2a MT / 27

46 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x Pois(θ) e θ θ x /x! log θ x log(x!) θ n log(1 p) N(µ, 1) 1 2π exp{ (x µ)2 2 } µ x x 2 /2 1 2 (µ2 log(2π)) Exp(θ) θe θx θ x 0 log θ Jonathan Marchini (University of Oxford) BS2a MT / 27

47 Examples of 1-parameter Exponential families Binomial, Poisson, Normal, Exponential. Distn f (x; θ) A(θ) B(x) C(x) D(θ) Bin(n, p) ( n x) p x (1 p) n x log p (1 p) x log ( ) n x Pois(θ) e θ θ x /x! log θ x log(x!) θ n log(1 p) N(µ, 1) 1 2π exp{ (x µ)2 2 } µ x x 2 /2 1 2 (µ2 log(2π)) Exp(θ) θe θx θ x 0 log θ Others : negative binomial, Pareto (with known minimum), Weibull (with known shape), Laplace (with known mean), Log-normal, inverse Gaussian, beta, Dirichlet, Wishart. Exercise: check these distributions Jonathan Marchini (University of Oxford) BS2a MT / 27

48 Example 2 : a 2-parameter family (Gamma) If X Gamma(α, β) then let θ = (α, β) so f (x; θ) = βα x α 1 e βx Γ(α) Jonathan Marchini (University of Oxford) BS2a MT / 27

49 Example 2 : a 2-parameter family (Gamma) If X Gamma(α, β) then let θ = (α, β) so f (x; θ) = βα x α 1 e βx Γ(α) = exp {α log β + (α 1) log x βx log Γ(α)} Jonathan Marchini (University of Oxford) BS2a MT / 27

50 Example 2 : a 2-parameter family (Gamma) If X Gamma(α, β) then let θ = (α, β) so f (x; θ) = βα x α 1 e βx Γ(α) = exp {α log β + (α 1) log x βx log Γ(α)} = exp { (α 1) log x βx log [ Γ(α)β α]} Jonathan Marchini (University of Oxford) BS2a MT / 27

51 Example 2 : a 2-parameter family (Gamma) If X Gamma(α, β) then let θ = (α, β) so f (x; θ) = βα x α 1 e βx Γ(α) = exp {α log β + (α 1) log x βx log Γ(α)} = exp { (α 1) log x βx log [ Γ(α)β α]} And we have A 1 (θ) = α 1, B 1 (x) = log x, A 2 (θ) = β, B 2 (x) = x. Jonathan Marchini (University of Oxford) BS2a MT / 27

52 Some other 2-parameter Exponential families Distribution f (x; θ) A(θ) B(x) C(x) D(θ) N(µ, σ 2 ) e (x µ)2 2σ 2 A 2πσ 2 1 (θ) = 1/2σ 2 B 1 (x) = x log(2πσ2 ) A 2 (θ) = µ/σ 2 B 2 (x) = x µ2 /σ 2 Gamma β α x α 1 e βx Γ(α) A 1 (θ) = α 1 B 1 (x) = log x 0 log [ Γ(α)β α] A 2 (θ) = β B 2 (x) = x Jonathan Marchini (University of Oxford) BS2a MT / 27

53 Exponential family canonical form Let φ j = A j (θ), j = 1,..., k k f (x; φ) = exp φ j B j (x) + C(x) + D(φ). j=1 φ j, j = 1,..., k are the canonical parameters, B j, j = 1,..., k are the canonical observations. These are sometimes called the natural parameters and observations. Jonathan Marchini (University of Oxford) BS2a MT / 27

54 Since R f (x; φ)dx = 1 Jonathan Marchini (University of Oxford) BS2a MT / 27

55 Since we have R f (x; φ)dx = 1 R k exp φ j B j (x) + C(x) + D(φ) dx = 1 j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27

56 Since we have R f (x; φ)dx = 1 k exp φ R j B j (x) + C(x) + D(φ) dx = 1 j=1 k exp{d(φ)} exp φ R j B j (x) + C(x) dx = 1 j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27

57 Since we have R f (x; φ)dx = 1 k exp φ R j B j (x) + C(x) + D(φ) dx = 1 j=1 k exp{d(φ)} exp φ R j B j (x) + C(x) dx = 1 j=1 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27

58 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27

59 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i R j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27

60 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ R j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i j=1 k B i (x) exp φ j B j (x) + C(x) + D(φ) dx = D(φ) φ i R j=1 Jonathan Marchini (University of Oxford) BS2a MT / 27

61 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ R j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i j=1 k B i (x) exp φ j B j (x) + C(x) + D(φ) dx = D(φ) φ i R j=1 E[B i (X)] = φ i D(φ) Jonathan Marchini (University of Oxford) BS2a MT / 27

62 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ R j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i j=1 k B i (x) exp φ j B j (x) + C(x) + D(φ) dx = D(φ) φ i R j=1 E[B i (X)] = φ i D(φ) Exercise Cov[B i (X), B j (X)] = 2 φ i φ j D(φ) Jonathan Marchini (University of Oxford) BS2a MT / 27

63 k exp φ j B j (x) + C(x) dx = exp{ D(φ)} R j=1 Differentiate with respect to φ i. k B i (x) exp φ R j B j (x) + C(x) dx = D(φ) exp{ D(φ)} φ i j=1 k B i (x) exp φ j B j (x) + C(x) + D(φ) dx = D(φ) φ i R j=1 E[B i (X)] = φ i D(φ) Exercise Cov[B i (X), B j (X)] = 2 φ i φ j D(φ) Exercise Var[B i (X)] = 2 φ 2 D(φ) i Jonathan Marchini (University of Oxford) BS2a MT / 27

64 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. Jonathan Marchini (University of Oxford) BS2a MT / 27

65 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} Jonathan Marchini (University of Oxford) BS2a MT / 27

66 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x Jonathan Marchini (University of Oxford) BS2a MT / 27

67 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) Jonathan Marchini (University of Oxford) BS2a MT / 27

68 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) = (φ 2 + 1) log( φ 1 ) log Γ(φ 2 + 1) Jonathan Marchini (University of Oxford) BS2a MT / 27

69 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) = (φ 2 + 1) log( φ 1 ) log Γ(φ 2 + 1) E[X] = D(φ) = (φ 2 + 1) = α φ 1 φ 1 β Jonathan Marchini (University of Oxford) BS2a MT / 27

70 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) = (φ 2 + 1) log( φ 1 ) log Γ(φ 2 + 1) E[X] = D(φ) = (φ 2 + 1) = α φ 1 φ 1 β Var[X] = 2 φ 2 D(φ) = (φ 2 + 1) 1 φ 2 = α β 1 2 Jonathan Marchini (University of Oxford) BS2a MT / 27

71 Example 3 : Gamma We already know that if X Gamma(α, β) the E(X) = α β and Var(X) = α β 2. β α x α 1 e βx Γ(α) = exp { βx + (α 1) log x + α log β log Γ(α)} φ 1 = β, φ 2 = α 1, B 1 (x) = x, B 2 (x) = log x D(φ) = α log β log Γ(α) = (φ 2 + 1) log( φ 1 ) log Γ(φ 2 + 1) E[X] = D(φ) = (φ 2 + 1) = α φ 1 φ 1 β Var[X] = 2 φ 2 D(φ) = (φ 2 + 1) 1 φ 2 = α β 1 2 Exercise: show E[log X] = ψ 0 (α) log(β) where ψ 0 is the digamma function, and Γ (α) = Γ(α)ψ 0 (α). Jonathan Marchini (University of Oxford) BS2a MT / 27

72 Cumulant Generating Function In a scalar canonical exponential family (k = 1) E X [e sb(x) ] = exp {(φ + s)b(x) + C(x) + D(φ)} dx R = exp{d(φ) D(φ + s)} Jonathan Marchini (University of Oxford) BS2a MT / 27

73 Cumulant Generating Function In a scalar canonical exponential family (k = 1) E X [e sb(x) ] = exp {(φ + s)b(x) + C(x) + D(φ)} dx R = exp{d(φ) D(φ + s)} If M B(X) (s) is the Moment Generating Function (mgf) for B(X), then log(m B(X) (s)) = D(φ) D(φ + s) Jonathan Marchini (University of Oxford) BS2a MT / 27

74 Cumulant Generating Function In a scalar canonical exponential family (k = 1) E X [e sb(x) ] = exp {(φ + s)b(x) + C(x) + D(φ)} dx R = exp{d(φ) D(φ + s)} If M B(X) (s) is the Moment Generating Function (mgf) for B(X), then log(m B(X) (s)) = D(φ) D(φ + s) This is the cumulant generating function (defined as the log of the mgf) for the cumulants of B(X) i.e. log(m B(X) (s)) = κ r s r /r! where κ 1 = E(B(X)) and κ 2 = V (B(X)) Exercise : prove this Jonathan Marchini (University of Oxford) BS2a MT / 27 r=1

75 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. Jonathan Marchini (University of Oxford) BS2a MT / 27

76 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. φ = log p (1 p) p = eφ 1 + e φ Jonathan Marchini (University of Oxford) BS2a MT / 27

77 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. φ = log p (1 p) p = eφ 1 + e φ and B 1 (x) = x, D(φ) = n log(1 p) = n log(1 + e φ ) Jonathan Marchini (University of Oxford) BS2a MT / 27

78 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. φ = log p (1 p) p = eφ 1 + e φ and B 1 (x) = x, D(φ) = n log(1 p) = n log(1 + e φ ) log(m X (s)) = D(φ) D(φ + s) = n log(1 + e φ ) + n log(1 + e φ+s ) Jonathan Marchini (University of Oxford) BS2a MT / 27

79 Example 4 : Binomial We already know that if X Binom(n, p) then E(X) = np. φ = log p (1 p) p = eφ 1 + e φ and B 1 (x) = x, D(φ) = n log(1 p) = n log(1 + e φ ) log(m X (s)) = D(φ) D(φ + s) = n log(1 + e φ ) + n log(1 + e φ+s ) Therefore κ 1 = s log(m X (s)) = n eφ s=0 1 + e φ = np Jonathan Marchini (University of Oxford) BS2a MT / 27

80 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf f (x; θ) = θe x (1 + e x ) θ+1 Jonathan Marchini (University of Oxford) BS2a MT / 27

81 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf θe x f (x; θ) = (1 + e x ) θ+1 { ( e = exp θ log(1 + e x x ) } ) + log 1 + e x + log θ Jonathan Marchini (University of Oxford) BS2a MT / 27

82 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf θe x f (x; θ) = (1 + e x ) θ+1 { ( e = exp θ log(1 + e x x ) } ) + log 1 + e x + log θ and φ = θ, B 1 (x) = log(1 + e x ) and D(φ) = log θ = log( φ) Jonathan Marchini (University of Oxford) BS2a MT / 27

83 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf θe x f (x; θ) = (1 + e x ) θ+1 { ( e = exp θ log(1 + e x x ) } ) + log 1 + e x + log θ and φ = θ, B 1 (x) = log(1 + e x ) and D(φ) = log θ = log( φ) log(m X (s)) = D(φ) D(φ + s) = log( φ) log( (φ + s)) Jonathan Marchini (University of Oxford) BS2a MT / 27

84 Example 5 : Skew-logistic distribution Consider the real valued random variable X with pdf θe x f (x; θ) = (1 + e x ) θ+1 { ( e = exp θ log(1 + e x x ) } ) + log 1 + e x + log θ and φ = θ, B 1 (x) = log(1 + e x ) and D(φ) = log θ = log( φ) log(m X (s)) = D(φ) D(φ + s) = log( φ) log( (φ + s)) E(log(1 + e x )) = 1 = 1 φ + s s=0 θ Var(log(1 + e x 1 s=0 )) = (φ + s) 2 = 1 θ 2 These results are harder to derive directly. Jonathan Marchini (University of Oxford) BS2a MT / 27

85 Family preserved under transformations A smooth invertible transformation of a rv from the Exponential family is also within the Exponential family. If X Y, Y = Y (X) then f Y (y; θ) = f X (x(y); θ) X/ Y k = exp A j (θ)b j (x(y)) + C(x(y)) + D(θ) X/ Y, j=1 The Jacobian depends only on y and so the natural observation B(x(y)), the natural parameter A(θ), and D(θ) do not change, while C(X) C(X(Y )) + log X/ Y. Jonathan Marchini (University of Oxford) BS2a MT / 27

86 Exponential family conditions 1. The range of X, χ does not depend on θ. Jonathan Marchini (University of Oxford) BS2a MT / 27

87 Exponential family conditions 1. The range of X, χ does not depend on θ. 2. ( A 1 (θ), A 2 (θ),..., A k (θ) ) have continuous second derivatives for θ Θ. Jonathan Marchini (University of Oxford) BS2a MT / 27

88 Exponential family conditions 1. The range of X, χ does not depend on θ. 2. ( A 1 (θ), A 2 (θ),..., A k (θ) ) have continuous second derivatives for θ Θ. 3. If dim Θ = d, then the Jacobian [ ] Ai (θ) J(θ) = θ j has full rank d for θ Θ. Jonathan Marchini (University of Oxford) BS2a MT / 27

89 Exponential family conditions 1. The range of X, χ does not depend on θ. 2. ( A 1 (θ), A 2 (θ),..., A k (θ) ) have continuous second derivatives for θ Θ. 3. If dim Θ = d, then the Jacobian [ ] Ai (θ) J(θ) = θ j has full rank d for θ Θ. [Part of the regularity conditions which we cite later.] Jonathan Marchini (University of Oxford) BS2a MT / 27

90 The family is minimal if no linear combination exists such that for all x χ. λ 0 + λ 1 B 1 (x) + + λ k B k (x) = 0 Jonathan Marchini (University of Oxford) BS2a MT / 27

91 The family is minimal if no linear combination exists such that λ 0 + λ 1 B 1 (x) + + λ k B k (x) = 0 for all x χ. The dimension of the family is defined to be d, the rank of J(θ). Jonathan Marchini (University of Oxford) BS2a MT / 27

92 The family is minimal if no linear combination exists such that for all x χ. λ 0 + λ 1 B 1 (x) + + λ k B k (x) = 0 The dimension of the family is defined to be d, the rank of J(θ). A minimal exponential family is said to be curved when d < k and linear when d = k. We refer to a (k, d) curved exponential family. Jonathan Marchini (University of Oxford) BS2a MT / 27

93 The family is minimal if no linear combination exists such that for all x χ. λ 0 + λ 1 B 1 (x) + + λ k B k (x) = 0 The dimension of the family is defined to be d, the rank of J(θ). A minimal exponential family is said to be curved when d < k and linear when d = k. We refer to a (k, d) curved exponential family. Example 6 (X 1, X 2 ) independent, normal, unit variance, means (θ, c/θ), c known. log f (x; θ) = x 1 θ + cx 2 /θ θ 2 /2 c 2 θ 2 / is a (2, 1) curved exponential family. Jonathan Marchini (University of Oxford) BS2a MT / 27

94 Linear relations among A s do not generate curvature Take a k-parameter exponential family and impose A 1 = aa 2 + b for constants a and b. k A i B i + C + D = i=1 k A i B i + A 2 (ab 1 + B 2 ) + (C + bb 1 ) + D i=3 This gives a k 1 parameter EF, not a (k, k 1)-CEF. Jonathan Marchini (University of Oxford) BS2a MT / 27

95 Examples not in an exponential family 1 Uniform on [0, θ], θ > 0. f (x; θ) = 1, x [0, θ] θ > 0 θ Jonathan Marchini (University of Oxford) BS2a MT / 27

96 Examples not in an exponential family 1 Uniform on [0, θ], θ > 0. f (x; θ) = 1, x [0, θ] θ > 0 θ 2 The Cauchy distribution f (x; θ) = [π(1 + (x θ) 2 )] 1, x R Jonathan Marchini (University of Oxford) BS2a MT / 27

Course arrangements. Foundations of Statistical Inference

Course arrangements. Foundations of Statistical Inference Course arrangements Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 013 Lectures M. and Th.10 SPR1 weeks 1-8 Classes Weeks 3-8 Friday 1-1, 4-5, 5-6

More information

Proof In the CR proof. and

Proof In the CR proof. and Question Under what conditions will we be able to attain the Cramér-Rao bound and find a MVUE? Lecture 4 - Consequences of the Cramér-Rao Lower Bound. Searching for a MVUE. Rao-Blackwell Theorem, Lehmann-Scheffé

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours MATH2750 This question paper consists of 8 printed pages, each of which is identified by the reference MATH275. All calculators must carry an approval sticker issued by the School of Mathematics. c UNIVERSITY

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

Statistical Theory MT 2006 Problems 4: Solution sketches

Statistical Theory MT 2006 Problems 4: Solution sketches Statistical Theory MT 006 Problems 4: Solution sketches 1. Suppose that X has a Poisson distribution with unknown mean θ. Determine the conjugate prior, and associate posterior distribution, for θ. Determine

More information

Compute f(x θ)f(θ) dθ

Compute f(x θ)f(θ) dθ Bayesian Updating: Continuous Priors 18.05 Spring 2014 b a Compute f(x θ)f(θ) dθ January 1, 2017 1 /26 Beta distribution Beta(a, b) has density (a + b 1)! f (θ) = θ a 1 (1 θ) b 1 (a 1)!(b 1)! http://mathlets.org/mathlets/beta-distribution/

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Mathematical Statistics

Mathematical Statistics Mathematical Statistics Chapter Three. Point Estimation 3.4 Uniformly Minimum Variance Unbiased Estimator(UMVUE) Criteria for Best Estimators MSE Criterion Let F = {p(x; θ) : θ Θ} be a parametric distribution

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Severity Models - Special Families of Distributions

Severity Models - Special Families of Distributions Severity Models - Special Families of Distributions Sections 5.3-5.4 Stat 477 - Loss Models Sections 5.3-5.4 (Stat 477) Claim Severity Models Brian Hartman - BYU 1 / 1 Introduction Introduction Given that

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 10 Alternatives to Monte Carlo Computation Since about 1990, Markov chain Monte Carlo has been the dominant

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Principles of Statistics

Principles of Statistics Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15

Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15 Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 15 Contingency table analysis North Carolina State University

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

University of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions

University of Chicago Graduate School of Business. Business 41901: Probability Final Exam Solutions Name: University of Chicago Graduate School of Business Business 490: Probability Final Exam Solutions Special Notes:. This is a closed-book exam. You may use an 8 piece of paper for the formulas.. Throughout

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

1 Introduction. P (n = 1 red ball drawn) =

1 Introduction. P (n = 1 red ball drawn) = Introduction Exercises and outline solutions. Y has a pack of 4 cards (Ace and Queen of clubs, Ace and Queen of Hearts) from which he deals a random of selection 2 to player X. What is the probability

More information

Statistics 3657 : Moment Generating Functions

Statistics 3657 : Moment Generating Functions Statistics 3657 : Moment Generating Functions A useful tool for studying sums of independent random variables is generating functions. course we consider moment generating functions. In this Definition

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Classical and Bayesian inference

Classical and Bayesian inference Classical and Bayesian inference AMS 132 January 18, 2018 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 18, 2018 1 / 9 Sampling from a Bernoulli Distribution Theorem (Beta-Bernoulli

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods PhD School in Statistics cycle XXVI, 2011 Theory and Methods of Statistical Inference PART I Frequentist theory and methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution

More information

Conditional distributions

Conditional distributions Conditional distributions Will Monroe July 6, 017 with materials by Mehran Sahami and Chris Piech Independence of discrete random variables Two random variables are independent if knowing the value of

More information

PhD Qualifying Examination Department of Statistics, University of Florida

PhD Qualifying Examination Department of Statistics, University of Florida PhD Qualifying xamination Department of Statistics, University of Florida January 24, 2003, 8:00 am - 12:00 noon Instructions: 1 You have exactly four hours to answer questions in this examination 2 There

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS

March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS Abstract. We will introduce a class of distributions that will contain many of the discrete and continuous we are familiar with. This class will help

More information

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods PhD School in Statistics XXV cycle, 2010 Theory and Methods of Statistical Inference PART I Frequentist likelihood methods (A. Salvan, N. Sartori, L. Pace) Syllabus Some prerequisites: Empirical distribution

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

Conjugate Priors: Beta and Normal Spring January 1, /15

Conjugate Priors: Beta and Normal Spring January 1, /15 Conjugate Priors: Beta and Normal 18.05 Spring 2014 January 1, 2017 1 /15 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = 2θ on [0,1]. Data: heads on one

More information

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification

Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Bayesian Statistics Part III: Building Bayes Theorem Part IV: Prior Specification Michael Anderson, PhD Hélène Carabin, DVM, PhD Department of Biostatistics and Epidemiology The University of Oklahoma

More information

Brief Review of Probability

Brief Review of Probability Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Conjugate Priors: Beta and Normal Spring 2018

Conjugate Priors: Beta and Normal Spring 2018 Conjugate Priors: Beta and Normal 18.05 Spring 018 Review: Continuous priors, discrete data Bent coin: unknown probability θ of heads. Prior f (θ) = θ on [0,1]. Data: heads on one toss. Question: Find

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Estimation of Quantiles

Estimation of Quantiles 9 Estimation of Quantiles The notion of quantiles was introduced in Section 3.2: recall that a quantile x α for an r.v. X is a constant such that P(X x α )=1 α. (9.1) In this chapter we examine quantiles

More information

Final Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor.

Final Examination a. STA 532: Statistical Inference. Wednesday, 2015 Apr 29, 7:00 10:00pm. Thisisaclosed bookexam books&phonesonthefloor. Final Examination a STA 532: Statistical Inference Wednesday, 2015 Apr 29, 7:00 10:00pm Thisisaclosed bookexam books&phonesonthefloor Youmayuseacalculatorandtwo pagesofyourownnotes Do not share calculators

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Overview Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation Probabilistic Interpretation: Linear Regression Assume output y is generated

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon Final Examination Saturday, 2001 May 5, 9:00am 12:00 noon This is an open-book examination, but you may not share materials. A normal distribution table, a PMF/PDF handout, and a blank worksheet are attached

More information

MIT Spring 2016

MIT Spring 2016 Dr. Kempthorne Spring 2016 1 Outline Building 1 Building 2 Definition Building Let X be a random variable/vector with sample space X R q and probability model P θ. The class of probability models P = {P

More information

Theory and Methods of Statistical Inference

Theory and Methods of Statistical Inference PhD School in Statistics cycle XXIX, 2014 Theory and Methods of Statistical Inference Instructors: B. Liseo, L. Pace, A. Salvan (course coordinator), N. Sartori, A. Tancredi, L. Ventura Syllabus Some prerequisites:

More information

Continuous random variables

Continuous random variables Continuous random variables Continuous r.v. s take an uncountably infinite number of possible values. Examples: Heights of people Weights of apples Diameters of bolts Life lengths of light-bulbs We cannot

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Bayesian Inference: Posterior Intervals

Bayesian Inference: Posterior Intervals Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

STAT Chapter 5 Continuous Distributions

STAT Chapter 5 Continuous Distributions STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Continuous Random Variables and Continuous Distributions

Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Continuous Random Variables and Continuous Distributions Expectation & Variance of Continuous Random Variables ( 5.2) The Uniform Random Variable

More information