Outline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson.

Size: px

Start display at page:

Download "Outline. Motivation Contest Sample. Estimator. Loss. Standard Error. Prior Pseudo-Data. Bayesian Estimator. Estimators. John Dodson."

Jocelin Thomas
5 years ago
Views:

1 s s Practitioner Course: Portfolio Optimization September 24, 2008

2 s

3 The Goal of s The goal of estimation is to assign numerical values to the parameters of a probability model. Considerations There are several risks to consider What if the model is mis-specified? What if the data is corrupt? These are addressed under the subject of robust statistics, which Meucci covers well, but which we will not be covering in this module

4 ! s To motivate the discussion, I am going to pose a challenge. Challenge There is a sample of U(0, θ) for some unknown (to you) parameter θ at fm503/docs/case1.dat Provide an interval estimate for θ The team whose interval included the true θ and is the narrowest wins Hint: You can load the sample into MATLAB with the command sample=sscanf(urlread( <URL> ), %f );

5 s In classical statistics, the term sample has two related meanings an (unordered) set of N values drawn from the sample space of some random variable X a random variable consisting of N (independent) copies of some random variable X You can think of the former as a realization of the latter. We can characterize the latter version of a sample, which we will denote hereafter by Y N = (X 1,..., X N ), as a random variable with f YN (Y ) = f X (X 1 ) f X (X N ) because we have assumed that the draws are independent.

6 The characterization of the sample Y N can often be expressed as the characterization of a collection of partial results, T N = T (X 1,..., X N ) called sufficient statistics. Important Example Say X N (µ, σ 2 ) and we have a sample Y N = (X 1,..., X N ). The density function of the sample is f YN (Y ) = (2π σ 2 ) N/2 e 1 2 σ 2 N i=1 (X i µ) 2 The form of this suggests T N = ( X i, X 2 i ) which yields ( N T2 T 2 ) (N 3)/2 1 f TN (T ) = N N/2 1 2 N/2 π Γ ( ) N 1 2 ( ( ) ) 1 N/2 exp σ 2 T2 N 2 T1 N µ + µ2 + log σ 2 ( ) s

7 Classical An estimator is a function of a sample. If the sample is considered to be random, the value of an estimator is a random variable subject to characterization If the estimator is applied to an actual sample, consisting of N draws from the sample space, the value of the estimator is called an estimate. Parameter We will be mostly interested in estimating the parameters of a characterization, which we will denote generically by θ. For a univariate normal, for example, θ = ( µ, σ 2). We will denote the parameter estimator by ˆθ (Y N ) where Y = (X 1,..., X N ) is the sample represented by N independent copies of the random variable X with a characterization parameterized by θ. s

8 Since ˆθ (Y N ) is a random variable, it is natural to explore its location and dispersion In particular, we are interested in how far it can diverge from the (unknown) true value, θ So we introduce a norm with respect to some positive definite metric Q, such that v 2 = v Q v for any v in the sample space of θ is the random variable ˆθ θ 2 Bias is the (unknown) value Eˆθ θ Inefficiency is the value E ˆθ Eˆθ 2 There is a usually a trade-off between bias and inefficiency. In fact, E = Bias 2 + Inef 2 s

9 (MLE) Since we have the distribution of the sample, perhaps in terms of sufficient statistics, it is natural to define an estimator for the parameters as the value of the parameters such that the sample observed is most likely 1. That is, ˆθ(y) = arg max f YN θ(y) θ = arg max f TN θ(t) θ where the sample is y = (x 1,..., x N ) or t = T (x 1,..., x N ). Important Example Consider the univariate normal from above. In terms of the sufficient statistics, the MLE (based on ( )) is ( ˆµˆσ 2 ) 1 ( = arg min (µ,σ 2 ) σ 2 t2 N 2 t1 N µ + µ2) + log σ 2 or s 1 This does not guarantee that the observation equals the mode.

10 (MLE) Important Example The solution to this (the MLE for a univariate normal) is ˆµ = t 1 N ˆσ 2 = t ( 2 N t1 N = x ) 2 = x x x x This result extends to the mutivariate case X R M whereby x has M rows and N columns. Bias We can see that the MLE is (slightly) biased. s E ˆµ = µ E ˆσ 2 = N 1 N σ2

11 Fisher Information In general we cannot evaluate the characterization of the distribution of an estimator. An application of the Central Limit Theorem gives us a useful approximation. lim N ( N (ˆθ (Y N ) θ) N where I is the Fisher Information Matrix I X θ = cov θ log f X θ(x ) = E 2 θ θ log f X θ(x ) Important Example For the univariate normal, this evaluates to ( 1 ) I X (µ,σ 2 ) = 0 σ σ 4 0, I 1 X θ ) s

12 Cramér-Rao Bound The Cramér-Rao Bound gives us a limit on the resolution of a classical estimator. 1 X θ cov ˆθ (Y N ) Eˆθ θ I N Eˆθ θ which is obtained if the estimator is efficient. The standard deviations of the margins of the estimator are called the standard errors se(ˆθ) = diag diag diag cov ˆθ In the case of the univariate normal example, this is ( ) ( σ ) N se ˆµˆσ 2 σ 2 N 1 N/2 N s

13 In estimation we do not endow the sample with a characterization; rather, we endow the parameters with a characterization, described by hyper-parameters. This is the prior characterization. We then update the characterization by conditioning on the observed data using Bayes Rule, which leads to the posterior characterization from which we can build estimates. f θ YN (θ) f θ (θ) f YN θ(y N ) The approach is inherently biased. The prior is ideally based on beliefs about the results before any data have been observed The approach is appropriate when the statistician is also a subject matter expert (such as you) s

14 In principle, the characterization of the parameter prior can be completely arbitrary. Conjugate But a judicious choice exists, which is to choose a prior from a family that this closed under updates. Such a prior is termed the conjugate prior. With a conjugate prior, updating can be expressed as an algebraic transformation of the hyper-parameters, involving the prior values, the sufficient statistics, and the sample size Improper It can be useful to consider a prior that has no information, for example a prior whose density is uniform over the sample space of the hyper-parameters. This is termed an improper prior. s

15 Important Example There is a conjugate prior for the univariate normal. It has a somewhat complicated form, but it is nonetheless very useful and worth learning. It is termed the normal-inverse Gamma distribution, and it is defined by the mixture ) µ σ 2 N (µ 0, σ2 λ 0 1 σ 2 G ( ν 0, 1 σ 2 0 The posterior is in the same family with updated parameters based on the sufficient statistics t 1, t 2 λ N = λ 0 + N ν N = ν 0 + N µ N = λ 0 µ 0 + t 1 λ 0 + N σ 2 N = σ2 0 + λ 0 µ t 2 (λ 0 µ 0 + t 1 ) 2 λ 0 + N and this can be generalized to the multinormal setting. ) s

16 Pseudo-data A useful application of the conjugate prior is to imagine that prior itself is a posterior with respect to some imaginary (random) dataset Ỹ = ( X 1,..., XÑ) where each X i are drawn independently from a known distribution representing our beliefs. f θ YN (θ) f YN θ(y) f θ (θ) ( ) = f YN θ(y) f YÑ θ(ỹ ) fθ im (θ) ) = f YN+Ñ (y θ Ỹ fθ im (θ) Application If we want to simulate from the posterior, we simply append the actual sample with a pseudo-sample of variates drawn from the characterization of X, then proceed with classical estimation (e.g. MLE). s

17 Once we know the posterior distribution f θ YN, we still need to provide a result. Some naïve approached include mode and modal dispersion, ˆθ = arg max f θ YN mean and covariance, ˆθ = E (θ Y N ) A more sophisticated approach is to find the minimum-measure region for a given level of confidence 1 α. In the univariate setting, this would be (θ 0, θ 1 ) for arg min θ 0 θ 1 (θ 0 ) θ 0 and θ 1 (θ 0 ) = Q θ YN ( Fθ YN (θ 0 ) + α ). Function The ideal approach is to find a parameter value that minimizes the expected value of some loss function customized to the subsequent application. s

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. January 28, 2015

MFM Practitioner Module: Risk & Asset Allocation Estimator January 28, 2015 Estimator Estimator Review: tells us how to reverse the roles in conditional probability. f Y X {x} (y) f Y (y)f X Y {y} (x)