The problem is to infer on the underlying probability distribution that gives rise to the data S.

Size: px

Start display at page:

Download "The problem is to infer on the underlying probability distribution that gives rise to the data S."

Ambrose Wheeler
5 years ago
Views:

1 Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to the data S. Statistical modeling Statistical analysis. Computational Methods in Inverse Problems, Mat

2 Parametric or non-parametric? Parametric problem: The underlying probability density has a specified form and depends on a number of parameters. The problem is to infer on those parameters. Non-parametric problem: No analytic expression for the probability density is available. Description consists of defining the dependency/nondependency of the data. Numerical exploration. Typical situation for parametric model: The distribution is the probability density of a random variable X : Ω R n. Parametric problem suitable for inverse problems Model for a learning process Computational Methods in Inverse Problems, Mat

3 Law of Large Numbers General result ( Statistical law of nature ): Assume that X 1, X 2,... are independent and identically distributed random variables with finite mean µ and variance σ 2. Then, almost certainly. lim n 1 ( ) X1 + X X n = µ n Almost certainly means that with probability one, lim n x j being a realization of X j. 1 ( ) x1 + x x n = µ, n Computational Methods in Inverse Problems, Mat

4 Sample Parametric model: x j realizations of Example S = { x 1, x 2,..., x N }, xj R 2. X N (x 0, Γ), with unknown mean x 0 R 2 and covariance matrix Γ R 2 2. Probability density of X: π(x x 0, Γ) = 1 exp 2πdet(Γ) 1/2 Problem: Estimate the parameters x 0 and Γ. ( 1 ) 2 (x x 0) T Γ 1 (x x 0 ). Computational Methods in Inverse Problems, Mat

5 The Law of Large Number suggests that we calculate x 0 = E { X } 1 n n x j = x 0. (1) Covariance matrix: observe that if X 1, X 2,... are i.i.d, so are f(x 1 ), f(x 2 ),... for any function f : R 2 R k. Try Γ = cov(x) = E { (X x 0 )(X x 0 ) T} E { (X x 0 )(X x 0 ) T} (2) 1 n n (x j x 0 )(x j x 0 ) T = Γ. Formulas (1) and (2) are known as empirical mean and covariance, respectively. Computational Methods in Inverse Problems, Mat

6 Case 1: Gaussian sample Computational Methods in Inverse Problems, Mat

7 Sample size N = 200. Eigenvectors of the covariance matrix: Γ = UDU T, (3) where U R 2 2 is an orthogonal matrix and D R 2 2 is a diagonal, U T = U 1. U = [ [ ] ] λ1 v 1 v 2, D =, λ 2 Γv j = λ j v,, j = 1, 2. Scaled eigenvectors, v j,scaled = 2 λ j v j, where λ j =standard deviation (STD). Computational Methods in Inverse Problems, Mat

8 Case 2: Non-Gaussian Sample Computational Methods in Inverse Problems, Mat

9 Consider the sets Estimate of normality/non-normality B α = { x R 2 π(x) α }, α > 0. If π is Gaussian, B α is an ellipse or. Calculate the integral P { X B α } = B α π(x)dx. (4) We call B α the credibility ellipse with credibility p, 0 < p < 1, if P { X B α } = p, giving α = α(p). (5) Computational Methods in Inverse Problems, Mat

10 Assume that the Gaussian density π has the center of mass and covariance matrix x 0 and Γ estimated from the sample S of size N. If S is normally distributed, # { x j B α(p) } pn. (6) Deviations due to non-normality. Computational Methods in Inverse Problems, Mat

11 How do we calculate the quantity? Eigenvalue decomposition: (x x 0 ) T Γ 1 (x x 0 ) = (x x 0 ) T UD 1 U T (x x 0 ) = D 1/2 U T (x x 0 ) 2, since U is orthogonal, i.e., U 1 = U T, and we wrote [ ] D 1/2 1/ λ1 = 1/. λ 2 We introduce the change of variables, w = f(x) = W (x x 0 ), W = D 1/2 U T. Computational Methods in Inverse Problems, Mat

12 Write the the integral in terms of the new variable w, ( 1 π(x)dx = B α 2π ( det( Γ) ) exp 1 ) 1/2 B α 2 (x x 0) T Γ 1 (x x 0 ) dx 1 = 2π ( det( Γ) ) exp ( 12 ) W (x x 1/2 0 2 dx B α = 1 exp ( 12 ) 2π w 2 dw, f(b α ) where we used the fact that dw = det(w )dx = 1 λ1 λ 2 dx = 1 det( Γ) 1/2 dx. Note: det( Γ) = det(udu T ) = det(u T UD) = det(d) = λ 1 λ 2. Computational Methods in Inverse Problems, Mat

13 The equiprobability curves for the density for w are circles centered around the origin, i.e., f(b α ) = D δ = { w R 2 w < δ } for some δ > 0. Solve δ: Integrate in radial coordinates (r, θ), 1 2π exp ( 12 ) w 2 dw = D δ δ 0 exp ( 12 ) r2 rdr = 1 exp ( 12 ) δ2 = p, implying that δ = δ(p) = 2 log ( ) 1. 1 p Computational Methods in Inverse Problems, Mat

14 To see if the sample points x j is within the confidence ellipse with confidence p, it is enough to check if the condition is valid. w j < δ(p), w j = W (x j x 0 ), 1 j N Plot p 1 N #{ x j B α(p) } Computational Methods in Inverse Problems, Mat

15 Example Gaussian Non gaussian Computational Methods in Inverse Problems, Mat

16 Matlab code N = length(s(1,:)); % Size of the sample xmean = (1/N)*(sum(S ) ); % Mean of the sample CS = S - xmean*ones(1,n); % Centered sample Gamma = 1/N*CS*CS ; % Covariance matrix % Whitening of the sample [V,D] = eig(gamma); % Eigenvalue decomposition W = diag([1/sqrt(d(1,1));1/sqrt(d(2,2))])*v ; WS = W*CS; % Whitened sample normws2 = sum(ws.^2); Computational Methods in Inverse Problems, Mat

17 % Calculating percentual amount of scatter points that are % included in the confidence ellipses rinside = zeros(11,1); rinside(11) = N; for j = 1:9 delta2 = 2*log(1/(1-j/10)); rinside(j+1) = sum(normws2<delta2); end rinside = (1/N)*rinside; plot([0:10:100],rinside, k.-, MarkerSize,12) Computational Methods in Inverse Problems, Mat

18 Which one of the following formulae? Γ = 1 N (x j x 0 )(x j x 0 ) T, or Γ = 1 N x j x T j x 0 x T 0. The former, please. Computational Methods in Inverse Problems, Mat

19 Example Calibration of a measurement instrument: Measure a dummy load whose output known Subtract from actual measurement Analyze the noise Discrete sampling; Output is a vector of length n. Noise vector x R n is a realization of X : Ω R n. Estimate mean and variance x 0 = 1 n x j (offset), n σ 2 = 1 n n (x j x 0 ) 2. Computational Methods in Inverse Problems, Mat

20 Improving Signal-to-Noise Ratio (SNR): Repeat the measurement Average Hope that the target is stationary Averaged noise: x = 1 N x (k) R n. k=1 How large must N be to reduce the noise enough? Computational Methods in Inverse Problems, Mat

21 Averaged noise x is a realization of a random variable X = 1 N X (k) R n. k=1 If X (1), X (2),... i.i.d., X is asymptotically Gaussian by Central Limit Theorem, and its variance is var(x) = σ2 N. Repeat until the variance is below a given threshold, σ 2 N < τ 2. Computational Methods in Inverse Problems, Mat

22 3 Number of averaged signals = Number of averaged signals = Computational Methods in Inverse Problems, Mat

23 3 Number of averaged signals = Number of averaged signals = Computational Methods in Inverse Problems, Mat

24 Maximum Likelihood Estimator: frequentist s approach Parametric problem, X π θ (x) = π(x θ), θ R k. Independent realizations: Assume that the observations x j are obtained independently. More precisely: X 1, X 2,..., X N i.i.d, x j is a realization of X j. Independency: π(x 1, x 2,..., x N θ) = π(x 1 θ)π(x 2 θ) π(x N θ), or, briefly, π(s θ) = N π(x j θ), Computational Methods in Inverse Problems, Mat

25 Maximum likelihood (ML) estimator of θ = parameter value that maximizes the probability of the outcome: θ ML = arg max N π(x j θ). Define L(S θ) = log(π(s θ)). Minimizer of L(S θ) = maximizer of π(s θ). Computational Methods in Inverse Problems, Mat

26 Example Gaussian model π(x x 0, σ 2 ) = ( ) 1 1 exp 2πσ 2 2σ 2 (x x 0) 2, θ = [ x0 σ 2 ] = [ θ1 ]. θ 2 Likelihood function is N π(x j θ) = ( 1 = exp 2πθ 2 ) N/2 exp 1 2θ 2 1 2θ 2 (x j θ 1 ) 2 (x j θ 1 ) 2 N 2 log ( ) 2πθ 2 = exp ( L(S θ)). Computational Methods in Inverse Problems, Mat

27 We have θ L(S θ) = L θ 1 L θ 2 = 1 2θ θ2 2 x j + N θ 2 2 θ 1 (x j θ 1 ) 2 + N 2θ 2. Setting θ L(S θ) = 0 gives x 0 = θ ML,1 = 1 N x j, σ 2 = θ ML,2 = 1 N (x j θ ML,1 ) 2. Computational Methods in Inverse Problems, Mat

28 Parametric model Example π(n θ) = θn n! e θ, sample S = {n 1,, n N }, n k N, obtained by independent sampling. The likelihood density is π(s θ) = N k=1 π(n k ) = e Nθ N k=1 θ n k n k!, and its negative logarithm is L(S θ) = log π(s θ) = ( θ nk log θ + log n k! ). k=1 Computational Methods in Inverse Problems, Mat

29 Derivative with respect to θ to zero: N θ L(S θ) = k=1 ( 1 n ) k θ = 0, (7) leading to Warning: θ ML = 1 N n k. k=1 var(n) 1 N n k 1 N n j 2, k=1 which is different from the estimate of θ ML obtained above. Computational Methods in Inverse Problems, Mat

30 Assume that θ is known a priori to be relatively large. Use Gaussian approximation: N π Poisson (n j θ) = ( ) N/2 1 exp 2πθ 1 2θ ( ) N/2 1 exp 1 2π 2 1 θ (n j θ) 2 (n j θ) 2 + N log θ. L(S θ) = 1 θ (n j θ) 2 + N log θ. Computational Methods in Inverse Problems, Mat

31 An approximation for θ ML : Minimize L(S θ) = 1 θ (n j θ) 2 + N log θ. Write or giving θ L(S θ) = 1 θ 2 (n j θ) 2 2 θ = N (n j θ) 2 2 θ (n j θ) + N θ = 0, θ(n j θ) + Nθ = Nθ 2 + Nθ n 2 j 1/ N n j n 2 j = 0,. Computational Methods in Inverse Problems, Mat

32 Example Multivariate Gaussian model, X N (x 0, Γ), where x 0 R n is unknown, Γ R n n is symmetric positive definite (SPD) and known. Model reduction: assume that x 0 depends on hidden parameters z R k through a linear equation, x 0 = Az, A R n k, z R k. (8) Model for an inverse problem: z is the true physical quantity that in the ideal case is related to the observable x 0 through the linear model (8). Computational Methods in Inverse Problems, Mat

33 Noisy observations: Obviously, and X = Az + E, E N (0, Γ). E { X } = Az + E { E } = Az = x 0, cov(x) = E { (X Az)(X Az) T} = E { EE T} = Γ. The probability density of X, given z, is π(x z) = 1 exp (2π) n/2 det(γ) 1/2 ( 1 2 (x Az)T Γ 1 (x Az) ). Computational Methods in Inverse Problems, Mat

34 Independent observations: S = { x 1,..., x N }, xj R n. Likelihood function N π(x j z) exp 1 2 (x j Az) T Γ 1 (x j Az) is maximized by minimizing L(S z) = 1 2 (x j Az) T Γ 1 (x j Az) = N 2 zt[ A T Γ 1 A ] z z T [ A T Γ 1 N x j ] x T j Γ 1 x j. Computational Methods in Inverse Problems, Mat

35 Zeroing of the gradient gives z L(S z) = N [ A T Γ 1 A ] z A T Γ 1 N x j = 0, i.e., the maximum likelihood estimator z ML is the solution of the linear system [ A T Γ 1 A ] z = A T Γ 1 x, x = 1 N x j. The solution may not exist; All depends on the properties of the model reduction matrix A R n k. Computational Methods in Inverse Problems, Mat

36 Particular case: one observation, S = { x }, L(z x) = (x Az) T Γ 1 (x Az). Eigenvalue decomposition of the covariance matrix, Γ = UDU T, or, Γ 1 = W T W, W = D 1/2 U T, we have L(z x) = W (Az x) 2. Hence, the problem reduces to a weighted least squares problem Computational Methods in Inverse Problems, Mat

Statistics: Learning models from data

DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial