The problem is to infer on the underlying probability distribution that gives rise to the data S.

Similar documents
Statistics: Learning models from data

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

Pattern Recognition. Parameter Estimation of Probability Density Functions

Bayesian Decision Theory

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Announcements (repeat) Principal Components Analysis

Multivariate Analysis and Likelihood Inference

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Cheng Soon Ong & Christian Walder. Canberra February June 2018

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Introduction to Bayesian methods in inverse problems

Bayes Decision Theory

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Modern Navigation. Thomas Herring

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Review (Probability & Linear Algebra)

GARCH Models Estimation and Inference

GAUSSIAN PROCESS REGRESSION

COS513 LECTURE 8 STATISTICAL CONCEPTS

DS-GA 1002 Lecture notes 12 Fall Linear regression

Variational Principal Components

y Xw 2 2 y Xw λ w 2 2

Review of probabilities

Lecture 2: Repetition of probability theory and statistics

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

If we want to analyze experimental or simulated data we might encounter the following tasks:

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Machine learning - HT Maximum Likelihood

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Lecture 1: Introduction

The Gaussian distribution

Covariance function estimation in Gaussian process regression

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

7. Statistical estimation

Density Estimation. Seungjin Choi

Stat 5101 Lecture Notes

Density Estimation: ML, MAP, Bayesian estimation

System Identification, Lecture 4

6.867 Machine Learning

System Identification, Lecture 4

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Frequentist-Bayesian Model Comparisons: A Simple Example

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

Gaussian Processes for Machine Learning

Linear Regression Linear Regression with Shrinkage

The Expectation-Maximization Algorithm

. Find E(V ) and var(v ).

Finite Singular Multivariate Gaussian Mixture

Linear Methods for Prediction

Gaussian Process Regression

Localization of Radioactive Sources Zhifei Zhang

Econometrics I, Estimation

Quick Tour of Basic Probability Theory and Linear Algebra

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

An Introduction to Bayesian Linear Regression

Computational functional genomics

Lecture 5: GPs and Streaming regression

3. Probability and Statistics

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

10. Linear Models and Maximum Likelihood Estimation

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

Fundamentals of Unconstrained Optimization

Statistical Data Analysis Stat 3: p-values, parameter estimation

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Mathematical Formulation of Our Example

Notes on the Multivariate Normal and Related Topics

MACHINE LEARNING ADVANCED MACHINE LEARNING

The generative approach to classification. A classification problem. Generative models CSE 250B

Classification. Chapter Introduction. 6.2 The Bayes classifier

Asymptotic Statistics-VI. Changliang Zou

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Mathematical statistics

On prediction and density estimation Peter McCullagh University of Chicago December 2004

GARCH Models Estimation and Inference

STAT215: Solutions for Homework 2

Linear Regression and Its Applications

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

4 Gaussian Mixture Models

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS

Machine Learning 4771

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Unsupervised Learning: Dimensionality Reduction

Transcription:

Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to the data S. Statistical modeling Statistical analysis. Computational Methods in Inverse Problems, Mat 1.3626 0-0

Parametric or non-parametric? Parametric problem: The underlying probability density has a specified form and depends on a number of parameters. The problem is to infer on those parameters. Non-parametric problem: No analytic expression for the probability density is available. Description consists of defining the dependency/nondependency of the data. Numerical exploration. Typical situation for parametric model: The distribution is the probability density of a random variable X : Ω R n. Parametric problem suitable for inverse problems Model for a learning process Computational Methods in Inverse Problems, Mat 1.3626 0-1

Law of Large Numbers General result ( Statistical law of nature ): Assume that X 1, X 2,... are independent and identically distributed random variables with finite mean µ and variance σ 2. Then, almost certainly. lim n 1 ( ) X1 + X 2 + + X n = µ n Almost certainly means that with probability one, lim n x j being a realization of X j. 1 ( ) x1 + x 2 + + x n = µ, n Computational Methods in Inverse Problems, Mat 1.3626 0-2

Sample Parametric model: x j realizations of Example S = { x 1, x 2,..., x N }, xj R 2. X N (x 0, Γ), with unknown mean x 0 R 2 and covariance matrix Γ R 2 2. Probability density of X: π(x x 0, Γ) = 1 exp 2πdet(Γ) 1/2 Problem: Estimate the parameters x 0 and Γ. ( 1 ) 2 (x x 0) T Γ 1 (x x 0 ). Computational Methods in Inverse Problems, Mat 1.3626 0-3

The Law of Large Number suggests that we calculate x 0 = E { X } 1 n n x j = x 0. (1) Covariance matrix: observe that if X 1, X 2,... are i.i.d, so are f(x 1 ), f(x 2 ),... for any function f : R 2 R k. Try Γ = cov(x) = E { (X x 0 )(X x 0 ) T} E { (X x 0 )(X x 0 ) T} (2) 1 n n (x j x 0 )(x j x 0 ) T = Γ. Formulas (1) and (2) are known as empirical mean and covariance, respectively. Computational Methods in Inverse Problems, Mat 1.3626 0-4

Case 1: Gaussian sample 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 4 5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 4 5 Computational Methods in Inverse Problems, Mat 1.3626 0-5

Sample size N = 200. Eigenvectors of the covariance matrix: Γ = UDU T, (3) where U R 2 2 is an orthogonal matrix and D R 2 2 is a diagonal, U T = U 1. U = [ [ ] ] λ1 v 1 v 2, D =, λ 2 Γv j = λ j v,, j = 1, 2. Scaled eigenvectors, v j,scaled = 2 λ j v j, where λ j =standard deviation (STD). Computational Methods in Inverse Problems, Mat 1.3626 0-6

Case 2: Non-Gaussian Sample 1.5 1.5 1 1 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 1 0.5 0 0.5 1 1.5 1.5 1.5 1 0.5 0 0.5 1 1.5 Computational Methods in Inverse Problems, Mat 1.3626 0-7

Consider the sets Estimate of normality/non-normality B α = { x R 2 π(x) α }, α > 0. If π is Gaussian, B α is an ellipse or. Calculate the integral P { X B α } = B α π(x)dx. (4) We call B α the credibility ellipse with credibility p, 0 < p < 1, if P { X B α } = p, giving α = α(p). (5) Computational Methods in Inverse Problems, Mat 1.3626 0-8

Assume that the Gaussian density π has the center of mass and covariance matrix x 0 and Γ estimated from the sample S of size N. If S is normally distributed, # { x j B α(p) } pn. (6) Deviations due to non-normality. Computational Methods in Inverse Problems, Mat 1.3626 0-9

How do we calculate the quantity? Eigenvalue decomposition: (x x 0 ) T Γ 1 (x x 0 ) = (x x 0 ) T UD 1 U T (x x 0 ) = D 1/2 U T (x x 0 ) 2, since U is orthogonal, i.e., U 1 = U T, and we wrote [ ] D 1/2 1/ λ1 = 1/. λ 2 We introduce the change of variables, w = f(x) = W (x x 0 ), W = D 1/2 U T. Computational Methods in Inverse Problems, Mat 1.3626 0-10

Write the the integral in terms of the new variable w, ( 1 π(x)dx = B α 2π ( det( Γ) ) exp 1 ) 1/2 B α 2 (x x 0) T Γ 1 (x x 0 ) dx 1 = 2π ( det( Γ) ) exp ( 12 ) W (x x 1/2 0 2 dx B α = 1 exp ( 12 ) 2π w 2 dw, f(b α ) where we used the fact that dw = det(w )dx = 1 λ1 λ 2 dx = 1 det( Γ) 1/2 dx. Note: det( Γ) = det(udu T ) = det(u T UD) = det(d) = λ 1 λ 2. Computational Methods in Inverse Problems, Mat 1.3626 0-11

The equiprobability curves for the density for w are circles centered around the origin, i.e., f(b α ) = D δ = { w R 2 w < δ } for some δ > 0. Solve δ: Integrate in radial coordinates (r, θ), 1 2π exp ( 12 ) w 2 dw = D δ δ 0 exp ( 12 ) r2 rdr = 1 exp ( 12 ) δ2 = p, implying that δ = δ(p) = 2 log ( ) 1. 1 p Computational Methods in Inverse Problems, Mat 1.3626 0-12

To see if the sample points x j is within the confidence ellipse with confidence p, it is enough to check if the condition is valid. w j < δ(p), w j = W (x j x 0 ), 1 j N Plot p 1 N #{ x j B α(p) } Computational Methods in Inverse Problems, Mat 1.3626 0-13

Example 1 0.9 Gaussian Non gaussian 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 Computational Methods in Inverse Problems, Mat 1.3626 0-14

Matlab code N = length(s(1,:)); % Size of the sample xmean = (1/N)*(sum(S ) ); % Mean of the sample CS = S - xmean*ones(1,n); % Centered sample Gamma = 1/N*CS*CS ; % Covariance matrix % Whitening of the sample [V,D] = eig(gamma); % Eigenvalue decomposition W = diag([1/sqrt(d(1,1));1/sqrt(d(2,2))])*v ; WS = W*CS; % Whitened sample normws2 = sum(ws.^2); Computational Methods in Inverse Problems, Mat 1.3626 0-15

% Calculating percentual amount of scatter points that are % included in the confidence ellipses rinside = zeros(11,1); rinside(11) = N; for j = 1:9 delta2 = 2*log(1/(1-j/10)); rinside(j+1) = sum(normws2<delta2); end rinside = (1/N)*rinside; plot([0:10:100],rinside, k.-, MarkerSize,12) Computational Methods in Inverse Problems, Mat 1.3626 0-16

Which one of the following formulae? Γ = 1 N (x j x 0 )(x j x 0 ) T, or Γ = 1 N x j x T j x 0 x T 0. The former, please. Computational Methods in Inverse Problems, Mat 1.3626 0-17

Example Calibration of a measurement instrument: Measure a dummy load whose output known Subtract from actual measurement Analyze the noise Discrete sampling; Output is a vector of length n. Noise vector x R n is a realization of X : Ω R n. Estimate mean and variance x 0 = 1 n x j (offset), n σ 2 = 1 n n (x j x 0 ) 2. Computational Methods in Inverse Problems, Mat 1.3626 0-18

Improving Signal-to-Noise Ratio (SNR): Repeat the measurement Average Hope that the target is stationary Averaged noise: x = 1 N x (k) R n. k=1 How large must N be to reduce the noise enough? Computational Methods in Inverse Problems, Mat 1.3626 0-19

Averaged noise x is a realization of a random variable X = 1 N X (k) R n. k=1 If X (1), X (2),... i.i.d., X is asymptotically Gaussian by Central Limit Theorem, and its variance is var(x) = σ2 N. Repeat until the variance is below a given threshold, σ 2 N < τ 2. Computational Methods in Inverse Problems, Mat 1.3626 0-20

3 Number of averaged signals = 1 2 1 0 1 2 3 0 10 20 30 40 50 3 2 1 0 1 2 Number of averaged signals = 5 3 0 10 20 30 40 50 Computational Methods in Inverse Problems, Mat 1.3626 0-21

3 Number of averaged signals = 10 2 1 0 1 2 3 0 10 20 30 40 50 3 2 1 0 1 2 Number of averaged signals = 25 3 0 10 20 30 40 50 Computational Methods in Inverse Problems, Mat 1.3626 0-22

Maximum Likelihood Estimator: frequentist s approach Parametric problem, X π θ (x) = π(x θ), θ R k. Independent realizations: Assume that the observations x j are obtained independently. More precisely: X 1, X 2,..., X N i.i.d, x j is a realization of X j. Independency: π(x 1, x 2,..., x N θ) = π(x 1 θ)π(x 2 θ) π(x N θ), or, briefly, π(s θ) = N π(x j θ), Computational Methods in Inverse Problems, Mat 1.3626 0-23

Maximum likelihood (ML) estimator of θ = parameter value that maximizes the probability of the outcome: θ ML = arg max N π(x j θ). Define L(S θ) = log(π(s θ)). Minimizer of L(S θ) = maximizer of π(s θ). Computational Methods in Inverse Problems, Mat 1.3626 0-24

Example Gaussian model π(x x 0, σ 2 ) = ( ) 1 1 exp 2πσ 2 2σ 2 (x x 0) 2, θ = [ x0 σ 2 ] = [ θ1 ]. θ 2 Likelihood function is N π(x j θ) = ( 1 = exp 2πθ 2 ) N/2 exp 1 2θ 2 1 2θ 2 (x j θ 1 ) 2 (x j θ 1 ) 2 N 2 log ( ) 2πθ 2 = exp ( L(S θ)). Computational Methods in Inverse Problems, Mat 1.3626 0-25

We have θ L(S θ) = L θ 1 L θ 2 = 1 2θ 2 2 1 θ2 2 x j + N θ 2 2 θ 1 (x j θ 1 ) 2 + N 2θ 2. Setting θ L(S θ) = 0 gives x 0 = θ ML,1 = 1 N x j, σ 2 = θ ML,2 = 1 N (x j θ ML,1 ) 2. Computational Methods in Inverse Problems, Mat 1.3626 0-26

Parametric model Example π(n θ) = θn n! e θ, sample S = {n 1,, n N }, n k N, obtained by independent sampling. The likelihood density is π(s θ) = N k=1 π(n k ) = e Nθ N k=1 θ n k n k!, and its negative logarithm is L(S θ) = log π(s θ) = ( θ nk log θ + log n k! ). k=1 Computational Methods in Inverse Problems, Mat 1.3626 0-27

Derivative with respect to θ to zero: N θ L(S θ) = k=1 ( 1 n ) k θ = 0, (7) leading to Warning: θ ML = 1 N n k. k=1 var(n) 1 N n k 1 N n j 2, k=1 which is different from the estimate of θ ML obtained above. Computational Methods in Inverse Problems, Mat 1.3626 0-28

Assume that θ is known a priori to be relatively large. Use Gaussian approximation: N π Poisson (n j θ) = ( ) N/2 1 exp 2πθ 1 2θ ( ) N/2 1 exp 1 2π 2 1 θ (n j θ) 2 (n j θ) 2 + N log θ. L(S θ) = 1 θ (n j θ) 2 + N log θ. Computational Methods in Inverse Problems, Mat 1.3626 0-29

An approximation for θ ML : Minimize L(S θ) = 1 θ (n j θ) 2 + N log θ. Write or giving θ L(S θ) = 1 θ 2 (n j θ) 2 2 θ = 1 4 + 1 N (n j θ) 2 2 θ (n j θ) + N θ = 0, θ(n j θ) + Nθ = Nθ 2 + Nθ n 2 j 1/2 1 2 1 N n j n 2 j = 0,. Computational Methods in Inverse Problems, Mat 1.3626 0-30

Example Multivariate Gaussian model, X N (x 0, Γ), where x 0 R n is unknown, Γ R n n is symmetric positive definite (SPD) and known. Model reduction: assume that x 0 depends on hidden parameters z R k through a linear equation, x 0 = Az, A R n k, z R k. (8) Model for an inverse problem: z is the true physical quantity that in the ideal case is related to the observable x 0 through the linear model (8). Computational Methods in Inverse Problems, Mat 1.3626 0-31

Noisy observations: Obviously, and X = Az + E, E N (0, Γ). E { X } = Az + E { E } = Az = x 0, cov(x) = E { (X Az)(X Az) T} = E { EE T} = Γ. The probability density of X, given z, is π(x z) = 1 exp (2π) n/2 det(γ) 1/2 ( 1 2 (x Az)T Γ 1 (x Az) ). Computational Methods in Inverse Problems, Mat 1.3626 0-32

Independent observations: S = { x 1,..., x N }, xj R n. Likelihood function N π(x j z) exp 1 2 (x j Az) T Γ 1 (x j Az) is maximized by minimizing L(S z) = 1 2 (x j Az) T Γ 1 (x j Az) = N 2 zt[ A T Γ 1 A ] z z T [ A T Γ 1 N x j ] + 1 2 x T j Γ 1 x j. Computational Methods in Inverse Problems, Mat 1.3626 0-33

Zeroing of the gradient gives z L(S z) = N [ A T Γ 1 A ] z A T Γ 1 N x j = 0, i.e., the maximum likelihood estimator z ML is the solution of the linear system [ A T Γ 1 A ] z = A T Γ 1 x, x = 1 N x j. The solution may not exist; All depends on the properties of the model reduction matrix A R n k. Computational Methods in Inverse Problems, Mat 1.3626 0-34

Particular case: one observation, S = { x }, L(z x) = (x Az) T Γ 1 (x Az). Eigenvalue decomposition of the covariance matrix, Γ = UDU T, or, Γ 1 = W T W, W = D 1/2 U T, we have L(z x) = W (Az x) 2. Hence, the problem reduces to a weighted least squares problem Computational Methods in Inverse Problems, Mat 1.3626 0-35