Let X denote a random variable, and z = h(x) a function of x. Consider the

Similar documents
3.0 PROBABILITY, RANDOM VARIABLES AND RANDOM PROCESSES

Let X and Y denote two random variables. The joint distribution of these random

CHAPTER 5. Jointly Probability Mass Function for Two Discrete Distributed Random Variables:

University of California, Los Angeles Department of Statistics. Joint probability distributions

conditional cdf, conditional pdf, total probability theorem?

Two-dimensional Random Vectors

Experimental Uncertainty Review. Abstract. References. Measurement Uncertainties and Uncertainty Propagation

6. Vector Random Variables

Random Variables. P(x) = P[X(e)] = P(e). (1)

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

Comparison of Fast ICA and Gradient Algorithms of Independent Component Analysis for Separation of Speech Signals

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Expected value of r.v. s

Correlation analysis 2: Measures of correlation

VISION TRACKING PREDICTION

2: Distributions of Several Variables, Error Propagation

Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

5 Operations on Multiple Random Variables

2 (Statistics) Random variables

Lecture 2: Review of Basic Probability Theory

Introduction to Probability and Stocastic Processes - Part I

Inference about the Slope and Intercept

Random Variables and Their Distributions

System Identification

BASICS OF PROBABILITY

Quick Tour of Basic Probability Theory and Linear Algebra

Chapter 18 Quadratic Function 2

The Multivariate Gaussian Distribution

ECE Lecture #10 Overview

6.041/6.431 Fall 2010 Quiz 2 Solutions

6 The normal distribution, the central limit theorem and random samples

18.440: Lecture 28 Lectures Review

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Lecture 2: Separable Ordinary Differential Equations

Section 8.1. Vector Notation

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

There are two things that are particularly nice about the first basis

EE4601 Communication Systems

A Probability Review

3.7 InveRSe FUnCTIOnS

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Ch. 12 Linear Bayesian Estimators

ECE534, Spring 2018: Solutions for Problem Set #3

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

18.440: Lecture 28 Lectures Review

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Chapter 13. Overview. The Quadratic Formula. Overview. The Quadratic Formula. The Quadratic Formula. Lewinter & Widulski 1. The Quadratic Formula

Definition of a Stochastic Process

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Continuous Random Variables

Probability Theory and Statistics. Peter Jochumzen

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Chapter 3. Expectations

Recitation 2: Probability

where r n = dn+1 x(t)

Review of Probability Theory

X t = a t + r t, (7.1)

6.451 Principles of Digital Communication II Wednesday, May 4, 2005 MIT, Spring 2005 Handout #22. Problem Set 9 Solutions

A Few Notes on Fisher Information (WIP)

Statistics 581, Problem Set 8 Solutions Wellner; 11/22/2018

The Multivariate Gaussian Distribution [DRAFT]

General Random Variables

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

2. As we shall see, we choose to write in terms of σ x because ( X ) 2 = σ 2 x.

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

L2: Review of probability and statistics

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

INF Introduction to classifiction Anne Solberg

ECE Lecture #9 Part 2 Overview

Lecture 4 Propagation of errors

Expectation and Variance

Using Modern Satellite Data to Constrain Climate Models, and Bayesian Climate Projection

Fundamentals of Digital Commun. Ch. 4: Random Variables and Random Processes

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

6. Linear transformations. Consider the function. f : R 2 R 2 which sends (x, y) (x, y)

Mathematical Statistics

Higher. Polynomials and Quadratics. Polynomials and Quadratics 1

Graph the linear system and estimate the solution. Then check the solution algebraically.

Solutions to Homework Set #6 (Prepared by Lele Wang)

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

uv = y2 u/v = x 4 y = 1/x 2 u = 1 y = 2/x 2 u = 2

Review (Probability & Linear Algebra)

Review of Elementary Probability Lecture I Hamid R. Rabiee

Multivariate distributions

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Introduction to Machine Learning

Chapter 4 Analytic Trigonometry

Coordinate geometry. + bx + c. Vertical asymptote. Sketch graphs of hyperbolas (including asymptotic behaviour) from the general

Probability on a Riemannian Manifold

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Joint ] X 5) P[ 6) P[ (, ) = y 2. x 1. , y. , ( x, y ) 2, (

A Very Brief Summary of Statistical Inference, and Examples

Transcription:

EE385 Class Notes 11/13/01 John Stensb Chapter 5 Moments and Conditional Statistics Let denote a random variable, and z = h(x) a function of x. Consider the transformation Z = h(). We saw that we could express z EZ [ ] = E[ h()] = hx ( )f x( x) dx, (5-1) a method of calculating E[Z] that does not require knowledge of f Z (z). It is possible to extend this method to transformations of two random variables. Given random variables, and function z = g(x,), form the new random variable Z = g(,). (5-) z f Z (z) denotes the densit of Z. he expected value of Z is EZ [ ] = zfz( z) dz; however, this formula requires knowledge of f Z, a densit which ma not be available. Instead, we can use z z E[ Z] = E[ g(, )] = g( x, ) fx( x, ) dxd (5-3) to calculate E[Z] without having to obtain f Z. his is a ver useful result. Covariance he covariance C of random variables and is defined as z z C = E[( - ηx)( - η)]= (x - η )( - η )f (x,)dxd - - x, (5-4) where η x = E[] and η = E[]. Note that C can be expressed as C = E[( - η )( - η )]=E[-η η + η η ] = E[ ] η η. (5-5) x x x x Updates at http://www.ece.uah.edu/courses/ee385/ 5-1

EE385 Class Notes 11/13/01 John Stensb Correlation Coefficient he correlation coefficient for random variables and is defined as r x C =. (5-6) σ σ x r x is a measure of the statistical similarit between and. heorem 5-1: he correlation coefficient must lie in the range 1 r x +1. Proof: Let α denote an real number. Consider the parabolic equation m r g( α) E[ α( ηx) + ( η) ] = ασx + αcx + σ 0 (5-7) Note that g(α) 0 for all α; g is a parabola that opens g( α) = α σx + αc x + σ upward. As a first case, suppose that there exists a value α 0 for which g(α 0 ) = 0 (see Fig. 5-1). hen α 0 is a repeated root of g(α) = 0. In the quadratic formula used to determine the roots of (5-7), the discriminant must be zero. hat is, (C x ) -4σ x σ = 0, so that α 0 α -axis Figure 5-1: Case for which the discriminant is zero. g( α) = α σ + αc + σ x x r x = C x / σ x σ = 1. Now, consider the case g(α) > 0 for all α; g has no real roots (see Fig. 5-). his means that the discriminant must be negative (so the roots are complex valued). Hence, (C x ) -4σ x σ < 0 so that α -axis Figure 5-: Case for which the discriminant is negative. Updates at http://www.ece.uah.edu/courses/ee385/ 5-

EE385 Class Notes 11/13/01 John Stensb C x r x = < 1. (5-8) σ σ x Hence, in either case, 1 r x +1 as claimed. Suppose an experiment ields values for and. Consider that we perform the experiment man times, and plot the outcomes and on a two dimensional plane. Some hpothetical results follow. -axis -axis x-axis x-axis Correlation Coefficient r x near -1 Correlation Coefficient r x Ver Small -axis x-axis Correlation Coefficient r x near +1 Figure 5-3: Samples of and with varing degrees of correlation. Updates at http://www.ece.uah.edu/courses/ee385/ 5-3

EE385 Class Notes 11/13/01 John Stensb Notes: 1. If r x = 1, then there exists constants a and b such that = a + b in the mean-square sense (i.e., E[{ - (a + b)} ] = 0).. he addition of a constant to a random variable does not change the variance of the random variable. hat is, σ = VAR[] = VAR[ + α] for an α. 3. Multiplication b a constant increases the variance of a random variable. If VAR[] = σ, then VAR[α] = α σ. 4. Adding constants to random variables and does not change the covarance or correlation of these random variables. hat is, + α and + β have the same covariance and correlation coefficient as and. Correlation Coefficient for Gaussian Random Variables Let zero mean and be joint Gaussian with joint densit R S f ( x, ) 1 exp 1 = x x ( ) r + 1 r M x x x πσ σ 1 r σ σ σ σ L M N O U QP V W. (5-9) We are interested in the correlation coefficient r ; we claim that r = r, where r is just a parameter in the joint densit (from statements given above, r is the correlation coefficient for the nonzero mean case as well). First, note that C = E[], since the means are zero. Now, show r = r b establishing E[] = rσ σ, so that r = C /σ σ = E[]/σ σ = r. In the square brackets of f is an expression that is quadratic in x/σ. Complete the square for this quadratic form to obtain x 1 σ r r x ( 1 r ) = x r ( 1 r ) σ σ σ σ. (5-10) x x σσ + = + + x x σ x σ σ x σ Use this new quadratic form to obtain Updates at http://www.ece.uah.edu/courses/ee385/ 5-4

EE385 Class Notes 11/13/01 John Stensb E[] = x f (x, ) dxd σx (x r ) 1 / x σ σ σ π x (1 r ) σx(1 r ) = e exp dxd. σ π σ x {normal densit with mean r x } σ (5-11) Note that the inner integral is an expected value calculation; the inner integral evaluates to σx r. Hence, σ 1 /σ E[] e σ = r x dxd σ π σ σ x 1 /σ σ x = r e d r σ = σ σ π (5-1) σ = r σxσ, as desired. From this, we conclude that r = r. Uncorrelatedness and Orthogonalit wo random variables are uncorrelated if their covariance is zero. hat is, the are uncorrelated if C = r = 0. (5-13) Since C = E[] E[]E[], Equation (5-13) is equivalent to the requirement that E[] = E[]E[]. wo random variables are called orthogonal if E[] = 0. (5-14) Updates at http://www.ece.uah.edu/courses/ee385/ 5-5

EE385 Class Notes 11/13/01 John Stensb heorem 5-: If random variables and are independent, then the are uncorrelated (independence uncorrelated). Proof: Let and be independent. hen. (5-15) E[] = x f (x, ) dxd = x f (x)f () dxd = E[] E[] herefore, and are uncorrelated. Note: he converse is not true in general. If and are uncorrelated, then the are not necessaril independent. his general rule has an exception for Gaussian random variable, a special case. heorem 5-3: For Gaussian random variables, uncorrelatedness is equivalent to independence ( Independence Uncorrelatedness for Gaussian random variables). Proof: We have onl to show that uncorrelatedness independence. But this is eas. Let the correlations coefficient r = 0 (so that the two random variables are uncorrelated) in the joint Gaussian densit. Note that the joint densit factors into a product of marginal densities. Joint Moments Joint moments of and can be computed. hese are defined as z z k r k r mkr = E [ ] = x f x dxd (, ). (5-16) Joint central moments are defined as z z μkr = E [( ηx ) k ( η ) r ] = x ηx k η r f x dxd ( ) ( ) (, ). (5-17) Conditional Distributions/Densities Let M denote an event with P(M) 0, and let and be random variables. Recall that Updates at http://www.ece.uah.edu/courses/ee385/ 5-6

EE385 Class Notes 11/13/01 John Stensb [, M] F( M) = P[ M] = P P[M]. (5-18) Now, event M can be defined in terms of the random variable. Example (5-1): Define M = [ x] and write P[ x, ] F (x, ) F( x) = = P[ x] F (x) (5-19) F( x, )/ f( x ) =. (5-0) F ( x) Example (5-): Define M = [x 1 < x ] and write P[x1< x, ] F (x,) F (x 1,) F( x1< x ) = = P[x < x ] F (x ) F (x ) 1 1. (5-1) Example (5-3): Define M = [ = x], where f (x) 0. he quantit P[, M]/ P [M] can be indeterminant (i.e., 0/0) in this case (certainl, this is true for continuous ) so that we must use F( = x) = limit F(x- Δ x< x). (5-) + Δx 0 From the previous example, this result can be written as F (x,) F (x Δx,) [F (x,) F (x Δx,)]/ Δx F( = x) = limit = limit F(x) F(x Δx) [F(x) F(x Δx)]/ Δx + + Δx 0 Δx 0 = F(x)/ x F (x,)/ x. (5-3) Updates at http://www.ece.uah.edu/courses/ee385/ 5-7

EE385 Class Notes 11/13/01 John Stensb From this last result, we conclude that the conditional densit can be expressed as f( = x) = F( = x) = F(x)/ x F (x,)/ x, (5-4) which ields f( x, ) f( = x ) =. (5-5) f ( x) Use the abbreviated notation f ( x) = f ( = x), Equation (5-5) and smmetr to write f (x,) = f ( x) f (x) = f (x ) f (). (5-6) Use this form of the joint densit with the formula before last to write f(x )f () f( x ) =, (5-7) f ( x) a result that is called Baes heorem for densities. Conditional Expectations Let M denote an event, g(x) a function of x, and a random variable. hen, the conditional expectation of g() given M is defined as z Eg [( ) Μ] = gx ( ) f( xμ) dx. (5-8) Updates at http://www.ece.uah.edu/courses/ee385/ 5-8

EE385 Class Notes 11/13/01 John Stensb For example, let and denote random variables, and write the conditional mean of given = as ηx E[ = ] E[ ] x f (x dx ). (5-9) Higher-order conditional moments can be defined in a similar manner. For example, the conditional variance is written as σx E[( η x) = ] E[( x ] (x x) f(x ) dx η ) η. (5-30) Remember that η x and σ x are functions of algebraic variable, in general. Example (5-4): Let and be zero-mean, jointl Gaussian random variables with R S f ( x, ) 1 exp 1 = x x ( ) r + 1 r x x πσ σ 1 r σ σ σ σ x L NM O U QP V W. (5-31) Find f(x ), η and σ x. We will accomplish this b factoring f into the product f(x )f (). B completing the square on the quadratic, we can write x x x r + = r + ( 1 r ) σ σσ x x σ σ x σ σ 1 x r + 1 r σ x = σ x σ ( ) σ, (5-3) so that Updates at http://www.ece.uah.edu/courses/ee385/ 5-9

EE385 Class Notes 11/13/01 John Stensb f(x ) σ x (x r ) σ 1 1 f (x,) = exp exp. (5-33) π(1 r ) σ σ x x (1 r ) πσ σ f () From this factorization, we observe that σx (x r ) 1 σ f(x ) = exp π(1 r ) σ σ x x (1 r ). (5-34) Note that this conditional densit is Gaussian! his unexpected conclusion leads to η = r x σx σ x x σ =σ (1 r ) (5-35) as the conditional mean and variance, respectivel. he variance σ x of a random variable is a measure of uncertaint in the value of. If σ x is small, it is highl likel to find near its mean. he conditional variance σ x is a measure of uncertaint in the value of given that =. From (5-35), note that σ x 0 as r 1. As perfect correlation is approached, it becomes more likel to find near its conditional mean η x. Example (5-5): Generalize the previous example to the non-zero mean case. Consider and same as above except for E[] = η and E[] = η. Now, define zero mean Gaussian variables d and d so that = d + η, = d + η and Updates at http://www.ece.uah.edu/courses/ee385/ 5-10

EE385 Class Notes 11/13/01 John Stensb f d d (x ηx, η) f (x, ) = = f d d (x ηx, η) (x,) (x, ) d d σx (x ηx r ( η)) σ ( η) exp exp σ x x (1 r ) πσ σ 1 1 = π(1 r ) σ. (5-36) B Baes rule for densit functions, it is easil seen that σx (x ηx r ( η)) 1 σ f(x ) = exp π(1 r ) σ σ x x (1 r ). (5-37) Hence, the conditional mean and variance are σx x x σ η =η + r ( η ) x x σ =σ (1 r ) (5-38) respectivel, for the case where and are themselves nonzero mean. Note that (5-38) follows directl from (5-35) since ηx E = = E d+η x d+η = = E d d = η + E η x d = η (5-39) σx σ = r ( η ) +η. x Updates at http://www.ece.uah.edu/courses/ee385/ 5-11

EE385 Class Notes 11/13/01 John Stensb Conditional Expected Value as a ransformation for a Random Variable Let and denote random variables. he conditional mean of random variable given that = x is an "ordinar" function ϕ(x) of x. hat is, ϕ (x) = E[ = x] = E[ x] = f( x) d. (5-40) In general, function ϕ(x) can be plotted, integrated, differentiated, etc.; it is an "ordinar" function of x. For example, as we have just seen, if and are jointl Gaussian, we know that σ ϕ (x) = E[ = x] =η + r (x ηx), (5-41) σ x a simple linear function of x. Use ϕ(x) to transform random variable. Now, ϕ() = E[ ] is a random variable. Be ver careful with the notation: random variable E[ ] is different from function E[ = x] E[ x] (note that E[ = x] and E[ x] are used interchangeabl). Find the expected value E[ϕ()] = E[E[ ]] of random variable ϕ(). In the usual wa, we start this task b writing z z z E[ E[ ]] E[ x] f ( x) dx = f( x) d f ( x) dx. (5-4) = = L N M O QP Now, since f (x,) = f ( x) f (x) we have z z z z E[ E[ ]] = f( x ) f( x) dxd= f( x, ) dxd= f( ) d. (5-43) z From this, we conclude that Updates at http://www.ece.uah.edu/courses/ee385/ 5-1

EE385 Class Notes 11/13/01 John Stensb E [ ] = E[ E [ ]]. (5-44) he inner conditional expectation is conditioned on ; the outer expectation is over. o emphasis this fact, the notation E [E[ ]] E[E[ ]] is used sometimes in the literature. Example (5-6): Example: wo fair dice are tossed until the combination 1 and 1 ( snake ees ) appear. Determine the average (i.e., expected) number of tosses required to hit snake ees. o solve this problem, define random variables 1) N = {number of tosses to hit snakes ees for the first time ) H =1 if snake ees hit on first roll = 0 if snake ees not hit first roll Note that H takes on onl two values with P[H= 1] = 1/36 and P[H=0] = 35/36. Now, we can = EE[NH], where the inner expectation is conditioned on H, and the outer expectation is an average over H. We write compute the average EN [ ] [ ] = = = P[ = ] + = P [ = ] EN E E[NH] E[NH 1] H 1 E[NH 0] H 0 Now, if H = 0, then snake ees was not hit on the first toss, and the game starts over (at the second toss) with an average of E[N] additional tosses still required to hit snake ees. Hence, E[N H = 0] = 1 + E[N]. On the other hand, if H = 1, snake ees was hit on the first roll, so E[N H = 1] = 1. hese two observations produce [ ] = = P[ = ] + = P[ = ] EN E[NH 1] H 1 E[NH 0] H 0 1 35 = 1 1 E[ N] + + 36 36 35 = EN [ ] + 1, 36 Updates at http://www.ece.uah.edu/courses/ee385/ 5-13

EE385 Class Notes 11/13/01 John Stensb and the conclusion E[N] = 36. Generalizations his basic concept can be generalized. Again, and denote random variables. And, g(x,) denotes a function of algebraic variables x and. he conditional mean ϕ(x) = E[g(,) = x] = E[g(x,) z = x] = g(x, ) f(x ) d (5-45) - is an "ordinar" function of real value x. Now, ϕ() = E[g(,) ] is a transformation of random variable (again, be careful: E[g(,) ] is a random variable and E[g(,) = x] = E[g(x,) x] = ϕ(x) is a function of x). We are interested in the expected value E[ϕ()] = E[E[g(,) ]] so we write - - E[ ϕ()] = E[ E[g(,) ] ] = f (x) g(x,)f ( x)d dx = g(x,)f ( x)f (x) d dx g(x,)f - - - - x(x, ) d dx E[g(,)], = = (5-46) where we have used f (x,) = f( x)f (x), Baes law of densities. Hence, we conclude that E[g(,)] = E[E[g(,) ]] = E [E[g(,) ]]. (5-47) In this last equalit, the inner conditional expectation is used to transform ; the outer expectation is over. Example (5-7): Let and be jointl Gaussian with E[] = E[] = 0, Var[] = σ, Var[] = σ and correlation coefficient r. Find the conditional second moment E[ = ] = E[ ]. First, note that Updates at http://www.ece.uah.edu/courses/ee385/ 5-14

EE385 Class Notes 11/13/01 John Stensb Var[] = E[ ] ee[ ] j. (5-48) Using the conditional mean and variance given b (5-35), we write = + F = + H G I e j K J σ E [ ] Var[ ] E [ ] σx( 1 r ) r x σ. (5-49) Example (5-8): Let and be jointl Gaussian with E[] = E[] = 0, Var[] = σ, Var[] = σ and correlation coefficient r. Find E[] = E [ ϕ ()], (5-50) where σ r x ϕ () = E[ = ] = E[ = ] = σ. (5-51) o accomplish this, substitute (5-51) into (5-50) to obtain σx σ E[] = E [ ϕ ()] = r E [ ] = r σ = rσ σ σ x x σ. (5-5) Application of Conditional Expectation: Baesian Estimation Let θ denote an unknown DC voltage (for example, the output a thermocouple, strain gauage, etc.). We are tring to measure θ. Unfortunatel, the measurement is obscured b additive noise n(t). At time t =, we take a single sample of θ and noise; this sample is called z = θ + n(). We model the noise sample n() as a random variable with known densit f n (n) (we Updates at http://www.ece.uah.edu/courses/ee385/ 5-15

EE385 Class Notes 11/13/01 John Stensb + n(t) θ + + Σ at t = z = θ + n() Figure 5-4: Nois measurement of a DC voltage. have abused the smbol n b using it simultaneousl to denote a random quantit and an algebraic variable. Such abuses are common in the literature). We model unknown θ as a random variable with densit f θ (θ). Densit f θ (θ) is called the a-priori densit of θ, and it is known. In most cases, random variables θ and n() are independent, but this is not an absolute requirement (the independence assumption simplifies the analsis). Figure 5-4 depicts a block diagram that illustrates the generation of voltage-sample z. From context in the discussion given below (and in the literature), the reader should be able to discern the current usage of the smbol z. He/she should be able to tell whether z denotes a random variable or a realization of a random variable (a particular sample outcome). Here, (as is often the case in the literature) there is no need to use Z to denote the random variable and z to denote a particular value (sample outcome or realization) of the random variable. We desire to use the measurement z to estimate voltage θ. We need to develop an estimator that will take our measurement sample value z and give us an estimate ˆθ (z) of the actual value of θ. Of course, there is some difference between the estimate ˆθ and the true value of θ; that is, there is an error voltage θ (z) ˆθ (z) - θ. Finall, making errors cost us. C( θ (z)) denotes the cost incurred b using measurement z to estimate voltage θ; C is a known cost function. he values of z and C( θ (z)) change from one sample to the next; the can be interpreted as random variables as described above. Hence, it makes no sense to develop estimator ˆθ that minimizes C( θ (z)). But, it does make sense to choose/design/develop ˆθ with the goal of Updates at http://www.ece.uah.edu/courses/ee385/ 5-16

EE385 Class Notes 11/13/01 John Stensb minimizing E[C( θ (z))] = E[C( ˆθ (z) - θ)], the expected or average cost associated with the estimation process. It is important to note that we are performing an ensemble average over all possible z and θ (random variables that we average over when computing E[C( ˆθ (z) - θ)]). he estimator, denoted here as ˆθ b, that minimizes this average cost is called the Baesian estimator. hat is, Baesian estimator ˆθ b satisfies E[ C( θˆ ˆ b(z) - θ)] E[ C ( θ(z) - θ)] (5-53) θ θ ˆ ˆ b. ( ˆθ b is the "best" estimator. On the average, ou "pa more" if ou use an other estimator). Important Special Case : Mean Square Cost Function C( θ ) = θ Let's use the squared error cost function C( θ ) = the average cost per decision is θ. hen, when estimator ˆθ is used, E[ ] (z) f (,z)d dz (z) f( z)d f (z)dz ( ˆ θ = ) z ( ˆ θ ) Z θ θ θ θ = θ θ θ θ (5-54) For the outer integral of the last double integral, the integrand is a non-negative function of z. Hence, average cost E[ θ ] will be minimized if, for ever value of z, we pick θ ˆ(z) to minimize the non-negative inner integral ( ) θ θˆ(z) f ( θ z) d θ. (5-55) With respect to ˆθ, differentiate this last integral, set our result to zero and get ( ˆ ) θ θ(z) f( θ z)dθ= 0. (5-56) Updates at http://www.ece.uah.edu/courses/ee385/ 5-17

EE385 Class Notes 11/13/01 John Stensb Finall, solve this last result for the Baesian estimator θ ˆ b (z) = θ f( θ z) d θ= E[ θ z]. (5-57) hat is, for the mean square cost function, the Baesian estimator is the mean of θ conditioned on the data z. Sometimes, we call (5-57) the conditional mean estimator. As outlined above, we make a measurement and get a specific numerical value for z (i.e., we ma interpret numerical z as a specific realization of a random variable). his measured value can be used in (5-57) to obtain a numerical estimate of θ. On the other hand, suppose that we are interested in the average performance of our estimator (averaged over all possible measurements and all possible values of θ). hen, as discussed below, we treat z as a random variable and average θ (z) = { θˆ (z) θ} over all possible measurements (values of z) and all b possible values of θ; that is, we compute the variance of the estimation error. In doing this, we treat z as a random variable. However, we use the same smbol z regardless of the interpretation and use of (5-57). From context, we must determine if z is being used to denote a random variable or a specific measurement (that is, a realization of a random variable). Alternative Expression for θ ˆb he conditional mean estimator can be expressed in a more convenient fashion. First, use Baes rule for densities (here, we interpret z as a random variable) f(z θ)f ( ) f( z) θ θ θ = (5-58) f z(z) in the estimator formula (5-57) to obtain f(z )f ( )d f(z )f ( )d ˆ f(z θ)f ( ) b (z) θ θ θ θ θ θ θ θ θ θ θ θ θ = d θ θ= =, f z(z) f z(z) f(z θ)f ( θ)dθ θ (5-59) Updates at http://www.ece.uah.edu/courses/ee385/ 5-18

EE385 Class Notes 11/13/01 John Stensb a formulation that is used in application. Mean and Variance of the Estimation Error For the conditional mean estimator, the estimation error is. (5-60) θ=θ θ ˆ b =θ E[ θ z] he mean value of θ is (averaged over all θ and all possible measurements z) E[ θ ] = E[ θ θ ˆ b] = E E[ z] θ θ = E[ θ] E E[ z] θ = E[ θ] E[ θ]. (5-61) =0 Equivalentl, E[ ˆ b] E[ ] θ = θ ; because of this, we sa that ˆθ b is an unbiased estimator. Since E[ θ ] = 0, the variance of the estimation error is VAR[ θ ] = E[ θ ] = θ E[ θ z] f ( θ, z)dθdz, (5-6) where f(θ,z) is the joint densit that describes θ and z. We want VAR[θ ] < VAR[θ]; otherwise, our estimator is of little value since we could use E[θ] to estimate θ. In general, VAR[ θ ] is a measure of estimator performance. Example (5-9): Baesian Estimator for Single-Sample Gaussian Case Suppose that θ is N(θ 0, σ 0 ) and n() is N(0,σ). Also, assume that θ and n are independent. Find the conditional mean (Baesian) estimator θ b. First, when interpreted as a random variable, z = θ + n() is Gaussian with mean θ 0 and variance σ 0 + σ. Hence, from the conditional mean formula (5-38) for the Gaussian case, we have Updates at http://www.ece.uah.edu/courses/ee385/ 5-19

EE385 Class Notes 11/13/01 John Stensb ˆ σ θ (z) = E[ θ z] = θ + r (z θ ), (5-63) 0 b 0 θz 0 σ 0 +σ where r θz is the correlation coefficient between θ and z. Now, we must find r θz. Observe that r θ z 0 0 0 0 n 0 0 0 0 0 0 0 0 E[( θ θ )(z θ )] E[( θ θ )([ θ θ ] + ())] E[( θ θ ) + ( θ θ ) n()] = = = σ σ +σ σ σ +σ σ σ +σ θ θ0 σ0 0 0 0 E[( = ) ] = σ σ +σ σ +σ, (5-64) since θ and η() are independent. Hence, the Baesian estimator is σ0 b 0 0 σ 0 +σ θ ˆ (z) =θ + (z θ ). (5-65) he error is θ = θ - ˆθ b, and E[ θ ] = 0 as shown b (5-61). hat is, ˆθ b is an unbiased estimator since its expected value is the mean of the quantit being estimated. he variance of θ is ˆ σ VAR[ θ ] = E[( θ θ 0 b) ] = E ( θ θ0) (z θ 0) σ 0 +σ σ 0 σ 0 0 0 0 0 σ 0 +σ σ 0 +σ = E[( θ θ ) ] E[( θ θ )(z θ )] + E[(z θ ) ]. (5-66) Due to independence, we have Updates at http://www.ece.uah.edu/courses/ee385/ 5-0

EE385 Class Notes 11/13/01 John Stensb 0 0 0 0 0 0 0 E[( θ θ )(z θ )] = E[( θ θ )( θ θ + n ())] = E[( θ θ )( θ θ )] =σ (5-67) 0 0 0 E[(z θ ) ] = E[( θ θ + n ()) ] =σ +σ (5-68) Now, use (5-67) and (5-68) in (5-66) to obtain σ 0 σ 0 0 0 0 σ 0 +σ σ 0 +σ VAR[ θ ] =σ σ + [ σ +σ ]. (5-69) σ =σ0 σ 0 +σ As expected, the variance of error θ approaches zero as the noise average power (i.e., the variance) σ 0. On the other hand, as σ, we have VAR[ θ ] σ 0 (this is the noise dominated case). As can be seen from (5-69), for all values of σ, we have VAR[ θ ] < VAR[θ] = σ 0, which means that ˆθ b will alwas out perform the simple approach of selecting mean E[θ] = θ 0 as the estimate of θ. Example (5-10): Baesian Estimator for Multiple Sample Gaussian Case As given b (5-69), the variance (i.e., the uncertaint) of ˆθ b ma be too large for some applications. We can use a sample mean (involving multiple samples) in the Baesian estimator to lower its variance. ake multiple samples of z(t k ) = θ + n(t k ), 1 k N (t k, 1 k N, denote the times at which samples are taken). Assume that the t k are far enough apart in time that n(t k ) and n(t j ) are independent for t k t j (for example, this would be the case if the time intervals between samples are large compared to the reciprocal of the bandwidth of noise n(t)). Define the sample mean of the collected data as Updates at http://www.ece.uah.edu/courses/ee385/ 5-1

EE385 Class Notes 11/13/01 John Stensb N 1 N k =θ+ n (5-70) k = 1 z z(t ) where N 1 n n (t ) (5-71) N k = 1 k is the sample mean of the noise. he quantit n is Gaussian with mean E[ n ] = 0; due to independence, the variance is N 1 σ k N k= 1 N VAR[ n] VAR[ n (t )] =. (5-7) Note that z θ+n has the same form regardless of the number of samples N. Hence, based on the data z, the Baesian estimator for θ has the same form regardless of the number of samples. We can adopt (5-65) and write σ0 b 0 0 σ 0 +σ /N θ ˆ (z) =θ + (z θ ). (5-73) hat is, in the Baesian estimator formula, use sample mean z instead of the single sample z. Adapt (5-69) to the multiple sample case and write the variance of error θ = θ - ˆθ b as σ /N VAR[ θ ] =σ0. (5-74) σ 0 +σ /N B making the number N of averaged samples large enough, we can average out the noise and Updates at http://www.ece.uah.edu/courses/ee385/ 5-

EE385 Class Notes 11/13/01 John Stensb make (5-74) arbitraril small. Conditional Multidimensional Gaussian Densit Let be an n 1 Gaussian vector with E[ ] = 0 and a positive definite n n covariance matrix Λ. Likewise, define as a zero-mean, m 1 Gaussian random vector with m m positive definite covariance matrix Λ. Also, define n m matrix Λ = E[ ]; note that Λ Z = Λ = E[ ], an m n matrix. Find the conditional densit f( ). First, define the (n+m) 1 super vector = L N M O Q P, (5-75) which is obtained b stacking on top of. he (n+m) (n+m) covariance matrix for Z is Λ Λ Λ Z = E[Z Z ] = E =. (5-76) Λ Λ he inverse of this matrix can be expressed as (observe that Λ Z Λ Z -1 = I) A B 1 Λ Z B C = L N M O Q P, (5-77) where A is n n, B is n m and C is m m. hese intermediate block matrices are given b Updates at http://www.ece.uah.edu/courses/ee385/ 5-3

EE385 Class Notes 11/13/01 John Stensb 1 1 1 1 A = ( Λ Λ Λ Λ ) = Λ [ I+ Λ CΛ Λ ] 1 1 B= AΛ Λ = Λ Λ C (5-78) 1 1 1 1 C= ( Λ Λ Λ Λ ) = Λ [ I+ Λ AΛ Λ ] Now, the joint densit is 1 A B f(, ) = exp 1 M L n m ( ) N M P + (5-79) π Λ B C Z L NM O QP L N M O Q P O QP he marginal densit is f( 1 ) = exp 1 1 m Λ (5-80) ( π) Λ From Baes heorem for densities f(, ) 1 f( ) = = exp f( ) n Λ Z ( π) NM Λ L L NM A B 1 1 B C Λ O QP L N M O Q P O QP (5-81) However, straightforward but tedious matrix algebra ields L NM A B B C Λ O QP L N M O Q P = A + B B+ ( C Λ ) 1 1 L N M O Q P 1 = [ A+ B] + [ B + ( C Λ ) ] (5-8) 1 = A+ B+ [ C Λ ] Updates at http://www.ece.uah.edu/courses/ee385/ 5-4

EE385 Class Notes 11/13/01 John Stensb (Note that the scalar identit B = B was used in obtaining this result). From the previous page, use the results B= AΛ Λ 1 1 1 and C Λ = ΛΛAΛΛ 1 to write L NM A B B C Λ O 1 1 1 A AΛ Λ Λ Λ AΛ Λ QP L N M O 1 Q P = + 1 1 = Λ Λ A Λ Λ (5-83) o simplif the notation, define 1 M Λ Λ (an m 1 vector) -1 1 Q A = Λ Λ Λ Λ (an n n matrix) (5-84) so that the quadratic form becomes L NM A B B 1 M Q M C Λ 1 ( ) ( ) (5-85) O QP L N M O Q P = Now, we must find the quotient Λ Z Λ. Write 1 Λ Λ Λ ΛΛ Λ I Λ n 0 Λ Z = = 1 Λ Λ 0 Λ Λ Λ Im (5-86) I m is the m m identit matrix and I n is the n n identit matrix. Hence, 1 Z Λ = Λ Λ Λ Λ Λ (5-87) Updates at http://www.ece.uah.edu/courses/ee385/ 5-5

EE385 Class Notes 11/13/01 John Stensb Λ Λ Z 1 Q = Λ Λ Λ Λ = (5-88) Use Equation (5-85) and (5-88) in f (x ) to obtain 1 1 1 f( ) = exp ( M) Q ( M) n ( π) Q, (5-89) where M Λ Λ (an m 1 vector) -1 1 1 Q A = Λ Λ Λ Λ (an n n matrix) (5-90) Vector M = E[ ] is the conditional expectation vector. Matrix Q = E[( M)( M) ] is the conditional covariance matrix. Generalizations to Nonzero Mean Case Suppose E[ ] = M and E[ ] = M, then f( 1 ) = exp 1 ( M 1 ) Q ( M ) n ( π) Q, (5-91) where M E[ ] = M +Λ Λ M ) (an n 1 vector) 1 ( Q E[( M)( M) ] =Λ Λ Λ Λ (an n n matrix). 1 (5-9) Updates at http://www.ece.uah.edu/courses/ee385/ 5-6