Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Similar documents
EIE6207: Estimation Theory

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

A Few Notes on Fisher Information (WIP)

Advanced Signal Processing Introduction to Estimation Theory

ECE531 Lecture 10b: Maximum Likelihood Estimation

PATTERN RECOGNITION AND MACHINE LEARNING

Detection & Estimation Lecture 1

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

Estimators as Random Variables

Estimation and Detection

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

10. Linear Models and Maximum Likelihood Estimation

EIE6207: Maximum-Likelihood and Bayesian Estimation

Mathematical statistics

Detection & Estimation Lecture 1

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Lecture 8: Information Theory and Statistics

Review. December 4 th, Review

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

ECE531 Lecture 8: Non-Random Parameter Estimation

Density Estimation. Seungjin Choi

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Mathematical statistics

Example: An experiment can either result in success or failure with probability θ and (1 θ) respectively. The experiment is performed independently

Chapter 8: Least squares (beginning of chapter)

Lecture 7 Introduction to Statistical Decision Theory

2 Statistical Estimation: Basic Concepts

Practice Problems Section Problems

Module 1 - Signal estimation

Basic concepts in estimation

LINEAR MMSE ESTIMATION

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

6.1 Variational representation of f-divergences

Parameter Estimation

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Linear models. x = Hθ + w, where w N(0, σ 2 I) and H R n p. The matrix H is called the observation matrix or design matrix 1.

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

Parametric Techniques Lecture 3

Statistical inference

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

1. Point Estimators, Review

Parametric Techniques

ECE 275A Homework 7 Solutions

Ch. 12 Linear Bayesian Estimators

Detection Theory. Chapter 3. Statistical Decision Theory I. Isael Diaz Oct 26th 2010

DETECTION theory deals primarily with techniques for

Advanced Signal Processing Minimum Variance Unbiased Estimation (MVU)

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

Estimation of Parameters

Estimation Theory Fredrik Rusek. Chapters

Estimation Theory Fredrik Rusek. Chapter 11

ISyE 6644 Fall 2014 Test 3 Solutions

Ch. 5 Hypothesis Testing

Detection theory. H 0 : x[n] = w[n]

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Brief Review on Estimation Theory

ST5215: Advanced Statistical Theory

Deterministic. Deterministic data are those can be described by an explicit mathematical relationship

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Introduction to Maximum Likelihood Estimation

Bayesian Learning (II)

ECE531 Lecture 2b: Bayesian Hypothesis Testing

System Identification, Lecture 4

System Identification, Lecture 4

Frames and Phase Retrieval

13. Parameter Estimation. ECE 830, Spring 2014

Maximum Likelihood Diffusive Source Localization Based on Binary Observations

Machine Learning 4771

8 Basics of Hypothesis Testing

EE 574 Detection and Estimation Theory Lecture Presentation 8

Classical Estimation Topics

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Lecture 2: Basic Concepts of Statistical Decision Theory

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Math Review Sheet, Fall 2008

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

ECE 275A Homework 6 Solutions

ECE534, Spring 2018: Solutions for Problem Set #3

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Estimation Theory Fredrik Rusek. Chapters 6-7

A General Overview of Parametric Estimation and Inference Techniques.

Statistics: Learning models from data

Detection theory 101 ELEC-E5410 Signal Processing for Communications

ECE531 Homework Assignment Number 6 Solution

6.867 Machine learning

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Proof In the CR proof. and

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Business Statistics: A Decision-Making Approach 6 th Edition. Chapter Goals

y Xw 2 2 y Xw λ w 2 2

Transcription:

Module Random Processes Version, ECE IIT, Kharagpur

Lesson 9 Introduction to Statistical Signal Processing Version, ECE IIT, Kharagpur

After reading this lesson, you will learn about Hypotheses testing Unbiased estimation based on minimum variance Mean Square Error (MSE) Crammer - Rao Lower Bound (CRLB) As mentioned in Module #1, demodulation of received signal and taking best possible decisions about the transmitted symbols are key operations in a digital communication receiver. We will discuss various modulation and demodulation schemes in subsequent modules (Modules # 4 and #5). However, a considerable amount of generalization is possible at times and hence, it is very useful to have an overview of the underlying principles of statistical signal processing. For example, the process of signal observation and decision making is better appreciated if the reader has an overview of what is known as hypothesis testing in detection theory. In the following, we present a brief treatise on some fundamental concepts and principles of statistical signal processing. Hypothesis Testing A hypothesis is a statement of a possible source of an observation. The observation in a statistical hypothesis is a random variable. The process of making a decision from an observation on which hypothesis is true (or correct), is known as hypothesis testing. Ex.#.9.1. Suppose it is known that a modulator generates either a pulse P 1 or a pulse P over a time interval of T second and r is received in the corresponding observation interval of T sec. Then, the two hypotheses of interest may be, H 1 : P 1 is transmitted. H : P is transmitted. ote that P 1 is not transmitted is equivalent to P is transmitted and vice versa, as it is definitely known that either P 1 or P has been transmitted. Let us define a parameter (random variable) u which is generated in the demodulator as a response to the received signal r, over the observation interval. The parameter u being a function of an underlying random process, is defined as a random variable (single measurement over T) and is called an observation in the context of hypothesis testing. An observation is also known as a decision variable in the jargon of digital communications. The domain (range of values) of u is known as observation space. The relevant hypothesis is, making a decision on whether H 1 or H is correct (upon observing u ). The observation space in the above example is the one dimensional real number axis. ext, to make decisions, the whole Version, ECE IIT, Kharagpur

observation space is divided in appropriate decision regions so as to associate the possible decisions from an observation r with these regions. ote that the observation space need not be the same as the range of pulses P 1 or P. However, decision regions can always be identified for the purpose of decisionmaking. If a decision doesn t match with the corresponding true hypothesis, an error is said to have occurred. An important aim of a communication receiver is to clearly identify the decision regions such that the (decision) errors are minimized. Estimation There are occasions in the design of digital communications systems when one or more parameters of interest are to be estimated from a set of signal samples. For example, it is very necessary to estimate the frequency (or, instantaneous phase) of a received carrier-modulated signal as closely as possible so that the principle of coherent demodulation may be used and the decision errors can be minimized. Let us consider a set of random, discrete-time samples or data points, {x[0], x[1], x[],,x[-1]} which depends (in some way) on a parameter θ which is unknown. The unknown parameter θ is of interest to us. Towards this, we express an estimator θˆ as a function of the data points: θˆ = f (x[0], x[1], x[],,x[-1] ).9.1 Obviously, the issue is to find a suitable function f(.) such that we can obtain θ from our knowledge of θˆ as precisely as we should expect. ote that, we stopped short of saying that at the end of our estimation procedure, θˆ will be equal to our parameter of interest θ. ext, it is important to analyze and mathematically model the available data points which will usually be of finite size. As the sample points are inherently random, it makes sense to model the data set in terms of some family of probability density function [pdf], parameterized by the parameter θ: p(x[0], x[1], x[],,x[- 1] ; θ ). It means that the family of pdf is dependent on θ and that different values of θ may give rise to different pdf-s. The distribution should be so chosen that the mathematical analysis for estimation is easier. A Gaussian pdf is a good choice on several occasions. In general, if the pdf of the data depends strongly on θ, the estimation process is likely to be more fruitful. A classical estimation procedure assumes that the unknown parameter θ is deterministic while the Bayesian approach allows the unknown parameter to be a random variable. In this case, we say that the parameter, being estimated, is a realization of the random variable. If the set of sample values {x[0], x[1], x[],,x[-1]} is represented by a data vector x, a joint pdf of the following form is considered before formulating the procedure of estimation: Version, ECE IIT, Kharagpur

p( x, θ) = p( x θ). p(θ).9. Here p(θ) is the prior pdf of θ, the knowledge of which we should have before observing the data and p( x θ) is the conditional pdf of x, conditioned on our knowledge of θ. With these interpretations, an estimator may now be considered as a rule that assigns a value to θ for each realization of data vector x and the estimate of θ is the value of θ for a specific realization of the data vector x. An important issue for any estimation is to assess the performance of an estimator. All such assessment is done statistically. As the process of estimation involves multiple computations, sometimes the efficacy of an estimator is decided in a practical application by its associated computational complexity. An optimal estimator may need excessive computation while the performance of a suboptimal estimator may be reasonably acceptable. An Unbiased Estimation based on Minimum Variance: Let us try to follow the principle of a classical estimation approach, known as unbiased estimation. We expect the estimator to result in the true value of the unknown parameter on an average. The best estimator in this class will have minimum variance in terms of estimation error ( θˆ -θ ). Suppose we know that the unknown parameter θ is to lie within the interval θ min < θ < θ max. The estimate is said to be unbiased if E[ θˆ ] = θ in the interval θ min < θ < θ max. The criterion for optimal estimation that is intuitive and easy to conceive is the mean square error (MSE), defined as below: MSE ( θˆ ) = E [ ( θˆ - θ) ].9.3 However, from practical considerations, this criterion is not all good as it may not be easy to formulate a good estimator (such as minimum MSE ) that can be directly expressed in terms of the data points. Usually, the MSE results in a desired variance term and an undesired bias term, which makes the estimator dependent on the unknown parameter (θ). So, summarily, the intuitive mean square error (MSE) approach does not naturally lead to an optimum estimator unless the bias is removed to formulate an unbiased estimator. A good approach, wherever applicable, is to constrain the bias term to zero and determine an estimator that will minimize the variance. Such an estimator is known as a minimum variance unbiased estimator. The next relevant issue in classical parameter estimation is to find the estimator. Unfortunately, there is no single prescription for achieving such an optimal estimator for all occasions. However, several powerful approaches are available and one has to select an appropriate method based on the situation at hand. One such approach is to determine the Crammer-Rao lower bound (CRLB) and to check whether an estimator at hand satisfied this bound. The CRLB results in a bound over the permissible range Version, ECE IIT, Kharagpur

of the unknown parameter θ such that the variance of any unbiased estimator will be equal or greater than this bound. Further, if an estimator exists whose variance equals the CRLB over the complete range of the unknown parameter θ, then that estimator is definitely a minimum variance unbiased estimator. The theory of CRLB can be used with reasonable ease to see for an application whether an estimator exists which satisfies the bound. This is important in modeling and performance evaluation of several functional modules in a digital receiver. The CRLB theorem can be stated in multiple ways and we choose the simple form which is applicable when the unknown parameter is a scalar. Theorem on Crammer- Rao Lower Bound The variance of any unbiased estimator θˆ will satisfy the following inequality provided the assumptions stated below are valid: var (θˆ) 1 / { -E [( ln p(x ;θ))/ ] } = 1/ {E[( ln p(x ;θ)/ ) ]}.9.4 Associated assumptions: a) all statistical expectation operations are with respect to the pdf p(x ;θ), b) the pdf p(x ;θ) also satisfies the following regularity condition for all θ: E [( ln p(x ;θ))/ ] = { ( ln p(x ;θ))/ }. p(x ;θ)dx = 0.9.5 Further, an unbiased estimator, achieving the above bound may be found only if the following equality holds for some functions g(.) and I(.): ln p(x ;θ))/ = I(θ).{g(x) - θ }.9.6 The unbiased estimator is given by θˆ of θˆ is the reciprocal of I(θ), i.e. 1/I(θ). = g(x) and the resulting minimum variance It is interesting to note that, I(θ) = -E [( ln p(x ;θ))/ ] = E[( ln p(x ;θ)/ ) ] and is known as Fisher information, associated with the data vector x. CRLB for signals in White Gaussian oise Let us denote time samples of a deterministic signal, which is some function of the unknown parameterθ, as y[n;θ]. Let us assume that we have received the samples after corruption by white Gaussian noise, denoted by ω[n], as: x[n] = y[n;θ] + ω[n], n = 1,,., We wish to determine an expression for the Crammer-Rao Lower Bound with the above knowledge. ote that we can express ω[n] as ω[n] = x[n] - y[n;θ]. Version, ECE IIT, Kharagpur

As ω[n] is known to follow Gaussian probability distribution, we may write, 1 1 p(x; θ) = exp[ ( x[ n] - y[n; θ ]) ].9.7 σ n= 1 (πσ ) Partial derivative of p(x; θ) with respect to θ gives: ln p(x; θ ) 1 y[n; θ ] =. ( x[ n] - y[n; θ ]).9.8 θ σ n= 1 Similarly, the second derivative leads to the following expression: ln p(x; θ ) 1 y[n; θ ] y[n; θ ]. = { ( x[ n] - y[n; θ ]) - ( ) }.9.9 σ θ n= 1 ow, carrying out the expectation operation gives, ln p(x; θ ) 1 y[n; θ ] E[ ] = -. ( ) σ θ n= 1.9.10 ow, following the CRLB theorem, we see that the bound is given by the following inequality: σ var( ˆ) θ.9.11 y[ n; θ ] ( ) θ n= 1 Problems Q.9.1) Justify why Hypotheses testing may be useful in the study of digital communications. Q.9.) What is MSE? Explain its significance. Version, ECE IIT, Kharagpur