2 Statistical Estimation: Basic Concepts

Size: px
Start display at page:

Download "2 Statistical Estimation: Basic Concepts"

Transcription

1 Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation: Basic Concepts 2.1 Probability We briefly remind some basic notions and notations from probability theory that will be required in this chapter. The Probability Space: The basic object in probability theory is the probability space (Ω, F, P), where Ω is the sample space (with sample points ω Ω), F is the (sigma-field) of possible events B F, and P is a probability measure, giving the probabilty P(B) of each possible event. A (vector-valued) Random Variable (RV) is a mapping : Ω IR n. is also required to be measurable on (Ω, F), in the sense that 1 (A) F for any open (or Borel) set A in IR n. In this course we shall not eplicitly define the underlying probability space, but rather define the probability distributions of the RVs of interest. 1

2 Distribution and Density: For an RV : Ω IR n, the (cumulative) probability distribution function (cdf) is defined as F () = P( ) = P{ω : (ω) }, IR n. The probability density function (pdf), if it eists, is given by p () = The RV s ( 1,..., k ) are independent if (and similarly for their densities). Moments: The epected value (or mean) of : n F () 1... n. F 1,..., k ( 1,..., k ) = E() = More generally, for a real function g on IR n, The covariance matrices: E(g()) = K F k ( k ) k=1 IR n df (). IR n g() df (). cov() = E{( )( ) T } cov( 1, 2 ) = E{( 1 1 )( 2 2 ) T }. When is scalar then cov() is simply its variance. The RV s 1 and 2 are uncorrelated if cov( 1, 2 ) = 0. 2

3 Gaussian RVs: A (non-degenerate) Gaussian RV on IR n has the density f () = 1 (2π) n/2 det(σ) 1 e 1/2 2 ( m)t Σ 1 ( m). It follows that m = E(), Σ = cov(). We denote N(m, Σ). 1 and 2 are jointly Gaussian if the random vector ( 1 ; 2 ) is Gaussian. It holds that: 1. Gaussian all linear combinations i a i i are Gaussian. 2. Gaussian y = A is Gaussian. 3. 1, 2 jointly Gaussian and uncorrelated 1, 2 are independent. Conditioning: For two events A, B, with P(B) > 0, define: P(A B) = P(A B) P(B). The conditional distribution of given y: The conditional density: F y ( y) = P( y = y) p y ( y) =. = lim ɛ 0 P( y ɛ < y < y + ɛ). n F y ( y) = p y(, y) n p y (y) In the following we simply write p( y) etc. when no confusion arises. 3

4 Conditional Epectation: E( y = y) = IR n p( y) d. Obviously, this is a function of y : E( y = y) = g(y). Therefore, E( y) = g(y) is an RV, and a function of y. Basic properties: Smoothing: E(E( y)) = E(). Orthogonality principle: E([ E( y)] h(y)) = 0 for every scalar function h. E( y) = E() if and y are independent. Bayes Rule: p( y) = p(, y) p(y) = p(y )p(). p(y )p() d 4

5 2.2 The Estimation Problem The basic estimation problem is: Compute an estimate for an unknown quantity X = IR n, based on measurements y = (y 1,..., y m ) IR m. Obviously, we need a model that relates y to. For eample, y = h() + v where h is a known function, and v a noise (or error) vector. An estimator ˆ for is a function ˆ : y ˆ(y). The value of ˆ(y) at a specific observed value y is an estimate of. Under different statistical assumptions, we have the following major solution concepts: (i) Deterministic framework: Here we simply look for that minimizes the error in y h(). The most common criterion is the square norm: min y h() 2 = min m y i h i () 2. This is the well-known (non-linear) least-squares (LS) problem. i=1 5

6 (ii) Non-Bayesian framework: Assume that y is a random function of. For eample, y = h() + v, with v an RV. More generally, we are given, for each fied, the pdf p(y ) (i.e., y p( )). No statistical assumptions are made on. The main solution concept here is the MLE. (iii) Bayesian framework: Here we assume that both y and are RVs with known joint statistics. The main solution concepts here are the MAP estimator and the optimal (MMSE) estimator. A problem related to estimation is the regression problem: given measurements ( k, y k ) N k=1, find a function h that gives the best fit y k h( k ). h is the regressor, or regression function. We shall not consider this problem directly in this course. 6

7 2.3 The Bayes Framework In the Bayesian setting, we are given: (i) p () the prior distribution for. (ii) p y (y ) the conditional distribution of y given =. Note that p(y ) is often specified through an equation such as y = h(, v) or y = h() + v, with v an RV, but this is immaterial for the theory. We can now compute the posterior probability of : p( y) = p(y )p() p(y )p() d. Given p( y), what would be a reasonable choice for ˆ? The two common choices are: (i) The mean of according to p( y): ˆ(y) = E( y) p( y) d. (ii) The most likely value of according to p( y): ˆ(y) = arg ma p( y) The first leads to the MMSE estimator, the second to the MAP estimator. 7

8 2.4 The MMSE Estimator The Mean Square Error (MSE) of as estimator ˆ is given by MSE(ˆ) = E( ˆ(y) 2 ). The Minimial Mean Square Error (MMSE) estimator, ˆ MMSE, is the one that minimizes the MSE. Theorem: ˆ MMSE (y) = E( y = y). Remarks: 1. Recall that conditional epectation E( y) satisfies the orthogonality principle (see above). This gives an easy proof of the theorem. 2. The MMSE estimator is unbiased: E(ˆ MMSE (y)) = E(). 3. The posterior MSE is defined (for every y) as: MSE (ˆ y) = E( ˆ(y) 2 y = y). with minimal value MMSE(y). Note that ( ) MSE(ˆ) = E E( ˆ(y) 2 y) = MSE(ˆ y)p(y)dy. y Since MSE(ˆ y) can be minimizing for each y separately, it follows that minimizing the MSE is equivalent to minimizing the posterior MSE for every y. Some shortcomings of the MMSE estimator are: Hard to compute (ecept for special cases). 8

9 May be inappropriate for multi-modal distributions. Requires the prior p(), which may not be available. Eample: The Gaussian Case. Let and y be jointly Gaussian RVs with means E() = m, E(y) = m y, and covariance matri cov = Σ y Σ y Σ y Σ yy. By direct calculation, the posterior distribution p y=y is Gaussian, with mean m y = m + Σ y Σ 1 yy(y m y ), and covariance Σ y = Σ Σ y Σ 1 yyσ y. (If Σ 1 yy does not eist, it may be replaced by the pseudo-inverse.) Note that the posterior variance Σ y does not depend on the actual value y of y! It follows immediately that for the Gaussian case, ˆ MMSE (y) E( y = y) = m y, and the associated posterior MMSE equals MMSE(y) = E( ˆ MMSE (y) 2 y = y) = trace(σ y ). Note that here ˆ MMSE is a linear function of y. Also, the posterior MMSE does not depend on y. 9

10 2.5 The Linear MMSE Estimator When the MMSE is too complicated we may settle for the best linear estimator. Thus, we look for ˆ of the form: ˆ(y) = Ay + b that minimizes ) MSE (ˆ) = E ( ˆ(y) 2. The solution may be easily obtained by differentiation, and has eactly the same form as the MMSE estimator for the Gaussian case: ˆ L (y) = m + Σ y Σ 1 yy(y m y ). Note: The LMMSE estimator depends only on the first and second order statistics of and y. The linear MMSE does not minimize the posterior MSE, namely MSE (ˆ y). This holds only in the Gaussian case, where the LMMSE and MMSE estimators coincide. The orthogonality principle here is: E ( ( ˆ L (y)) L(y) T ) = 0, for every linear function L(y) = Ay + b of y. The LMMSE is unbiased: E(ˆ L (y)) = E(). 10

11 2.6 The MAP Estimator Still in the Bayesian setting, the MAP (Maimum a-posteriori) estimator is defined as Noting that ˆ MAP (y) = arg ma p( y) = p(, y) p(y) we obain the equivalent characterizations: ˆ MAP (y) = arg ma = arg ma p( y). = p()p(y ) p(y) p(, y), p()p(y ). Motivation: Find the value of which has the highest probability according to the posterior p( y). Eample: In the Gaussian case, with p( y) N(m y, Σ y ), we have: ˆ MAP (y) = arg ma Hence, ˆ MAP ˆ MMSE for this case. p( y) = m y E( y = y). 11

12 2.7 The ML Estimator The MLE is defined in a non-bayesian setting: No prior p() is given. In fact, need not be random. The distribution p(y ) of y given is given as before. The MLE is defined by: ˆ ML (y) = arg ma X p(y ). It is convenient to define the likelihood function L y () = p(y ) and the log-likelihood function Λ y () = log L y (), and then we have Note: ˆ ML (y) = arg ma X L y() arg ma X Λ y(). Often is denoted as θ in this contet. Motivation: The value of that makes y most likely. This justification is merely heuristic! Compared with the MAP estimator: ˆ MAP (y) = arg ma p()p(y ), we see that the MLE lacks the weighting of p(y ) by p(). The power of the MLE lies in: its simplicity good asymptotic behavior. 12

13 Eample 1: y is eponentially distributed with rate > 0, namely = E(y) 1. Thus: F (y ) = (1 e y ) 1 {y 0} p y (y) = e y 1 {y 0} ˆ ML (y) = arg ma 0 e y d d ( e y ) = 0 = y 1 ˆ ML (y) = y 1. Eample 2 (Gaussian case): y = H + v (y IR m, IR n ) v N(0, R v ) L y () = p(y ) = 1 1 c e 2 (y H)T Rv 1 (y H) log L y () = c (y H)T Rv 1 (y H) ˆ ML = arg min (y H) T Rv 1 (y H). This is a (weighted) LS problem! By differentiation, H T R 1 v (y H) = 0, ˆ ML = (H T R 1 v H) 1 H T R 1 y (assuming that H T Rv 1 H is invertible: in particular, m n). v 13

14 2.8 Bias and Covariance Since the measurement y is random, the estimate ˆ = ˆ(y) is a random variable, and we can relate to its mean and variance. The conditional mean of ˆ is given by ˆm() = E(ˆ ) E(ˆ = ) = ˆ(y) p(y ) dy The bias ˆ is defined as b() = E(ˆ ). The of estimator ˆ is (conditionally) unbiased if b() = 0 for every X. The covariance matri of ˆ is, cov(ˆ ) = E((ˆ E(ˆ ))(ˆ E(ˆ ) = ) In the scalar case, it follows by orthogonality that MSE(ˆ ) E(( ˆ) 2 ) = E(( E(ˆ ) + E(ˆ ) ˆ) 2 ) = cov(ˆ ) + b() 2. Thus, if ˆ is conditionally unbiased, MSE(ˆ ) = cov(ˆ ). Similarly, if is vector-valued, then MSE(ˆ ) = trace(cov(ˆ )) + b() 2. In the Bayesian case, we say that ˆ is unbiased if E(ˆ(y)) = E(). Note that the first epectation is both over and y. 14

15 2.9 The Cramer-Rao Lower Bound (CRLB) The CRLB gives a lower bound on the MSE of any (unbiased) estimator. For illustration, we mention here the non-bayesian version, with a scalar parameter. Assume that ˆ is conditionally unbiased, namely E (ˆ(y)) =. (We use here E ( ) for E( X = )). Then MSE(ˆ ) = E {(ˆ(y) ) 2 } J() 1, where J is the Fisher information: { } J() = 2 ln p(y ) E { 2 ( ) } 2 ln p(y ) = E. An (unbiased) estimator that meets the above CRLB is said to be efficient. 15

16 2.10 Asymptotic Properties of the MLE Suppose is estimated based on multiple i.i.d. samples: y = y n = (y 1,..., y n ), with p(y n ) = n p 0 (y i ). i=1 For each n 1, let ˆ n denote an estimator based on y n. For eample, ˆ n = ˆ n ML. We consider the asymptotic properties of {ˆ n }, as n. Definitions: The (non-bayesian) estimator sequence {ˆ (n) } is termed: Consistent if: lim n ˆ n (y n ) = (w.p. 1). Asymptotically unbiased if: lim n E (ˆ n (y n )) =. Asymptotically efficient if it satisfies the CRLB for n, in the sense that: lim J n () MSE (ˆ n ) = 1. n Here MSE( n ) = E (ˆ n (y n ) ) 2 ), and J n is the Fisher information for y n. For i.i.d. observations, J n = nj (1). The ML Estimator ˆ n ML is both asymptotically unbiased and asymptotically efficient (under mild technical conditions). 16

Basic concepts in estimation

Basic concepts in estimation Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Estimators as Random Variables

Estimators as Random Variables Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until

More information

4 Derivations of the Discrete-Time Kalman Filter

4 Derivations of the Discrete-Time Kalman Filter Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time

More information

Ch. 12 Linear Bayesian Estimators

Ch. 12 Linear Bayesian Estimators Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

9 Multi-Model State Estimation

9 Multi-Model State Estimation Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Estimation techniques

Estimation techniques Estimation techniques March 2, 2006 Contents 1 Problem Statement 2 2 Bayesian Estimation Techniques 2 2.1 Minimum Mean Squared Error (MMSE) estimation........................ 2 2.1.1 General formulation......................................

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lecture Note 2: Estimation and Information Theory

Lecture Note 2: Estimation and Information Theory Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 2: Estimation and Information Theory Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 2.1 Estimation A static estimation problem

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

Stochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali

Stochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali Stochastic Processes Review o Elementary Probability bili Lecture I Hamid R. Rabiee Ali Jalali Outline History/Philosophy Random Variables Density/Distribution Functions Joint/Conditional Distributions

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans

More information

6. Vector Random Variables

6. Vector Random Variables 6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

LINEAR MMSE ESTIMATION

LINEAR MMSE ESTIMATION LINEAR MMSE ESTIMATION TERM PAPER FOR EE 602 STATISTICAL SIGNAL PROCESSING By, DHEERAJ KUMAR VARUN KHAITAN 1 Introduction Linear MMSE estimators are chosen in practice because they are simpler than the

More information

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) 1 ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Minimum Variance Unbiased Estimators (MVUE)

More information

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf) Lecture Notes 2 Random Variables Definition Discrete Random Variables: Probability mass function (pmf) Continuous Random Variables: Probability density function (pdf) Mean and Variance Cumulative Distribution

More information

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Chapter 5,6 Multiple RandomVariables

Chapter 5,6 Multiple RandomVariables Chapter 5,6 Multiple RandomVariables ENCS66 - Probabilityand Stochastic Processes Concordia University Vector RandomVariables A vector r.v. is a function where is the sample space of a random experiment.

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

L2: Review of probability and statistics

L2: Review of probability and statistics Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances

Cramér-Rao Bounds for Estimation of Linear System Noise Covariances Journal of Mechanical Engineering and Automation (): 6- DOI: 593/jjmea Cramér-Rao Bounds for Estimation of Linear System oise Covariances Peter Matiso * Vladimír Havlena Czech echnical University in Prague

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Advanced Signal Processing Introduction to Estimation Theory

Advanced Signal Processing Introduction to Estimation Theory Advanced Signal Processing Introduction to Estimation Theory Danilo Mandic, room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,

More information

System Identification

System Identification System Identification Lecture : Statistical properties of parameter estimators, Instrumental variable methods Roy Smith 8--8. 8--8. Statistical basis for estimation methods Parametrised models: G Gp, zq,

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis Lecture Recalls of probability theory Massimo Piccardi University of Technology, Sydney,

More information

Measure-theoretic probability

Measure-theoretic probability Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Probability on a Riemannian Manifold

Probability on a Riemannian Manifold Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your

More information

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential. Math 8A Lecture 6 Friday May 7 th Epectation Recall the three main probability density functions so far () Uniform () Eponential (3) Power Law e, ( ), Math 8A Lecture 6 Friday May 7 th Epectation Eample

More information

Summary of EE226a: MMSE and LLSE

Summary of EE226a: MMSE and LLSE 1 Summary of EE226a: MMSE and LLSE Jean Walrand Here are the key ideas and results: I. SUMMARY The MMSE of given Y is E[ Y]. The LLSE of given Y is L[ Y] = E() + K,Y K 1 Y (Y E(Y)) if K Y is nonsingular.

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information

The Joint MAP-ML Criterion and its Relation to ML and to Extended Least-Squares

The Joint MAP-ML Criterion and its Relation to ML and to Extended Least-Squares 3484 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 12, DECEMBER 2000 The Joint MAP-ML Criterion and its Relation to ML and to Extended Least-Squares Arie Yeredor, Member, IEEE Abstract The joint

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

f-domain expression for the limit model Combine: 5.12 Approximate Modelling What can be said about H(q, θ) G(q, θ ) H(q, θ ) with

f-domain expression for the limit model Combine: 5.12 Approximate Modelling What can be said about H(q, θ) G(q, θ ) H(q, θ ) with 5.2 Approximate Modelling What can be said about if S / M, and even G / G? G(q, ) H(q, ) f-domain expression for the limit model Combine: with ε(t, ) =H(q, ) [y(t) G(q, )u(t)] y(t) =G (q)u(t) v(t) We know

More information

Where now? Machine Learning and Bayesian Inference

Where now? Machine Learning and Bayesian Inference Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Review: mostly probability and some statistics

Review: mostly probability and some statistics Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y

More information

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28

More information

Gaussians. Hiroshi Shimodaira. January-March Continuous random variables

Gaussians. Hiroshi Shimodaira. January-March Continuous random variables Cumulative Distribution Function Gaussians Hiroshi Shimodaira January-March 9 In this chapter we introduce the basics of how to build probabilistic models of continuous-valued data, including the most

More information

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur Module Random Processes Version, ECE IIT, Kharagpur Lesson 9 Introduction to Statistical Signal Processing Version, ECE IIT, Kharagpur After reading this lesson, you will learn about Hypotheses testing

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

CSCI-6971 Lecture Notes: Probability theory

CSCI-6971 Lecture Notes: Probability theory CSCI-6971 Lecture Notes: Probability theory Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu January 31, 2006 1 Properties of probabilities Let, A,

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Bias. Definition. One of the foremost expectations of an estimator is that it gives accurate estimates.

Bias. Definition. One of the foremost expectations of an estimator is that it gives accurate estimates. Bias One of the foremost expectations of an estimator is that it gives accurate estimates. Definition An estimator ˆ is said to be accurate or unbiased if and only if µˆ = E(ˆ ) = 0 (19) In plain language,

More information

MODULE 6 LECTURE NOTES 1 REVIEW OF PROBABILITY THEORY. Most water resources decision problems face the risk of uncertainty mainly because of the

MODULE 6 LECTURE NOTES 1 REVIEW OF PROBABILITY THEORY. Most water resources decision problems face the risk of uncertainty mainly because of the MODULE 6 LECTURE NOTES REVIEW OF PROBABILITY THEORY INTRODUCTION Most water resources decision problems ace the risk o uncertainty mainly because o the randomness o the variables that inluence the perormance

More information

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Lecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 Lecture 3: Basic Statistical Tools Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 1 Basic probability Events are possible outcomes from some random process e.g., a genotype is AA, a phenotype

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

4. Distributions of Functions of Random Variables

4. Distributions of Functions of Random Variables 4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n

More information

10. Joint Moments and Joint Characteristic Functions

10. Joint Moments and Joint Characteristic Functions 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent the inormation contained in the joint p.d. o two r.vs.

More information

10. Linear Models and Maximum Likelihood Estimation

10. Linear Models and Maximum Likelihood Estimation 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that

More information

CSE 559A: Computer Vision

CSE 559A: Computer Vision CSE 559A: Computer Vision Fall 208: T-R: :30-pm @ Lopata 0 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao ia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/ Sep

More information

Economics 620, Lecture 5: exp

Economics 620, Lecture 5: exp 1 Economics 620, Lecture 5: The K-Variable Linear Model II Third assumption (Normality): y; q(x; 2 I N ) 1 ) p(y) = (2 2 ) exp (N=2) 1 2 2(y X)0 (y X) where N is the sample size. The log likelihood function

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

CSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30am-Noon

CSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30am-Noon CSE 559A: Computer Vision ADMINISTRIVIA Tomorrow Zhihao's Office Hours back in Jolley 309: 0:30am-Noon Fall 08: T-R: :30-pm @ Lopata 0 This Friday: Regular Office Hours Net Friday: Recitation for PSET

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Lecture 5 September 19

Lecture 5 September 19 IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

2. A Basic Statistical Toolbox

2. A Basic Statistical Toolbox . A Basic Statistical Toolbo Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Wikipedia definition Mathematical statistics: concerned

More information

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation

Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 9: Asymptotics III(MLE) 1 / 20 Jensen

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Detection & Estimation Lecture 1

Detection & Estimation Lecture 1 Detection & Estimation Lecture 1 Intro, MVUE, CRLB Xiliang Luo General Course Information Textbooks & References Fundamentals of Statistical Signal Processing: Estimation Theory/Detection Theory, Steven

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Estimation and Detection

Estimation and Detection stimation and Detection Lecture 2: Cramér-Rao Lower Bound Dr. ir. Richard C. Hendriks & Dr. Sundeep P. Chepuri 7//207 Remember: Introductory xample Given a process (DC in noise): x[n]=a + w[n], n=0,,n,

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information