2 Statistical Estimation: Basic Concepts
|
|
- Gregory Chapman
- 5 years ago
- Views:
Transcription
1 Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation: Basic Concepts 2.1 Probability We briefly remind some basic notions and notations from probability theory that will be required in this chapter. The Probability Space: The basic object in probability theory is the probability space (Ω, F, P), where Ω is the sample space (with sample points ω Ω), F is the (sigma-field) of possible events B F, and P is a probability measure, giving the probabilty P(B) of each possible event. A (vector-valued) Random Variable (RV) is a mapping : Ω IR n. is also required to be measurable on (Ω, F), in the sense that 1 (A) F for any open (or Borel) set A in IR n. In this course we shall not eplicitly define the underlying probability space, but rather define the probability distributions of the RVs of interest. 1
2 Distribution and Density: For an RV : Ω IR n, the (cumulative) probability distribution function (cdf) is defined as F () = P( ) = P{ω : (ω) }, IR n. The probability density function (pdf), if it eists, is given by p () = The RV s ( 1,..., k ) are independent if (and similarly for their densities). Moments: The epected value (or mean) of : n F () 1... n. F 1,..., k ( 1,..., k ) = E() = More generally, for a real function g on IR n, The covariance matrices: E(g()) = K F k ( k ) k=1 IR n df (). IR n g() df (). cov() = E{( )( ) T } cov( 1, 2 ) = E{( 1 1 )( 2 2 ) T }. When is scalar then cov() is simply its variance. The RV s 1 and 2 are uncorrelated if cov( 1, 2 ) = 0. 2
3 Gaussian RVs: A (non-degenerate) Gaussian RV on IR n has the density f () = 1 (2π) n/2 det(σ) 1 e 1/2 2 ( m)t Σ 1 ( m). It follows that m = E(), Σ = cov(). We denote N(m, Σ). 1 and 2 are jointly Gaussian if the random vector ( 1 ; 2 ) is Gaussian. It holds that: 1. Gaussian all linear combinations i a i i are Gaussian. 2. Gaussian y = A is Gaussian. 3. 1, 2 jointly Gaussian and uncorrelated 1, 2 are independent. Conditioning: For two events A, B, with P(B) > 0, define: P(A B) = P(A B) P(B). The conditional distribution of given y: The conditional density: F y ( y) = P( y = y) p y ( y) =. = lim ɛ 0 P( y ɛ < y < y + ɛ). n F y ( y) = p y(, y) n p y (y) In the following we simply write p( y) etc. when no confusion arises. 3
4 Conditional Epectation: E( y = y) = IR n p( y) d. Obviously, this is a function of y : E( y = y) = g(y). Therefore, E( y) = g(y) is an RV, and a function of y. Basic properties: Smoothing: E(E( y)) = E(). Orthogonality principle: E([ E( y)] h(y)) = 0 for every scalar function h. E( y) = E() if and y are independent. Bayes Rule: p( y) = p(, y) p(y) = p(y )p(). p(y )p() d 4
5 2.2 The Estimation Problem The basic estimation problem is: Compute an estimate for an unknown quantity X = IR n, based on measurements y = (y 1,..., y m ) IR m. Obviously, we need a model that relates y to. For eample, y = h() + v where h is a known function, and v a noise (or error) vector. An estimator ˆ for is a function ˆ : y ˆ(y). The value of ˆ(y) at a specific observed value y is an estimate of. Under different statistical assumptions, we have the following major solution concepts: (i) Deterministic framework: Here we simply look for that minimizes the error in y h(). The most common criterion is the square norm: min y h() 2 = min m y i h i () 2. This is the well-known (non-linear) least-squares (LS) problem. i=1 5
6 (ii) Non-Bayesian framework: Assume that y is a random function of. For eample, y = h() + v, with v an RV. More generally, we are given, for each fied, the pdf p(y ) (i.e., y p( )). No statistical assumptions are made on. The main solution concept here is the MLE. (iii) Bayesian framework: Here we assume that both y and are RVs with known joint statistics. The main solution concepts here are the MAP estimator and the optimal (MMSE) estimator. A problem related to estimation is the regression problem: given measurements ( k, y k ) N k=1, find a function h that gives the best fit y k h( k ). h is the regressor, or regression function. We shall not consider this problem directly in this course. 6
7 2.3 The Bayes Framework In the Bayesian setting, we are given: (i) p () the prior distribution for. (ii) p y (y ) the conditional distribution of y given =. Note that p(y ) is often specified through an equation such as y = h(, v) or y = h() + v, with v an RV, but this is immaterial for the theory. We can now compute the posterior probability of : p( y) = p(y )p() p(y )p() d. Given p( y), what would be a reasonable choice for ˆ? The two common choices are: (i) The mean of according to p( y): ˆ(y) = E( y) p( y) d. (ii) The most likely value of according to p( y): ˆ(y) = arg ma p( y) The first leads to the MMSE estimator, the second to the MAP estimator. 7
8 2.4 The MMSE Estimator The Mean Square Error (MSE) of as estimator ˆ is given by MSE(ˆ) = E( ˆ(y) 2 ). The Minimial Mean Square Error (MMSE) estimator, ˆ MMSE, is the one that minimizes the MSE. Theorem: ˆ MMSE (y) = E( y = y). Remarks: 1. Recall that conditional epectation E( y) satisfies the orthogonality principle (see above). This gives an easy proof of the theorem. 2. The MMSE estimator is unbiased: E(ˆ MMSE (y)) = E(). 3. The posterior MSE is defined (for every y) as: MSE (ˆ y) = E( ˆ(y) 2 y = y). with minimal value MMSE(y). Note that ( ) MSE(ˆ) = E E( ˆ(y) 2 y) = MSE(ˆ y)p(y)dy. y Since MSE(ˆ y) can be minimizing for each y separately, it follows that minimizing the MSE is equivalent to minimizing the posterior MSE for every y. Some shortcomings of the MMSE estimator are: Hard to compute (ecept for special cases). 8
9 May be inappropriate for multi-modal distributions. Requires the prior p(), which may not be available. Eample: The Gaussian Case. Let and y be jointly Gaussian RVs with means E() = m, E(y) = m y, and covariance matri cov = Σ y Σ y Σ y Σ yy. By direct calculation, the posterior distribution p y=y is Gaussian, with mean m y = m + Σ y Σ 1 yy(y m y ), and covariance Σ y = Σ Σ y Σ 1 yyσ y. (If Σ 1 yy does not eist, it may be replaced by the pseudo-inverse.) Note that the posterior variance Σ y does not depend on the actual value y of y! It follows immediately that for the Gaussian case, ˆ MMSE (y) E( y = y) = m y, and the associated posterior MMSE equals MMSE(y) = E( ˆ MMSE (y) 2 y = y) = trace(σ y ). Note that here ˆ MMSE is a linear function of y. Also, the posterior MMSE does not depend on y. 9
10 2.5 The Linear MMSE Estimator When the MMSE is too complicated we may settle for the best linear estimator. Thus, we look for ˆ of the form: ˆ(y) = Ay + b that minimizes ) MSE (ˆ) = E ( ˆ(y) 2. The solution may be easily obtained by differentiation, and has eactly the same form as the MMSE estimator for the Gaussian case: ˆ L (y) = m + Σ y Σ 1 yy(y m y ). Note: The LMMSE estimator depends only on the first and second order statistics of and y. The linear MMSE does not minimize the posterior MSE, namely MSE (ˆ y). This holds only in the Gaussian case, where the LMMSE and MMSE estimators coincide. The orthogonality principle here is: E ( ( ˆ L (y)) L(y) T ) = 0, for every linear function L(y) = Ay + b of y. The LMMSE is unbiased: E(ˆ L (y)) = E(). 10
11 2.6 The MAP Estimator Still in the Bayesian setting, the MAP (Maimum a-posteriori) estimator is defined as Noting that ˆ MAP (y) = arg ma p( y) = p(, y) p(y) we obain the equivalent characterizations: ˆ MAP (y) = arg ma = arg ma p( y). = p()p(y ) p(y) p(, y), p()p(y ). Motivation: Find the value of which has the highest probability according to the posterior p( y). Eample: In the Gaussian case, with p( y) N(m y, Σ y ), we have: ˆ MAP (y) = arg ma Hence, ˆ MAP ˆ MMSE for this case. p( y) = m y E( y = y). 11
12 2.7 The ML Estimator The MLE is defined in a non-bayesian setting: No prior p() is given. In fact, need not be random. The distribution p(y ) of y given is given as before. The MLE is defined by: ˆ ML (y) = arg ma X p(y ). It is convenient to define the likelihood function L y () = p(y ) and the log-likelihood function Λ y () = log L y (), and then we have Note: ˆ ML (y) = arg ma X L y() arg ma X Λ y(). Often is denoted as θ in this contet. Motivation: The value of that makes y most likely. This justification is merely heuristic! Compared with the MAP estimator: ˆ MAP (y) = arg ma p()p(y ), we see that the MLE lacks the weighting of p(y ) by p(). The power of the MLE lies in: its simplicity good asymptotic behavior. 12
13 Eample 1: y is eponentially distributed with rate > 0, namely = E(y) 1. Thus: F (y ) = (1 e y ) 1 {y 0} p y (y) = e y 1 {y 0} ˆ ML (y) = arg ma 0 e y d d ( e y ) = 0 = y 1 ˆ ML (y) = y 1. Eample 2 (Gaussian case): y = H + v (y IR m, IR n ) v N(0, R v ) L y () = p(y ) = 1 1 c e 2 (y H)T Rv 1 (y H) log L y () = c (y H)T Rv 1 (y H) ˆ ML = arg min (y H) T Rv 1 (y H). This is a (weighted) LS problem! By differentiation, H T R 1 v (y H) = 0, ˆ ML = (H T R 1 v H) 1 H T R 1 y (assuming that H T Rv 1 H is invertible: in particular, m n). v 13
14 2.8 Bias and Covariance Since the measurement y is random, the estimate ˆ = ˆ(y) is a random variable, and we can relate to its mean and variance. The conditional mean of ˆ is given by ˆm() = E(ˆ ) E(ˆ = ) = ˆ(y) p(y ) dy The bias ˆ is defined as b() = E(ˆ ). The of estimator ˆ is (conditionally) unbiased if b() = 0 for every X. The covariance matri of ˆ is, cov(ˆ ) = E((ˆ E(ˆ ))(ˆ E(ˆ ) = ) In the scalar case, it follows by orthogonality that MSE(ˆ ) E(( ˆ) 2 ) = E(( E(ˆ ) + E(ˆ ) ˆ) 2 ) = cov(ˆ ) + b() 2. Thus, if ˆ is conditionally unbiased, MSE(ˆ ) = cov(ˆ ). Similarly, if is vector-valued, then MSE(ˆ ) = trace(cov(ˆ )) + b() 2. In the Bayesian case, we say that ˆ is unbiased if E(ˆ(y)) = E(). Note that the first epectation is both over and y. 14
15 2.9 The Cramer-Rao Lower Bound (CRLB) The CRLB gives a lower bound on the MSE of any (unbiased) estimator. For illustration, we mention here the non-bayesian version, with a scalar parameter. Assume that ˆ is conditionally unbiased, namely E (ˆ(y)) =. (We use here E ( ) for E( X = )). Then MSE(ˆ ) = E {(ˆ(y) ) 2 } J() 1, where J is the Fisher information: { } J() = 2 ln p(y ) E { 2 ( ) } 2 ln p(y ) = E. An (unbiased) estimator that meets the above CRLB is said to be efficient. 15
16 2.10 Asymptotic Properties of the MLE Suppose is estimated based on multiple i.i.d. samples: y = y n = (y 1,..., y n ), with p(y n ) = n p 0 (y i ). i=1 For each n 1, let ˆ n denote an estimator based on y n. For eample, ˆ n = ˆ n ML. We consider the asymptotic properties of {ˆ n }, as n. Definitions: The (non-bayesian) estimator sequence {ˆ (n) } is termed: Consistent if: lim n ˆ n (y n ) = (w.p. 1). Asymptotically unbiased if: lim n E (ˆ n (y n )) =. Asymptotically efficient if it satisfies the CRLB for n, in the sense that: lim J n () MSE (ˆ n ) = 1. n Here MSE( n ) = E (ˆ n (y n ) ) 2 ), and J n is the Fisher information for y n. For i.i.d. observations, J n = nj (1). The ML Estimator ˆ n ML is both asymptotically unbiased and asymptotically efficient (under mild technical conditions). 16
Basic concepts in estimation
Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationEstimators as Random Variables
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until
More information4 Derivations of the Discrete-Time Kalman Filter
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof N Shimkin 4 Derivations of the Discrete-Time
More informationCh. 12 Linear Bayesian Estimators
Ch. 1 Linear Bayesian Estimators 1 In chapter 11 we saw: the MMSE estimator takes a simple form when and are jointly Gaussian it is linear and used only the 1 st and nd order moments (means and covariances).
More informationAppendix A : Introduction to Probability and stochastic processes
A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of
More information9 Multi-Model State Estimation
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 9 Multi-Model State
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26. Estimation: Regression and Least Squares
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 26 Estimation: Regression and Least Squares This note explains how to use observations to estimate unobserved random variables.
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationVariations. ECE 6540, Lecture 10 Maximum Likelihood Estimation
Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter
More informationEstimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction
Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationEstimation techniques
Estimation techniques March 2, 2006 Contents 1 Problem Statement 2 2 Bayesian Estimation Techniques 2 2.1 Minimum Mean Squared Error (MMSE) estimation........................ 2 2.1.1 General formulation......................................
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLecture Note 2: Estimation and Information Theory
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 2: Estimation and Information Theory Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 2.1 Estimation A static estimation problem
More informationLecture 2: CDF and EDF
STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all
More informationStochastic Processes. Review of Elementary Probability Lecture I. Hamid R. Rabiee Ali Jalali
Stochastic Processes Review o Elementary Probability bili Lecture I Hamid R. Rabiee Ali Jalali Outline History/Philosophy Random Variables Density/Distribution Functions Joint/Conditional Distributions
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans
More information6. Vector Random Variables
6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationLecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN
Lecture Notes 5 Convergence and Limit Theorems Motivation Convergence with Probability Convergence in Mean Square Convergence in Probability, WLLN Convergence in Distribution, CLT EE 278: Convergence and
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationLINEAR MMSE ESTIMATION
LINEAR MMSE ESTIMATION TERM PAPER FOR EE 602 STATISTICAL SIGNAL PROCESSING By, DHEERAJ KUMAR VARUN KHAITAN 1 Introduction Linear MMSE estimators are chosen in practice because they are simpler than the
More informationELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)
1 ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Minimum Variance Unbiased Estimators (MVUE)
More informationLecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)
Lecture Notes 2 Random Variables Definition Discrete Random Variables: Probability mass function (pmf) Continuous Random Variables: Probability density function (pdf) Mean and Variance Cumulative Distribution
More informationReview Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the
Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans
More informationECE531 Lecture 8: Non-Random Parameter Estimation
ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction
More informationIntroduction to Estimation Methods for Time Series models Lecture 2
Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationChapter 5,6 Multiple RandomVariables
Chapter 5,6 Multiple RandomVariables ENCS66 - Probabilityand Stochastic Processes Concordia University Vector RandomVariables A vector r.v. is a function where is the sample space of a random experiment.
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM
c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With
More informationL2: Review of probability and statistics
Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationCramér-Rao Bounds for Estimation of Linear System Noise Covariances
Journal of Mechanical Engineering and Automation (): 6- DOI: 593/jjmea Cramér-Rao Bounds for Estimation of Linear System oise Covariances Peter Matiso * Vladimír Havlena Czech echnical University in Prague
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationAdvanced Signal Processing Introduction to Estimation Theory
Advanced Signal Processing Introduction to Estimation Theory Danilo Mandic, room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,
More informationSystem Identification
System Identification Lecture : Statistical properties of parameter estimators, Instrumental variable methods Roy Smith 8--8. 8--8. Statistical basis for estimation methods Parametrised models: G Gp, zq,
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationShort course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda
Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis Lecture Recalls of probability theory Massimo Piccardi University of Technology, Sydney,
More informationMeasure-theoretic probability
Measure-theoretic probability Koltay L. VEGTMAM144B November 28, 2012 (VEGTMAM144B) Measure-theoretic probability November 28, 2012 1 / 27 The probability space De nition The (Ω, A, P) measure space is
More informationECE 275A Homework 6 Solutions
ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationSTATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN
Massimo Guidolin Massimo.Guidolin@unibocconi.it Dept. of Finance STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN SECOND PART, LECTURE 2: MODES OF CONVERGENCE AND POINT ESTIMATION Lecture 2:
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationProbability on a Riemannian Manifold
Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and
More informationt x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.
Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your
More informationMath 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.
Math 8A Lecture 6 Friday May 7 th Epectation Recall the three main probability density functions so far () Uniform () Eponential (3) Power Law e, ( ), Math 8A Lecture 6 Friday May 7 th Epectation Eample
More informationSummary of EE226a: MMSE and LLSE
1 Summary of EE226a: MMSE and LLSE Jean Walrand Here are the key ideas and results: I. SUMMARY The MMSE of given Y is E[ Y]. The LLSE of given Y is L[ Y] = E() + K,Y K 1 Y (Y E(Y)) if K Y is nonsingular.
More informationA Probability Review
A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in
More informationThe Joint MAP-ML Criterion and its Relation to ML and to Extended Least-Squares
3484 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 12, DECEMBER 2000 The Joint MAP-ML Criterion and its Relation to ML and to Extended Least-Squares Arie Yeredor, Member, IEEE Abstract The joint
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationf-domain expression for the limit model Combine: 5.12 Approximate Modelling What can be said about H(q, θ) G(q, θ ) H(q, θ ) with
5.2 Approximate Modelling What can be said about if S / M, and even G / G? G(q, ) H(q, ) f-domain expression for the limit model Combine: with ε(t, ) =H(q, ) [y(t) G(q, )u(t)] y(t) =G (q)u(t) v(t) We know
More informationWhere now? Machine Learning and Bayesian Inference
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone etension 67 Email: sbh@clcamacuk wwwclcamacuk/ sbh/ Where now? There are some simple take-home messages from
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationReview: mostly probability and some statistics
Review: mostly probability and some statistics C2 1 Content robability (should know already) Axioms and properties Conditional probability and independence Law of Total probability and Bayes theorem Random
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationParameter Estimation
Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y
More informationSGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection
SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28
More informationGaussians. Hiroshi Shimodaira. January-March Continuous random variables
Cumulative Distribution Function Gaussians Hiroshi Shimodaira January-March 9 In this chapter we introduce the basics of how to build probabilistic models of continuous-valued data, including the most
More informationModule 2. Random Processes. Version 2, ECE IIT, Kharagpur
Module Random Processes Version, ECE IIT, Kharagpur Lesson 9 Introduction to Statistical Signal Processing Version, ECE IIT, Kharagpur After reading this lesson, you will learn about Hypotheses testing
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationCSCI-6971 Lecture Notes: Probability theory
CSCI-6971 Lecture Notes: Probability theory Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu January 31, 2006 1 Properties of probabilities Let, A,
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationBias. Definition. One of the foremost expectations of an estimator is that it gives accurate estimates.
Bias One of the foremost expectations of an estimator is that it gives accurate estimates. Definition An estimator ˆ is said to be accurate or unbiased if and only if µˆ = E(ˆ ) = 0 (19) In plain language,
More informationMODULE 6 LECTURE NOTES 1 REVIEW OF PROBABILITY THEORY. Most water resources decision problems face the risk of uncertainty mainly because of the
MODULE 6 LECTURE NOTES REVIEW OF PROBABILITY THEORY INTRODUCTION Most water resources decision problems ace the risk o uncertainty mainly because o the randomness o the variables that inluence the perormance
More informationLecture 3: Basic Statistical Tools. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013
Lecture 3: Basic Statistical Tools Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013 1 Basic probability Events are possible outcomes from some random process e.g., a genotype is AA, a phenotype
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More information4. Distributions of Functions of Random Variables
4. Distributions of Functions of Random Variables Setup: Consider as given the joint distribution of X 1,..., X n (i.e. consider as given f X1,...,X n and F X1,...,X n ) Consider k functions g 1 : R n
More information10. Joint Moments and Joint Characteristic Functions
10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent the inormation contained in the joint p.d. o two r.vs.
More information10. Linear Models and Maximum Likelihood Estimation
10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that
More informationCSE 559A: Computer Vision
CSE 559A: Computer Vision Fall 208: T-R: :30-pm @ Lopata 0 Instructor: Ayan Chakrabarti (ayan@wustl.edu). Course Staff: Zhihao ia, Charlie Wu, Han Liu http://www.cse.wustl.edu/~ayan/courses/cse559a/ Sep
More informationEconomics 620, Lecture 5: exp
1 Economics 620, Lecture 5: The K-Variable Linear Model II Third assumption (Normality): y; q(x; 2 I N ) 1 ) p(y) = (2 2 ) exp (N=2) 1 2 2(y X)0 (y X) where N is the sample size. The log likelihood function
More informationLecture 11. Probability Theory: an Overveiw
Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationCSE 559A: Computer Vision Tomorrow Zhihao's Office Hours back in Jolley 309: 10:30am-Noon
CSE 559A: Computer Vision ADMINISTRIVIA Tomorrow Zhihao's Office Hours back in Jolley 309: 0:30am-Noon Fall 08: T-R: :30-pm @ Lopata 0 This Friday: Regular Office Hours Net Friday: Recitation for PSET
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationContinuous Random Variables
1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables
More informationPart 4: Multi-parameter and normal models
Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,
More informationLecture 5 September 19
IFT 6269: Probabilistic Graphical Models Fall 2016 Lecture 5 September 19 Lecturer: Simon Lacoste-Julien Scribe: Sébastien Lachapelle Disclaimer: These notes have only been lightly proofread. 5.1 Statistical
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More information2. A Basic Statistical Toolbox
. A Basic Statistical Toolbo Statistics is a mathematical science pertaining to the collection, analysis, interpretation, and presentation of data. Wikipedia definition Mathematical statistics: concerned
More informationEconomics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation
Economics 620, Lecture 9: Asymptotics III: Maximum Likelihood Estimation Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 9: Asymptotics III(MLE) 1 / 20 Jensen
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationDetection & Estimation Lecture 1
Detection & Estimation Lecture 1 Intro, MVUE, CRLB Xiliang Luo General Course Information Textbooks & References Fundamentals of Statistical Signal Processing: Estimation Theory/Detection Theory, Steven
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2
MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is
More informationEstimation and Detection
stimation and Detection Lecture 2: Cramér-Rao Lower Bound Dr. ir. Richard C. Hendriks & Dr. Sundeep P. Chepuri 7//207 Remember: Introductory xample Given a process (DC in noise): x[n]=a + w[n], n=0,,n,
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More information