The problem is to infer on the underlying probability distribution that gives rise to the data S.

Size: px
Start display at page:

Download "The problem is to infer on the underlying probability distribution that gives rise to the data S."

Transcription

1 Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to the data S. Statistical modeling Statistical analysis. Computational Methods in Inverse Problems, Mat

2 Parametric or non-parametric? Parametric problem: The underlying probability density has a specified form and depends on a number of parameters. The problem is to infer on those parameters. Non-parametric problem: No analytic expression for the probability density is available. Description consists of defining the dependency/nondependency of the data. Numerical exploration. Typical situation for parametric model: The distribution is the probability density of a random variable X : Ω R n. Parametric problem suitable for inverse problems Model for a learning process Computational Methods in Inverse Problems, Mat

3 Law of Large Numbers General result ( Statistical law of nature ): Assume that X 1, X 2,... are independent and identically distributed random variables with finite mean µ and variance σ 2. Then, almost certainly. lim n 1 ( ) X1 + X X n = µ n Almost certainly means that with probability one, lim n x j being a realization of X j. 1 ( ) x1 + x x n = µ, n Computational Methods in Inverse Problems, Mat

4 Sample Parametric model: x j realizations of Example S = { x 1, x 2,..., x N }, xj R 2. X N (x 0, Γ), with unknown mean x 0 R 2 and covariance matrix Γ R 2 2. Probability density of X: π(x x 0, Γ) = 1 exp 2πdet(Γ) 1/2 Problem: Estimate the parameters x 0 and Γ. ( 1 ) 2 (x x 0) T Γ 1 (x x 0 ). Computational Methods in Inverse Problems, Mat

5 The Law of Large Number suggests that we calculate x 0 = E { X } 1 n n x j = x 0. (1) Covariance matrix: observe that if X 1, X 2,... are i.i.d, so are f(x 1 ), f(x 2 ),... for any function f : R 2 R k. Try Γ = cov(x) = E { (X x 0 )(X x 0 ) T} E { (X x 0 )(X x 0 ) T} (2) 1 n n (x j x 0 )(x j x 0 ) T = Γ. Formulas (1) and (2) are known as empirical mean and covariance, respectively. Computational Methods in Inverse Problems, Mat

6 Case 1: Gaussian sample Computational Methods in Inverse Problems, Mat

7 Sample size N = 200. Eigenvectors of the covariance matrix: Γ = UDU T, (3) where U R 2 2 is an orthogonal matrix and D R 2 2 is a diagonal, U T = U 1. U = [ [ ] ] λ1 v 1 v 2, D =, λ 2 Γv j = λ j v,, j = 1, 2. Scaled eigenvectors, v j,scaled = 2 λ j v j, where λ j =standard deviation (STD). Computational Methods in Inverse Problems, Mat

8 Case 2: Non-Gaussian Sample Computational Methods in Inverse Problems, Mat

9 Consider the sets Estimate of normality/non-normality B α = { x R 2 π(x) α }, α > 0. If π is Gaussian, B α is an ellipse or. Calculate the integral P { X B α } = B α π(x)dx. (4) We call B α the credibility ellipse with credibility p, 0 < p < 1, if P { X B α } = p, giving α = α(p). (5) Computational Methods in Inverse Problems, Mat

10 Assume that the Gaussian density π has the center of mass and covariance matrix x 0 and Γ estimated from the sample S of size N. If S is normally distributed, # { x j B α(p) } pn. (6) Deviations due to non-normality. Computational Methods in Inverse Problems, Mat

11 How do we calculate the quantity? Eigenvalue decomposition: (x x 0 ) T Γ 1 (x x 0 ) = (x x 0 ) T UD 1 U T (x x 0 ) = D 1/2 U T (x x 0 ) 2, since U is orthogonal, i.e., U 1 = U T, and we wrote [ ] D 1/2 1/ λ1 = 1/. λ 2 We introduce the change of variables, w = f(x) = W (x x 0 ), W = D 1/2 U T. Computational Methods in Inverse Problems, Mat

12 Write the the integral in terms of the new variable w, ( 1 π(x)dx = B α 2π ( det( Γ) ) exp 1 ) 1/2 B α 2 (x x 0) T Γ 1 (x x 0 ) dx 1 = 2π ( det( Γ) ) exp ( 12 ) W (x x 1/2 0 2 dx B α = 1 exp ( 12 ) 2π w 2 dw, f(b α ) where we used the fact that dw = det(w )dx = 1 λ1 λ 2 dx = 1 det( Γ) 1/2 dx. Note: det( Γ) = det(udu T ) = det(u T UD) = det(d) = λ 1 λ 2. Computational Methods in Inverse Problems, Mat

13 The equiprobability curves for the density for w are circles centered around the origin, i.e., f(b α ) = D δ = { w R 2 w < δ } for some δ > 0. Solve δ: Integrate in radial coordinates (r, θ), 1 2π exp ( 12 ) w 2 dw = D δ δ 0 exp ( 12 ) r2 rdr = 1 exp ( 12 ) δ2 = p, implying that δ = δ(p) = 2 log ( ) 1. 1 p Computational Methods in Inverse Problems, Mat

14 To see if the sample points x j is within the confidence ellipse with confidence p, it is enough to check if the condition is valid. w j < δ(p), w j = W (x j x 0 ), 1 j N Plot p 1 N #{ x j B α(p) } Computational Methods in Inverse Problems, Mat

15 Example Gaussian Non gaussian Computational Methods in Inverse Problems, Mat

16 Matlab code N = length(s(1,:)); % Size of the sample xmean = (1/N)*(sum(S ) ); % Mean of the sample CS = S - xmean*ones(1,n); % Centered sample Gamma = 1/N*CS*CS ; % Covariance matrix % Whitening of the sample [V,D] = eig(gamma); % Eigenvalue decomposition W = diag([1/sqrt(d(1,1));1/sqrt(d(2,2))])*v ; WS = W*CS; % Whitened sample normws2 = sum(ws.^2); Computational Methods in Inverse Problems, Mat

17 % Calculating percentual amount of scatter points that are % included in the confidence ellipses rinside = zeros(11,1); rinside(11) = N; for j = 1:9 delta2 = 2*log(1/(1-j/10)); rinside(j+1) = sum(normws2<delta2); end rinside = (1/N)*rinside; plot([0:10:100],rinside, k.-, MarkerSize,12) Computational Methods in Inverse Problems, Mat

18 Which one of the following formulae? Γ = 1 N (x j x 0 )(x j x 0 ) T, or Γ = 1 N x j x T j x 0 x T 0. The former, please. Computational Methods in Inverse Problems, Mat

19 Example Calibration of a measurement instrument: Measure a dummy load whose output known Subtract from actual measurement Analyze the noise Discrete sampling; Output is a vector of length n. Noise vector x R n is a realization of X : Ω R n. Estimate mean and variance x 0 = 1 n x j (offset), n σ 2 = 1 n n (x j x 0 ) 2. Computational Methods in Inverse Problems, Mat

20 Improving Signal-to-Noise Ratio (SNR): Repeat the measurement Average Hope that the target is stationary Averaged noise: x = 1 N x (k) R n. k=1 How large must N be to reduce the noise enough? Computational Methods in Inverse Problems, Mat

21 Averaged noise x is a realization of a random variable X = 1 N X (k) R n. k=1 If X (1), X (2),... i.i.d., X is asymptotically Gaussian by Central Limit Theorem, and its variance is var(x) = σ2 N. Repeat until the variance is below a given threshold, σ 2 N < τ 2. Computational Methods in Inverse Problems, Mat

22 3 Number of averaged signals = Number of averaged signals = Computational Methods in Inverse Problems, Mat

23 3 Number of averaged signals = Number of averaged signals = Computational Methods in Inverse Problems, Mat

24 Maximum Likelihood Estimator: frequentist s approach Parametric problem, X π θ (x) = π(x θ), θ R k. Independent realizations: Assume that the observations x j are obtained independently. More precisely: X 1, X 2,..., X N i.i.d, x j is a realization of X j. Independency: π(x 1, x 2,..., x N θ) = π(x 1 θ)π(x 2 θ) π(x N θ), or, briefly, π(s θ) = N π(x j θ), Computational Methods in Inverse Problems, Mat

25 Maximum likelihood (ML) estimator of θ = parameter value that maximizes the probability of the outcome: θ ML = arg max N π(x j θ). Define L(S θ) = log(π(s θ)). Minimizer of L(S θ) = maximizer of π(s θ). Computational Methods in Inverse Problems, Mat

26 Example Gaussian model π(x x 0, σ 2 ) = ( ) 1 1 exp 2πσ 2 2σ 2 (x x 0) 2, θ = [ x0 σ 2 ] = [ θ1 ]. θ 2 Likelihood function is N π(x j θ) = ( 1 = exp 2πθ 2 ) N/2 exp 1 2θ 2 1 2θ 2 (x j θ 1 ) 2 (x j θ 1 ) 2 N 2 log ( ) 2πθ 2 = exp ( L(S θ)). Computational Methods in Inverse Problems, Mat

27 We have θ L(S θ) = L θ 1 L θ 2 = 1 2θ θ2 2 x j + N θ 2 2 θ 1 (x j θ 1 ) 2 + N 2θ 2. Setting θ L(S θ) = 0 gives x 0 = θ ML,1 = 1 N x j, σ 2 = θ ML,2 = 1 N (x j θ ML,1 ) 2. Computational Methods in Inverse Problems, Mat

28 Parametric model Example π(n θ) = θn n! e θ, sample S = {n 1,, n N }, n k N, obtained by independent sampling. The likelihood density is π(s θ) = N k=1 π(n k ) = e Nθ N k=1 θ n k n k!, and its negative logarithm is L(S θ) = log π(s θ) = ( θ nk log θ + log n k! ). k=1 Computational Methods in Inverse Problems, Mat

29 Derivative with respect to θ to zero: N θ L(S θ) = k=1 ( 1 n ) k θ = 0, (7) leading to Warning: θ ML = 1 N n k. k=1 var(n) 1 N n k 1 N n j 2, k=1 which is different from the estimate of θ ML obtained above. Computational Methods in Inverse Problems, Mat

30 Assume that θ is known a priori to be relatively large. Use Gaussian approximation: N π Poisson (n j θ) = ( ) N/2 1 exp 2πθ 1 2θ ( ) N/2 1 exp 1 2π 2 1 θ (n j θ) 2 (n j θ) 2 + N log θ. L(S θ) = 1 θ (n j θ) 2 + N log θ. Computational Methods in Inverse Problems, Mat

31 An approximation for θ ML : Minimize L(S θ) = 1 θ (n j θ) 2 + N log θ. Write or giving θ L(S θ) = 1 θ 2 (n j θ) 2 2 θ = N (n j θ) 2 2 θ (n j θ) + N θ = 0, θ(n j θ) + Nθ = Nθ 2 + Nθ n 2 j 1/ N n j n 2 j = 0,. Computational Methods in Inverse Problems, Mat

32 Example Multivariate Gaussian model, X N (x 0, Γ), where x 0 R n is unknown, Γ R n n is symmetric positive definite (SPD) and known. Model reduction: assume that x 0 depends on hidden parameters z R k through a linear equation, x 0 = Az, A R n k, z R k. (8) Model for an inverse problem: z is the true physical quantity that in the ideal case is related to the observable x 0 through the linear model (8). Computational Methods in Inverse Problems, Mat

33 Noisy observations: Obviously, and X = Az + E, E N (0, Γ). E { X } = Az + E { E } = Az = x 0, cov(x) = E { (X Az)(X Az) T} = E { EE T} = Γ. The probability density of X, given z, is π(x z) = 1 exp (2π) n/2 det(γ) 1/2 ( 1 2 (x Az)T Γ 1 (x Az) ). Computational Methods in Inverse Problems, Mat

34 Independent observations: S = { x 1,..., x N }, xj R n. Likelihood function N π(x j z) exp 1 2 (x j Az) T Γ 1 (x j Az) is maximized by minimizing L(S z) = 1 2 (x j Az) T Γ 1 (x j Az) = N 2 zt[ A T Γ 1 A ] z z T [ A T Γ 1 N x j ] x T j Γ 1 x j. Computational Methods in Inverse Problems, Mat

35 Zeroing of the gradient gives z L(S z) = N [ A T Γ 1 A ] z A T Γ 1 N x j = 0, i.e., the maximum likelihood estimator z ML is the solution of the linear system [ A T Γ 1 A ] z = A T Γ 1 x, x = 1 N x j. The solution may not exist; All depends on the properties of the model reduction matrix A R n k. Computational Methods in Inverse Problems, Mat

36 Particular case: one observation, S = { x }, L(z x) = (x Az) T Γ 1 (x Az). Eigenvalue decomposition of the covariance matrix, Γ = UDU T, or, Γ 1 = W T W, W = D 1/2 U T, we have L(z x) = W (Az x) 2. Hence, the problem reduces to a weighted least squares problem Computational Methods in Inverse Problems, Mat

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density Marginal density If the unknown is of the form x = x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density πx 1 y) = πx 1, x 2 y)dx 2 = πx 2 )πx 1 y, x 2 )dx 2 needs to be

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information. An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information. Model: where g(t) = a(t s)f(s)ds + e(t), a(t) t = (rapidly). The problem,

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Modern Navigation. Thomas Herring

Modern Navigation. Thomas Herring 12.215 Modern Navigation Thomas Herring Estimation methods Review of last class Restrict to basically linear estimation problems (also non-linear problems that are nearly linear) Restrict to parametric,

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

DS-GA 1002 Lecture notes 12 Fall Linear regression

DS-GA 1002 Lecture notes 12 Fall Linear regression DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

y Xw 2 2 y Xw λ w 2 2

y Xw 2 2 y Xw λ w 2 2 CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:

More information

Review of probabilities

Review of probabilities CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia GARCH Models Estimation and Inference Eduardo Rossi University of Pavia Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

The Gaussian distribution

The Gaussian distribution The Gaussian distribution Probability density function: A continuous probability density function, px), satisfies the following properties:. The probability that x is between two points a and b b P a

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()

More information

7. Statistical estimation

7. Statistical estimation 7. Statistical estimation Convex Optimization Boyd & Vandenberghe maximum likelihood estimation optimal detector design experiment design 7 1 Parametric distribution estimation distribution estimation

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans

More information

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals

ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Matched Filter Generalized Matched Filter Signal

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Linear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Linear regression. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example:

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics) COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core

More information

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 Prof. Steven Waslander p(a): Probability that A is true 0 pa ( ) 1 p( True) 1, p( False) 0 p( A B) p( A) p( B) p( A B) A A B B 2 Discrete Random Variable X

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Localization of Radioactive Sources Zhifei Zhang

Localization of Radioactive Sources Zhifei Zhang Localization of Radioactive Sources Zhifei Zhang 4/13/2016 1 Outline Background and motivation Our goal and scenario Preliminary knowledge Related work Our approach and results 4/13/2016 2 Background and

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v } Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

Computational functional genomics

Computational functional genomics Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate

More information

Lecture 5: GPs and Streaming regression

Lecture 5: GPs and Streaming regression Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain Whitening and Coloring Transformations for Multivariate Gaussian Data A Slecture for ECE 662 by Maliha Hossain Introduction This slecture discusses how to whiten data that is normally distributed. Data

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

10. Linear Models and Maximum Likelihood Estimation

10. Linear Models and Maximum Likelihood Estimation 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that

More information

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes

Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes TTU, October 26, 2012 p. 1/3 Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes Hao Zhang Department of Statistics Department of Forestry and Natural Resources Purdue University

More information

Fundamentals of Unconstrained Optimization

Fundamentals of Unconstrained Optimization dalmau@cimat.mx Centro de Investigación en Matemáticas CIMAT A.C. Mexico Enero 2016 Outline Introduction 1 Introduction 2 3 4 Optimization Problem min f (x) x Ω where f (x) is a real-valued function The

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

GARCH Models Estimation and Inference

GARCH Models Estimation and Inference Università di Pavia GARCH Models Estimation and Inference Eduardo Rossi Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions ASTR 511/O Connell Lec 6 1 STATISTICS OF OBSERVATIONS & SAMPLING THEORY References: Bevington Data Reduction & Error Analysis for the Physical Sciences LLM: Appendix B Warning: the introductory literature

More information

An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions

An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions H. V. Poor Princeton University March 17, 5 Exercise 1: Let {h k,l } denote the impulse response of a

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

4 Gaussian Mixture Models

4 Gaussian Mixture Models 4 Gaussian Mixture Models Once you have a collection of feature vectors you will need to describe their distribution. You will do this using a Gaussian Mixture Model. The GMM comprises a collection of

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS

THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS MATH853: Linear Algebra, Probability and Statistics May 5, 05 9:30a.m. :30p.m. Only approved calculators as announced by the Examinations Secretary

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information