The problem is to infer on the underlying probability distribution that gives rise to the data S.
|
|
- Ambrose Wheeler
- 5 years ago
- Views:
Transcription
1 Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to the data S. Statistical modeling Statistical analysis. Computational Methods in Inverse Problems, Mat
2 Parametric or non-parametric? Parametric problem: The underlying probability density has a specified form and depends on a number of parameters. The problem is to infer on those parameters. Non-parametric problem: No analytic expression for the probability density is available. Description consists of defining the dependency/nondependency of the data. Numerical exploration. Typical situation for parametric model: The distribution is the probability density of a random variable X : Ω R n. Parametric problem suitable for inverse problems Model for a learning process Computational Methods in Inverse Problems, Mat
3 Law of Large Numbers General result ( Statistical law of nature ): Assume that X 1, X 2,... are independent and identically distributed random variables with finite mean µ and variance σ 2. Then, almost certainly. lim n 1 ( ) X1 + X X n = µ n Almost certainly means that with probability one, lim n x j being a realization of X j. 1 ( ) x1 + x x n = µ, n Computational Methods in Inverse Problems, Mat
4 Sample Parametric model: x j realizations of Example S = { x 1, x 2,..., x N }, xj R 2. X N (x 0, Γ), with unknown mean x 0 R 2 and covariance matrix Γ R 2 2. Probability density of X: π(x x 0, Γ) = 1 exp 2πdet(Γ) 1/2 Problem: Estimate the parameters x 0 and Γ. ( 1 ) 2 (x x 0) T Γ 1 (x x 0 ). Computational Methods in Inverse Problems, Mat
5 The Law of Large Number suggests that we calculate x 0 = E { X } 1 n n x j = x 0. (1) Covariance matrix: observe that if X 1, X 2,... are i.i.d, so are f(x 1 ), f(x 2 ),... for any function f : R 2 R k. Try Γ = cov(x) = E { (X x 0 )(X x 0 ) T} E { (X x 0 )(X x 0 ) T} (2) 1 n n (x j x 0 )(x j x 0 ) T = Γ. Formulas (1) and (2) are known as empirical mean and covariance, respectively. Computational Methods in Inverse Problems, Mat
6 Case 1: Gaussian sample Computational Methods in Inverse Problems, Mat
7 Sample size N = 200. Eigenvectors of the covariance matrix: Γ = UDU T, (3) where U R 2 2 is an orthogonal matrix and D R 2 2 is a diagonal, U T = U 1. U = [ [ ] ] λ1 v 1 v 2, D =, λ 2 Γv j = λ j v,, j = 1, 2. Scaled eigenvectors, v j,scaled = 2 λ j v j, where λ j =standard deviation (STD). Computational Methods in Inverse Problems, Mat
8 Case 2: Non-Gaussian Sample Computational Methods in Inverse Problems, Mat
9 Consider the sets Estimate of normality/non-normality B α = { x R 2 π(x) α }, α > 0. If π is Gaussian, B α is an ellipse or. Calculate the integral P { X B α } = B α π(x)dx. (4) We call B α the credibility ellipse with credibility p, 0 < p < 1, if P { X B α } = p, giving α = α(p). (5) Computational Methods in Inverse Problems, Mat
10 Assume that the Gaussian density π has the center of mass and covariance matrix x 0 and Γ estimated from the sample S of size N. If S is normally distributed, # { x j B α(p) } pn. (6) Deviations due to non-normality. Computational Methods in Inverse Problems, Mat
11 How do we calculate the quantity? Eigenvalue decomposition: (x x 0 ) T Γ 1 (x x 0 ) = (x x 0 ) T UD 1 U T (x x 0 ) = D 1/2 U T (x x 0 ) 2, since U is orthogonal, i.e., U 1 = U T, and we wrote [ ] D 1/2 1/ λ1 = 1/. λ 2 We introduce the change of variables, w = f(x) = W (x x 0 ), W = D 1/2 U T. Computational Methods in Inverse Problems, Mat
12 Write the the integral in terms of the new variable w, ( 1 π(x)dx = B α 2π ( det( Γ) ) exp 1 ) 1/2 B α 2 (x x 0) T Γ 1 (x x 0 ) dx 1 = 2π ( det( Γ) ) exp ( 12 ) W (x x 1/2 0 2 dx B α = 1 exp ( 12 ) 2π w 2 dw, f(b α ) where we used the fact that dw = det(w )dx = 1 λ1 λ 2 dx = 1 det( Γ) 1/2 dx. Note: det( Γ) = det(udu T ) = det(u T UD) = det(d) = λ 1 λ 2. Computational Methods in Inverse Problems, Mat
13 The equiprobability curves for the density for w are circles centered around the origin, i.e., f(b α ) = D δ = { w R 2 w < δ } for some δ > 0. Solve δ: Integrate in radial coordinates (r, θ), 1 2π exp ( 12 ) w 2 dw = D δ δ 0 exp ( 12 ) r2 rdr = 1 exp ( 12 ) δ2 = p, implying that δ = δ(p) = 2 log ( ) 1. 1 p Computational Methods in Inverse Problems, Mat
14 To see if the sample points x j is within the confidence ellipse with confidence p, it is enough to check if the condition is valid. w j < δ(p), w j = W (x j x 0 ), 1 j N Plot p 1 N #{ x j B α(p) } Computational Methods in Inverse Problems, Mat
15 Example Gaussian Non gaussian Computational Methods in Inverse Problems, Mat
16 Matlab code N = length(s(1,:)); % Size of the sample xmean = (1/N)*(sum(S ) ); % Mean of the sample CS = S - xmean*ones(1,n); % Centered sample Gamma = 1/N*CS*CS ; % Covariance matrix % Whitening of the sample [V,D] = eig(gamma); % Eigenvalue decomposition W = diag([1/sqrt(d(1,1));1/sqrt(d(2,2))])*v ; WS = W*CS; % Whitened sample normws2 = sum(ws.^2); Computational Methods in Inverse Problems, Mat
17 % Calculating percentual amount of scatter points that are % included in the confidence ellipses rinside = zeros(11,1); rinside(11) = N; for j = 1:9 delta2 = 2*log(1/(1-j/10)); rinside(j+1) = sum(normws2<delta2); end rinside = (1/N)*rinside; plot([0:10:100],rinside, k.-, MarkerSize,12) Computational Methods in Inverse Problems, Mat
18 Which one of the following formulae? Γ = 1 N (x j x 0 )(x j x 0 ) T, or Γ = 1 N x j x T j x 0 x T 0. The former, please. Computational Methods in Inverse Problems, Mat
19 Example Calibration of a measurement instrument: Measure a dummy load whose output known Subtract from actual measurement Analyze the noise Discrete sampling; Output is a vector of length n. Noise vector x R n is a realization of X : Ω R n. Estimate mean and variance x 0 = 1 n x j (offset), n σ 2 = 1 n n (x j x 0 ) 2. Computational Methods in Inverse Problems, Mat
20 Improving Signal-to-Noise Ratio (SNR): Repeat the measurement Average Hope that the target is stationary Averaged noise: x = 1 N x (k) R n. k=1 How large must N be to reduce the noise enough? Computational Methods in Inverse Problems, Mat
21 Averaged noise x is a realization of a random variable X = 1 N X (k) R n. k=1 If X (1), X (2),... i.i.d., X is asymptotically Gaussian by Central Limit Theorem, and its variance is var(x) = σ2 N. Repeat until the variance is below a given threshold, σ 2 N < τ 2. Computational Methods in Inverse Problems, Mat
22 3 Number of averaged signals = Number of averaged signals = Computational Methods in Inverse Problems, Mat
23 3 Number of averaged signals = Number of averaged signals = Computational Methods in Inverse Problems, Mat
24 Maximum Likelihood Estimator: frequentist s approach Parametric problem, X π θ (x) = π(x θ), θ R k. Independent realizations: Assume that the observations x j are obtained independently. More precisely: X 1, X 2,..., X N i.i.d, x j is a realization of X j. Independency: π(x 1, x 2,..., x N θ) = π(x 1 θ)π(x 2 θ) π(x N θ), or, briefly, π(s θ) = N π(x j θ), Computational Methods in Inverse Problems, Mat
25 Maximum likelihood (ML) estimator of θ = parameter value that maximizes the probability of the outcome: θ ML = arg max N π(x j θ). Define L(S θ) = log(π(s θ)). Minimizer of L(S θ) = maximizer of π(s θ). Computational Methods in Inverse Problems, Mat
26 Example Gaussian model π(x x 0, σ 2 ) = ( ) 1 1 exp 2πσ 2 2σ 2 (x x 0) 2, θ = [ x0 σ 2 ] = [ θ1 ]. θ 2 Likelihood function is N π(x j θ) = ( 1 = exp 2πθ 2 ) N/2 exp 1 2θ 2 1 2θ 2 (x j θ 1 ) 2 (x j θ 1 ) 2 N 2 log ( ) 2πθ 2 = exp ( L(S θ)). Computational Methods in Inverse Problems, Mat
27 We have θ L(S θ) = L θ 1 L θ 2 = 1 2θ θ2 2 x j + N θ 2 2 θ 1 (x j θ 1 ) 2 + N 2θ 2. Setting θ L(S θ) = 0 gives x 0 = θ ML,1 = 1 N x j, σ 2 = θ ML,2 = 1 N (x j θ ML,1 ) 2. Computational Methods in Inverse Problems, Mat
28 Parametric model Example π(n θ) = θn n! e θ, sample S = {n 1,, n N }, n k N, obtained by independent sampling. The likelihood density is π(s θ) = N k=1 π(n k ) = e Nθ N k=1 θ n k n k!, and its negative logarithm is L(S θ) = log π(s θ) = ( θ nk log θ + log n k! ). k=1 Computational Methods in Inverse Problems, Mat
29 Derivative with respect to θ to zero: N θ L(S θ) = k=1 ( 1 n ) k θ = 0, (7) leading to Warning: θ ML = 1 N n k. k=1 var(n) 1 N n k 1 N n j 2, k=1 which is different from the estimate of θ ML obtained above. Computational Methods in Inverse Problems, Mat
30 Assume that θ is known a priori to be relatively large. Use Gaussian approximation: N π Poisson (n j θ) = ( ) N/2 1 exp 2πθ 1 2θ ( ) N/2 1 exp 1 2π 2 1 θ (n j θ) 2 (n j θ) 2 + N log θ. L(S θ) = 1 θ (n j θ) 2 + N log θ. Computational Methods in Inverse Problems, Mat
31 An approximation for θ ML : Minimize L(S θ) = 1 θ (n j θ) 2 + N log θ. Write or giving θ L(S θ) = 1 θ 2 (n j θ) 2 2 θ = N (n j θ) 2 2 θ (n j θ) + N θ = 0, θ(n j θ) + Nθ = Nθ 2 + Nθ n 2 j 1/ N n j n 2 j = 0,. Computational Methods in Inverse Problems, Mat
32 Example Multivariate Gaussian model, X N (x 0, Γ), where x 0 R n is unknown, Γ R n n is symmetric positive definite (SPD) and known. Model reduction: assume that x 0 depends on hidden parameters z R k through a linear equation, x 0 = Az, A R n k, z R k. (8) Model for an inverse problem: z is the true physical quantity that in the ideal case is related to the observable x 0 through the linear model (8). Computational Methods in Inverse Problems, Mat
33 Noisy observations: Obviously, and X = Az + E, E N (0, Γ). E { X } = Az + E { E } = Az = x 0, cov(x) = E { (X Az)(X Az) T} = E { EE T} = Γ. The probability density of X, given z, is π(x z) = 1 exp (2π) n/2 det(γ) 1/2 ( 1 2 (x Az)T Γ 1 (x Az) ). Computational Methods in Inverse Problems, Mat
34 Independent observations: S = { x 1,..., x N }, xj R n. Likelihood function N π(x j z) exp 1 2 (x j Az) T Γ 1 (x j Az) is maximized by minimizing L(S z) = 1 2 (x j Az) T Γ 1 (x j Az) = N 2 zt[ A T Γ 1 A ] z z T [ A T Γ 1 N x j ] x T j Γ 1 x j. Computational Methods in Inverse Problems, Mat
35 Zeroing of the gradient gives z L(S z) = N [ A T Γ 1 A ] z A T Γ 1 N x j = 0, i.e., the maximum likelihood estimator z ML is the solution of the linear system [ A T Γ 1 A ] z = A T Γ 1 x, x = 1 N x j. The solution may not exist; All depends on the properties of the model reduction matrix A R n k. Computational Methods in Inverse Problems, Mat
36 Particular case: one observation, S = { x }, L(z x) = (x Az) T Γ 1 (x Az). Eigenvalue decomposition of the covariance matrix, Γ = UDU T, or, Γ 1 = W T W, W = D 1/2 U T, we have L(z x) = W (Az x) 2. Hence, the problem reduces to a weighted least squares problem Computational Methods in Inverse Problems, Mat
Statistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationMarginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density
Marginal density If the unknown is of the form x = x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density πx 1 y) = πx 1, x 2 y)dx 2 = πx 2 )πx 1 y, x 2 )dx 2 needs to be
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationAn example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.
An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information. Model: where g(t) = a(t s)f(s)ds + e(t), a(t) t = (rapidly). The problem,
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More informationAnnouncements (repeat) Principal Components Analysis
4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationBayes Decision Theory
Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationModern Navigation. Thomas Herring
12.215 Modern Navigation Thomas Herring Estimation methods Review of last class Restrict to basically linear estimation problems (also non-linear problems that are nearly linear) Restrict to parametric,
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationGARCH Models Estimation and Inference
GARCH Models Estimation and Inference Eduardo Rossi University of Pavia December 013 Rossi GARCH Financial Econometrics - 013 1 / 1 Likelihood function The procedure most often used in estimating θ 0 in
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationDS-GA 1002 Lecture notes 12 Fall Linear regression
DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationy Xw 2 2 y Xw λ w 2 2
CS 189 Introduction to Machine Learning Spring 2018 Note 4 1 MLE and MAP for Regression (Part I) So far, we ve explored two approaches of the regression framework, Ordinary Least Squares and Ridge Regression:
More informationReview of probabilities
CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationGARCH Models Estimation and Inference. Eduardo Rossi University of Pavia
GARCH Models Estimation and Inference Eduardo Rossi University of Pavia Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More information5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors
EE401 (Semester 1) 5. Random Vectors Jitkomut Songsiri probabilities characteristic function cross correlation, cross covariance Gaussian random vectors functions of random vectors 5-1 Random vectors we
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationThe Gaussian distribution
The Gaussian distribution Probability density function: A continuous probability density function, px), satisfies the following properties:. The probability that x is between two points a and b b P a
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric
More informationCS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More information7. Statistical estimation
7. Statistical estimation Convex Optimization Boyd & Vandenberghe maximum likelihood estimation optimal detector design experiment design 7 1 Parametric distribution estimation distribution estimation
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans
More informationELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals
ELEG 5633 Detection and Estimation Signal Detection: Deterministic Signals Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Matched Filter Generalized Matched Filter Signal
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationLinear regression. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example:
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationCOMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)
COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core
More informationME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander
ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 Prof. Steven Waslander p(a): Probability that A is true 0 pa ( ) 1 p( True) 1, p( False) 0 p( A B) p( A) p( B) p( A B) A A B B 2 Discrete Random Variable X
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More information. Find E(V ) and var(v ).
Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationLocalization of Radioactive Sources Zhifei Zhang
Localization of Radioactive Sources Zhifei Zhang 4/13/2016 1 Outline Background and motivation Our goal and scenario Preliminary knowledge Related work Our approach and results 4/13/2016 2 Background and
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationStatistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }
Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationComputational functional genomics
Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More informationWhitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain
Whitening and Coloring Transformations for Multivariate Gaussian Data A Slecture for ECE 662 by Maliha Hossain Introduction This slecture discusses how to whiten data that is normally distributed. Data
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More information10. Linear Models and Maximum Likelihood Estimation
10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that
More informationKarhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes
TTU, October 26, 2012 p. 1/3 Karhunen-Loeve Expansion and Optimal Low-Rank Model for Spatial Processes Hao Zhang Department of Statistics Department of Forestry and Natural Resources Purdue University
More informationFundamentals of Unconstrained Optimization
dalmau@cimat.mx Centro de Investigación en Matemáticas CIMAT A.C. Mexico Enero 2016 Outline Introduction 1 Introduction 2 3 4 Optimization Problem min f (x) x Ω where f (x) is a real-valued function The
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationBrandon C. Kelly (Harvard Smithsonian Center for Astrophysics)
Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationThe generative approach to classification. A classification problem. Generative models CSE 250B
The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationOn prediction and density estimation Peter McCullagh University of Chicago December 2004
On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating
More informationGARCH Models Estimation and Inference
Università di Pavia GARCH Models Estimation and Inference Eduardo Rossi Likelihood function The procedure most often used in estimating θ 0 in ARCH models involves the maximization of a likelihood function
More informationSTAT215: Solutions for Homework 2
STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationSTATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions
ASTR 511/O Connell Lec 6 1 STATISTICS OF OBSERVATIONS & SAMPLING THEORY References: Bevington Data Reduction & Error Analysis for the Physical Sciences LLM: Appendix B Warning: the introductory literature
More informationAn Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions
An Introduction to Signal Detection and Estimation - Second Edition Chapter III: Selected Solutions H. V. Poor Princeton University March 17, 5 Exercise 1: Let {h k,l } denote the impulse response of a
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More information4 Gaussian Mixture Models
4 Gaussian Mixture Models Once you have a collection of feature vectors you will need to describe their distribution. You will do this using a Gaussian Mixture Model. The GMM comprises a collection of
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationTHE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS
THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS MATH853: Linear Algebra, Probability and Statistics May 5, 05 9:30a.m. :30p.m. Only approved calculators as announced by the Examinations Secretary
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More information