The Multivariate Gaussian Distribution

Similar documents
10/8/2014. The Multivariate Gaussian Distribution. Time-Series Plot of Tetrode Recordings. Recordings. Tetrode

9.07 Introduction to Probability and Statistics for Brain and Cognitive Sciences Emery N. Brown

9.07 Introduction to Statistics for Brain and Cognitive Neuroscience Emery N. Brown

A. Motivation To motivate the analysis of variance framework, we consider the following example.

9.07 Introduction to Probability and Statistics for Brain and Cognitive Sciences Emery N. Brown

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Post-exam 2 practice questions 18.05, Spring 2014

Multivariate Capability Analysis Using Statgraphics. Presented by Dr. Neil W. Polhemus

Confidence Intervals for Normal Data Spring 2014

7. The Multivariate Normal Distribution

The Expectation-Maximization Algorithm

Recitation 1: Regression Review. Christina Patterson

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Autonomous Navigation for Flying Robots

Disentangling Gaussians

A Probability Review

Probability and Stochastic Processes

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

STT 843 Key to Homework 1 Spring 2018

Stat 5101 Lecture Notes

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

MIT Spring 2015

L03. PROBABILITY REVIEW II COVARIANCE PROJECTION. NA568 Mobile Robotics: Methods & Algorithms

14.6 Spring Force Energy Diagram

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Statistical techniques for data analysis in Cosmology

Machine learning - HT Maximum Likelihood

Multivariate Distributions

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Observed Brain Dynamics

Contents 1. Contents

3 Joint Distributions 71

Neural Spike Train Analysis 1: Introduction to Point Processes

Chapter 5 continued. Chapter 5 sections

Lecture 2: Repetition of probability theory and statistics

Multiple Random Variables

Class 8 Review Problems 18.05, Spring 2014

Class 8 Review Problems solutions, 18.05, Spring 2014

x log x, which is strictly convex, and use Jensen s Inequality:

Hypothesis testing:power, test statistic CMS:

6.207/14.15: Networks Lecture 12: Generalized Random Graphs

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment

Part 6: Multivariate Normal and Linear Models

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

14.30 Introduction to Statistical Methods in Economics Spring 2009

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Applied Probability and Stochastic Processes

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Compute f(x θ)f(θ) dθ

First steps of multivariate data analysis

Lecture: Sampling and Standard Error LECTURE 8 1

18.05 Exam 1. Table of normal probabilities: The last page of the exam contains a table of standard normal cdf values.

Introduction to Probabilistic Graphical Models: Exercises

Bootstrapping Spring 2014

Financial Econometrics and Quantitative Risk Managenent Return Properties

Linear Dynamical Systems

2.7 The Gaussian Probability Density Function Forms of the Gaussian pdf for Real Variates

Financial Econometrics and Volatility Models Copulas

Gentle Introduction to Infinite Gaussian Mixture Modeling

HANDBOOK OF APPLICABLE MATHEMATICS

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

9.01 Introduction to Neuroscience Fall 2007

HST 583 FUNCTIONAL MAGNETIC RESONANCE IMAGING DATA ANALYSIS AND ACQUISITION A REVIEW OF STATISTICS FOR FMRI DATA ANALYSIS

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Stat 710: Mathematical Statistics Lecture 31

Parameter Estimation. Industrial AI Lab.

F & B Approaches to a simple model

Multivariate Bayesian Linear Regression MLAI Lecture 11

B4 Estimation and Inference

The Instability of Correlations: Measurement and the Implications for Market Risk

Introduction to Probability and Stocastic Processes - Part I

GAUSSIAN PROCESS REGRESSION

Contents. Acknowledgments. xix

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

Recipes for the Linear Analysis of EEG and applications

Information geometry for bivariate distribution control

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Chapter 8: Sampling distributions of estimators Sections

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY

ECON 3150/4150, Spring term Lecture 6

9.07 MATLAB Tutorial. Camilo Lamus. October 1, 2010

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Estimation of Parameters

Transductive neural decoding for unsorted neuronal spikes of rat hippocampus

The data can be downloaded as an Excel file under Econ2130 at

A new look at state-space models for neural data

Principles of the Global Positioning System Lecture 11

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

Multidimensional scaling (MDS)

Lecture 3. Inference about multivariate normal distribution

EEG- Signal Processing

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

Subject CS1 Actuarial Statistics 1 Core Principles

16.400/453J Human Factors Engineering. Design of Experiments II

Transcription:

9.07 INTRODUCTION TO STATISTICS FOR BRAIN AND COGNITIVE SCIENCES Lecture 4 Emery N. Brown The Multivariate Gaussian Distribution Analysis of Background Magnetoencephalogram Noise Courtesy of Simona Temereanca MGH Martinos Center for Biomedical Imaging 1

Magnetoencephalogram Background Noise Boxplot Q-Q Plot Why are these data Gaussian? Answer: Central Limit Theorem 1.6 Seconds of Two Time Series of MEG Recordings Front Sensor Back Sensor Figure 4G 2

Scatterplot of MEG Recordings Figure 4H Histograms and Q-Q Plots of MEG Recordings Front Sensor Back Sensor Figure 4I 3

Case 2: Probability Model for Spike Sorting The data are tetrode recordings (four electrodes) of the peak voltages (mv) corresponding to putative spike events from a rat hippocampal neuron recorded during a texture-sensitivity behavioral task. Each of the 15,600 spike events recorded during the 50 minutes is a four vector. The objective is to develop a probability model to describe the cluster of spikes events coming from a single neuron. Such a model provides the basis for a spike sorting algorithm. Acknowledgments: Data provided by Sujith Vijayan and Matt Wilson Technical Assistance Julie Scott Tetrode Recordings Neuron 4

Time-Series Plot of Tetrode Recordings Six Bivariate Plots of Tetrode Channel Recordings 5

Histograms of Spike Events By Channel Channel 1 Channel 2 Channel 3 Channel 4 Box Plots of Spike Events By Channel Voltage (V) Channel 2 Channel 4 Channel 1 Channel 3 6

DATA: The Tetrode Recordings x k,1 x k,2 x k x k,3 xk,4 Four peak voltages recorded on the k-th spike event for k = 1,, K, where K is the total number of spike events. GAUSSIAN PROBABILITY MODEL Four-Variate Gaussian Model Mean 1 1-1 f( x k,w )= exp - ( x k - )W ( x k - ) (2 W 2 =( 1, 2, 3, 4 ) Covariance Matrix (symmetric) k 1,..., K 2 1 12 13 14 2 21 2 23 24 W = 2 31 32 3 34 2 41 42 43 4 7

N-Multivariate Gaussian Model Factoids 1. A Gaussian probability density is completely defined by its mean vector and covariance matrix. 2. All marginal probability densities are univariate Gaussian. 3. Frequently used because it is i) analytically and computationally tractable ii) suggested by the Central Limit Theorem 4. Any linear of the components is Gaussian (a characterization). Central Limit Theorem The distribution of the sum of random quantities such that the contribution of any individual quantity goes to zero as the number of quantities being summed becomes large (goes to infinity) will be Gaussian. 8

N-Multivariate Gaussian Model Factoids Bivariate Gaussian Distribution Cross-section is an ellipse Marginal distribution is univariate Gaussian Cumulative Distribution Function Univariate Gaussian Model Factoids Gaussian Probability Density Function 1 1(x) 2 f( x ) (2 2 ) 2 exp{ }. 2 2 Standard Gaussian Probability Density Function 2 0 1 f ( ) 1 1 2 2 x (2 ) exp{ x }. Standard Cumulative Gaussian Distribution Function x 1 1 2 2 ( x) (2 ) exp{ u } du. 2 2 9

Univariate Gaussian Model Factoids Mu is the mean (location) Standard deviation (scale) Any Gaussian distribution can be converted into a standard Gaussian distribution (mu = 0, sd =1) 68% of the area within ~ 1 sd of mean 95% of the area within ~ 2 sd of mean 99% of the area within ~ 2.58 sd of mean ESTIMATION Joint Distribution of the Four-Variate Gaussian Model K K 1 1 K W k, )= k -1 f( x, )= f ( x W exp - ( x - ) W ( x k - ) k=1 (2 W 2 k=1 where x =(x 1,..., x K ) Log Likelihood 2 K 1 K log f( x, W )=-Klog(2 ) - log W - k=1 ( x k - ) W ( x k - ) 2 2 where K is the number of spike events in the data set. 10

ESTIMATION For Gaussian observations the maximum likelihood and method-of-moments estimates are the same. Sample Mean Sample Variance -1 K 2-1 K 2 = (x k,i ˆi ) k1 k,i ˆi = K = x ˆi = K k 1 Sample Covariance ˆi, j = K -1 K = (x k,i ˆi )(x k, j ˆ j ) k 1 for i = 1,, 4 and j = 1,,4. Sample Correlation ˆi, j i, j 1 2 2 2 ˆ i ˆ j ˆ = CONFIDENCE INTERVALS FOR THE PARAMETER ESTIMATES OF THE MARGINAL GAUSSIAN DISTRIBUTIONS The Fisher Information Matrix is 2 L I ( ) = -E 2 where i 2, ) The confidence interval is 2 K / I( )= 4 2 K / ii ± zα I ( θ 2 ) īi 1 11

Four-Variate Gaussian Model Parameter Estimates Sample Mean Vector Sample Covariance Matrix 0.0015 0.0019 0.0011 0.0009 1.0 e - 07 x 0.2322 0.1724 0.1503 0.1570 0.1724 0.2304 0.1560 0.1387 0.1503 0.1560 0.2126 0.1466 0.1570 0.1387 0.1466 0.2130 Sample Correlation Matrix 1.00 0.74 0.68 0.71 0.74 1.00 0.70 0.63 0.68 0.70 1.00 0.69 0.71 0.63 0.69 1.00 Marginal Gaussian Parameter Estimates and Confidence Intervals (An Exercise: Compute the Confidence Intervals) Sample Mean Vector 0.0015 0.0019 0.0011 0.0009 Sample Variances 1.0 e - 07 x 0.2322 0.2304 0.2126 0.2130 12

Histograms of Spike Events By Channel Channel 1 Channel 2 Channel 3 Channel 4 Empirical and Model Estimates of Marginal Cumulative Probability Densities 13

Six Bivariate Plots of Tetrode Channel Recordings With 95% Probability Contour Q-Q Plots GOODNESS-OF-FIT Kolmogorov-Smirnov Tests A Chi-Squared Test Separate the bivariate data into deciles and compute 10 (O - E ) 2 ~ O 2 i i 9 d = 1 i where O i is the observed number of observation in decile i and E i is expected number of observations in decile i. 14

Q-Q Plots Channel 1 Channel 2 Channel 3 Channel 4 K-S Plots Channel 1 Channel 2 Channel 3 Channel 4 15

Six Bivariate Plots of Tetrode Channel Recordings With 95% Probability Contour Bivariate Plots of Tetrode Channel Recordings Channel 4 vs 1 Channel 3 vs 2 16

Bivariate Plots of Tetrode Channel Recordings with Gaussian Equiprobability Contours (An Exercise: Carry Out the Chi-Squared Test) Channel 4 vs 1 Channel 3 vs 2 Linear Combinations of Gaussian Random Variables are Gaussian If where and where then X ~ N (, W) X ( x1, x2, x3, x 4 ) 4 Y c x i i i1 c ( c1, c2, c3, c 4 ) 4 Y ~ N (c, c'wc) i1 i i 17

Linear Combination Analysis Channel 2 Linear Combination 0.3528-0.2523-0.2093-2.1318 CONCLUSION The data seem well approximated with a four-variate Gaussian model. The marginal probability density of Channel 4 is the best Gaussian fit. The Central Limit Theorem most likely explains why the Gaussian model works here. 18

Epilogue Another real example of real Gaussian data in neuroscience data? 19

MIT OpenCourseWare https://ocw.mit.edu 9.07 Statistics for Brain and Cognitive Science Fall 2016 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.