Notes on the Multivariate Normal and Related Topics

Size: px
Start display at page:

Download "Notes on the Multivariate Normal and Related Topics"

Transcription

1 Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions and sampling distributions One might say that anyone worth knowing, knows these distinctions We start with the simple univariate framework and then move on to the multivariate context A) Begin by positing a population of interest that is operationalized by some random variable, say X In this Theory World framework, X is characterized by parameters, such as the expectation of X, µ = E(X), or its variance, σ 2 = V(X) The random variable X has a (population) distribution, which for us will typically be assumed normal B) A sample is generated by taking observations on X, say, X 1,, X n, considered independent and identically distributed as X, ie, they are exact copies of X In this Data World context, statistics are functions of the sample, and therefore, characterize the sample: the sample mean, ˆµ = 1 n n X i ; the sample variance, ˆσ 2 = 1 n n (X i ˆµ) 2, with some possible variation in dividing by n 1 to generate an unbiased estimator of σ 2 The statistics, ˆµ and ˆσ 2, are point estimators of µ and σ 2 They are random variables by themselves, so they have distributions that are called sampling distributions The general problem of statistical inference is to ask what the sample statistics, ˆµ and ˆσ 2, tell us about their population counterparts, 1

2 such as µ and σ 2 Can we obtain some notion of accuracy from the sampling distribution, eg, confidence intervals? The multivariate problem is generally the same but with more notation: A) The population (Theory World) is characterized by a collection of p random variables, X = [X 1, X 2,, X p ], with parameters: µ i = E(X i ); σ 2 i = V(X i ); σ ij = Cov(X i, X j ) = E[(X i E(X i ))(X j E(X j ))]; or the correlation between X i and X j, ρ ij = σ ij /σ i σ j B) The sample (Data World) is defined by n independent observations on the random vector X, with the observations placed into an n p data matrix (eg, subject by variable) that we also (with a slight abuse of notation) denote by a bold-face capital letter, X n p : X n p = x 11 x 12 x 1p x 21 x 22 x 2p x n1 x n2 x np The statistics corresponding to the parameters of the population are: ˆµ i = 1 n n k=1 x ki ; ˆσ 2 i = 1 n n k=1 (x ki ˆµ i ) 2, with again some possible variation to a division by n 1 for an unbiased estimate; ˆσ ij = n k=1 (x ki ˆµ i )(x kj ˆµ j ); and ˆρ ij = ˆσ ij /ˆσ iˆσ j 1 n To obtain a good sense of what the estimators tell us about the population parameters, we will have to make some assumption about the population, eg, [X 1,, X p ] has a multivariate normal distribution As we will see, this assumption leads to some very nice results = x 1 x 2 x n 2

3 01 Developing the Multivariate Normal Distribution Suppose X 1,, X p are p continuous random variables with density functions, f 1 (x 1 ),, f p (x p ), and distribution functions, F 1 (x 1 ),, F p (x p ), where P (X i x i ) = F i (x i ) = x i f i(x i )dx We define a p-dimensional random variable (or random vector) as the vector, X = [X 1,, X p ]; X has the joint cumulative distribution function F (x 1,, x p ) = x p x 1 f(x 1,, x p )dx 1 dx p If the random variables, X 1,, X p are independent, then the joint density and cumulative distribution functions factor: f(x 1,, x p ) = f 1 (x 1 ) f p (x p ) and F (x 1,, x p ) = F 1 (x 1 ) F p (x p ) The independence property will not be assumed in the following; in fact, the whole idea is to investigate what type of dependency exists in the random vector X When X = [X 1, X 2,, X p ] MVN(µ, Σ), the joint density function has the form 1 φ(x 1,, x p ) = (2π) p/2 Σ exp{ 1 1/2 2 (x µ) Σ 1 (x µ)} In the univariate case, the density provides the usual bell-shaped curve: φ(x) = 1 2πσ exp{ 1 2 [(x µ)/σ)2 ]} Given the MVN assumption on the generation of the data matrix, 3

4 X n p, we know the sampling distribution of the vector of means: ˆµ = ˆµ 1 ˆµ 2 ˆµ p MVN(µ, (1/n)Σ) In the univariate case, we have that ˆµ N(µ, (1/n)σ 2 ) As might be expected, a Central Limit Theorem (CLT) states that these results also hold asymptotically when the MVN assumption is relaxed Also, if we estimate the variance-covariance matrix, Σ, with divisions by n 1 rather than n, and denote the result as ˆΣ, then E(ˆΣ) = Σ Linear combinations of random variables form the backbone of all of multivariate statistics For example, suppose X = [X 1, X 2,, X p ] MVN(µ, Σ), and consider the following q linear combinations: Z 1 = c 11 X c 1p X p Z q = c q1 X c qp X p Then the vector Z = [Z 1, Z 2,, Z q ] MVN(Cµ, CΣC ), where C q p = c 11 c q1 c 1p c qp These same results hold in the sample if we observe the q linear combinations, [Z 1, Z 2,, Z q ]: ie, the sample mean vector is Cˆµ and the sample variance-covariance matrix is CˆΣC 4

5 02 Multiple Regression and Partial Correlation (in the Population) Suppose we partition our vector of p random variables as follows: X = [X 1, X 2,, X p ] = [[X 1, X 2,, X k ], [X k+1, X k+2, X p ]] [X 1, X 2] Partitioning the mean vector and variance-covariance matrix in the same way, µ µ = 1 Σ ; Σ = 11 Σ 12 µ 2 Σ 12 Σ, 22 we have X 1 MVN(µ 1, Σ 11 ) ; X 2 MVN(µ 2, Σ 22 ) X 1 and X 2 are statistically independent vectors if and only if Σ 12 = 0 k p, the k p zero matrix What I call the Master Theorem refers to the conditional density of X 1 given X 2 = x 2 ; it is multivariate normal with mean vector of order k 1: µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ), and variance-covariance matrix of order k k: Σ 11 Σ 12 Σ 1 22 Σ 12 If we denote the (i, j) th partial covariance element in this latter matrix ( holding all variables in the second set constant ) as σ ij (k+1) p, then the partial correlation of variables i and j, holding all variables in the second set constant, is ρ ij (k+1) p = σ ij (k+1) p / σ ii (k+1) p σ jj (k+1) p 5

6 Notice that in the formulas just given, the inverse of Σ 22 must exist; otherwise, we have what is called multicollinearity, and solutions don t exist (or are very unstable numerically if the inverse almost doesn t exist) So, as one moral: don t use both total and subtest scores in the same set of independent variables If the set X 1 contains just one random variable, X 1, then the mean vector of the conditional distribution can be given as E(X 1 ) + σ 12 Σ 1 22 (x 2 µ 2 ), where σ 12 is 1 (p 1) and of the form [Cov(X 1, X 2 ),, Cov(X 1, X p )] This is nothing but our old regression equation written out in matrix notation If we let β = σ 12 Σ 1 22, then the conditional mean vector (predicted X 1 ) is equal to E(X 1 ) + β (x 2 µ 2 ) = E(X 1 ) + β 1 (x 2 µ 2 )+ +β p 1 (x p µ p ) The covariance matrix, Σ 11 Σ 12 Σ 1 22 Σ 12, takes on the form Var(X 1 ) σ 12 Σ 1 22 σ 12, ie, the variation in X 1 that is not explained If we take explained variation, σ 12 Σ 1 22 σ 12, and consider the proportion to the total variance, Var(X 1 ), we define the squared multiple correlation coefficient: ρ p = σ 12Σ 1 22 σ 12 Var(X 1 ) In fact, the linear combination, β X 2, has the highest correlation of any linear combination with X 1 ; and this correlation is the positive square root of the squared multiple correlation coefficient 03 Moving to the Sample Up to this point, the concepts of regression and kindred ideas, have been discussed only in terms of population parameters We now have 6

7 the task of obtaining estimates of these various quantities based on a sample; also, associated inference procedures need to be developed Generally, we will rely on maximum-likelihood estimation and the related likelihood-ratio tests To define how maximum-likelihood estimation proceeds, we will first give a series of general steps, and then operationalize for a univariate normal distribution example A) Let X 1,, X n be univariate observations on X (independent and continuous, and depending on some parameters, θ 1,, θ k The density function of X i is denoted as f(x i ; θ 1,, θ k ) B) The likelihood of observing x 1,, x n for values of X 1,, X n is the joint density of the n observations, and because the observations are independent, L(θ 1,, θ k ) n f(x i ; θ 1,, θ k ), ie, we assume x 1,, x n are already observed, and that L is a function of the parameters only Now, choose parameter values such that L(θ 1,, θ k ) is at a maximum ie, the probability of observing that particular sample is maximized by the choice of θ 1,, θ k Generally, it is easier and equivalent to maximize the log-likelihood using l(θ 1,, θ k ) log(l(θ 1,, θ k )) C) Generally, the maximum values are found through differentiation, and by setting the partial derivatives equal to zero Explicitly, 7

8 we find θ 1,, θ k such that θ j l(θ 1,, θ k ) = 0, for j = 1,, k (We should probably check second derivatives to see if we have a maximum, but this is almost always true, anyways) Example: Suppose X i N(µ, σ 2 ), with density f(x i ; µ, σ 2 ) = 1 σ 2π exp{ (x i µ) 2 /2σ 2 } The likelihood has the form L(µ, σ 2 ) = n 1 σ 2π exp{ (x i µ) 2 /2σ 2 } = (1/ 2π) n (1/σ 2 ) n/2 exp{ (1/2σ 2 ) n (x i µ) 2 } The log-likelihood reduces: l(µ, σ 2 ) = log L(µ, σ 2 ) = log(1/ 2π) n + log(1/σ 2 ) n/2 + ( (1/2σ 2 ) n (x i µ) 2 ) = constant (n/2) log σ 2 (1/2σ 2 ) n (x i µ) 2 The partial derivatives have the form: l(µ, σ 2 ) µ = (1/2σ 2 ) n 2(x i µ)( 1), 8

9 and l(µ, σ 2 ) σ 2 = (n/2)(1/σ 2 ) (1/2(σ 2 ) 2 )( 1) n (x i µ) 2 Setting these two expressions to zero, gives ˆµ = n x i /n ; ˆσ 2 = (1/n) n (x i ˆµ) 2 Maximum likelihood (ML) estimates are generally consistent, asymptotically normal (for large n), and efficient; a function of the sufficient statistics; and have an invariance to operations performed on them eg, the ML estimate for σ is just ˆσ 2 As the ML estimator for σ 2 shows, ML estimators are not necessarily unbiased Another Example: Suppose X 1,, X n are observations on the Poisson discrete random variable X (having outcomes 0, 1, ): The likelihood is L(λ) = n P (X i = x i ) = exp( λ)λx i x i! and the log-likelihood P (X i = x i ) = exp( nλ)λ i x i i x i! l(λ) = log L(λ) = nλ + log(λ i x i ) log i, x i! The partial derivative l(λ) λ = n + (1/λ) n x i, 9

10 and when set to zero, gives 031 Likelihood Ratio Tests ˆλ = n x i /n Besides using the likelihood concept to find good point estimates, the likelihood can also be used to find good tests of hypotheses We will develop this idea in terms of a simple example: Suppose X 1 = x 1,, X n = x n denote values obtained on n independent observations on a N(µ, σ 2 ) random variable, and where σ 2 is assumed known The standard test of H 0 : µ = µ 0 versus H 1 : µ µ 0, would compare ( x µ 0 ) 2 /(σ 2 /n) to a χ 2 random variable with one-degree of freedom Or, if we chose the significance level to be 05, we could reject H 0 if x µ 0 /(σ/ n) 196 We now develop this same test using likelihood ideas: The likelihood of observing x 1,, x n, L(µ), is only a function of µ (because σ 2 is assumed to be known): L(µ) = (1/σ 2π) n exp{ (1/2σ 2 ) n (x i µ) 2 } Under H 0, L(µ) is at a maximum for µ = µ 0 ; under H 1, L(µ) is at a maximum for µ = ˆµ, the maximum likelihood estimator If H 0 is true, L(ˆµ) and L(µ 0 ) should be close to each other; If H 0 is false, L(ˆµ) should be much larger than L(µ 0 ) Thus, the decision rule would be to reject H 0 if L(µ 0 )/L(ˆµ) λ, where λ is some number less than 100 and chosen to obtain a particular α level The ratio, (L(µ 0 )/L(ˆµ)) = exp{ (1/2)(ˆµ µ 0 ) 2 }/(σ 2 /n) Thus, we could rephrase the decision rule: reject H 0 if (ˆµ µ 0 ) 2 }/(σ 2 /n) 10

11 2 log(λ), or if x µ 0 /(σ/ n) 2 log λ Thus, for an 05 level of significance, choose λ so 196 = 2 log λ Generally, we can phrase likelihood ratio tests as: 2 log(likelihood ratio) χ 2 ν ν 0, where ν is the dimension of the parameter space generally, and ν 0 is the dimension under H Estimation To obtain estimates of the various quantities we need, merely replace the variances and covariances by their maximum likelihood estimates This process generates sample partial correlations or covariances; sample multiple squared correlations; sample regression parameters; and so on 11

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

MATH5745 Multivariate Methods Lecture 07

MATH5745 Multivariate Methods Lecture 07 MATH5745 Multivariate Methods Lecture 07 Tests of hypothesis on covariance matrix March 16, 2018 MATH5745 Multivariate Methods Lecture 07 March 16, 2018 1 / 39 Test on covariance matrices: Introduction

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

2.6.3 Generalized likelihood ratio tests

2.6.3 Generalized likelihood ratio tests 26 HYPOTHESIS TESTING 113 263 Generalized likelihood ratio tests When a UMP test does not exist, we usually use a generalized likelihood ratio test to verify H 0 : θ Θ against H 1 : θ Θ\Θ It can be used

More information

Math 181B Homework 1 Solution

Math 181B Homework 1 Solution Math 181B Homework 1 Solution 1. Write down the likelihood: L(λ = n λ X i e λ X i! (a One-sided test: H 0 : λ = 1 vs H 1 : λ = 0.1 The likelihood ratio: where LR = L(1 L(0.1 = 1 X i e n 1 = λ n X i e nλ

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Math 152. Rumbos Fall Solutions to Assignment #12

Math 152. Rumbos Fall Solutions to Assignment #12 Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus

More information

Review of Discrete Probability (contd.)

Review of Discrete Probability (contd.) Stat 504, Lecture 2 1 Review of Discrete Probability (contd.) Overview of probability and inference Probability Data generating process Observed data Inference The basic problem we study in probability:

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

In this course we: do distribution theory when ǫ i N(0, σ 2 ) discuss what if the errors, ǫ i are not normal? omit proofs.

In this course we: do distribution theory when ǫ i N(0, σ 2 ) discuss what if the errors, ǫ i are not normal? omit proofs. Distribution Theory Question: What is distribution theory? Answer: How to compute the distribution of an estimator, test or other statistic, T : Find P(T t), the Cumulative Distribution Function (CDF)

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 9 for Applied Multivariate Analysis Outline Addressing ourliers 1 Addressing ourliers 2 Outliers in Multivariate samples (1) For

More information

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test.

Hypothesis Testing. A rule for making the required choice can be described in two ways: called the rejection or critical region of the test. Hypothesis Testing Hypothesis testing is a statistical problem where you must choose, on the basis of data X, between two alternatives. We formalize this as the problem of choosing between two hypotheses:

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary Patrick Breheny October 13 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction Introduction What s wrong with z-tests? So far we ve (thoroughly!) discussed how to carry out hypothesis

More information

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX Term Test 3 December 5, 2003 Name Math 52 Student Number Direction: This test is worth 250 points and each problem worth 4 points DO ANY SIX PROBLEMS You are required to complete this test within 50 minutes

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Likelihood Ratio tests

Likelihood Ratio tests Likelihood Ratio tests For general composite hypotheses optimality theory is not usually successful in producing an optimal test. instead we look for heuristics to guide our choices. The simplest approach

More information

Topic 19 Extensions on the Likelihood Ratio

Topic 19 Extensions on the Likelihood Ratio Topic 19 Extensions on the Likelihood Ratio Two-Sided Tests 1 / 12 Outline Overview Normal Observations Power Analysis 2 / 12 Overview The likelihood ratio test is a popular choice for composite hypothesis

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

A732: Exercise #7 Maximum Likelihood

A732: Exercise #7 Maximum Likelihood A732: Exercise #7 Maximum Likelihood Due: 29 Novemer 2007 Analytic computation of some one-dimensional maximum likelihood estimators (a) Including the normalization, the exponential distriution function

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls ~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2 STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Hypothesis Testing: The Generalized Likelihood Ratio Test

Hypothesis Testing: The Generalized Likelihood Ratio Test Hypothesis Testing: The Generalized Likelihood Ratio Test Consider testing the hypotheses H 0 : θ Θ 0 H 1 : θ Θ \ Θ 0 Definition: The Generalized Likelihood Ratio (GLR Let L(θ be a likelihood for a random

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours

More information

A Likelihood Ratio Test

A Likelihood Ratio Test A Likelihood Ratio Test David Allen University of Kentucky February 23, 2012 1 Introduction Earlier presentations gave a procedure for finding an estimate and its standard error of a single linear combination

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Statistics. Statistics

Statistics. Statistics The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

5.2 Fisher information and the Cramer-Rao bound

5.2 Fisher information and the Cramer-Rao bound Stat 200: Introduction to Statistical Inference Autumn 208/9 Lecture 5: Maximum likelihood theory Lecturer: Art B. Owen October 9 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Probabilities & Statistics Revision

Probabilities & Statistics Revision Probabilities & Statistics Revision Christopher Ting Christopher Ting http://www.mysmu.edu/faculty/christophert/ : christopherting@smu.edu.sg : 6828 0364 : LKCSB 5036 January 6, 2017 Christopher Ting QF

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Statistical Inference

Statistical Inference Statistical Inference Bernhard Klingenberg Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Outline Estimation: Review of concepts

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45 Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS 21 June 2010 9:45 11:45 Answer any FOUR of the questions. University-approved

More information

Lecture 11. Multivariate Normal theory

Lecture 11. Multivariate Normal theory 10. Lecture 11. Multivariate Normal theory Lecture 11. Multivariate Normal theory 1 (1 1) 11. Multivariate Normal theory 11.1. Properties of means and covariances of vectors Properties of means and covariances

More information

Topic 17: Simple Hypotheses

Topic 17: Simple Hypotheses Topic 17: November, 2011 1 Overview and Terminology Statistical hypothesis testing is designed to address the question: Do the data provide sufficient evidence to conclude that we must depart from our

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Lectures on Statistics. William G. Faris

Lectures on Statistics. William G. Faris Lectures on Statistics William G. Faris December 1, 2003 ii Contents 1 Expectation 1 1.1 Random variables and expectation................. 1 1.2 The sample mean........................... 3 1.3 The sample

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Parameter estimation: ACVF of AR processes

Parameter estimation: ACVF of AR processes Parameter estimation: ACVF of AR processes Yule-Walker s for AR processes: a method of moments, i.e. µ = x and choose parameters so that γ(h) = ˆγ(h) (for h small ). 12 novembre 2013 1 / 8 Parameter estimation:

More information

MAS223 Statistical Modelling and Inference Examples

MAS223 Statistical Modelling and Inference Examples Chapter MAS3 Statistical Modelling and Inference Examples Example : Sample spaces and random variables. Let S be the sample space for the experiment of tossing two coins; i.e. Define the random variables

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Estimation, Inference, and Hypothesis Testing

Estimation, Inference, and Hypothesis Testing Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of

More information