LECTURE 2 NOTES. 1. Minimal sufficient statistics.

Size: px
Start display at page:

Download "LECTURE 2 NOTES. 1. Minimal sufficient statistics."

Transcription

1 LECTURE 2 NOTES 1. Minimal sufficient statistics. Definition 1.1 (minimal sufficient statistic). A statistic t := φ(x) is a minimial sufficient statistic for the parametric model F if 1. it is sufficient. 2. it can be expressed as a function of any other sufficient statistic; i.e. for any other sufficient statistic t, there is some function g such that t = g(t ). Recall passing a random variable x through a function g is a form of data reduction: a statistic t = g(t) has no more information than t. Thus a minimal sufficient statistic is a sufficient statistic with the least information. That is, it is an optimal form of data reduction. Like sufficient statistics, minimal sufficient statistics are not unique. If t is minimal sufficient, g(t) for any invertible g is also minimal sufficient. In terms of partitions of the sample space, a minimal sufficient statistic induces the coarsest (hence minimal) partition of the sample space among all sufficient statistics. Although minimal sufficient statistics are not unique, the minimal sufficient partition is unique. Going back to the coin tossing example, consider the statistic { (1.1) φ 100x x 2 + x 3 x 1 + x 2 + x 3 = 2 (x 1, x 2, x 3 ) = x 1 + x 2 + x 3 otherwise. Table 1 gives the conditional distribution of (x 1, x 2, x 3 ) t. It is sufficient because the conditional distribution does not depend on p. However, it is not minimal because it is not a function of x 1 + x 2 + x 3, which we know is also sufficient. Theorem 1.2. A statistic t := φ(x) is a miminal sufficent statistic for a parametric model F if the ratio f θ(x 1 ) f θ (x 2 ) does not depend on θ if and only if φ(x 1 ) = φ(x 2 ). Proof. We shall prove the theorem when the densities in the model have common support. 1

2 2 STAT 201B Table 1 The conditional distribution of three coin tosses given t. We see that t is not a minimal sufficient statistic: (x 1, x 2, x 2) t f p(x t ) (0,0,0) 0 1 (1,0,0) 1 1/3 (0,1,0) 1 1/3 (0,0,1) 1 1/3 (1,1,0) (0,1,1) 11 1 (1,0,1) (1,1,1) 3 1 First, we show that t is sufficient. For each set A t = {x : φ(x) = t} in the partition of induced by t, consider a point x t A t. The density has the form f θ (x) = f θ(x) f θ (x φ(x) ) f θ(x φ(x) ). By assumption, the ratio f θ(x) f θ (x ) 1. x φ(x) is a function of x, so f θ(x) f θ (x ) 2. f θ (x φ(x) ) depends only on φ(x). does not depend on θ. Further, depends only on x. Thus, by the factorization theorem, t = φ(x) is a sufficient statistic. We complete the proof by showing that t is minimal. It suffices to show that φ (x 1 ) = φ (x 2 ) for any sufficient statistic t := φ (x) implies φ(x 1 ) = φ(x 2 ). By the factorization theorem, the density has the form f θ (x) = g θ (φ (x))h(x) For any x 1, choose x 2 such that φ (x 1 ) = φ (x 2 ). We observe the ratio f θ (x 1 ) f θ (x 2 ) = g θ(φ (x 1 ))h(x 1 ) g θ (φ (x 2 ))h(x 2 ) = h(x 1) h(x 2 ) does not depend on θ. By assumption, this implies φ(x 1 ) = φ(x 2 ). Example 1.3. Let x i i.i.d. Poi(θ). We showed that t := i n] x i is a sufficient statistic for the Poisson family. The ratio f θ (x) f θ (x ) = i n] x i! i n] x i! θ i n] x i x i,

3 LECTURE 2 NOTES 3 which does not depend on θ if and only if t = t. By Theorem 1.2, t is a minimal sufficient statistic. 2. Exponential families. Definition 2.1. A set of densities is an exponential family if the densities are of the form (2.1) f θ (x) = exp ( η(θ) T φ(x) a(θ) ) h(x), where η(θ) R p are the natural parameters. φ(x) R p are sufficient statistics a(θ) := log exp ( η(θ) T φ(x) ) h(x)dx is the log-partition function. h : R is the base measure. Exponential families are of statistical relevance because many common distributions are exponential families. For example, 1. binomial: f p (x) = ( ) n x p x (1 p) n x = exp ( x log p + (n x) log(1 p) )( n x = exp ( x log p 1 p + n log(1 p))( n x). p 1 p. The natural parameter is the logit of p: log 2. Poisson: f λ (x) = λx x! e λ = exp (x log λ λ) 1 x!. 3. Gaussian (σ 2 = 1): 4. Gaussian (unknown σ 2 ): f µ,σ 2(x) = exp ( x2 ( µ = exp σ 2 f µ (x) = 1 2π exp ( 1(x µ)2) = exp ( µx 1 2 µ2) e x2 /2 2π. 2 + µx µ2 2σ 2 σ 2 σ 2 ] T x x 2 2 log σ ) 1 2σ 2 2π ] µ2 2σ 2 log σ ) ) 1. 2π

4 4 STAT 201B As all-inclusive as exponential families seem, there are notable exceptions. For example, the unif(0, θ) family is not an exponential family. More generally, parametric models whose support depends on the parameter are not exponential families. Often, it is convenient to re-parametrize an exponential family in terms of its natural parameters η. Doing so gives the canonical form of the exponential family: f η (x) = exp ( η T φ(x) a(η) ) h(x). For a set of sufficient statistics φ and a base measure h, the set of all valid natural parameters is the natural parameter space: H := { η R p : exp ( η T φ(x) ) h(x)dx < }. Lemma 2.2. The log-partition function of an exponential family in canonical form is convex. Proof. Let η 1, η 2 H. For any α (0, 1), a(αη 1 + (1 α)η 2 ) = log exp ( (αη 1 + (1 α)η 2 ) T φ(x) ) h(x)dx = log exp ( η1 T φ(x) ) α ( exp η T 2 φ(x) ) 1 α h(x)dx. By Hölder s inequality, log ( ( ( exp η T 1 φ(x) ) h(x) ) ) α α ( ( ( α dx exp η T 2 φ(x) ) h(x) ) ) 1 α 1 α 1 α dx = αa(η 1 ) + (1 α)a(η 2 ). Since a(η 1 ) and a(η 2 ) are finite by assumption, so is a(αη 1 + (1 α)η 2 ). Lemma 2.3. The log-partition function a : Θ R is convex and infinitely differentiable. Further, its derivatives are ] (2.2) a(θ) = E θ φ(x), 2 ] (2.3) a(θ) = cov θ φ(x). Proof. The proof of Lemma 2.2 actually shows a is convex. To complete the proof, we assume exchanging expectation and differentiation is permitted; verifying the validity of the assumption is done by a tedious argument

5 LECTURE 2 NOTES 5 that appeals to the dominated convergence theorem. We defer the details to the appendix. We exchange expectation and differentiation to obtain i a(θ) = φ i(x) exp ( θ T φ(x) ) h(x)dx exp (θt φ(x)) h(x)dx = φ i (x) exp ( θ T φ(x) a(θ) ) h(x)dx = E θ φi (x) ] for any i p], which estabishes (2.2). Exchanging expectation and differentiation again, 2 i,ja(θ) = φ i (x)(φ j (x) j a(θ)) exp ( θ T φ(x) a(θ) ) h(x)dx = φ i (x)φ j (x) exp ( θ T φ(x) a(θ) ) h(x)dx j a(θ) φ i (x) exp ( θ T φ(x) a(θ) ) h(x)dx = E θ φi (x)φ j (x) ] E θ φi (x) ] E θ φj (x) ], for any i, j p], which estabishes (2.3). The superficial dimension of an exponential family may be reduced by re-parametrization in two cases: 1. T := φ( ) is a subset of an affine set (in R p ): there are a R p and b R such that a T φ(x) = b for any x H is a subset of an affine set (in R p ). In the first case, two parameters η 1, η 2 H may correspond to the same density, making the model unidentifiable. Indeed, the densities that correspond to η and η + a are proportional to each other: f η+a (x) exp ( (η + a) T φ(x) ) h(x) = exp ( η T φ(x) + b ) h(x) exp ( η T φ(x) ) h(x). In fact, any parameter η + γa for some γ R is associated with the same density. In the second case, η obeys linear inequality constraints, the values 1 An affine set is the set of solutions to a linear system: {x R n : Ax = b} for some A R m n and b R m.

6 6 STAT 201B of some parameters are determined by the values of other parameters. Thus some of the parameters are extraneous. If neither case holds, the exponential family is minimal. Definition 2.4. An exponential family in canonical form is minimal if 1. the image of under φ is not a subset of an affine set. 2. H is not a subset of an affine set. Minimal exponential families are classified into two types: full-rank and curved. A minimal exponential family is full-rank when its natural parameter space H contains an open set. Otherwise, a minimal exponential family is curved. By the factorization theorem, φ(x) is a sufficient statistic for densities of the form f η (x) = exp ( η T φ(x) a(η) ) h(x). If the exponential family is minimal, then φ(x) is a minimal sufficient statistic. Theorem 2.5. If F is a minimal exponential family, then t = φ(x) is a minimal sufficient statistic. Proof. By Theorem 3.2, it suffices to show the ratio fη(x 1) f η(x 2 ) in η if and only if φ(x 1 ) = φ(x 2 ). If φ(x 1 ) = φ(x 2 ), the ratio is Assume fη(x 1) f η(x 2 ) We simplify to obtain f η (x 1 ) f η (x 2 ) = exp ( η T (φ(x 1 ) φ(x 2 )) ) = 1. is constant in η. That is f η1 (x 1 ) f η1 (x 2 ) = f η 2 (x 1 ) f η2 (x 2 ) for any η 1, η 2 H. (η 1 η 2 ) T (φ(x 1 ) φ(x 2 )) = 0. is constant By the definition of minimal exponential family, H is not a subset of an affine set. Since subspaces are affine sets, Θ is not a subset of any subspace, which implies span(θ) = R p.

7 LECTURE 2 NOTES 7 Since { (η1 η 2 ) T ( φ(x i ) φ(x i) ) = 0 for all η 1, η 2 H } we deduce { η T ( φ(x i ) φ(x i) ) = 0 for all η span(h) }, η T ( φ(x i ) φ(x i )) = 0 for all η R p, which allows us to conclude φ(x i ) = φ(x i ) = 0. The joint distribution of i.i.d. samples from a p-dimensional exponential family is another p-dimensional exponential family: f θ (x i ) = exp ( η(θ) T φ(x i ) a(θ) ) h(x i ) i n] i n] = exp ( η(θ) T ( i n] φ(x i) ) ) a(θ) i n] h(x i ) By the factorization theorem, i n] φ(x i) is a sufficient statistic. We remark that its dimension does not grow with the sample size. In fact, i.i.d. exponential families are the only parametric models that admit a sufficient statistic whose dimension does not depend on the sample size. The Pitman- Koopman-Darmois 2 theorem says that if {x i } i n] are i.i.d. samples from a continuous density f θ in a parametric model that consists of densities whose support does not depend on the parameter, then there is a sufficient statistic whose dimension does not depend on the sample size if and only if the parametric model is an exponential family. Lemma A.1. APPENDI A If two functions f 1 and f 2 are 1. proportional to each other: f 1 (x) = cf 2 (x) for any x 2. f 1(x)dx = f 2(x)dx, then they are equal: f 1 (x) = f 2 (x) for any x. Proof. Indeed, by integrating f 1 (x) = cf 2 (x) and rearranging, we have f 1(x)dx f 2(x)dx = c. By the assumption that their integrals agree, we deduce c = 1, which shows that f 1 (x) = f 2 (x). 2 One of the namesakes of the theorem is Dr. Jim Pitman s father, Dr. Edwin Pitman.

8 8 STAT 201B Lemma A.2. We have at any η int(h). Let f θ (x) be a member of a canonical exponential family. i z(η) = i exp(η T φ(x)) ] h(x)dx Proof. We show that exchanging differentiation and integration is justified at the origin. By the definition of the derivative, i z(0) = lim n = lim n z(e i ) z(0) e (e j) T φ(x) 1 h(x)dx, where { } is any decreasing sequence that converges to zero. We assume d 0 := sup n is small enough so that e j δ 0 int(h). We recognize the limit of the integrand is i exp(η T φ(x)) ] h(x). To justify exchanging lim n and integration, we appeal to the dominated convergence theorem. Let d 0 := sup n. The (absolute value of the) integrand is at most e (e j) T φ(x) 1 h(x) e (e j) T φ(x) (e j ) T φ(x) h(x) e (e jδ 0 ) T φ(x) (e j δ 0 ) T φ(x) h(x) δ 0 exp2 (e jδ 0 ) T φ(x) h(x) δ 0 exp2 (e jδ 0 ) T φ(x) + exp 2 (e j δ 0 ) T φ(x) h(x) δ 0 We check that the bound is integrable: exp 2 (e jδ 0 ) T φ(x) + exp 2 (e j δ 0 ) T φ(x) h(x)dx δ 0 = z(2e j δ 0 ) + z( 2e j δ 0 ), ( e x 1 e x x ) ( x e x ) ( e x e x + e x)

9 LECTURE 2 NOTES 9 which, by the assumption that e j δ 0 int(h) is finite. Thus we are free to exchange lim n and integration to conclude i z(0) = lim n = = lim n e (e j) T φ(x) 1 h(x)dx e (e j) T φ(x) 1 h(x)dx i exp(η T φ(x)) ] 0 h(x)dx. Yuekai Sun Berkeley, California November 29, 2015

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information

ECE 275B Homework # 1 Solutions Winter 2018

ECE 275B Homework # 1 Solutions Winter 2018 ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,

More information

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS LECTURE : EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS HANI GOODARZI AND SINA JAFARPOUR. EXPONENTIAL FAMILY. Exponential family comprises a set of flexible distribution ranging both continuous and

More information

MIT Spring 2016

MIT Spring 2016 Dr. Kempthorne Spring 2016 1 Outline Building 1 Building 2 Definition Building Let X be a random variable/vector with sample space X R q and probability model P θ. The class of probability models P = {P

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Monday, September 26, 2011 Lecture 10: Exponential families and Sufficient statistics Exponential Families Exponential families are important parametric families

More information

Introduction: exponential family, conjugacy, and sufficiency (9/2/13)

Introduction: exponential family, conjugacy, and sufficiency (9/2/13) STA56: Probabilistic machine learning Introduction: exponential family, conjugacy, and sufficiency 9/2/3 Lecturer: Barbara Engelhardt Scribes: Melissa Dalis, Abhinandan Nath, Abhishek Dubey, Xin Zhou Review

More information

Lecture 17: Minimal sufficiency

Lecture 17: Minimal sufficiency Lecture 17: Minimal sufficiency Maximal reduction without loss of information There are many sufficient statistics for a given family P. In fact, X (the whole data set) is sufficient. If T is a sufficient

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Lecture 16 November Application of MoUM to our 2-sided testing problem

Lecture 16 November Application of MoUM to our 2-sided testing problem STATS 300A: Theory of Statistics Fall 2015 Lecture 16 November 17 Lecturer: Lester Mackey Scribe: Reginald Long, Colin Wei Warning: These notes may contain factual and/or typographic errors. 16.1 Recap

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Lecture 22: Variance and Covariance

Lecture 22: Variance and Covariance EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce

More information

March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS

March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS March 10, 2017 THE EXPONENTIAL CLASS OF DISTRIBUTIONS Abstract. We will introduce a class of distributions that will contain many of the discrete and continuous we are familiar with. This class will help

More information

Lecture 3: Random variables, distributions, and transformations

Lecture 3: Random variables, distributions, and transformations Lecture 3: Random variables, distributions, and transformations Definition 1.4.1. A random variable X is a function from S into a subset of R such that for any Borel set B R {X B} = {ω S : X(ω) B} is an

More information

Last Lecture - Key Questions. Biostatistics Statistical Inference Lecture 03. Minimal Sufficient Statistics

Last Lecture - Key Questions. Biostatistics Statistical Inference Lecture 03. Minimal Sufficient Statistics Last Lecture - Key Questions Biostatistics 602 - Statistical Inference Lecture 03 Hyun Min Kang January 17th, 2013 1 How do we show that a statistic is sufficient for θ? 2 What is a necessary and sufficient

More information

Math Solutions to homework 5

Math Solutions to homework 5 Math 75 - Solutions to homework 5 Cédric De Groote November 9, 207 Problem (7. in the book): Let {e n } be a complete orthonormal sequence in a Hilbert space H and let λ n C for n N. Show that there is

More information

18.175: Lecture 15 Characteristic functions and central limit theorem

18.175: Lecture 15 Characteristic functions and central limit theorem 18.175: Lecture 15 Characteristic functions and central limit theorem Scott Sheffield MIT Outline Characteristic functions Outline Characteristic functions Characteristic functions Let X be a random variable.

More information

Classical Estimation Topics

Classical Estimation Topics Classical Estimation Topics Namrata Vaswani, Iowa State University February 25, 2014 This note fills in the gaps in the notes already provided (l0.pdf, l1.pdf, l2.pdf, l3.pdf, LeastSquares.pdf). 1 Min

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Chapter 8.8.1: A factorization theorem

Chapter 8.8.1: A factorization theorem LECTURE 14 Chapter 8.8.1: A factorization theorem The characterization of a sufficient statistic in terms of the conditional distribution of the data given the statistic can be difficult to work with.

More information

1 Probability Model. 1.1 Types of models to be discussed in the course

1 Probability Model. 1.1 Types of models to be discussed in the course Sufficiency January 18, 016 Debdeep Pati 1 Probability Model Model: A family of distributions P θ : θ Θ}. P θ (B) is the probability of the event B when the parameter takes the value θ. P θ is described

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES Contents 1. Continuous random variables 2. Examples 3. Expected values 4. Joint distributions

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

STAT 801: Mathematical Statistics. Hypothesis Testing

STAT 801: Mathematical Statistics. Hypothesis Testing STAT 801: Mathematical Statistics Hypothesis Testing Hypothesis testing: a statistical problem where you must choose, on the basis o data X, between two alternatives. We ormalize this as the problem o

More information

MIT Spring 2016

MIT Spring 2016 Exponential Families II MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Exponential Families II 1 Exponential Families II 2 : Expectation and Variance U (k 1) and V (l 1) are random vectors If A (m k),

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

18.175: Lecture 3 Integration

18.175: Lecture 3 Integration 18.175: Lecture 3 Scott Sheffield MIT Outline Outline Recall definitions Probability space is triple (Ω, F, P) where Ω is sample space, F is set of events (the σ-algebra) and P : F [0, 1] is the probability

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Hypothesis testing is a statistical problem where you must choose, on the basis of data X, between two alternatives. We formalize this as the problem of choosing between two

More information

Patterns of Scalable Bayesian Inference Background (Session 1)

Patterns of Scalable Bayesian Inference Background (Session 1) Patterns of Scalable Bayesian Inference Background (Session 1) Jerónimo Arenas-García Universidad Carlos III de Madrid jeronimo.arenas@gmail.com June 14, 2017 1 / 15 Motivation. Bayesian Learning principles

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Chapter 1. Statistical Spaces

Chapter 1. Statistical Spaces Chapter 1 Statistical Spaces Mathematical statistics is a science that studies the statistical regularity of random phenomena, essentially by some observation values of random variable (r.v.) X. Sometimes

More information

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank

More information

1. Let A be a 2 2 nonzero real matrix. Which of the following is true?

1. Let A be a 2 2 nonzero real matrix. Which of the following is true? 1. Let A be a 2 2 nonzero real matrix. Which of the following is true? (A) A has a nonzero eigenvalue. (B) A 2 has at least one positive entry. (C) trace (A 2 ) is positive. (D) All entries of A 2 cannot

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Order Statistics and Distributions

Order Statistics and Distributions Order Statistics and Distributions 1 Some Preliminary Comments and Ideas In this section we consider a random sample X 1, X 2,..., X n common continuous distribution function F and probability density

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

Lecture 3. David Aldous. 31 August David Aldous Lecture 3

Lecture 3. David Aldous. 31 August David Aldous Lecture 3 Lecture 3 David Aldous 31 August 2015 This size-bias effect occurs in other contexts, such as class size. If a small Department offers two courses, with enrollments 90 and 10, then average class (faculty

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell, Recall The Neyman-Pearson Lemma Neyman-Pearson Lemma: Let Θ = {θ 0, θ }, and let F θ0 (x) be the cdf of the random vector X under hypothesis and F θ (x) be its cdf under hypothesis. Assume that the cdfs

More information

18.175: Lecture 13 Infinite divisibility and Lévy processes

18.175: Lecture 13 Infinite divisibility and Lévy processes 18.175 Lecture 13 18.175: Lecture 13 Infinite divisibility and Lévy processes Scott Sheffield MIT Outline Poisson random variable convergence Extend CLT idea to stable random variables Infinite divisibility

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Richard Lockhart Simon Fraser University STAT 830 Fall 2018 Richard Lockhart (Simon Fraser University) STAT 830 Hypothesis Testing STAT 830 Fall 2018 1 / 30 Purposes of These

More information

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, 2018 BORIS S. MORDUKHOVICH 1 and NGUYEN MAU NAM 2 Dedicated to Franco Giannessi and Diethard Pallaschke with great respect Abstract. In

More information

Chapter 1. Sets and probability. 1.3 Probability space

Chapter 1. Sets and probability. 1.3 Probability space Random processes - Chapter 1. Sets and probability 1 Random processes Chapter 1. Sets and probability 1.3 Probability space 1.3 Probability space Random processes - Chapter 1. Sets and probability 2 Probability

More information

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions 18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 18.175 Lecture 8 1 Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach

More information

Lecture Notes 3 Convergence (Chapter 5)

Lecture Notes 3 Convergence (Chapter 5) Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Lecture 26: Likelihood ratio tests

Lecture 26: Likelihood ratio tests Lecture 26: Likelihood ratio tests Likelihood ratio When both H 0 and H 1 are simple (i.e., Θ 0 = {θ 0 } and Θ 1 = {θ 1 }), Theorem 6.1 applies and a UMP test rejects H 0 when f θ1 (X) f θ0 (X) > c 0 for

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

I. ANALYSIS; PROBABILITY

I. ANALYSIS; PROBABILITY ma414l1.tex Lecture 1. 12.1.2012 I. NLYSIS; PROBBILITY 1. Lebesgue Measure and Integral We recall Lebesgue measure (M411 Probability and Measure) λ: defined on intervals (a, b] by λ((a, b]) := b a (so

More information

Introducing the Normal Distribution

Introducing the Normal Distribution Department of Mathematics Ma 3/13 KC Border Introduction to Probability and Statistics Winter 219 Lecture 1: Introducing the Normal Distribution Relevant textbook passages: Pitman [5]: Sections 1.2, 2.2,

More information

Lecture 11: Random Variables

Lecture 11: Random Variables EE5110: Probability Foundations for Electrical Engineers July-November 2015 Lecture 11: Random Variables Lecturer: Dr. Krishna Jagannathan Scribe: Sudharsan, Gopal, Arjun B, Debayani The study of random

More information

u( x) = g( y) ds y ( 1 ) U solves u = 0 in U; u = 0 on U. ( 3)

u( x) = g( y) ds y ( 1 ) U solves u = 0 in U; u = 0 on U. ( 3) M ath 5 2 7 Fall 2 0 0 9 L ecture 4 ( S ep. 6, 2 0 0 9 ) Properties and Estimates of Laplace s and Poisson s Equations In our last lecture we derived the formulas for the solutions of Poisson s equation

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

Numerical Solutions to Partial Differential Equations

Numerical Solutions to Partial Differential Equations Numerical Solutions to Partial Differential Equations Zhiping Li LMAM and School of Mathematical Sciences Peking University The Residual and Error of Finite Element Solutions Mixed BVP of Poisson Equation

More information

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4

STA 732: Inference. Notes 2. Neyman-Pearsonian Classical Hypothesis Testing B&D 4 STA 73: Inference Notes. Neyman-Pearsonian Classical Hypothesis Testing B&D 4 1 Testing as a rule Fisher s quantification of extremeness of observed evidence clearly lacked rigorous mathematical interpretation.

More information

Fourier Transform & Sobolev Spaces

Fourier Transform & Sobolev Spaces Fourier Transform & Sobolev Spaces Michael Reiter, Arthur Schuster Summer Term 2008 Abstract We introduce the concept of weak derivative that allows us to define new interesting Hilbert spaces the Sobolev

More information

Generalized Linear Models and Exponential Families

Generalized Linear Models and Exponential Families Generalized Linear Models and Exponential Families David M. Blei COS424 Princeton University April 12, 2012 Generalized Linear Models x n y n β Linear regression and logistic regression are both linear

More information

USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION

USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION YI WANG Abstract. We study Banach and Hilbert spaces with an eye towards defining weak solutions to elliptic PDE. Using Lax-Milgram

More information

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true 3 ohn Nirenberg inequality, Part I A function ϕ L () belongs to the space BMO() if sup ϕ(s) ϕ I I I < for all subintervals I If the same is true for the dyadic subintervals I D only, we will write ϕ BMO

More information

MATH 426, TOPOLOGY. p 1.

MATH 426, TOPOLOGY. p 1. MATH 426, TOPOLOGY THE p-norms In this document we assume an extended real line, where is an element greater than all real numbers; the interval notation [1, ] will be used to mean [1, ) { }. 1. THE p

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Part A Examination. Statistics: Theory Paper McGill University Faculty of Science Department of Mathematics and Statistics Part A Examination Statistics: Theory Paper Date: 10th May 2015 Instructions Time: 1pm-5pm Answer only two questions from Section

More information

1 Exercises for lecture 1

1 Exercises for lecture 1 1 Exercises for lecture 1 Exercise 1 a) Show that if F is symmetric with respect to µ, and E( X )

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

(Practice Version) Midterm Exam 2

(Practice Version) Midterm Exam 2 EECS 126 Probability and Random Processes University of California, Berkeley: Fall 2014 Kannan Ramchandran November 7, 2014 (Practice Version) Midterm Exam 2 Last name First name SID Rules. DO NOT open

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted

More information

Lecture 5: The Bellman Equation

Lecture 5: The Bellman Equation Lecture 5: The Bellman Equation Florian Scheuer 1 Plan Prove properties of the Bellman equation (In particular, existence and uniqueness of solution) Use this to prove properties of the solution Think

More information

Trace Class Operators and Lidskii s Theorem

Trace Class Operators and Lidskii s Theorem Trace Class Operators and Lidskii s Theorem Tom Phelan Semester 2 2009 1 Introduction The purpose of this paper is to provide the reader with a self-contained derivation of the celebrated Lidskii Trace

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Miscellaneous Errors in the Chapter 6 Solutions

Miscellaneous Errors in the Chapter 6 Solutions Miscellaneous Errors in the Chapter 6 Solutions 3.30(b In this problem, early printings of the second edition use the beta(a, b distribution, but later versions use the Poisson(λ distribution. If your

More information

10. Composite Hypothesis Testing. ECE 830, Spring 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014 10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters

More information

4. CONTINUOUS RANDOM VARIABLES

4. CONTINUOUS RANDOM VARIABLES IA Probability Lent Term 4 CONTINUOUS RANDOM VARIABLES 4 Introduction Up to now we have restricted consideration to sample spaces Ω which are finite, or countable; we will now relax that assumption We

More information

Introduction & Random Variables. John Dodson. September 3, 2008

Introduction & Random Variables. John Dodson. September 3, 2008 & September 3, 2008 Statistics Statistics Statistics Course Fall sequence modules portfolio optimization Bill Barr fixed-income markets Chris Bemis calibration & simulation introductions flashcards how

More information

Banach Spaces V: A Closer Look at the w- and the w -Topologies

Banach Spaces V: A Closer Look at the w- and the w -Topologies BS V c Gabriel Nagy Banach Spaces V: A Closer Look at the w- and the w -Topologies Notes from the Functional Analysis Course (Fall 07 - Spring 08) In this section we discuss two important, but highly non-trivial,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does

More information

Functional Analysis Exercise Class

Functional Analysis Exercise Class Functional Analysis Exercise Class Week 9 November 13 November Deadline to hand in the homeworks: your exercise class on week 16 November 20 November Exercises (1) Show that if T B(X, Y ) and S B(Y, Z)

More information

Introducing the Normal Distribution

Introducing the Normal Distribution Department of Mathematics Ma 3/103 KC Border Introduction to Probability and Statistics Winter 2017 Lecture 10: Introducing the Normal Distribution Relevant textbook passages: Pitman [5]: Sections 1.2,

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

Chapter 2: Random Variables

Chapter 2: Random Variables ECE54: Stochastic Signals and Systems Fall 28 Lecture 2 - September 3, 28 Dr. Salim El Rouayheb Scribe: Peiwen Tian, Lu Liu, Ghadir Ayache Chapter 2: Random Variables Example. Tossing a fair coin twice:

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

New Class of duality models in discrete minmax fractional programming based on second-order univexities

New Class of duality models in discrete minmax fractional programming based on second-order univexities STATISTICS, OPTIMIZATION AND INFORMATION COMPUTING Stat., Optim. Inf. Comput., Vol. 5, September 017, pp 6 77. Published online in International Academic Press www.iapress.org) New Class of duality models

More information

The Sherrington-Kirkpatrick model

The Sherrington-Kirkpatrick model Stat 36 Stochastic Processes on Graphs The Sherrington-Kirkpatrick model Andrea Montanari Lecture - 4/-4/00 The Sherrington-Kirkpatrick (SK) model was introduced by David Sherrington and Scott Kirkpatrick

More information