Introduction to Estimation Theory

Similar documents
Chapter 1 Review of Equations and Inequalities

Conditional probabilities and graphical models

Introduction to Algebra: The First Week

1.4 Mathematical Equivalence

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2018

Physics 6010, Fall Relevant Sections in Text: Introduction

Properties of Arithmetic

Study skills for mathematicians

Stochastic Processes

8. TRANSFORMING TOOL #1 (the Addition Property of Equality)

2.2 Graphs of Functions

CIS 2033 Lecture 5, Fall

Logic. Quantifiers. (real numbers understood). x [x is rotten in Denmark]. x<x+x 2 +1

Math 31 Lesson Plan. Day 2: Sets; Binary Operations. Elizabeth Gillaspy. September 23, 2011

30. TRANSFORMING TOOL #1 (the Addition Property of Equality)

CM10196 Topic 2: Sets, Predicates, Boolean algebras

Calculus II. Calculus II tends to be a very difficult course for many students. There are many reasons for this.

Lecture 18: Learning probabilistic models

UMASS AMHERST MATH 300 SP 05, F. HAJIR HOMEWORK 8: (EQUIVALENCE) RELATIONS AND PARTITIONS

THE LOGIC OF QUANTIFIED STATEMENTS. Predicates and Quantified Statements I. Predicates and Quantified Statements I CHAPTER 3 SECTION 3.

Chapter Three. Deciphering the Code. Understanding Notation

LECSS Physics 11 Introduction to Physics and Math Methods 1 Revised 8 September 2013 Don Bloomfield

Appendix A. Jargon. A.1 Strange technical terms

Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

Chapter 3. Cartesian Products and Relations. 3.1 Cartesian Products

Main topics for the First Midterm Exam

Notes 6: Multivariate regression ECO 231W - Undergraduate Econometrics

Math Lecture 3 Notes

Solution to Proof Questions from September 1st

How to Use Calculus Like a Physicist

2.23 Theorem. Let A and B be sets in a metric space. If A B, then L(A) L(B).

Introductory Quantum Chemistry Prof. K. L. Sebastian Department of Inorganic and Physical Chemistry Indian Institute of Science, Bangalore

We set up the basic model of two-sided, one-to-one matching

Solving with Absolute Value

A Measure Theory Tutorial (Measure Theory for Dummies) Maya R. Gupta

Calculus of One Real Variable Prof. Joydeep Dutta Department of Economic Sciences Indian Institute of Technology, Kanpur

HOW TO THINK ABOUT POINTS AND VECTORS WITHOUT COORDINATES. Math 225

One-to-one functions and onto functions

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Quantifiers Here is a (true) statement about real numbers: Every real number is either rational or irrational.

HOW TO CREATE A PROOF. Writing proofs is typically not a straightforward, algorithmic process such as calculating

HAMPSHIRE COLLEGE: YOU CAN T GET THERE FROM HERE : WHY YOU CAN T TRISECT AN ANGLE, DOUBLE THE CUBE, OR SQUARE THE CIRCLE. Contents. 1.

Lecture 4: Training a Classifier

Mathematics-I Prof. S.K. Ray Department of Mathematics and Statistics Indian Institute of Technology, Kanpur. Lecture 1 Real Numbers

Presentation Outline: Haunted Places in North America

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

Math 300: Foundations of Higher Mathematics Northwestern University, Lecture Notes

6.080/6.089 GITCS May 6-8, Lecture 22/23. α 0 + β 1. α 2 + β 2 = 1

The PROMYS Math Circle Problem of the Week #3 February 3, 2017

How to write maths (well)

MITOCW ocw f99-lec05_300k

QFT. Chapter 14: Loop Corrections to the Propagator

2. Limits at Infinity

Introduction to Logic and Axiomatic Set Theory

Coordinate systems and vectors in three spatial dimensions

An Intuitive Introduction to Motivic Homotopy Theory Vladimir Voevodsky

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

Probability, Entropy, and Inference / More About Inference

Introducing Proof 1. hsn.uk.net. Contents

33. SOLVING LINEAR INEQUALITIES IN ONE VARIABLE

Generating Function Notes , Fall 2005, Prof. Peter Shor

Math 300 Introduction to Mathematical Reasoning Autumn 2017 Inverse Functions

Physics 110. Electricity and Magnetism. Professor Dine. Spring, Handout: Vectors and Tensors: Everything You Need to Know

Introduction and some preliminaries

5601 Notes: The Sandwich Estimator

A Primer on Three Vectors

MITOCW watch?v=vjzv6wjttnc

At the start of the term, we saw the following formula for computing the sum of the first n integers:

Lecture 6: Finite Fields

We have seen that the symbols,,, and can guide the logical

The role of multiple representations in the understanding of ideal gas problems Madden S. P., Jones L. L. and Rahm J.

Lecture 1: Introduction and probability review

CHAPTER 1. Introduction

Name Solutions Linear Algebra; Test 3. Throughout the test simplify all answers except where stated otherwise.

Knots, Coloring and Applications

1.1 The Language of Mathematics Expressions versus Sentences

Sums of Squares (FNS 195-S) Fall 2014

Fitting a Straight Line to Data

The Most Important Thing for Your Child to Learn about Arithmetic. Roger Howe, Yale University

Descriptive Statistics (And a little bit on rounding and significant digits)

2.1 Definition. Let n be a positive integer. An n-dimensional vector is an ordered list of n real numbers.

Special Theory Of Relativity Prof. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay

Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities)

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Ordinary Least Squares Linear Regression

Hardy s Paradox. Chapter Introduction

Math-2 Section 1-1. Number Systems

Design and Optimization of Energy Systems Prof. C. Balaji Department of Mechanical Engineering Indian Institute of Technology, Madras

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

THE LOGIC OF QUANTIFIED STATEMENTS

Making Sense. Tom Carter. tom/sfi-csss. April 2, 2009

PROOF WITHOUT WORDS MATH CIRCLE (BEGINNERS) 05/06/2012

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Logic of Sentences (Propositional Logic) is interested only in true or false statements; does not go inside.

PHYSICS 107. Lecture 5 Newton s Laws of Motion

QUADRATICS 3.2 Breaking Symmetry: Factoring

Special Theory of Relativity Prof. Dr. Shiva Prasad Department of Physics Indian Institute of Technology, Bombay

Lecture 1: Probability Fundamentals

Let V be a vector space, and let X be a subset. We say X is a Basis if it is both linearly independent and a generating set.

N AT E S T E M E N & K E V I N Y E H R U D I N : T R A N S L AT E D

Transcription:

Introduction to Statistics IST, the slides, set 1 Introduction to Estimation Theory Richard D. Gill Dept. of Mathematics Leiden University September 18, 2009

Statistical Models Data, Parameters: Maths Data, Parameters: Less Maths The i.i.d. Case Statistics and Estimators Statistics and Estimators are functions of the data Estimation Theory: Good Estimation Statistics and Cookery Parameters and Parametrizations The Chef Discusses Cookery

Statistical Models: Data, Parameters: Maths A statistical model is a probability model (, F, P θ ) in which the probability measure P θ is indexed by an unknown parameter θ, together with a specification of the observed data X a random variable on the probability space, with values x in some outcome space (X, B) Almost all readers can safely forget about the specifications of the collections of events F and B But for completeness, F is a nice collection of nice subsets A of the sample space, and B is a nice collection of nice subsets B of the outcome space X. These are simply those sets A and B X for which it makes sense to talk about P θ (A) and P θ (X B) = P θ ({ω : X(ω) B}), respectively. Students of measure theory will know what I mean by nice, and will also realise that we can usually safely forget about these qualifications. So I ll do this slide all over again ( next slide please! )

Statistical Models: Data, Parameters: Less Maths A statistical model is a probability space on which the probability measure P θ is indexed by an unknown parameter θ, together with a specification of the observed data X a random element on the probability space with values x in some outcome space X

Statistical Models: Data, Parameters: Less Maths (cont.) The idea behind this is that we observe the values taken by a collection of random variables defined on this probability space, collected into a random vector or matrix (or whatever) X. Sometimes x is called the data, sometimes the dataset More precisely (and actually, without loss of generality), X is a list, or a list of lists, or..., of ordinary random variables, i.e., random variables taking values in the set of real numbers R = (, ), or extended real numbers R = [, + ], or in some subset thereof. Examples are: the nonnegative reals [0, ); the integers Z, the nonnegative integers including +, N; the numbers {0, 1,..., 10} There is sometimes disagreement as to whether the natural numbers include 0 or not

Statistical Models: Data, Parameters: Less Maths (cont.) Both X and a possible realization thereof, x, are often called the data. Most often X is a real random vector or a real random matrix, i.e., a vector or a matrix of ordinary (possibly extended real valued) random variables defined on the sample space (, F). Sometimes it is just one ordinary real random variable! Sometimes it is something of much more complicated structure or shape We see a realization X(ω) = x, but we don t know which θ underlies the probability measure P θ on which actually directed the Goddess Fortuna in her choice of ω. We don t even get to see or at least, for the moment we are forgetting we saw it the actual choice ω, otherwise we would have called ω the observed data I am here assuming that the model is true or correct. We can also study the case that the actual probability measure which directed Fortuna is not one of the P θ s but some other probability measure, say Q

Statistical Models: Data, Parameters: Less Maths (cont.) Supposing X takes values x in some space X with σ -algebra B of measurable sets B X, the only relevant things for us are the induced probability distributions Pθ X of X, defined by Pθ X(B) = P θ(x B), B X, B B Thus we are interested in the induced indexed family of probability spaces (X, B, (Pθ X : θ )), where we get to observe which realized dataset, a point x X, has been chosen by Fortuna, whose choice was made under the law of the probability measures or distributions Pθ X. We don t know which θ, through the distribution Pθ X, actually directed Fortuna in her choice of the actually observed x In many cases, these probability distributions are continuous distributions on X = R n for some n, i.e., have probability densities f (x; θ). In many other cases, they are discrete, i.e., concentrated on some (at most) countable set of outcomes, and therefore have probability mass functions p(x; θ)

Statistical Models: Data, Parameters: Less Maths (cont.) We will usually not have to distinguish between these two cases. In the sequel we ll often write p or f, whether we are talking about densities, mass functions, or both... Indeed, from the point of view of measure theory, both an ordinary multivariate probability density, and an ordinary discrete probability mass function, are densities of the corresponding probability measure with respect to some reference measure (ordinary n-dimensional hypervolume measure, and ordinary counting measure respectively. A statistical model is therefore often specified by merely saying: Suppose the data X = x has density f (x; θ) on X, for some θ. Or we might say ( next slide please )

Statistical Models: Data, Parameters: In a Nutshell A statistical model for observed data X = x, with unknown parameter θ is defined by X f (x; θ), x X ; for some θ Example: One observation from an unknown normal distribution Y 1 ( exp 1 (y µ) 2 ), y R; (µ, σ ) (, ) (0, ) 2πσ 2 2 σ 2

The iid Case If X = (X 1,..., X n ) where the X i are independent and identically distributed random variables (or vectors or... ) then everything depends on the sample size n and the family of probability distributions of just one of the X i s, which (apparently) depends on some parameter θ. Often these probability distributions have a density or mass-function f ( ; θ). We write X to stand for any of the X i and x for a possible value of X. Our data X = x is of the form x = (x 1,..., x n ) and X has a (joint) density or mass-function n i=1 f X(x i ; θ). Since f often stands for a generic probability density, and since now there are an awful lot of probability densities in the picture, I have started to append an index to f, in order to indicate whose density I m talking about.

Statistical Models in the iid case A statistical model often comes down to consideration of a family of probability densities (or mass functions) of data x, indexed by a parameter θ, f (x; θ). In the i.i.d. case, x = (x 1,..., x n ), and f (x; θ) = f X (x; θ) = n i=1 f X i (x i ; θ) = n i=1 f X(x i ; θ).

Statistics and estimators: Estimands, estimators, estimates A statistic T is a function of the data X, taking values in some measurable space T. Thus T = T (X) = T (X (ω)); i.e., we can think of T both as a function defined on the outcome space of X; and as a function defined on ; and hence we can think of T both as a random variable (or vector or... ) defined on (X, B, P X θ ) or on (, F, P θ). An estimator of θ is a statistic θ taking values in. A realized value of θ, which is of course just some point in, might often be denoted by the same symbol; but it has a different name: it is then called an estimate. What we are estimating, θ, is called the estimand.

Estimation Theory: The Art of Good Estimation A main task of estimation theory is to tell us how to find good estimators. But what is a good estimator? We suppose that the data is generated by one particular but unknown value of θ, often denoted θ 0 Thus θ is a random variable taking values in, and having a probability distribution depending on θ 0 An estimator is thus a recipe for converting data x into an estimate θ A good estimator ends up lying close to θ 0 with large probability; whatever θ 0 might actually be

Statistics and Cookery As is well known, if you want to tell lies you hire a statistician to cook the numbers for you Estimation theory is about recipes for recipes. In other words, meta-cookery (meta-lying) Anyone can use a cookery book. Only a good theoretical statistician can keep on inventing good new recipes; especially when for some reason or other she has to use a new kind of ingredients. Top statisticians are like five star chefs. This class is a beginners class in top-chef-school The proof of the pudding is in the eating. We investigate whether θ is usually close to θ 0, whatever θ 0 might be, or not. If it is, then this statistic does not lie. Still, if another estimator is usually closer, then the first, though telling the truth, is not telling the whole truth. The best estimator tells the truth, only the truth, and the whole truth, as much as is mathematically possible, so help me Goddess Fortuna.

Statistics and Cookery (cont.) Unfortunately, you can please some of the people all of the time, or all of the people some of the time, but not all of the people all of the time. Tastes differ. Some estimators are better when θ 0 is somewhere over here, but worse when actually θ 0 is over there. There is almost never an estimator which is absolutely best, independent of θ 0. We have to make some kind of compromise. By definition, there is no best compromise. It s a matter of taste, of negotiation.

Parameters and Parametrizations There are parameters and parameters Often we are interested in estimating some component of a vector parameter θ, or indeed, any other function of θ. If φ = φ(θ) is this so-called parameter of interest, then we often call φ itself a parameter ; and a statistic φ which takes values in the same space as φ can be called an estimator of φ An important point is that we can often usefully think of estimating some feature of P θ, i.e., some function of θ, without first figuring out how to estimate θ. If however θ is an estimator of θ and φ = φ(θ) is the parameter of interest, then φ = φ( θ) is called the plug-in estimator of φ Not all good estimators of a parameter φ are plug-in estimators. Some good estimators are plug-in estimators based on lousy estimators of θ

Interest parameters and nuisance parameters If we can write θ = (φ, ψ) where we are only interested in φ then we call φ the interest parameter and ψ the nuisance parameter

Parameters and Parametrizations Replacing θ by some one-to-one function of θ is called a reparametrization of the model. The model is really the same, we just happen to describe it differently. At the most abstract level, a statistical model is just a particular family of probability distributions of X. These distributions themselves form a parametrization of the model. Parameters of interest are interesting functions (some might say: functionals) of the unknown probability distribution. Nuisance parameters are everything else which needs to be specified in order to fix the probability distribution. Unidentified parameters are functions of θ which can t be expressed as functions of P X θ. Identified parameters can be.

The Chef Discusses Cookery Back to basics This is where the underlying probability space turns up again. We often define models by describing probability mechanisms which lead to the observed data, but which depend on plenty of unobserved things. In fact we are often most interested in those unobserved things, or rather, features of their distribution. If we did the wrong experiment however, the parameters of interest won t be functions of the distribution of the data, though they did appear in the definition of the underlying probability model of the phenomenon of interest.

The Chef Discusses Cookery A statistician s life is interesting Often in statistics there are two parts of the underlying probability model: the part corresponding to the phenomenon of interest, and the part corresponding to what we get to see of it in our experiment, our sample, whatever... Thus our experimental design, sampling design, observational frame, whatever..., can introduce more randomness and more unknown parameters into the distribution of the data. It can make our life more easy or make our life more difficult, especially depending on whether we are smart or dumb (if we are lucky enough to have any choice in the matter) This is certainly the thing that makes a statistician s life interesting An ancient Chinese curse is: may you live in interesting times

The Chef Discusses Cookery A mathematician s apology Pedantic mathemations object to our habit of using the same symbol to stand for completely different things at the same time, e.g., both for a function and for a possible value of that function. We also often use illegal (lazy physicists ) notation in which the name for an object determines its nature; in which the same symbol is used for different objects but where the context in which it appears, hopefully determines which variety is meant. We realise that this might be a subjective criterion. I shall be even more wicked still: attach whichever subscripts or superscripts to objects I think are important to emphasize, and leave them off again after a few slides when we all realise what we are talking about. Arguments will change into indices (sub- or superscript, for no particular reason except local aesthetics), and later disappear altogether One has the choice of either writing out everything explicitly which means a huge burden of complex notation, which might actually obscure the intended meaning, or of using various kinds of unofficial shorthands, which allow the insiders to quickly see what is meant, but can be impossible for outsiders to decipher.

The Chef Discusses Cookery A statistician s exhortation In this respect, statistics is much like physics: we abuse mathematics as much as we use it. One person s laziness or carelessness is another person s efficiency or clarity. Explicitness can obscure meaning, just as much as impliciteness can obscure meaning. We always have to make a compromise. At the end of the day, we just have to get used to the language which after a long cultural development is at this moment the most convenient compromise for some subset of insiders. In a hundred years time it will no longer be a good compromise any more. C est la vie!