STAT 512 sp 2018 Summary Sheet

Similar documents
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Statistics and Econometrics I

Central Limit Theorem ( 5.3)

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics 3858 : Maximum Likelihood Estimators

HT Introduction. P(X i = x i ) = e λ λ x i

Review. December 4 th, Review

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Statistics STAT:5100 (22S:193), Fall Sample Final Exam B

Mathematical statistics

STA 260: Statistics and Probability II

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Mathematical statistics

Probability Theory and Statistics. Peter Jochumzen

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

1. Point Estimators, Review

Elements of statistics (MATH0487-1)

Chapter 2. Discrete Distributions

Statistics GIDP Ph.D. Qualifying Exam Theory Jan 11, 2016, 9:00am-1:00pm

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Summary of Chapters 7-9

STAT 730 Chapter 4: Estimation

MATH4427 Notebook 2 Fall Semester 2017/2018

Theory of Statistics.

Exercises and Answers to Chapter 1

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Lecture 5: Moment generating functions

A Very Brief Summary of Statistical Inference, and Examples

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 21: Convergence of transformations and generating a random variable

7 Random samples and sampling distributions

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

SOLUTION FOR HOMEWORK 6, STAT 6331

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

A Very Brief Summary of Statistical Inference, and Examples

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Stat 5102 Lecture Slides Deck 3. Charles J. Geyer School of Statistics University of Minnesota

Stat 5101 Notes: Brand Name Distributions

Probability and Distributions

Confidence Intervals, Testing and ANOVA Summary

Regression and Statistical Inference

1 Likelihood. 1.1 Likelihood function. Likelihood & Maximum Likelihood Estimators

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Lecture 1: August 28

Chapter 8.8.1: A factorization theorem

Quick Tour of Basic Probability Theory and Linear Algebra

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Math 152. Rumbos Fall Solutions to Assignment #12

Lecture 11. Multivariate Normal theory

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Asymptotic Statistics-VI. Changliang Zou

Notes on Random Vectors and Multivariate Normal

Multiple Random Variables

6 The normal distribution, the central limit theorem and random samples

Evaluating the Performance of Estimators (Section 7.3)

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

First Year Examination Department of Statistics, University of Florida

15 Discrete Distributions

Chapter 3. Point Estimation. 3.1 Introduction

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Chapters 9. Properties of Point Estimators

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Things to remember when learning probability distributions:

Math 3215 Intro. Probability & Statistics Summer 14. Homework 5: Due 7/3/14

Math 494: Mathematical Statistics

IIT JAM : MATHEMATICAL STATISTICS (MS) 2013

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Contents 1. Contents

STA 4322 Exam I Name: Introduction to Statistics Theory

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Probability and Statistics Notes

2017 Financial Mathematics Orientation - Statistics

STA 2201/442 Assignment 2

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Lecture 17: Likelihood ratio and asymptotic tests

ISyE 6644 Fall 2014 Test 3 Solutions

Probability Background

Statistics. Statistics

Continuous Random Variables

IEOR E4703: Monte-Carlo Simulation

Mathematical Statistics

This does not cover everything on the final. Look at the posted practice problems for other topics.

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

Spring 2012 Math 541B Exam 1

Qualifying Exam in Probability and Statistics.

TUTORIAL 8 SOLUTIONS #

First Year Examination Department of Statistics, University of Florida

Math 494: Mathematical Statistics

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Practice Problems Section Problems

Sampling Distributions

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

Uses of Asymptotic Distributions: In order to get distribution theory, we need to norm the random variable; we usually look at n 1=2 ( X n ).

F & B Approaches to a simple model

Transcription:

STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A} for all A Y. If X is a discrete rv, the rv Y = g(x is discrete and has pmf given by { x g p Y (y = (y p X(x for y Y 0 for y / Y. If X has pdf f X then the cdf of Y = g(x is given by F Y (y = f X (xdx, < y <. {g(x<y} If X has pdf f X which is continuous on X and if g is monotone and g has a continuous derivative then the pdf of Y = g(x is given by { fx (g (y d f Y (y = dy g (y for y Y 0 for y / Y. If X has mgf M X (t, the mgf of Y = ax + b is given by M Y (t = e bt M X (at. If X has continuous cdf F X, the rv U = F X (X has the uniform distribution on (0,. To generate X F X, generate U Uniform(0, and set X = F (U, where F X (u = inf{x : F X(x u}. X If F X is strictly increasing, F X satisfies for all 0 < u < and < x <. x = F X (u u = F X(x

. Transformations of multiple random variables Let (X, X be a pair of continuous random variables with pdf f (X,X having joint support X = {(x, x : f (X,X (x, x > 0} and let Y = g (X, X and Y = g (X, X, where g and g are one-to-one functions taking values in X and returning values in the set Y = {(y, y : y = g (x, x, y = g (x, x, (x, x X }. Then the joint pdf f (Y,Y of (Y, Y is given by { f(x,x f (Y,Y (y, y = (g (y, y, g (y, y J(y, y for (y, y Y 0 for (y, y / Y, where g and g are the functions satisfying y = g (x, x and y = g (x, x x = g (y, y and x = g (y, y and J(y, y = y g (y, y y g (y, y y g provided J(y, y is not equal to zero for all (y, y Y. For real numbers a, b, c, d, a c y g b d = ad bc. (y, y (y, y, Let X,..., X n be mutually independent rvs such that X i has mgf M Xi (t for i =,..., n. Then the mgf of Y = X + + X n is given by M Y (t = n M Xi (t. i= 3. Random samples, sums of rvs, and order statistics A random sample is a collection of mutually independent rvs all having the same distribution. This common distribution is called the population distribution and quantities describing the population distribution are called population parameters. A sample statistic is a function of the rvs in the random sample and its distribution is called its sampling distribution. What we can learn about population parameters from sample statistics has everything to do with their sampling distributions. Random samples might be introduced in any of the following ways, Let X,..., X n be a random sample from a population with such-and-such distribution. Let X,..., X n be independent copies of the random variable X, where X has such-and-such distribution.

Let X,..., X n be IID rvs from such-and-such distribution (IID means independent and identically distributed. Let X,..., X n iid such-and-such distribution. Let X,..., X n be a random sample from a population with mean µ and variance σ < and define the sample mean and sample variance as X n = n (X + + X n S n = n We have E X n = µ, Var X n = σ /n, and ES n = σ. n (X i X n. For a random sample X,..., X n, the order statistics are the ordered data values, represented as i= X ( < X ( < < X (n. Let X ( < < X (n be the order statistics of a random sample from a population with cdf F X and pdf f X. Then the pdf of the kth order statistic X (k is given by f X(k (x = n! (k!(n k! [F X(x] k [ F X (x] n k f X (x for k =,..., n. In particular, the minimum X ( and the maximum X (n have the pdfs f X( (x = n[ F X (x] n f X (x f X(n (x = n[f X (x] n f X (x Let X ( < < X (n be the order statistics of a random sample from a population with cdf F X and pdf f X. Then the joint pdf of the jth and kth order statistics (X (j, X (k, with j < k n, is given by f (X(j,X (k (u, v = n! (j!(k j!(n k! f X(uf X (v[f X (u] j [F X (v F X (u] k j [ F X (v] n k for < u < v <. In particular, the joint pdf of the minimum and maximum (X (, X (n is for < u < v <. f (X(,X (n (u, v = n(n f X (uf X (v[f X (v F X (u] n 4. Pivot quantities and sampling from the Normal distribution A pivot quantity is a function of sample statistics and population parameters which has a known distribution (not depending on unknown parameters. 3

A ( α00% confidence interval (CI for an unknown parameter θ R is an interval (L, U where L and U are random variables such that P (L,U (L < θ < U = α, for α (0,. Important distributions and upper quantile notation: The Normal(0, distribution has pdf φ(z = π e z / ( < z <, and for ξ (0, we denote by z ξ the value satisfying P Z (Z > z ξ = ξ, where Z Normal(0,. The Chi-square distributions χ ν, ν =,,... have the pdfs f X (x; ν = Γ(ν/ ν/ xν/ e x/ (x > 0, and for ξ (0, we denote by χ ν,ξ the value satisfying P X(X > χ ν,ξ = ξ, where X χ ν. The parameter ν is called the degrees of freedom. The t distributions t ν, ν =,,... have the pdfs f T (t; ν = ν+ Γ( ( (ν+/ Γ( ν + t ( < t <, νπ ν and for ξ (0, we denote by t ν,ξ the value satisfying P T (T > t ν,ξ = ξ, where T t ν. The parameter ν is called the degrees of freedom. The F distributions F ν,ν, ν, ν =,,... have the pdfs f R (r; ν, ν = Γ( ν +ν Γ( ν Γ( ν ( ν ν ν / ( r (ν / + ν (ν +ν / r (r > 0, ν and for ξ (0, we denote by F ν,ν,ξ the value satisfying P R (R > F ν,ν,ξ = ξ, where R F ν,ν. The parameters ν and ν are called, respectively, the numerator degrees of freedom and the denominator degrees of freedom. Let Z Normal(0, and X χ ν be independent rvs. Then T = Z X/ν t ν. Let X χ ν and X χ ν be independent rvs. Then R = X /ν X /ν F ν,ν. Let X,..., X n be a random sample from the Normal(µ, σ distribution. Then n( X n µ/σ Normal(0,, giving P Xn ( X n z α/ σ n < µ < X n + z α/ σ n = α. (n S n/σ χ n, giving P S n ((n S n/χ n,α/ < σ < (n S n/χ n, α/ = α. 4

n( X n µ/s n t n, giving P ( Xn,S n( X S n t n n,α/ n < µ < X S n + t n n,α/ n = α. Consider two independent random samples X,..., X n from the Normal(µ, σ distribution and Y,..., Y n from the Normal(µ, σ distribution with sample means X n and Ȳn and sample variances S and S, respectively. Then S /σ S /σ F n,n, giving ( S P (S,S F S n,n, α/ < σ σ < S F S n,n,α/ = α. 5. Parametric estimation and properties of estimators Parametric framework: Let X,..., X n have a joint distribution which depends on a finite number of parameters θ,..., θ d, the values of which are unknown and lie in the spaces Θ,..., Θ d, respectively, where Θ k R, for k =,..., d. To know the joint distribution of X,..., X n, it is sufficient to know the values of θ,..., θ d, so we define estimators ˆθ,..., ˆθ d of θ,..., θ d based on X,..., X n. Nonparametric framework: Let X,..., X n have a joint distribution which depends on an infinite number of parameters. For example, suppose X,..., X n is a random sample from a distribution with pdf f X, where we do not specify any functional form for f X. Then we may regard the value of the pdf f X (x at every point x R as a parameter, and since there are an infinite number of values of x R, there are infinitely many parameters. The bias of an estimator ˆθ of a parameter θ is defined as Bias ˆθ = Eˆθ θ. And estimator is called unbiased if Bias ˆθ = 0, that is if Eˆθ = θ. The standard error of an estimator ˆθ of a parameter θ is defined as SE ˆθ = Var ˆθ, so the standard error of an estimator is simply its standard deviation. The mean squared error (MSE of an estimator ˆθ of a parameter θ is defined as MSE ˆθ = E(ˆθ θ, so the MSE of ˆθ is the expected squared distance between ˆθ and θ. MSE ˆθ = Var ˆθ + (Bias ˆθ. If ˆθ is an unbiased estimator of θ, it is not generally true that τ(ˆθ is an unbiased estimator of τ(θ. 6. Large-sample properties of estimators, consistency and WLLN 5

An sequence of estimators {θ n } n of a parameter θ Θ R is called consistent if for every ɛ > 0 and every θ Θ, lim n P ( ˆθ n θ < ɛ =. The weak law of large numbers (WLLN: Let X,..., X n be a random sample from a distribution with mean µ and variance σ <. Then X n = n (X + + X n is a consistent estimator of µ. A sequence of estimators {ˆθ n } n for θ is consistent if (a lim n Var ˆθ n = 0 (b lim n Bias ˆθ n = 0 Suppose {ˆθ,n } n and {ˆθ,n } n be sequences of estimators for θ and θ, respectively. Then (a ˆθ,n ± ˆθ,n is consistent for θ ± θ. (b ˆθ,nˆθ,n is consistent for θ θ. (c ˆθ,n /ˆθ,n is consistent for θ /θ, so long as θ 0. (d For any continuous function τ : R R, τ(ˆθ,n is consistent for τ(θ. (e For any sequences {a n } n and {b n } n such that lim n a n = and lim n b n = 0, a nˆθ,n + b n is consistent for θ. If X,..., X n is a random sample from a distribution with mean µ and variance σ < and 4th moment µ 4 <, the sample variance S n is consistent for σ. If X,..., X n is a random sample from the Bernoulli(p distribution and ˆp = n (X + + X n, then ˆp( ˆp is a consistent estimator of p( p. For a consistent sequence of estimators {ˆθ n } n of θ, we often write ˆθ n p θ, where p denotes something called convergence in probability. probability to θ. We often say, ˆθ n converges in 7. Large-sample pivot quantities and central limit theorem A sequence of random variables Y, Y,... with cdfs F Y, F Y,... is said to converge in distribution to the random variable Y F Y if lim F Y n (y = F Y (y n for all y R. We express convergence in distribution with the notation Y n D Y and we refer to the distribution with cdf F Y as the asymptotic distribution of the sequence of random variables Y, Y,... Convergence in distribution is a sense in which a random variable (we can think of a quantity computed on larger and larger samples behaves more and more like another random variable (as the sample size grows. A large-sample pivot quantity is a function of sample statistics and population parameters for which the asymptotic distribution is fully known (not depending on unknown parameters. 6

The Central Limit Theorem (CLT: Let X,..., X n be a random sample from any distribution and suppose the distribution has mean µ and variance σ <. Then X µ σ/ n D Z, Z Normal(0,. Application to CIs: The above means that for any α (0,, ( lim P z α/ < X µ n σ/ n < z α/ = α, giving that X n ± z α/ σ n is an approximate ( α00% CI for µ as long as n is large. A rule of thumb says n 30 is large. Slutzky s Theorem: Let X n D X and Y n p. Then (a X n Y n D X (b X n + Y n D X Corollary of Slutzky s Theorem: Let X,..., X n be a random sample from any distribution and suppose the distribution has mean µ and variance σ <. Moreover, let ˆσ n be a consistent estimator of σ based on X,..., X n. Then X n µ ˆσ n / n D Z, Z Normal(0,. Application to CIs: The above means that for any α (0,, ( lim P z α/ < X µ n ˆσ n / n < z α/ = α, giving that X n ± z α/ ˆσ n n is an approximate ( α00% CI for µ as long as n is large. A rule of thumb says n 30 is large. Let X,..., X n be a random sample from any distribution and suppose the distribution has mean µ and variance σ < and 4th moment µ 4 <. Then X n µ S n / n D Z, Z Normal(0,, so that for large n, X n ± z α/ S n n is an approximate ( α00% CI for µ. 7

Let X,..., X n be a random sample from the Bernoulli(p distribution and let ˆp n = X n. Then ˆp n p ˆp n( ˆp n n D Z, Z Normal(0,, so that for large n (a rule of thumb is to require min{nˆp n, n( ˆp n } 5, ˆp n ± z α/ ˆpn ( ˆp n n is an approximate ( α00% CI for p. Summary of ( α00% CIs for µ: Let X,..., X n be independent rvs with the same distribution as the rv X, where EX = µ, Var X = σ. X ± z α/ σ n (exact σ known X Normal(µ, σ σ unknown X ± t n,α/ S n n (exact X ± z α/ σ n (approx X non-normal σ known n 30 σ unknown X ± z α/ S n n (approx n < 30 Beyond the scope of this course 8. Classical two-sample results comparing means, proportions, and variances Summary of ( α00% CIs for µ µ : Let X,..., X n be independent rvs with the same distribution as the rv X, where EX = µ and Var X = σ and let Y,..., Y n be independent rvs with the same distribution as the rv Y, where EY = µ and Var σ. 8

Case (i: σ σ : X X ± tˆν,α/ s n + s n Normal min{n, n } < 30 non-normal Beyond the scope of this course. min{n, n } 30 X X ± z α/ s n + s n In the above Case (ii: σ = σ : ˆν = ( S n + S n ( ( S S n n + n n X X ± t n +n,α/. s pooled ( n + n Normal min{n, n } < 30 non-normal Beyond the scope of this course. min{n, n } 30 X X ± z α/ s pooled ( n + n 9

A ( α00% CI for p p : Let X,..., X iid n Bernoulli(p and Y,..., Y iid n Bernoulli(p and let ˆp = n (X + + X n and ˆp = n (Y + + Y n. Then for large n and n (Rule of thumb is min{n ˆp, n ( ˆp } 5 and min{n ˆp, n ( ˆp } 5 ˆp ( ˆp ˆp ˆp ± z α/ + ˆp ( ˆp n n is an approximate ( α00% CI for p p. A ( α00% CI for σ/σ : Consider two independent random samples X,..., X n from the Normal(µ, σ distribution and Y,..., Y n from the Normal(µ, σ distribution with sample variances S and S, respectively. Then ( S F S n,n, α/, S F S n,n,α/ is a ( α00% CI for σ /σ. 9. Sample size calculations Let ˆθ be an estimator of a parameter θ and let ˆθ ± η be a CI for θ. Then the quantity η is called the margin of error (ME of the CI. Let X,..., X n be a random sample from a population with mean µ and variance σ. The smallest sample size n such that the ME of the CI X n ± z α/ σ n is at most M is n = (zα/ M σ where x is the smallest integer greater than or equal to x. Plug in an estimate ˆσ of σ from a previous study to make the calculation. Let X,..., X n be a random sample from the Bernoulli(p distribution. The smallest sample size n such that the ME of the CI p( p ˆp ± z α/ n is at most M is (zα/ n = p( p. M Plug in an estimate ˆp of p from a previous study to make the calculation or use p = / to err on the side of larger n. Let X,..., X n be iid rvs with mean µ and variance σ and let Y,..., Y n be iid rvs with mean µ and variance σ. To find the smallest n = n + n such that the ME of the CI X Ȳ ± z σ α/ + σ n n 0,

is at most M, compute and set n = ( n zα/ = (σ + σ M n and n = σ + σ ( σ ( σ σ + σ n. Plug in estimates ˆσ and ˆσ for σ and σ from a previous study to make the calculation. Let X,..., X n be a random sample from the Bernoulli(p distribution and let Y,..., Y n be a random sample from the Bernoulli(p distribution. To find the smallest n = n + n such that the ME of the CI p ( p ˆp ˆp ± z α/ + p ( p n n is at most M, compute n = and set ( p ( p n = p ( p + n p ( p ( zα/ ( p ( p M + p ( p and n = ( p ( p p ( p + n. p ( p Plug in estimates ˆp and ˆp for p and p from a previous study to make the calculation. 0. First principles of estimation, sufficiency and Rao Blackwell theorem Let X,..., X n have a joint distribution depending on the parameter θ Θ R d. A statistic T (X,..., X n is a sufficient statistic for θ if it carries all the information about θ contained in X,..., X n. Specifically, T (X,..., X n is a sufficient statistic for θ if the joint distribution of X,..., X n conditional on the value of T (X,..., X n does not depend on θ. We can establish sufficiency using the following results: For discrete rvs (X,..., X n p (X,...,X n(x,..., x n ; θ, T (X,..., X n is a sufficient statistic for θ if for all (x,..., x n in the support of (X,..., X n, the ratio p (X,...,X n(x,..., x n ; θ p T (T (x,..., x n ; θ is free of θ, where p T (t; θ is the pmf of T (X,..., X n. For continuous rvs (X,..., X n f (X,...,X n(x,..., x n ; θ, T (X,..., X n is a sufficient statistic for θ if for all (x,..., x n in the support of (X,..., X n, the ratio f (X,...,X n(x,..., x n ; θ f T (T (x,..., x n ; θ is free of θ, where f T (t; θ is the pdf of T (X,..., X n.

The factorization theorem gives another way to see whether a statistic is sufficient for θ: For discrete rvs (X,..., X n p (X,...,X n(x,..., x n ; θ, T (X,..., X n is a sufficient statistic for θ if and only if there exist functions g(t ; θ and h(x,..., x n such that p (X,...,X n(x,..., x n ; θ = g(t (x,..., x n ; θh(x,..., x n for all (x,..., x n in the support of (X,..., X n and all θ Θ. For continuous rvs (X,..., X n f (X,...,X n(x,..., x n ; θ, T (X,..., X n is a sufficient statistic for θ if and only if there exist functions g(t ; θ and h(x,..., x n such that f (X,...,X n(x,..., x n ; θ = g(t (x,..., x n ; θh(x,..., x n for all (x,..., x n in the support of (X,..., X n and all θ Θ. We also consider statistics with multiple components which take the form T (X,..., X n = (T (X,..., X n,..., T K (X,..., X n for some K. For example, T (X,..., X n = (X (,..., X (n is the set of order statistics (which is always sufficient; here K = n. Minimum-variance unbiased estimator (MVUE: Let ˆθ be an unbiased estimator of θ Θ R. If Var ˆθ Var θ for every unbiased estimator θ of θ, then ˆθ is a MVUE. Rao Blackwell Theorem: Let θ be an estimator of θ Θ R such that E θ = θ and let T be a sufficient statistic for θ. Define ˆθ = E[ θ T ]. Then ˆθ is an estimator of θ such that Eˆθ = θ and Var ˆθ Var θ. Interpretation: An unbiased estimator of θ can always be improved by taking into account the value of a sufficient statistic for θ. Therefore, a MVUE must be a function of a sufficient statistic. The MVUE for a parameter θ is (essentially unique (for all situations considered in this course; to find it, do the following: (a Find a sufficient statistic for θ. (b Find a function of the sufficient statistic which is unbiased for θ. This is the MVUE. This also works for finding the MVUE of a function τ(θ of θ. In step (b, just find a function of the sufficient statistic which is unbiased for τ(θ.. MoMs and MLEs Let X,..., X n be independent rvs with the same distribution as the rv X, where X has a distribution depending on the parameters θ,..., θ d. If the first d moments EX,..., EX d of X are finite, then the method of moments (MoMs estimators ˆθ,..., ˆθ d are the values of θ,..., θ d which solve the following system of equations: m := n X i = EX =: µ n (θ,..., θ d m d := n i=. n Xi d = EX d =: µ d(θ,..., θ d i=

Let X,..., X n be a random sample from a distribution with pdf f X (x; θ,..., θ d or pmf p X (x; θ,..., θ d which depends on some parameters (θ,..., θ d Θ R d. Then the likelihood function is defined as L(θ,..., θ d ; X,..., X n = { n i= f X(X i ; θ,..., θ d, if X,..., X n iid f X (x; θ,..., θ d n i= p X(X i ; θ,..., θ d, if X,..., X n iid p X (x; θ,..., θ d. Moreover, the log-likelihood function is defined as l(θ,..., θ d ; X,..., X n = log L(θ,..., θ d ; X,..., X n. Let X,..., X n be rvs with likelihood function L(θ,..., θ d ; X,..., X n for some parameters (θ,..., θ d Θ R d. Then the maximum likelihood estimators (MLEs of θ,..., θ d are the values ˆθ,..., θ d which maximize the likelihood function over all (θ,..., θ d Θ. That is (ˆθ,..., ˆθ d = argmax L(θ,..., θ d ; X,..., X n (θ,...,θ d Θ = argmax l(θ,..., θ d ; X,..., X n (θ,...,θ d Θ If l(θ,..., θ d ; X,..., X n is differentiable and has a single maximum in the interior of Θ, then we can find the MLEs ˆθ,..., θ d by solving the following system of equations: θ l(θ,..., θ d ; X,..., X n = 0. l(θ,..., θ d ; X,..., X n = 0. θ d The MLE is always a function of a sufficient statistic. If ˆθ is the MLE for θ, then τ(ˆθ is the MLE for τ(θ, for any function τ. 3

pmf/pdf X M X (t EX Var X p X (x; p = p x ( p x, x = 0, pe t + ( p p p( p p X (x; n, p = ( n x p x ( p n x, x = 0,,..., n [pe t + ( p] n np np( p p X (x; p = ( p x p, x =,,... p X (x; p, r = ( x r ( p x r p r, x = r, r +,... [ pe t ( pe t p ( pp pe t ( pe t ] r rp r( pp p X (x; λ = e λ λ x /x! x = 0,,... e λ(et λ λ p X (x; N, M, K = ( ( M N M ( x K x / N K x = 0,,..., K complicadísimo! KM p X (x; K = K x =,..., K K N K K+ x= etx p X (x; x,..., x n = n x = x,..., x n n n i= etx i x = n n i= x i f X (x; µ, σ = π σ exp ( f X (x; α, β = x α exp Γ(αβ α f X (x; λ = exp ( x λ λ (x µ σ ( x β KM N n (N K(N M N(N (K+(K n i= (x i x < x < e µt+σ t / µ σ 0 < x < ( βt α αβ αβ 0 < x < ( λt λ λ f X (x; ν = x ν/ exp ( Γ(ν/ x 0 < x < ( t ν/ ν ν ν/ f X (x; α, β = Γ(α+β Γ(αΓ(β xα ( x β 0 < x < + ( t k k α+r α k= k! r=0 α+β+r α+β αβ (α+β (α+β+ Table : Commonly encountered pmfs and pdfs along with their mgfs, expected values, and variances. { p (X,...,X K (x,..., x K ; p,..., p K = p x p x K K (x,..., x K {0, } K : } K k= x k = ( { n! p (Y,...,Y K (y,..., y K ; n, p,..., p K = y! y K p y! p y K K (y,..., y K {0,,..., n} K : } K k= y k = n [ ( ( ( ( ] f (X,Y (x, y; µ X, µ Y, σx, σ Y, ρ = ( exp x µ X π σ X σ Y ρ ρ σ X ρ x µ X y µ Y y µ σ X σ Y + Y σ Y Table : The multinoulli and multinomial pmfs and the bivariate Normal pdf. 4