Statistical Inference

Size: px
Start display at page:

Download "Statistical Inference"

Transcription

1 Statistical Iferece Professor Abolfazl Sakhai School of Social Work Columbia Uiversity Notes by Yiqiao Yi i L A TEX December 18, 2016 Abstract This is the otes for Statistical Iferece at Columbia Uiversity. I am workig for Professor Abolfazl Sakhai o this project for the lectures he gave i Fall semester i

2 This is dedicated to Professor Abolfazl Sakhai. 2

3 Preface Startig with a backgroud i tradig, I am o strager to the kowledge of statistics. Just like tradig, research i the realms such as asset pricig ad behavioral fiace requires a whole lot about theory i probabilities ad statistics as well. With this purpose, I came to Columbia Uiversity to atted this course kowig the essece of this subject. Luckily, Professor Safikhai is teachig this course ad I have decided to put everythig dow from his lecture. It is ot just a solid project for me but also for the future geeratios. The aim of the course is to describe the two aspects of statistics estimatio ad iferece i some details. The topics will iclude maximum likelihood estimatio, Bayesia iferece, cofidece itervals, bootstrap methods, statistical hypothesis testig, etc. Yiqiao Yi Fall 2016 at Columbia Uiversity 3

4 Cotets 1 Pre-requisite: Differet Types of Distributios Beroulli Distributio Biomial Distributio Poisso Distributio Normal Distributio Gamma Distributio Beta Distributio Multiomial Distributio Estimatio Motivatio Statistical Iferece Method of Momets Estimators Method of Maximum Likelihood Properties of M.L.E.s Computatioal Methods for Approximatig M.L.E.s Priciples of Estimatio Sufficiet Statistics Bayesia Paradigm Prior Distributio Posterior Distributio Samplig from a Beroulli Distributio Samplig from a Poisso Distributio Samplig from a Expoetial Distributio Bayes Estimators Samplig from a Normal Distributio Samplig Distributio of Estimators Samplig Distributio of a Statistic The Gamma ad the χ 2 Distributios The Gamma Distributios The Chi-squared Distributio Samplig from a Normal Populatio The t-distributio Cofidece Itervals The Cramer-Rao Iformatio Iequality Large Sample Properties of the M.L.E Hypothesis Testig Hypothesis Testig Power Fuctio ad Types of Error Sigificace Level P -Value The Complete Power Fuctio Comparig the Meas of Two Normal Distributios (Twosample tg-test) Oe-sided Alteratives Two-sided Alteratives

5 4.3.3 Comparig the Variaces of Two Normal Distributios (F -test) Likelihood Ratio Test Noparametrics Tests of Goodess-of-Fit The χ 2 -Test Likelihood Ratio Tests for Proportios Goodess-of-Fit for Composite Hypothesis The Sample Distributio Fuctio The Kolmogorov-Smirov Goodess-of-Fit Test Bootstrap Bootstrap i Geeral Parametric Bootstrap The Noparametric Bootstrap Liear Regressio Models Method of Least Squares Normal Equatios Normal Simple Liear Regressio Iferece Liear Models with Normal Errors Oe Way Aalysis of Variace (ANOVA)

6 1 Pre-requisite: Differet Types of Distributios 1.1 Beroulli Distributio Defiitio 1.1. Beroulli Distributio. A radom variable X has Beroulli distributio with parameter p (0 p 1) if X ca take oly the values 0 ad 1 ad the probabilities are P r(x = 1) = p ad Pr(X = 0) = 1 p. The p.f. of X ca be writte as follows: { p x (1 p) 1 x, for x = 0, 1 f(x p) = 0, otherwise If X has the Beroulli distributio with parameter p, the X 2 ad X are the same radom variable. It follows that E(X) = 1 p + 0 (1 p) = p E(X 2 ) = E(X) = p ad V ar(x) = E(X 2 ) [E(X)] 2 = p(1 p) Moreover, the m.g.f. of X is φ(t) = E(e tx ) = pe t + (1 p) for < t < 1.2 Biomial Distributio Defiitio 1.2. Biomial Distributio. A radom variable X has the biomial distributio with parameters ad p if X has a discrete distributio for which the p.f. is as follows: { ( ) f(x, p) = x p x (1 p) x, for x = 0, 1, 2,..., 0, otherwise Theorem 1.3. If the radom variables X 1,..., X from Beroulli trials with parameter p, ad if X = X X, the X has the biomial distributio with parameters ad p. The we have E(X) = E(Xi) = p V ar(x) = V ar(xi) = p(1 p) φ(t) = E(e tx ) = E(etX i ) = (pe t + 1 p) 6

7 1.3 Poisso Distributio Defiitio 1.4. Poisso Distributio. Let λ > 0. A radom variable X has the Poisso distributio with mea λ if the p.f. of X is as follows: { e λ λ x, forx = 0, 1, 2,... f(x λ) = x! 0, otherwise Theorem 1.5. Mea. The mea of the distributio with p.f. equal to the above equatio is λ. Theorem 1.6. Variace. The variace of the Poisso distributio with mea λ is also λ. Theorem 1.7. Momet Geeratig Fuctio. The m.g.f. of the Poisso distributio with mea λ is φ(t) = e λ(et 1). Defiitio 1.8. Momet Geeratig Fuctio. If X has the egative biomial distributio with parameters r ad p, the the m.g.f. of X is as follows: ( ) r ( ) p 1 φ(t) = for t < log. 1 (1 p)e t 1 p Theorem 1.9. Mea ad Variace. If X has the egative biomial distributio with parameters r ad p, the mea ad the variace of X must be E(X) = r(1 p) p ad V ar(x) = r(1 p) p 2. Remark The mea ad variace of the geometric distributio with parameter p are the special case of the equatio above with r = Normal Distributio Defiitio Normal Distributios. A radom variable X has the ormal distributio with mea µ ad variace σ 2 ( < µ < ad σ > 0) if X has a cotiuous distributio with the followig p.d.f. f(x µ, σ 2 ) = [ 1 (2π) 1/2 σ exp 1 2 for < x <. ( x µ σ ) 2 ], Defiitio Logormal Distributio. If log(x) has the ormal distributio with mea µ ad variace σ 2, we say that X has the logormal distributio with parameters µ ad σ 2. 7

8 1.5 Gamma Distributio Defiitio Gamma Fuctio. For each positive umber α, let the value Γ(α) be defied by the followig itegral Γ(α) = 0 x α 1 e x The fuctio Γ defied above for α > 0 is called the gamma fuctio. Defiitio Gamma Distributios. Let α ad β be positive umbers. A radom variable X has the gamma distributio with parameters α ad β if X has a cotiuous distributio for which the p.d.f. is f(x α, β) = { β α Γ(α) xα 1 e βx, for x > 0 0, for x 0 Theorem Momets. Let X have the gamma distributio with parameters α ad β. For k = 1, 2,..., E(X k ) = Γ(α + k) β k Γ(α) I particular, E(X) = α β, ad V ar(x) = α β 2. = α(α + 1)...(α + k 1) β k. Theorem Momet Geeratig Fuctio. Let X have the gamma distributio with parameters α ad β. The m.g.f. of X is ( ) α β φ(t) = for t < β. β t 1.6 Beta Distributio Defiitio Beta Fuctio. For each positive α ad β, defie B(α, β) = 1 The fuctio B is called the beta fuctio. Theorem For all α, β > 0, 0 x α 1 (1 x) β 1 dx. B(α, β) = Γ(α)Γ(β) Γ(α + β). Defiitio Beta Distributios. Let α, β > 0 ad let X be a radom variable with p.d.f. { Γ(α+β) f(x α, β) = Γ(α)Γ(β) xα 1 (1 x) β 1, for 0 < x < 1, 0, otherwise. 8

9 Theorem Suppose that P has the beta distributio with parameters α ad β, ad the coditioal distributio of X give P = p is the biomial distributio with parameters ad p. The the coditioal distributio of P give X = x is the beta distributio with parameters α+x ad β+ x. Theorem Momets. Suppose that X has the beta distributio with parameters α ad β. The for each positive iteger k, E(X k ) = α(α + 1)...(α + k 1) (α + β)(α + β + 1)...(α + β + k 1). I particular, E(X) = α α + β, αβ V ar(x) = (α + β) 2 (α + β + 1). 1.7 Multiomial Distributio Defiitio Multiomial Distributio. X = (X 1,..., X k )whose p.f. is give as A discrete radom vector f(x, p) = P r(x = x) = P r(x 1 = x 1,..., X k = k) { ( ) x = 1,...,x k p x 1 1..px k k, x x k = 0, otherwise ad has the multiomial distributio with parameters ad p = (p 1,..., + p k ). Theorem Meas, Variace, ad Covariace. Let the radom vector X have the multiomial distributio with parameters ad P. The meas ad variaces of the coordiates of X are E(X i) = p i ad V ar(x i) = p i(1 p i) for i = 1,..., k. Also, the covariaces betwee the coordiates are 2 Estimatio Cov(X i, X j) = p ip j. 2.1 Motivatio Statistical iferece is cocered with makig probabilistic statemets about ukow quatities. The goal is to lear about the ukow quatities after observig some data that we believe cotai relevat iformatio. For a example, cosider a comay sells electroic compoets, ad 9

10 they are iterested i kowig about how log each compoet is likely to last. They collect data o compoets that have bee used uder typical coditios. They choose to use the family of expoetial distributios to model the legth of time i years from whe a compoet is put ito service util it fails. The compay believes that, if they kew the failure rate θ, the X = (X 1,..., X ) would be i.i.d. radom variables havig the expoetial distributio with parameter θ. I this case, by the L.L.N., X coverges i probability to the expectatio 1 (failure rate), i.e. X P 1. The by the cotiuous mappig θ θ theorem (see remark) coverges i probability to θ. By the C.L.T., X 1 we kow that ( X θ 1 ) D N(0, θ 2 ) where V ar(x 1) = θ 2. By the X 1 delta method, we ca show that ( θ) D N(0, (θ 2 ) 2 θ 2 ), where we have cosidered g9x) = 1 ; x g (x) = 1. x 2 Remark 2.1. This is the Theorem 6.2.5, cotiuous mappig theorem, from textbook [1]. Theorem 2.2. (Cotiuous Fuctios of Radom Variables). If Z P b, ad if g(z) is a fuctio that is cotiuous at z = b, the g(z ) P g(b). 2.2 Statistical Iferece Defiitio 2.3. (Statistical Model). A statistical model is (1) a idetificatio of radom variables of iterest, 92) a specificatio of a joit distributio or a family of possible joit distributios for the observable radom variables, (3) the idetificatio of ay parameters of those distributios that are assumed ukow, ad (4) (Bayesia approach, if deisred) a specificatio for a (joit) distributio for the ukow parameter(s). Here we ca discuss a little the differece betwee frequetist ad Bayesia statistics. Frequetist treats data as repeatable radom sample, hece the word frequecy. They cosider parameters to be fixed. They see uderlyig parameters remaiig costat durig the repeatable process. For Bayesias, the parameters are ukow ad are described probabilistically. Aalysis is doe coditioig o the observed data ad the data is treated as fixed. Defiitio 2.4. (Statistical Iferece). A statistical iferece is a procedure that procdues a probabilistic statemet about some or all parts of a statistical model. Defiitio 2.5. (Parameter Space). I a problem of statistical iferece, a characteristic or combiatio of characteristics that determie the joit distributio for the radom variables of iterest is called a parameter of the distributio. The set Ω of all possible values of a parameter θ or of a vector of parameters θ = (θ 1,..., θ k ) is called the parameter space. To uder this defiitio, cosider the followig examples. The family of biomial distributios has parameters ad p. The family of ormal distributios is parameterized by the mea µ ad variace σ 2 of each distributio (so θ = (µ, σ 2 ) ca be cosidered a pair of parameters, ad 10

11 Ω = R R + ). The family of expoetial distributios is parameterized by the rate parameter θ (the failure rate must be positive: Ω will be the set of all positive umbers). The parameter space Ω must cotai all possible values of the parameters i a give problem. For example, suppose that patiets are goig to be give a treatmet for a coditio ad that we will observe for each patiet whether or ot they recover from the coditio. For each patiet i = 1, 2,..., let X i = 1 if patiet i recovers, ad let X i = 0 if ot. As a collectio of possible distributios for X 1, X 2,..., we could choose to say that the X i are i.i.d. havig the Beroulli distributio with parameter p, for 0 p 1. I this case, the parameter p is kow to lie i the closed iterval [0, 1], ad this iterval could be take as the parameter space. Notice also that by the L.L.N., p is the limit as of the proportio of the first patiets who recover. Defiitio 2.6. (Statistic). Suppose that the observable radom variables of iterest are X 1,..., X. Let ϕ be a real-valued fuctio of real variables. The the radom variable T = ϕ(x 1,..., X ) is called a statistic. For example, oe ca cosider the sample mea X = 1 X i, the maximum X () of the values X 1,..., X, ad the sample variace S 2 = 1 (X i X ) 2 of the vales X 1,..., X. 1 Defiitio 2.7. (Estimator/Estimate). Let X 1,..., X be observalbe data whose joit distributio is idexed by a parameter θ takig avlues i a subset Ω of the real lie. A estimator ˆθ of the parameter θ is a real-valued fuctio ˆθ = ϕ(x 1,..., X ). If {X 1 = x 1,..., X = x } is observed, the ϕ(x 1,..., x ) is called the estimate of θ. Defiitio 2.8. (Estimator/Estimate). Let X 1,..., X be observable data whose joit distributio is idexed by a parameter θ takig values i a subset Ω of k-dimesioal space, i.e. Ω R k. Let h : Ω R d, be a fuctio from Ω ito d-dimesioal space. Defie ψ = h(θ). A estimator of ψ is a fuctio g(x 1,..., X ) that takes values i d-dimesioal space. If{X 1 = x 1,..., X = x } are observed, the g(x 1,..., x ) is called the estimate of ψ. Remark 2.9. Whe h i the above defiitio is the idetity fuctio h(θ) = θ, the ψ = θ ad we are estimatig the origial parameter θ. Whe g(θ) is oe coordiate of θ, the the ψ that we are estimatig is just that oe coordiate. Defiitio (Cosistet i probability estimator). A sequece of estimators ˆθ that coverges i probability to the ukow value of the parameter beig estimated, as, is called a cosistet sequece of estimators, i.e. ˆθ is cosistet if ad oly if for every ɛ > 0, P( ˆθ θ > ɛ) 0, as. Notice that i this sectio we will cover three types of estimators: (1) Method of Momets Estimators, 92) Maximum Likelihood Estimators, ad (3) Bayes Estimators. 11

12 2.3 Method of Momets Estimators Defiitio (Method of Momets Estimator/Estimate). Assume that X 1,..., X form a radom sample from a distributio that is idexed by a k-dimesioal parameter θ ad that has at least k fitie momets. For j = 1,..., k, let µ j(θ) = E(X1 θ). j Suppose that the fuctio µ(θ) = (µ 1(θ),..., µ k (θ)) is a oe-to-oe fuctio of θ. Let M(µ 1,..., µ k ) deote the iverse fuctio, that is, for all θ, θ = M(µ 1,..., µ k ). Defie the sample momets by m j = 1 X j i forj = 1,..., k. The method of momets estimator of θ is M(m 1,...m k ). The usual way of implemetig the method of momets is to set up the k equatios m j = µ j(θ) ad the solve for θ. Cosider the followig example. Suppose that X 1,..., X are i.i.d. Γ(α, β) with α, β > 0. Thus, θ(α, β) Ω := R + R +. The first two momets of this distributio are which implies that µ 1(θ) = α α(α + 1), µ2(θ) =, β β 2 α = µ2 1, β = µ1. µ 2 µ 2 1 µ 2 µ 2 1 The MOM says that we replace the right-had sides of these equatios by the sample momets ad the solve for α ad β. I this case, we get m 2 1 m 1 ˆα =, ˆβ =. m 2 m 2 1 m 2 m 2 2 For aother example, let X 1,.., X be from a N(µ, σ 2 ) distributio. Thus, θ = (µ, σ 2 ). What is the MOM estimator of θ? The solutio is the followig. Cosider µ 1 = E(X 1) ad µ 2 = E(X 2 1 ). Cleraly, the parameter θ ca be expressed as a fuctio of the first two populatio momets, sice µ = µ 1, σ 2 = µ 2 µ 2 1. To get MOM estimates of µ ad σ 2 we are goig to plug i the sample momets. Thus ˆµ = m 1 = X, ad ˆσ 2 = 1 j=1 X2 j X 2 = 1 (Xi X) 2. Remark MOM ca be uderstood as plug-i estimates; to get a estimate ˆθ of θ = M(µ 1, µ 2,..., µ k ), we plug i estimates of the µ i s which are the m i s, to get ˆθ. A observatio of the result is the follows. if M is cotiuous, the the fact that m I coverges i probability to µ i, for every i, etails that ˆθ = M(m 1, m 2,..., m k ) P M(µ 1, µ 2,..., µ k ) = θ. Thus MOM estimators are cosistet uder mild assumptios. Hece, we have the theorem: 12

13 Theorem If M is cotiuous, the m I coverges i probability to µ i, i, ˆθ = M(m 1,..., m k ) P M(µ 1,..., µ k ) = θ. Proof: By the L.L.N., the sample momets coverge i probability to the populatio momets µ 1(θ),..., µ k (θ). The geeralizatio of the cotiuous mappig theorem to fuctios of k variables implies that M( ) evaluted at the sample momets coverges i probability to θ. That is, the MOM estimator coverges i probability to θ. Remark Recall Theorem 6.2.5, Cotiuous Mappig Theorem, from text [1]. The theorem, a.k.a. Cotiuous Fuctios of Radom Variables, p if Z b, ad if g(z) is a fuctio that is cotiuous at z = b, the g(z ) p g(b). p p Similarly, it is almost as easy to show that if Z b ad Y c, ad if g(z, y) is cotiuous at (z, y) = (b, c), the g(z, Y ) p g(b, c). Ideed, this theorem exteds to ay fiite umber k of sequeces that coverge i probability ad a cotiuous fuctio of k variables. Remark I geeral, we might be iterested i estimatig g(θ) where g is some kow fuctio of θ; i such a case, the MOM estimate of g(θ) is g(ˆθ) where ˆθ is the MOM estimate of θ. For a example, let X 1,..., X be the idicators of Beroulli trials with success probability θ. We are goig to fid a MOM estimator of θ. The solutio is the followig. Note that θ is the probability of success ad satisfies that θ = E(X 1), θ = E(X1 2 ). Thus, we ca get MOMs of θ based o both the first ad the secod momets. Thus, ˆθ MOM = X, ad ˆθ MOM = 1 X 2 j = X j = X. Thus, the MOM estimates obtaied i j=1 j=1 these two differet ways coicide. Note that V ar θ ( X) = θ(1 θ), ad a MOM estimate of V ar θ ( X) is obtaied as V arθ ( X) X(1 X) MOM =. here the MOM estimate based o the secod momet µ 2 coicides with the MOM estimate based o µ 1 because of the 0-1 ature of the X i s (which etails that Xi 2 = X i). However, this is ot ecessarily the case; the MOM estimate of a certai parameter may ot be uique as illustrated below. For a example, cosider X 1,..., X be i.i.d. P oisso(λ). Fid the MOM estimator of λ. The solutio is the followig. We kow that E(X 1) = µ 1 = λ ad V ar(x 1) = µ 2 µ 2 1 = λ. Thus µ 2 = λ + λ 2. Now a MOM estimate of λ is clearly give by ˆλ = m 1 = X; thus a MOM estimate of µ 2 = λ 2 + λ is give by X 2 + X. O the other had, the obvious MOM estimate of m 2 is m 2 = 1 j=1 X2 j. However these two estimates are ot ecessarily equal; i other words, it is ot ecessarily the case that X 2 + X = 1 j=1 X2 j. This illustrates oe of the disadvatages of MOM estimates - they may ot be uiquely defied. For aother example, cosider systems with failure times X 1,..., X assumed to be i.i.d. Exp(λ). Fid the MOM estimators of λ. The solutio is to show that E(X 1) = 1, λ E(X2 1 ) = 2. Therefore, we have λ = 1 λ 2 µ 1 = 2 µ 2. The above equatios lead to two differet MOM estimators for λ; the estimate based o the first momet is ˆλ = 1 m 1, ad the estimate 13

14 based o the secod momet is ˆλ 2 MOM = m 2. Oce agai, ote the o-uiqueess of the estimates. We fiish up this sectio by some key observatios about method of momets estimates. First of all, the MOM priciple geerally leads to procedures that are easy to compute ad which are therefore valuable as prelimiary estimates. Secod, for large sample sizes, these estimates are likely to be close to the value beig estimated (cosistecy). Third, the prime disadvatage is that they do ot provide a uique estimate ad this has bee illustrated before with examples. 2.4 Method of Maximum Likelihood As before we have i.i.d. observatios X 1,..., X with commo probability desity or mass fuctio f(x, θ) ad θ is a Euclidea parameter idexig the class of distributios beig cosidered. The goal is to estimate θ or some Φ(θ) where Φ is some kow fuctio of θ. Defiitio (Likelihood fuctio). The likelihood fuctio for the sample X = (X 1,..., X ) is L (θ, X ) = f(x i, θ). This is simply the joit desity (or mass fuctio) but we ow thik of this as a fuctio of θ for a fixed X ; amely the X that is realized. Heutistics is the followig. Suppose for the momet that X i s are discrete, so that f is actually p.m.f., the L (X, θ) is exactly the probability that the observed data is realized or happes. We ow seek to obtai that θ Ω for which L (X, θ) is maximized. Call this ˆθ assume that it exists. Thus ˆθ is that value of the parameter that maximizes the likelihood fuctio, or i other words, makes the observed data most likely. It makes sese to pick ˆθ as a guess for θ. Whe the X i s are cotiuous ad f(x, θ) is i fact a desity we do the same thig maximize the likelihood fuctio as before ad prescribe the maximizer as a estimate of θ. For obvious reasos, ˆθ is called a maximum likelihood estimate (M.L.E.). Notice that ˆθ is itself a determiistic fuctio of (X 1,..., X ) ad is therefore a radom variable. There is othig that gauratees that ˆθ is uique, eve if it exists. I cases of multiple maximizers, we choose oe which is more desirable accordig to some sesible criterio. For a example, suppose that X 1,..., X are i.i.d. P oisso(θ), θ > 0. Fid the M.L.E. of θ. The solutio is to fid L (θ, X ) = e θ θ X i X 1! To maximize this expressio, we set log L(θ, X) = 0. θ = C(X )e θ θ X i. 14

15 This yields that that is, [ ( ] θ + X i )log θ = 0; θ + 1 θ = 0, showig that ˆθ = X. It ca be checked by compuig the secod derivative at ˆθ that the statioary poit ideed gives a (uique) maximum or by otig that the log-likelihood is a strictly cocave fuctio. For aother example, let X 1,..., X be i.i.d. Ber(θ) where 0 θ 1. We wat to fid the M.L.E. of θ. Now, L (θ, X ) = θx i (1 θ) 1 X i = θ X i (1 θ) X i = θ X (1 θ) (1 X ) Maximizig L (θ, X ) is equivalet to maximizig logl (θ, X ). Now log L (θ, X ) = X log theta + (1 X )log(1 θ). We split the maximizatio problem ito the followig three cases. First, X = 1; this meas that we observed a success i every trial. It is ot difficult to see that i this case the M.L.E. ˆθ = 1 which is compatible with ituitio. Secod, for X = 0; this meas that we observe da failure i every trial. It is ot difficult to see that i this case the M.L.E. ˆθ = 0, also compatible with ituitio. Last, for 0 < X < 1; i this case it is easy to see that the fuctio logl (θ, X ) goes to as θ approaches 0 or 1, so that for purposes of maximizatio we ca restrict to 0 < θ < 1. To maximize log L (θ, X ), we solve the equatio, log L(θ, X) = 0, θ which yields, X θ 1 X 1 θ = 0. This gives θ = X, which ca be checked by computig the secod derivative at X or oticig that logl (θ, X ) is cocave i θ that the fuctio attais a maximum at X. Thus, the M.L.E., ˆθ = X ad this is just the sample proportio of 1 s. Thus, i every case, the M.L.E. is the sample proportio of 1 s. Note that this is also the MOM estimate of θ. For example, suppose X 1,..., X are i.i.d. uif([0, θ]) radom variables, where θ > 0. We wat to obtai the M.L.E. of θ. The solutio is to fid the likelihood fuctio, which is L (θ, X ) = 1 = 1 θ = 1 I θ [0,θ](X i) I [Xi, )(θ) θ I [max,...,,x i, )(θ) 15

16 It is the clear that L (θ, X ) is costat ad equals 1 for θ max X θ i ad is 0 otherwise. By plottig the graph of this fuctio, you ca see that ˆθ = max Xi.,..., Here the differetiatio will ot help you to get the M.L.E. because the likelihood fuctio is ot differetiable at the poit where it hits the maximum. For example, suppose that X 1,..., X are i.i.d. N(µ, σ 2 ). We wat to fid the M.L.E.s of the mea µ ad the variace σ 2. The solutio is to fid likelihood fuctio first, which is, L (µ, σ 2, X ) = It is easy to see that ( 1 (2π) /2 σ exp 1 2σ 2 log L (µ, σ 2, X ) = 2 log σ2 1 = 2 log σ2 1 2σ 2 2σ 2 (X i µ) ). 2 (X i µ) 2 + c, c some costat (X i X ) 2 ( X 2σ 2 µ) 2. To maximize the above expressio w.r.t. µ ad σ 2 we proceed as follows. For ay (µ, σ 2 ) we have log L (µ, σ 2, X ) log L ( X, σ 2, X ), showig that we ca choose µ M.L.E. = X. The we eed to maximize log L ( X, σ 2, X ) with respect to σ 2 to fid σ 2 M.L.E.. Now, log L ( X, σ 2, X ) = 2 log σ2 1 2σ 2 (X i X ) 2. The fact that this actually gives a global maximizer follows from the fact that the secod derivative at ˆσ 2 is egative. Notice that MOM estimates coicide with the M.L.E.s. For example, we ow tweak the above situatio a bit. Suppose ow that we restrict parameter space, so that µ has to be o-egative, i.e., µ 0. Thus we seek to maximize logl (µ, σ 2, X ) but subject to the costrait that µ 0 ad σ 2 > 0. To approach this problem, the M.L.E. is ( X, ˆσ 2 ) if X 0. I case, that X < 0 we proceed thus. For fixed σ 2, the fuctio logl 9µ, σ 2, X ), as a fuctio of µ, attais a maximum at X ad the falls off as a parabola o either side. The µ 0 for which the fuctio logl (µ, σ 2, X ) is the largest is 0; thus µ M.L.E. = 0 ad logl (ˆµ, σ 2, X ) is the give by, log L (ˆµ, σ 2, X ) = 2 log σ2 1 = 2 logσ2 1 2σ 2 (X i X) 2 X 2σ 2 i 2 2σ 2 X 2 16

17 Proceedig as before (by differetiatio) it is show that σm.l.e. 2 = 1 X 2 i. Thus, the M.L.E.s ca be writte as ( (µ M.L.E., σm.l.e.) 2 = I (,0) ( X) 0, 1 X 2 i ) + I [0,+ ) ( X)( X, ˆσ 2 ). For example, about the o-uiqueess of M.L.E., suppose that X 1,..., X form a radom sample from the uiform distributio o the iterval [θ, θ + 1], where θ R is ukow. We wat to fid the M.L.E. of θ. L (θ) = I [θ,θ+1] (X i). The coditio that θ X i, for all i = 1,..., is equivalet to the coditio that θ mi{x 1,..., X } = X (1). Similarly, the coditio that X i θ+1, for all i = 1,..., is equivalet to the coditio that θ max{x 1,..., X } 1 = X () 1. Thus the likelihood ca be writte as L (θ) = I [X() 1,X (1) ](θ). Hece, it is possible to select as a M.L.E. ay value of θ i the iterval [X () 1, X (1) ], ad thus the M.L.E. is ot uique. For example, cosider a radom variable X that ca come with equal probability either from a N(0, 1) or from N(µ, σ 2 ), where both µ ad σ are ukow. Thus, the p.d.f. f(, µ, σ 2 ) of X is give by f(x, µ, σ 2 ) = 1 [ 1 e x2 /2 + 1 ] e (x µ)2 /(2σ 2 ). 2 2π 2πσ Suppose ow that X 1,..., X form a radom sample from this distributio. As usual, the likelihood fuctio L (µ, σ 2 ) = f(x i, µ, σ 2 ). We wat to fid the M.L.E. of θ = (µ, σ 2 ). Let X k deote oe of the observed values. If we let µ = X k ad let σ 2 0, we fid that L (µ, σ 2 ). Note that 0 is ot a permissible estimate of σ 2, because we kow i advace that σ > 0. Sice the likelihood fuctio ca be made arbitrarily large by choosig µ = X k ad choosig σ 2 arbitrarily close to 0, it follows that the M.L.E. does ot exist Properties of M.L.E.s Theorem (Ivariace Property of M.L.E.s). If ˆθ is the M.L.E. of θ ad if h is ay fuctio, the h(ˆθ ) is the M.L.E. of h(θ). Remark Recall Theorem i text [1]. Let ˆθ be a M.L.E. of θ, ad let g(θ) be a fuctio of θ. The M.L.E. of g(θ) is g(ˆθ). 17

18 Cosistecy is aother property for M.L.E.. Cosider a estimatio problem i which a radom sample is to be take from a distributio ivolvig a parameter θ. The uder certai coditios, which are typically satisfied i practical problems, the sequece of M.L.E.s is cosistet, i.e. ˆθ P θ, as Computatioal Methods for Approximatig M.L.E.s Cosider a example, suppose that X 1,..., X are i.i.d. from a Gamma distributio for which the p.d.f. is as follows f(x, α) = 1 Γ(α) xα 1 e x, for x > 0. The likelihood fuctio is ad thus the log-likelihood is L (α) = 1 ( ) α 1 X Γ(α) i e i, X l (α) = log L (α) = logγ(α) + (α 1) log(x i) X i, The M.L.E. of α will be the value of α that satisfies the equatio l(α) = Γ (α) + α Γ(α) Γ (α) log(x i). = 1 Γ(α) Newto s Method. Let f(x) be a real-valued fuctio of a real variable, ad suppose that we wish to solve the equatio f(x) =. Let x 0 be a iitial guess at the solutio. Newto s method replaces the iitial guess with the updated guess x 1 = x 0 f(x 0) f (x 0. The ratioale behid the ) Newto s method is: approximate the curve by a lie taget to the curve passig through the poit (x 0, f(x 0)). The approximatig lie crosses the horizotal axis at the revised guess x 1. Typically, oe replaces the iitial guess with the revised guess ad iterates Newto s method util the results stabilize. = 0 18

19 Figure 1: This is the graphic illustratio for Newto s Method. The EM Algorithm. Accordig to text [1], there are a umber of complicated situatios i which it is difficult to compute the M.L.E.. May of these situatios ivolve forms of missig data. The term missig data ca refer to several differet types of iformatio. The most obvious would be observatios that we had plaed or hoped to observe but were ot observed. For example, imagie that we plaed to collect both heights ad weights for a sample of athletes. For reasos that might be beyod our cotrol, it is possible that we observed both heights ad weights for most of the athletes, but oly heights for oe subset of atheletes ad oly weights for aother subset. If we model the heights ad weights as havig a bivariate ormal distributio, we might wat to compute the M.L.E. of the parameters of that distributio. The EM algorithm is a iterative method for approximatig M.L.E. s whe missig data makig it difficult to fid the M.L.E. s i closed form. Oe begis at stage 0 with a iitial parameter vector θ (0). To move from stage j to stage j + 1, oe first writes the full-data log-likelihood, which is what the logarithm of the likelihood fuctio would be if we had observed the missig data. The values of the missig data appear i the full-data log-likelihood as radom variables rather tha as observed values. The E step of the EM algorithm is the followig: compute the coditioal distributio of the missig data give the observed data as if the parameter θ were equal to θ (j), ad the compute the coditioal mea of the full-data log-likelihood treatig θ as costat ad the missig data as radom variables. The E step gets rid of the uobserved radom variables from the full-data log-likelihood ad leaves θ where it was. For the M step, choose θ (j+1) to maximize the expected value of the fulldata log-likelihood that you just computed. The M step takes you to stage j + 1. Ideally, the maximizatio step is o harder tha it would be if the missig data had actually bee observed. 19

20 2.5 Priciples of Estimatio We eed to setup the problem firs.t Our data X 1,..., X are i.i.d. observatios from the distributio P θ where θ Ω, the parameter space (Ω is assumed to be the k-dimesioal Euclidea space). We eed to assume we have θ 1 θ 2 P θ1 P θ2. Here is the estimatio problem. Cosider, for ow, the problem of estimatig g(θ) where g is some fuctio of θ. I may cases g(θ) = θ itself. Geerally g(θ) will describe some importat aspect of the distributio P θ. Our estimator of g(θ) will be some fuctio of our observed data X = (X 1,..., X ). I geeral, there will be differet estimators of g(θ) which may all seem reasoable from differet perspectives - the questio the becomes oe of fidig the most optimal oe. This requires a objective measure of performace of the estimator. If T estimates g(θ) a criterio that aturally suggests itself if the distace of T from g(θ). Good estimators are those for which T g(θ) is geerally small. Sice T is a radom variable o determiistic statemet ca be made about the absolute deviatio; however what we ca expect of a good estimator is a high chace of remaiig close to g(θ). Also as, sample size, icreases we get hold of more iformatio ad hece expect to be able to do a better job of estimatig g(θ). These otios whe coupled together give rise to the cosistecy requiremet for a sequece of estimators T ; as icreases, T ought to coverge i probability to g(θ) uder the probability distributio P θ ). I other words, for ay ɛ > 0, P ( T g(θ) > ɛ) 0. The above is clearly a large sample property; what it says is that with probability icreasig to 1, T estimates g(θ) to ay pre-determied level of accuracy. However, the cosistecy coditio aloe, does ot tell us aythig about how well we are performig for ay particular sample size, or the rate at which the above probability is goig to 0. For a fixed sample size, how do we measure the performace of a estimator T? Oe optio is to obtai a average measure of the error, or i other words, average out T g( theta) over all possible realizatios of T. The resultig quatity is the still a fuctio of θ but o loger radom. It is called the mea absolute error, writte as, M.A.E. = E θ ( T g(θ) ). However, it is more commo to avoid absolute deviatios ad work with the square of the deviatio, itegrated out as before over the distributio of T. This is called the mea squared error, M.S.E., ad is M.S.E.(T, g(θ)) = E θ [(T g(θ)) 2 ]. This is meaigful oly the above quatity is fiite for all θ. Good estimators are those for which the M.S.E. is geerally ot too high, whatever be the value of θ. There is a stadard decompositio of the M.S.E. that 20

21 helps us uderstad its compoets. We have M.S.E.(T, g(θ)) = E θ [(T g(θ)) 2 ] = E θ [(T E θ (T ) + E θ (T ) g(θ)) 2 ] = E θ [(T E θ (T )) 2 ] + (E θ (T ) g(θ)) 2 +2E θ [(T E θ (T ))(E θ (T ) g(θ))] = V ar θ (T ) + b(t, g(θ)) 2, where b(t, g(θ)) = E θ (T ) g(θ0 is the bias of T as a estimator of g(θ). The cross product term i the above vaishes sice E θ (T ) g(θ) is a costat ad equals to 0. The bias measures, o average, by how much T overestimate or uderestimate g(θ). If we thik of the expectatio E θ (T ) as the ceter of the distributio of T the the bias measures by how much the ceter deviates from the target. The variace of T measures how closely T is clustered aroud tis ceter. Ideally oe would like to miimize both simultaeously, but ufortuately this is rarely possible. Two estimators T ad S ca be compared o the basis of their M.S.E.s. Uder parameter value θ, T domiates S as a estimator if M.S.E.(T, θ) M.S.E.(S, θ), θ Ω. I this situatio we say that S is iadmissible i the presece of T. The use of the term iadmissible hardly eeds explaatio. If, for all possible values of the parameter, we icur less error usig T istead of S as a estimate of g(θ), the clearly there is o poit i cosiderig S as a estimator at all. Remark Cotiuig alog this lie of thought, is there a estimate that improves all others? I other words, is there a estimator that makes every other estimator iadmissible? The aswer is o, except i certai pathological situatios. As we have oted before, it is geerally ot possible to fid a uiversally best estimator. Oe way to try to costruct optimal estimators is to restrict oeself to a subclass of estimators ad try to fid the best possible estimator i this subclass. Oe arrives at subclasses of estimators by costraiig them to meet some desirable requiremets. Oe such requiremet is that of ubiasedess. Below, we provide a formal defiitio. Defiitio (Ubiased estimator). A estimator T of g(θ) is said to be ubiased if E θ (T ) = g(θ) for all possible values of θ, that is, b(t, g(θ)) = 0, θ Ω. Thus, ubiased estimators, o a average, hit the target value. This seems to be a reasoable costrait to impose o a estimator ad ideed produces meaig ful estimates i a variety of situatios. Remark Notice that for a ubiased estimator T, the M.S.E. uder θ is simply the variace of T uder θ. I a large class of models, it is possible to fid a ubiased estimator of g(θ) that has the smallest possible variace amog all possible ubiased estimators. Such a estimate is called a miimum variace ubiased estimator (MVUE). Here is a formal defiitio. 21

22 Defiitio (Miimum Variace Ubiased Estimator). We call S a M.V.U.E. of g(θ) if (i) E θ (S ) = g(θ), θ Ω, ad (ii) if T is a ubiased estimate of g(θ), the V ar θ (S ) V ar θ (T ). Here are a few examples to illustrate some of the various cocepts. (a) Cosider X 1,.., X i.i.e. N(µ, σ 2 ). A atural ubiased estimator of g t(θ) = µ is X, the sample mea. IT is also cosistet for µ by the W.L.L.N.. It ca be show that this is also the M.V.U.E. of µ. I other words, ay other ubiased estimate of µ will have a larger variace tha X. Recall that the variace of X is simply σ 2 /. Cosider ow, the estimatio of σ 2. Two estimates of this that we have cosidered i the past are ad ˆσ 2 = 1 s 2 = 1 1 (X i X) 2, formula (i) (X i X) 2, formula (ii). Our of these ˆσ 2 is ot ubiased for σ 2 but s 2 is. I fact s 2 is also the M.V.U.E. of σ 2. (b) Let X 1,..., X be i.i.d. from some uderlyig desity fuctio or mass fuctio f(x, θ). Let g(θ) = E θ (X 1). The sample mea X is always a ubiased estimate of g(θ). Whether it is M.V.U.E. or ot depeds o the uderlyig structure of the model. (c) Suppose that X 1,..., X be i.i.d. Ber(θ). It ca be show that X is the M.V.U.E. of θ. Now defie g(θ) = θ. This is a quatity of 1 θ iterest because it is precisely the odds i favor of heads i a coitoss-evet. It ca be show that there is o ubiased estimator of g(θ) i this model. However, a ituitively appealig estimate of g(θ) is T X 1 X. It is ot ubiased for g(θ); however, it does coverge i probability to g(θ). Remark This example illustrates a importat poit: ubiased estimators may ot always exist. Hece imposig ubiasedess as a costrait may ot be meaigful i all situatios. (d) Ubiased estimators are ot always better tha biased estimators. Remember that it is the M.S.E. that gauges the performace of the estimator ad a biased estimator may actually outperform a ubiased oe owig to a sigificatly smaller variace. Cosider X 1,..., X i.i.d. Uif([0, θ]). Here Ω = (0, ). A atural estimate of θ is the maximum of the X i s, which we deote by X (). Aother estimate of θ is obtaied by observig that X is a ubiased estimate of θ/2, the commo mea of the X i s; hece 2 X is a ubiased estimate of θ. Show that X () i the sese of M.S.E. outperforms 2 X by a order of magitude. The best ubiased estimator (M.V.U.E.) of θ is (1 + 1 )X (). The solutio is 22

23 the followig: M.S.E.(2 X, θ) = θ2 = V ar(2 X 3 ) M.S.E.((1 + 1 )X (), θ) = θ 2 (+2) 1 )X () ) M.S.E.(X (), θ) = θ θ2, (+2) (+1) 2 (+1) 2 where i the last equality we have two terms, the variace ad the squared bias. 2.6 Sufficiet Statistics Sometimes there may ot be ay M.L.E., or there may be more tha oe. Eve whe a M.L.E. is uique, it may ot be suitable (as i the Uif(0, θ) example where the M.L.E. always uderestimates the value of θ). I such problems, the search for a good estimator must be exteded beyod the methods that have bee itroduced thus far. This sectio discusses the cocept of a sufficiet statistic, which ca be sued to simplify the search for a good estimator i may problems. Suppose that i a specific estimatio problem, two statisticias A ad B must estimate the value of the parameter θ. Statisticia A ca observe the values of the observatios X 1,..., X i a radom sample, ad statisticia B caot observe the idividual value of X 1,..., X but ca lear the value of a certai statistic T = ϕ(x 1,..., X ). The, statisticia A ca choose ay fuctio of the observatios X 1,..., X as a estimator of θ (icludig a fuctio of T ). But statisticia B ca use oly a fuctio of T. Hece, it follows that A will geerally be able to fid a better estimator tha B does. Sometimes, however, B will be able to do just as well as A does. I such cases, the sigle fuctio T = ϕ(x 1,..., X ) will i some sese summarize all the iformatio cotaied i the radom sample about θ, ad kowledge of the idividual values of X 1,..., X will be irrelevat i the search for a good estimator of θ. A statistic T havig this property is called a sufficiet statistic. A statistic is sufficiet with respect to a statistical model P θ ad its associated ukow parameter θ if it provides all the iformatio o θ; e.g., if o other statistic that ca be calculated from the same sample provides ay additioal iformatio as to the value of the parameter. Defiitio (Sufficiet Statistic). Let X 1,..., X be a radom sample from a distributio idexed by a parameter θ Ω. Let T be a statistic. Suppose that, for ever θ Ω ad every possible value t of T, the coditioal joit distributio of X 1,..., X give that T = t (at θ) depeds oly o t but ot o θ. That is, for each t, the coditioal distributio of X 1,..., X give T = t is the same for all θ. The we say that T is a sufficiet statistic for the parameter θ. If T is sufficiet, ad oe observed oly T istead of (X 1,..., X ), oe could, at least i priciple, simulate radom variables (X 1,..., X ) with the same joit distributio. 23

24 I this sese, T is sufficiet for obtaiig as much iformatio about θ as oe could get from (X 1,..., X ). We shall ow preset a simple method for fidig a sufficiet statistic that ca be applied i may problems. Theorem (Factorizatio Criterio). Let X 1,..., X form a radom sample from either a cotiuous distributio or a discrete distributio for which the p.d.f. or the p.m.f. is f(x, θ), where the value of θ is ukow ad belogs to a give parameter space Ω. A statistic T = r(x 1,..., X ) is a sufficiet statistic for θ if ad oly if the joit p.d.f. or the joit p.m.f. f (x, θ) of (X 1,..., X ) ca be factored as follows for all values of x = (x 1,..., x ) R ad all values of θ Ω: f (x, θ) = u(x)v(r(x), θ), where (i) u ad v are both o-egative, (ii) the fuctio u may deped o x but does ot deped o θ, ad (iii) the fuctio v will deped o θ but depeds o the observed value x oly through the value of the statistic r(x). For a example, suppose that X 1,..., X are i.i.d. P oi(θ), θ > 0. Thus, for every o-egative itegers x 1,..., x, the joit p.m.f. f (x, θ0 of (X 1,..., X ) is f (x, θ) = THus, we take u(x) = 1 x i! e θ θ x i x i! = 1 e θ θ x i! i. x, r(x) = x i, v(t, θ) = e θ θ t. It follows that T = X i is a sufficiet statistic for θ. For aother example, suppose that X 1,..., X are i.i.d. Gamma(α, β), α, β > 0, where α is kow, ad β is ukow. The joit p.d.f. is { ( f (x, β) = [Γ(α)] u(x) The sufficiet statistics if T = X i. ) α 1 } 1 { } x i β α exp ( βt), t = v(t,β) X i. For example, suppose that X 1,..., xx are i.i.d. Gamma(α, β), α, β > 0 where α is ukow, ad β is kow. The joit p.d.f. i this exercise is the same as that give i the previous exercise. However, sice the ukow parameter is ow α istead of β, the appropriate factorizatio is ow { } { } β α f (x, α) = exp( β x i) u(x) [Γ(α)] tα 1, t = x i. The sufficiet statistics is T = X i. v(t,α) 24

25 2.7 Bayesia Paradigm Prior Distributio Defiitio (Prior Distributio). Suppose that oe has a statistical model with parameter θ. If oe treats θ as radom, the the distributio that oe assigs to θ before observig the data is called its prior distributio. Thus, ow θ is radom ad will be deoted by Θ. We will assume that if the prior distributio of Θ is cotiuous, the its p.d.f. is called the prior p.d.f. of Θ. For a example, let Θ deote the probability of obtaiig a head whe a certai coi is tossed. We discuss two cases. First, suppose that it is kow that the coi either is fair or has a head o each side. The Θ oly takes two values, amely 1/2 ad 1. If the prior probability that the coi is fair is 0.8, the the prior p.m.f. of Θ is ζ(1/2) = 0.8 ad ζ(1) = 0.2. Secod, suppose that Θ ca take ay value betwee (0, 1) with a prior distributio give by a Beta distributio with parameters (1, 1). Suppose that the observable data X 1,..., X are modeled as radom sample from a distributio idexed by θ. Suppose f( θ) deote the p.m.f./p.d.f. of a sigle radom variable uder the distributio idexed by θ. Whe we treat the ukow parameter Θ as radom, the the joit distributio of the observable radom variables (i.e. data) idexed by θ is uderstood as the coditioal distributio of the data give Θ = θ. Thus, i geeral we will have X 1,..., X Θ = θ are i.i.d. with p.d.f./p.m.f. f( θ), ad that Θ ζ, i.e., f (X θ) = f(x 1 θ)... f(x θ), where f is the joit coditioal distributio of X = (X 1,..., X ) give Θ = θ Posterior Distributio Defiitio (Posterior Distributio). Cosider a statistical iferece problem with parameter θ ad radom variables X 1,..., X to be observed. The coditioal distributio of Θ give X 1,..., X is called the posterior distributio of θ. The coditioal p.m.f./p.d.f. of Θ give X 1 = x 1,..., X = x is called the posterior p.m.f./p.d.f. of θ ad is usually deoted by ζ( x 1,..., x ). Theorem Suppose that the radom variables X 1,..., X form a radom sample from a distributio for which the p.d.f./p.m.f. is θ). 25

26 Suppose also that the value of the parameter θ is ukow ad the prior p.d.f./p.m.f. of θ is ζ( ). The the posterior p.d.f./p.m.f. of θ is ζ(θ x) = f(x1 θ)... f(x θ)ζ(θ), θ Ω, g (x) where g is the margial joit p.d.f./p.m.f. of X 1,..., X Samplig from a Beroulli Distributio Theorem Suppose that X 1,..., X form a radom sample from the Beroulli distributio with mea θ > 0, where 0 < θ < 1 is ukow. Suppose that the prior distributio of Θ is Beta(α, β), where α, β > 0. The the posterior distributio of Θ give X 1 = x i, for i = 1,...,, is Beta(α + x i, β + x i). Proof: The joit p.m.f. of the data is f x(x θ) = f(x 1 θ)... f(x θ) = θ x i (1 θ) 1 x i = θ(1 θ) Therefore the posterior desity of Θ X 1 = x 1,..., X = x is give by ζ(θ x0) θ α 1 (1 θ) β 1 x i(1 x i θ θ) = x i +α 1 β+ i 1 θ (1 θ) x, θ (0, 1) Thus, Θ X 1 = x 1,..., X = x Beta(α + x i, β + x i) Samplig from a Poisso Distributio x i. Q.E.D. Theorem Suppose that X 1,..., X form a radom sample from the Poisso distributio with mea θ > 0, where θ is ukow. Suppose that the prior distributio of Θ is Gamma(α, β), where α, β > 0. The the posterior distributio of Θ give X i = x i, for i = 1,...,, is Gamma(α + x i, β + ). ı=1 Defiitio Let X 1,..., X be coditioally i.i.d. give Θ = θ with commo p.m.f./p.d.f. f( θ), where θ Ω. Let Ψ be a family of possible distributios over the parameter space Ω. Suppose that o matter which prior distributio ζ we choose from ψ, o matter how may observatios X = (X 1,..., X ) we observe, ad o matter what are their observed values x = (x 1,..., x ), the posterior distributio ζ( x) is a member of Ψ. The Ψ is called a cojugate family of prior distributios for samples from the distributios f( θ). 26

27 2.7.5 Samplig from a Expoetial Distributio Cosider the followig example, suppose that the distributio of the lifetime of fluorescet tubes of a certai type is the expoetial distributio with parameter θ. Suppose that X 1,..., X is a radom sample of lamps of this type. Also suppose that Θ Gamma(α, β), for kow α, β. The f (x θ) = θe θx i = θ e θ i. x The the posterior distributio of Θ give the data is ζ(θ x) θ e x i θ α 1 e βθ = θ +α 1 e (β+ Therefore, Θ X = x Gamma(α +, β + x i). 2.8 Bayes Estimators x i )θ. A estimator of a parameter is some fuctio of the data that we hope is close to the parameter, i.e., ˆθ θ. Let X 1,..., X be data whose joit distributio is idexed by a parameter θ Ω. Let δ(x 1,..., X ) be a estimator of θ. Defiitio A loss fuctio is a real-valued fuctio of two variables, L(θ, a), where θ Ω ad a R. The iterpretatio is that the statisticia losses L(θ, a) if the parameter equals θ ad the estimate equals a. For a example, say to calculate squared error loss, we compute L(θ, a) = (θ a) 2, which is the formula for squared error loss. Ad the formula for absolute error loss is L(θ, a) = θ a. Suppose that ζ( ) is a prior p.d.f./p.m.f. of θ Ω. Cosider the problem of estimatig θ without beig able to observe the data. If the statisticia chooses a particular estimate a, the her expected loss will be E[L(θ, a)] = L(θ, a)ζ(θ)dθ. Ω It is sesible that the statisticia wishes to choose a estimate a for which the expected loss is miimum. Defiitio Suppose ow that the statisticia ca observe the value x of the data X before estimatig θ, ad let ζ( x) deote the posterior p.d.f. of θ Ω. For each estimate a that the statisticia might use, her expected loss i this case will be E[L(θ, a) x] = L(θ, a)ζ(θ x)dθ. Ω 27

28 Hece, the statisticia should ow choose a estimate a for which the above expectatio is miimum. For each possible value x of X, let δ (x) deote a value of the estimate a for which the expected loss is miimum. The the fuctio δ (X ) is called the Bayes estimator of θ. Oce X = x is observed, δ (x) is called the Bayes estimate of θ. Thus, a Bayes estimator is a estimator that is chose to miimize the posterior mea of some measure of how far the estimator that is chose to miimize the posterior mea of some measure of how far the estimator is from the parameter. Corollary Let θ Ω R. Suppose that the squared error loss fuctio is used ad the posterior mea of Θ, i.e., E(Θ X ) is fiite. The the Bayes estimator of θ is δ (X ) = E(Θ X ). For example, suppose that X 1,..., X form a radom sample from the Beroulli distributio with mea θ > 0, where 0 < θ < 1 is ukow. Suppose that the prior distributio of Θ is Beta(α, β0, where α, β > 0. Recall that Θ X 1 = x 1,..., X = x Beta(α + x i, β + x i) ad thus, we have o=1 α + X i δ (X) = α + β Samplig from a Normal Distributio Theorem Suppose that X 1,..., X form a radom sample from N(µ, σ 2 ), where µ is ukow ad the value of the variace σ 2 > 0 is kow. Suppose that Θ N(µ 0, v0). 2 The where Θ X 1 = x 1,..., X = x N(µ 1, v 2 1), µ 1 = σ2 µ 0 + v0 2 x, ad v 2 σ 2 v0 2 1 = σ2 v0 2. σ 2 + v0 2 Proof: The joit desity has the form [ f (x θ) exp 1 (x 2σ 2 i θ) ]. 2 Thus, by omittig the factor that ivolves x 1,..., x but does deped o θ, we may rewrite f (x θ) as [ f (x θ) exp ] (θ x)2. 2σ2 Sice the prior desity has the form [ ζ(θ) exp 1 (θ µ 0) ], 2 2v0 2 28

29 it follows that the posterior p.d.f. ζ(θ x) satisfies [ ζ(θ x) exp 2σ (θ 2 x)2 1 (θ µ 0) ]. 2 2v0 2 Completig the squares agai establishes the followig idetity: σ (θ 2 x)2 + 1 (θ µ 0) 2 = 1 (θ µ 1) 2 + v0 2 v1 2 ( x σ 2 + v0 2 µ 0) 2. The last term o the right side does ot ivolve o θ. Thus, [ ζ(θ x) exp 1 (θ µ 1) ]. 2 2v1 2 Thus, we have δ (X) = σ2 µ 0 + v0 2 X. σ 2 + v0 2 Let us fiish this sectio with a corollary. Q.E.D. Corollary Let θ Ω R. Suppose that the absolute error loss fuctio is sued. The the Bayes estimator of θ, δ (X ) equals the media of the posterior distributio of Θ. 3 Samplig Distributio of Estimators 3.1 Samplig Distributio of a Statistic A statistic is a fuctio of the data, ad hece is itself a radom variable with a distributio. This distributio is called its samplig distributio. It tells us what values the statistic is likely to assume ad how likely is it to take these values. Supposethat X 1,..., X are i.i.d. from the distributio P θ, where θ Ω R k. Let T be a statistic, i.e. suppose that T = ϕ(x 1,..., X ). Assume that T F θ, where F θ is the c.d.f. of T (depedet o θ. The distributio of T (with θ fixed is called the samplig distributio of T. Thus, the samplig distributio has c.d.f. F θ. 3.2 The Gamma ad the χ 2 Distributios 29

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Distribution of Random Samples & Limit theorems

Distribution of Random Samples & Limit theorems STAT/MATH 395 A - PROBABILITY II UW Witer Quarter 2017 Néhémy Lim Distributio of Radom Samples & Limit theorems 1 Distributio of i.i.d. Samples Motivatig example. Assume that the goal of a study is to

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

6. Sufficient, Complete, and Ancillary Statistics

6. Sufficient, Complete, and Ancillary Statistics Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell

More information

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

Lecture 11 and 12: Basic estimation theory

Lecture 11 and 12: Basic estimation theory Lecture ad 2: Basic estimatio theory Sprig 202 - EE 94 Networked estimatio ad cotrol Prof. Kha March 2 202 I. MAXIMUM-LIKELIHOOD ESTIMATORS The maximum likelihood priciple is deceptively simple. Louis

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

Statistical Theory MT 2009 Problems 1: Solution sketches

Statistical Theory MT 2009 Problems 1: Solution sketches Statistical Theory MT 009 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. (a) Let 0 < θ < ad put f(x, θ) = ( θ)θ x ; x = 0,,,... (b) (c) where

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Statistical Theory MT 2008 Problems 1: Solution sketches

Statistical Theory MT 2008 Problems 1: Solution sketches Statistical Theory MT 008 Problems : Solutio sketches. Which of the followig desities are withi a expoetial family? Explai your reasoig. a) Let 0 < θ < ad put fx, θ) = θ)θ x ; x = 0,,,... b) c) where α

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Properties of Point Estimators and Methods of Estimation

Properties of Point Estimators and Methods of Estimation CHAPTER 9 Properties of Poit Estimators ad Methods of Estimatio 9.1 Itroductio 9. Relative Efficiecy 9.3 Cosistecy 9.4 Sufficiecy 9.5 The Rao Blackwell Theorem ad Miimum-Variace Ubiased Estimatio 9.6 The

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 22 CS 70 Discrete Mathematics for CS Sprig 2007 Luca Trevisa Lecture 22 Aother Importat Distributio The Geometric Distributio Questio: A biased coi with Heads probability p is tossed repeatedly util the first

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Last Lecture. Wald Test

Last Lecture. Wald Test Last Lecture Biostatistics 602 - Statistical Iferece Lecture 22 Hyu Mi Kag April 9th, 2013 Is the exact distributio of LRT statistic typically easy to obtai? How about its asymptotic distributio? For testig

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

IIT JAM Mathematical Statistics (MS) 2006 SECTION A IIT JAM Mathematical Statistics (MS) 6 SECTION A. If a > for ad lim a / L >, the which of the followig series is ot coverget? (a) (b) (c) (d) (d) = = a = a = a a + / a lim a a / + = lim a / a / + = lim

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Seunghee Ye Ma 8: Week 5 Oct 28

Seunghee Ye Ma 8: Week 5 Oct 28 Week 5 Summary I Sectio, we go over the Mea Value Theorem ad its applicatios. I Sectio 2, we will recap what we have covered so far this term. Topics Page Mea Value Theorem. Applicatios of the Mea Value

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Lecture 3: MLE and Regression

Lecture 3: MLE and Regression STAT/Q SCI 403: Itroductio to Resamplig Methods Sprig 207 Istructor: Ye-Chi Che Lecture 3: MLE ad Regressio 3. Parameters ad Distributios Some distributios are idexed by their uderlyig parameters. Thus,

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker

SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker SOME THEORY AND PRACTICE OF STATISTICS by Howard G. Tucker CHAPTER 9. POINT ESTIMATION 9. Covergece i Probability. The bases of poit estimatio have already bee laid out i previous chapters. I chapter 5

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Approximations and more PMFs and PDFs

Approximations and more PMFs and PDFs Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments

LECTURE NOTES 9. 1 Point Estimation. 1.1 The Method of Moments LECTURE NOTES 9 Poit Estimatio Uder the hypothesis that the sample was geerated from some parametric statistical model, a atural way to uderstad the uderlyig populatio is by estimatig the parameters of

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit theorems Throughout this sectio we will assume a probability space (Ω, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

Department of Mathematics

Department of Mathematics Departmet of Mathematics Ma 3/103 KC Border Itroductio to Probability ad Statistics Witer 2017 Lecture 19: Estimatio II Relevat textbook passages: Larse Marx [1]: Sectios 5.2 5.7 19.1 The method of momets

More information

Recap! Good statistics, cont.! Sufficiency! What are good statistics?! 2/20/14

Recap! Good statistics, cont.! Sufficiency! What are good statistics?! 2/20/14 Recap Cramér-Rao iequality Best ubiased estimators What are good statistics? Parameter: ukow umber that we are tryig to get a idea about usig a sample X 1,,X Statistic: A fuctio of the sample. It is a

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

EE 4TM4: Digital Communications II Probability Theory

EE 4TM4: Digital Communications II Probability Theory 1 EE 4TM4: Digital Commuicatios II Probability Theory I. RANDOM VARIABLES A radom variable is a real-valued fuctio defied o the sample space. Example: Suppose that our experimet cosists of tossig two fair

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet

More information

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2. SAMPLE STATISTICS A radom sample x 1,x,,x from a distributio f(x) is a set of idepedetly ad idetically variables with x i f(x) for all i Their joit pdf is f(x 1,x,,x )=f(x 1 )f(x ) f(x )= f(x i ) The sample

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Solutions: Homework 3

Solutions: Homework 3 Solutios: Homework 3 Suppose that the radom variables Y,...,Y satisfy Y i = x i + " i : i =,..., IID where x,...,x R are fixed values ad ",...," Normal(0, )with R + kow. Fid ˆ = MLE( ). IND Solutio: Observe

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

MAT1026 Calculus II Basic Convergence Tests for Series

MAT1026 Calculus II Basic Convergence Tests for Series MAT026 Calculus II Basic Covergece Tests for Series Egi MERMUT 202.03.08 Dokuz Eylül Uiversity Faculty of Sciece Departmet of Mathematics İzmir/TURKEY Cotets Mootoe Covergece Theorem 2 2 Series of Real

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman:

Problems from 9th edition of Probability and Statistical Inference by Hogg, Tanis and Zimmerman: Math 224 Fall 2017 Homework 4 Drew Armstrog Problems from 9th editio of Probability ad Statistical Iferece by Hogg, Tais ad Zimmerma: Sectio 2.3, Exercises 16(a,d),18. Sectio 2.4, Exercises 13, 14. Sectio

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS STRUCTURE OF EXAMINATION PAPER. There will be oe 2-hour paper cosistig of 4 questios.

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Fall 2013 MTH431/531 Real analysis Section Notes

Fall 2013 MTH431/531 Real analysis Section Notes Fall 013 MTH431/531 Real aalysis Sectio 8.1-8. Notes Yi Su 013.11.1 1. Defiitio of uiform covergece. We look at a sequece of fuctios f (x) ad study the coverget property. Notice we have two parameters

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information