S T A T I S T I C S. Jan Ámos Víšek 2008/09. Institute of Economic Studies, Faculty of Social Sciences Charles University, Prague

Size: px
Start display at page:

Download "S T A T I S T I C S. Jan Ámos Víšek 2008/09. Institute of Economic Studies, Faculty of Social Sciences Charles University, Prague"

Transcription

1 S T A T I S T I C S (THE THIRTEENTH LECTURE) 2008/09 Institute of Economic Studies, Faculty of Social Sciences Charles University, Prague visek/statistika/

2 Content of lecture 1 Recalling an example - an inspiration for the summer term Regression model 2 What it is an estimator - definition Basic types of estimators - point versus interval ones 3

3 Regression model In the second lecture we have considered 1 REGRESSION MODEL. In one of the previous lectures we have recalled it and used as a motivation, e.g. for introducing Varadarajan theorem. Do you remember what is it?

4 Regression model REGRESSION MODEL Y i = X i β0 + ε i = p j=1 X ij β 0 j + ε i, i = 1, 2,..., n Y i - response variable (vysvětlovaná veličina) What do - we need to learn? (for i-th object) X i R p - explanatory variables (vysvětlující proměnné) β 0 1 A construction of an estimator ˆβ - regression coefficients (regresní koeficienty) of the regression coefficients β 0. ε i - error term (chybový člen) 2 Test of hypothesis that β 0 = Galton, F. (1886): Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute vol. 15,

5 Regression model 3 And these two tasks will be our topics for the rest of summer term Testing statistical hypotheses. Keep in mind: To guess is cheap, to guess wrongly is (extremely) expensive. Chinese proverb

6 What it is an estimator - definition Basic types of estimators - point versus interval ones Returning once again to REGRESSION MODEL 1 Y i = X i β0 + ε i = p j=1 X ij βj 0 + ε i, i = 1, 2,..., n LetYus i analyze -situation response and variable decide what (vysvětlovaná to do: veličina) 1 Somebody brought - data, say (for i-th object) X i R p - explanatory variables y 1,(vysvětlující x 11,, xproměnné) 1p β 0 - regression coefficients (regresní koeficienty) (Y ε i - error term (n), X (chybový (n) y 2, x 21,, x 2p ) = člen)..... y n, x n1,, x np 2 She/he assumes that we shall create a (regression) model. Galton, F. (1886): Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute vol. 15, A (very) first step is to estimate β 0 s by an estimate ˆβ = ˆβ ( (Y (n), X (n) ) ).

7 What it is an estimator - definition Basic types of estimators - point versus interval ones Conclusion and further considerations: The estimate is to be a function of data, i.e. estimate is a mapping (zobrazení) from the observational space to the parameter space. ( ) ( ) Formally, ˆβ (Y (n), X (n) ) : ( Y (n), X (n) ) R p. 2 As the data are realization of some r.v. s, it would be reasonable to express estimated knowledge about the true value of parmeter in probabilistic assertions. We shall plug into the estimate ˆβ ( (Y (n), X (n) ) ) the random variables Y (n) (ω) and X (n) (ω). So we obtain a mapping ( ) ˆβ (Y (n), X (n) ) : Ω R p. We shall call it estimator (after adding some details, see next slide).

8 What it is an estimator - definition Basic types of estimators - point versus interval ones So: To be able to formulate and to prove probabilistic assertions, the estimator has to be measurable. 3 DEFINITION : Estimator The estimator is a random variable. REMARK 1. Values of the estimator fall at a parameter space which is usually a part of l-dimensional Euclidean space, (for a (general) framework see next slide). REMARK 2. Notice that the (numerical) value of estimator was called (on the previous slide) estimate.

9 What it is an estimator - definition Basic types of estimators - point versus interval ones The English used by statisticians (all over the world) has two words for odhad : 4 estimator and estimate. Some authors use the word estimator for ( ( )) ˆβ Y (n) (ω), X (n) (ω) : Ω R p. while the word estimate for the value of the estimator at given data. We shall keep this convention as it facilitates reading and understanding the text.

10 What it is an estimator - definition Basic types of estimators - point versus interval ones A (general) framework for establishing an estimator can be: 1 Let (Ω, A, P) and (R, B) be a probability and measurable space, respectively. 2 We shall assume that the parameter space (we ll use in the next) is part of R p (for some p N ). 5 3 Consider a sequence of (i.i.d.) r.v. s {X n } n=1 goverened by an unknown d.f. from a family {F θ (x)} θ Θ. (θ (Θ) is theeta (capital theeta ), in Czech theta (velké theta ))

11 What it is an estimator - definition Basic types of estimators - point versus interval ones A (general) framework for establishing... be (continued) : 1 By some technique of constructing etimators establish (n N ) an estimator, i.e. a mapping 6 ˆθ n (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ which is (B, A) measurable. 2 Prove (all) required properties of the estimator ˆθ n (we ll discuss them later). 3 Find a reliable, sufficiently quick algorithm for evaluating ˆθ n and implement (or get implement) it.

12 What it is an estimator - definition Basic types of estimators - point versus interval ones REMARK 3. Let s assume, we have at hand a data x (n) = (x 1, x 2,..., x n ). 7 We can consider it so that somebody selected an ω 0 Ω and informed us that the values of the first n r.v. s of the sequence of (i.i.d.) r.v. s {X n } n=1 at ω 0 are just x 1, x 2,..., x n, i.e. (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) = (x 1, x 2,..., x n ) (denote (x 1, x 2,..., x n ) by x (n) ). We may evaluate the estimate as ˆθ n = ˆθ n (x (n) ) = ˆθ n (x 1, x 2,..., x n ) = ˆθ n (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )).

13 What it is an estimator - definition Basic types of estimators - point versus interval ones Consider an example of Normal distributions 1 {F θ (x)} θ Θ = { Φ µ,σ 2(x) }, µ R,σ 2 R + so θ = (µ, σ 2 ) Θ = (R R + ) and hence ) ˆθ n = (ˆθ n1, ˆθ n2 = (ˆµ n, ˆσ n) 2. If and ˆµ n (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) : Ω R ˆσ n 2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) : Ω R +, we speak about the point estimator, and we assume that with large probability for some small δ > 0 ˆµ n (µ δ, µ + δ) and ˆσ n 2 (σ 2 δ, σ 2 + δ).

14 What it is an estimator - definition Basic types of estimators - point versus interval ones If ˆµ n is an interval, i.e. ˆµ n : Ω R R, and hence 2 ˆµ n (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) = (ˆµ n1 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )), ˆµ n2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 ))) and ˆσ 2 n : Ω R + R + ˆσ 2 n (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) = (ˆσ 2 n1 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )), ˆσ 2 n2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 ))), we speak about the interval estimator, and we assume that with large probability (ˆµ n1 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )), ˆµ n2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 ))) µ and (ˆσ 2 n1 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )), ˆσ 2 n2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 ))) σ 2.

15 What is a history of studying estimation? 1

16 What is a history of studying estimation? Pierre Simon Adrien Marie Carl Friedrich Laplace Legendre Gauss Thomas Ronald Aylmer Edwin James Bayes Fisher Pitman

17 3 Types of estimators 1 (momentová metoda) 2 Maximum likelihood (maximálně věrohodné odhady) 3 The least squares (nejmenší čtverce) 4 Minimum distance (metoda minimalizující vzdálenost) 5 Many others - minimum volume, weighted, etc.

18 Let us recall - as an inspiring example 1 THE SECOND KOLMOGOROV S THEOREM : Alternative conditions for the strong law of large numbers Let {X n } n=1 be a sequence of i.i.d. r.v. s. Then IEX 1 = µ exists iff 1 n n X l µ a. e. as n.

19 Another example 2 COROLLARY Let {X n } n=1 be a sequence of i.i.d. r.v. s. Then IEX 2 1 = σ2 + µ 2 exists iff Then also 1 n n 1 n n Xl 2 σ 2 + µ 2 a. e. as n. X 2 l ( 1 n ) 2 n X l σ 2 a. e. as n.

20 More generaly: 3 LEMMA Let {X n } n=1 be a sequence of i.i.d. r.v. s. goverened by an unknown d.f. from a family {F θ (x)} θ Θ. Assume that for some k N IEX1 k exists and that there is a function h such that θ = h(iex1 k k 1, IEX1,..., IEX 1 ). Then ( ) 1 n h Xl k n, 1 n X k 1 n l,..., 1 n X l θ a. e. as n. n This way of constructing estimators is called.

21 EXAMPLE OF ESTIMATORS CONSTRUCTED BY MEANS OF. 4 Consider once again the example of Normal distributions { Φµ,σ 2(x) } µ R,σ 2 R +. By Kolmogorov Theorem 1 n X l n µ a. e. as n as well as 1 n n X 2 l ( 1 n ) 2 n X l σ 2 a. e. as n.

22 Putting ˆµ n = 1 n n X l and ˆσ n 2 = 1 n n X 2 l ( 1 n ) 2 n X l, 5 we have ˆµ n µ and ˆσ 2 n σ 2 a. e. as n. REMARK 4. ˆµ n and ˆσ 2 n are examples of estimators constructed by method of moments. They work for a wide class of families of d.f. s, namely for all for which the corresponding moments exist.

23 (MLE) - Pierre Simon Laplace 1 IT IS REMARKABLE THAT A SCIENCE WHICH BEGAN WITH CONSIDERATION OF GAMES OF CHANCE SHOULD HAVE BECOME THE MOST IMPORTANT OBJECT OF HUMAN KNOWLEDGE. P.S.LAPLACE: Theórie Analytique des Probabilités Laplace, P.S. (1774): Mémoire sur la probabilité des causes par les évènemens. Mémoires de l Académie royale des sciences presentés par divers savans 6, Pierre Simon Laplase was the first scientist who became to study likelihood.

24 MLE - framework and definition A framework for establishing the MLE can be as follows: 1 Let (Ω, A, P) and (R, B) be We shall assume that the parameter space is again part of R p (for some p N ). 3 Consider a sequence of absolutely continuous i.i.d. r.v. s {X n } n=1 governed by an unknown d.f., density of which is from a family {f θ (x)} θ Θ. 4 Then define maximum likelihood estimator by 2 ) ˆθ n (X (n) (ω) = arg max θ Θ n f θ (X l (ω)).

25 MLE - evaluation Let s rewrite here the last line of the previous slide ) n ) ˆθ n (X (n) (ω) = arg max f θ (X l (ω). θ Θ Due to monotonicity of ln(x), the point θ Θ at which ( n ) n ) f θ (X l (ω) and ln f θ (X ) l (ω) 3 attain maximum, is the same. Moreover ( n ln f θ (X l (ω)) ) n ( )) = ln (f θ X l (ω).

26 MLE - evaluation So, finally ( ) ˆθ n X (n) (ω) = arg max θ Θ n ( )) ln (f θ X l (ω). Having at hand data x (n) = (x 1, x 2,..., x n ), we can calculate the value of estimate as ˆθ n (x (n)) n = arg max ln (f θ (x l )). θ Θ 4 It is frequently (in many textbooks and monographs) written as n ˆθ n = arg max ln (f θ (x l )). θ Θ

27 MLE - evaluation Assuming that f θ(x) θ is equal to ˆθ n. n exists, one of (possibly more) solutions of ( 1 ln f θ (x l ) f ) θ (x) = 0 (1) θ 5 REMARK 5. In the case that (1) has more solutions, there are some recommendations, what to do. We shall discuss them later.

28 THE FIRST EXAMPLE OF Maximun likelihood estimator. 6 Consider again the family of Normal distributions { Φµ,σ 2(x) } µ R,σ 2 R +. Then ( ˆµ n, ˆσ 2 n) (ML) = arg max µ R,σ R + { [ ]} n 1 σ 2π exp (X l µ) 2 2σ 2. We shall see that we can look for ˆµ (ML) n and ˆσ n(ml) 2 separetly. So, let us consider at first ˆµ (ML) n = arg max µ R { n 1 σ 2π exp [ (X l µ) 2 2σ 2 ]}.

29 7 ˆµ (ML) n = arg max µ R = arg max µ R = arg max µ R = arg max µ R n { [ ]} n 1 σ 2π exp (X l µ) 2 2σ 2 { n ( ) } 1 log σ (X l µ) 2 2π 2σ 2 { } n (X l µ) 2 2σ 2 { (X l µ) 2} = arg min µ R n (X l µ) 2

30 8 So we have ˆµ (ML) n = arg min µ R n (X l µ) 2 We are going to solve n (X l µ) 2 µ n i. e. 2 (X l µ) = 0 and finally n = 0, X l = n µ ˆµ (ML) n = 1 n n X l

31 9 So we have found that ˆµ (ML) n = arg min µ R n (X l µ) 2 = 1 n n X l, i. e. the sum n (X l µ) 2 attains its minimum if we put µ = 1 n n X l = x (the last equality is an introduction of notation).

32 THE SECOND EXAMPLE OF Maximun likelihood estimator. 10 Similarly as in the previous example { n σ ˆ n(ml) 2 1 = arg max σ R + σ 2π exp [ (X l µ) 2 2σ 2 ]}.

33 So ˆ σ 2 n(ml) = arg max σ R + = arg max σ R + { n ( ) } 1 log σ (X l µ) 2 2π 2σ 2 { ( ) } 1 n n log σ (X l µ) 2 2π 2σ As the sum n (X l µ) 2 has in the previous expression minus sign, we have minimize it in order to maximize the whole expression, i. e. according to the previous slide we have to put µ = x ˆ σ 2 n(ml) = arg max σ R + { ( 1 n log σ 2π ) } n (X l x) 2 2σ 2.

34 It means that σ ˆ n(ml) 2 is among the solution of equation { ( ) P } n n log 1 σ (X l x) 2 2π 2σ 2 σ 2 = As the mapping σ σ 2 is for σ R + one-to-one, we may solve { ( ) P } n n log 1 σ (X l x) 2 2π 2σ 2 = 0, σ i. e. σ 2π = n σ 2 2π + 2 n 2σ 3 (X l x) 2 = 0.

35 13 It gives and finally n σ 2 = ˆ σ 2 n = 1 n n (X l x) 2 n (X l x) 2. We shall see however later that a better estimator is σ ˆ n 2 = 1 n (X l x) 2. n 1

36 The least squares (LS)- - Adrien Marie Legendre & Carl Friedrich Gauss Legendre, A. M. (1805): Nouvelles méthodes pour la détermination des orbites des comètes. Paris, Courcier. Gauss, C. F. (1809): Theoria molus corporum celestium. Hamburg, Perthes et Besser.

37 Somebody brought data. 2

38 Hertzsprung-Russell diagram of stars cluster CYG OB1 (in the direction of Cygnus) 3

39 4 r(β) = y x β r 2 (β) = ( y x β ) 2

40 5 Recalling : Somebody brought data, say (Y (n), X (n) ) = y 1, 1, x 11,, x 1p y 2, 1, x 21,, x 2p.... y n, 1, x n1,, x np,

41 6 ˆβ (LS) = arg min β R p r i (β) = y i x i β ri 2 (β) = ( y i x i β) 2 n ri 2 (β) = arg min β R p i=1 ˆβ (LS) = ( X X ) 1 X Y n i=1 ( yi x i β) 2

42 DEFINITION : The Least Squares (estimator) The Least Squares (estimator) is given as a solution of extremal problem 7 ˆβ (LS) = arg min β R p n i=1 r 2 i (β) = arg min β R p n i=1 ( yi x i β) 2. REMARK 6. If p = 1 (see data on previous but one slide), the extremal problem turns to ˆµ (LS) = arg min µ R n i=1 r 2 i (µ) = arg min µ R n (y i µ) 2. i=1

43 8

44 Hertzsprung-Russell diagram of stars cluster CYG OB1 (in the direction of Cygnus (Labut )) 9 Number of stars = 47 Humpreys, R. M. (1978): Studies of luminous stars in nearby galaxies. Supergiant and O stars in the milky way. Astrophysical Journal Supplument Ser., 38,

45 10

46 11

47 12

48 Francis Ysidro Edgeworth

49 Francis Ysidro Edgeworth The method of the Least Squares is seen to be our best course when we have thrown overboard a certain portion of our data - a sort of sacrifice which has often to be made by those who sail upon the stormy seas of Probability. F. Y. EDGEWORTH

50 Francis Ysidro Edgeworth

51 A (general) framework for establishing a minimal distance estimator can be: 1 Let (Ω, A, P) and (R, B) be Let a parameter space Θ R p (p N ). 3 Consider a family of d.f. s {F θ (x)} θ Θ. 4 Somebody brought data, say x (n) = (x 1, x 2,..., x n ). 5 Let us draw the empirical distribution function (e.d.f.).

52 Empirical d.f. 2 F emp (x) = 1 n n i=1 I {xi <x}

53 Empirical d.f. F emp (x) = 1 n n i=1 I {xi <x} 3 and theoretical d.f., evidently with too large variance.

54 Empirical d.f. 4 F emp (x) = 1 n n i=1 I {x i <x} and theoretical d.f., may be with already right variance, but evidently with too small location (shift) parameter.

55 Empirical d.f. F emp (x) = 1 n n i=1 I {xi <x} 5 and theoretical d.f., hopefully with right variance, but evidently with too large location parameter.

56 Empirical d.f. F emp (x) = 1 n n i=1 I {xi <x} 6 and theoretical d.f., already fitting to the e.d.f..

57 DEFINITION : The minimal distance estimator The estimator given as a solution of extremal problem 7 ˆθ (MD) = arg min θ Θ max x R F emp(x) F θ (x) belongs to the class of minimal distance estimators but very frequently it is called minimax estimator. REMARK 7. The expression max x R F emp(x) F θ (x) represents a measure of distance between F emp (x) and F(x). Let s try to consider this idea in a bit more general way.

58 8 Let F(x) and G(x) be two d.f. s. Then d (F, G) is called (measure of) distance of F(x) and G(x) if 1 d (F, G) 0, 2 d (F, F) = 0, 3 for any d.f. H(x), d (F, G) d (F, H) + d (H, G). The measure of distance max F(x) G(x) x R is frequently used. It was firstly studied by Kolmogorov and Smirnov.

59 Kolmogorov-Smirnov distance 9 DEFINITION : Kolmogorov-Smirnov distance Let F(x) and G(x) be two d.f. s. The (measure of) distance given by d KS (F(x), G(x)) = max F(x) G(x) x R is called Kolmogorov-Smirnov distance.

60 Consider a sequence of (i.i.d.) r.v. s {X n } n=1 goverened by a d.f. F(x) and the sequence of empirical d.f. s F emp(x) (n) = 1 n I n {Xi (ω)<x}. Direct computation gives: i=1 IEI {Xi (ω)<x} = 1 P (X i (ω) < x) = F(x) 10 Then strong law of large numbers implies: 1 n ( I{Xi (ω)<x} F(x) ) = F (n) n emp(x) F(x) 0 i=1 a. s. as n.

61 Under the same framework we have: IEI 2 {X i (ω)<x} = IEI {X i (ω)<x} = F(x) 11 var ( I {Xi (ω)<x}) = F(x)(1 F(x)) = σ 2 <. Then CLT implies: L as ( 1 σ n i.e. n ( I{Xi (ω)<x} F(x) )) = N(0, 1). i=1 ( n ( )) L as F emp(x) (n) F(x) = N(0, σ).

62 Under the same framework Kolmogorov and Smirnov proved: ( n ) P sup F emp(x) (n) F (x) = Q(x) x R 12 where Q(x) = 0 for x 0 and Q(x) = ( 1) l exp( 2l 2 x 2 ) l= for x > 0.

63 What is Kolmogorov-Smirnov distance? 13

64 The answer is: The yellow segment. 14

65 Consider one fix red d.f. F(x) and a green sequence of d.f. s {F n (x)} n=1. Assume that for n the brown segment converges to Does the green sequence of d.f. s converge to the red d.f.? The answer is (unfortunatelly and against an natural feeling) NO (!?).

66 Verify it. 16

67 That is why Prokhorov proposed 17 DEFINITION : Prokhorov distance For two d.f. s F(x) and G(x) the value π (F (x), G(x)) = inf {ε : F(x) < G(x + ε) + ε, G(x) < F (x + ε) + ε}. ε>0 is called Prokhorov distance.

68 So, Prokhorov distance is given by horizontal (or vertical) yellow segment. Verify it. 18

69 Now, the green sequence converges to the red one, in the Prokhorov distance. Verify it. 19

70 Cramér-von-Mises distance An alternative to K-S and Prokhorov distance may be DEFINITION : Cramér-von-Mises distance 20 Let F(x) and G(x) be two d.f. s. The (measure of) distance given by d KS (F(x), G(x)) = [F(x) G(x)] 2 dg(x) is called Cramér-von-Mises distance. REMARK 8. There are many other proposal of distances between d.f. s. About one of them, so called χ 2 -distance, we shall speak in the next term in Statistics III.

71 So, we may continue a bit more generally: Let d (F emp (x), F θ (x)) be a (measure of) distance of F emp (x) and F θ (x) (which need not be necessarily a metric). 21 DEFINITION : The minimal distance estimator The estimator given as a solution of extremal problem ˆθ (MD) = arg min θ Θ d (F emp (x), F θ (x)) is called minimal distance estimator.

72 An alternative way for constructing a minimal distance estimator can be: 1 Let (Ω, A, P) and (R, B) be Let a parameter space Θ R p (p N ). 3 Consider a family of densities {f θ (x)} θ Θ. 4 Somebody brought data, say x (n) = (x 1, x 2,..., x n ). 5 Let us draw the histogram, see next slide.

73 23 This is histogram. We may try to fit to it one density from our family.

74 This is density with too small variance. 24

75 This is density with too large variance. 25

76 These are densities with too small and too large location parameters, respectively. 26

77 Finally, we have found the fit. 27

78 So, we may continue a bit more generally: Let d (f emp (x), f θ (x)) be a (measure of) distance of f emp (x) and f θ (x) (which need not be necessarily a metric). 28 DEFINITION : The minimal distance estimator The estimator given as a solution of extremal problem ˆθ (MD) = arg min θ Θ d (f emp (x), f θ (x)) is called minimal distance estimator.

79 EXAMPLE : Let f (x) and g(x) be two density and α (1, ). Then the value { } 1 div(f, g) = (f (x) g(x)) α α dx 29 is called α-divergence of densities f (x) and g(x).

80 SUMMARY Notions to keep in mind to understand next lectures and to pass the exam: 1 Point and interval estimators, 2 moment method, least squares, maximum likelihood, 3 minimal distance - examples and general point of view.

81 SUMMARY (continued) Keep in mind that we have denoted estimator by the same letter as estimated parameter but we put ˆ above the given letter, e. g.: 1 ˆθ n (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ, 2 or ˆβ n (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ, 3 or ˆµ n (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ, 4 or ˆσ n 2 (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ. We shall keep it in the rest of term.

82 SUMMARY (continued) What we are going to do in the next lecture: 1 We ll continue in studying point estimation, 2 plausible properties and their sense.

83 End of the Thirteenth lecture Thanks for attention

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017

COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 COMS 4721: Machine Learning for Data Science Lecture 1, 1/17/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University OVERVIEW This class will cover model-based

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

1 Degree distributions and data

1 Degree distributions and data 1 Degree distributions and data A great deal of effort is often spent trying to identify what functional form best describes the degree distribution of a network, particularly the upper tail of that distribution.

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 2: Estimation Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Classical Estimation

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

Statistical Inference

Statistical Inference Statistical Inference Classical and Bayesian Methods Revision Class for Midterm Exam AMS-UCSC Th Feb 9, 2012 Winter 2012. Session 1 (Revision Class) AMS-132/206 Th Feb 9, 2012 1 / 23 Topics Topics We will

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

i=1 h n (ˆθ n ) = 0. (2)

i=1 h n (ˆθ n ) = 0. (2) Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

STAT 461/561- Assignments, Year 2015

STAT 461/561- Assignments, Year 2015 STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

21.1 Lower bounds on minimax risk for functional estimation

21.1 Lower bounds on minimax risk for functional estimation ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

An Introduction to Spectral Learning

An Introduction to Spectral Learning An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Statistic Distribution Models for Some Nonparametric Goodness-of-Fit Tests in Testing Composite Hypotheses

Statistic Distribution Models for Some Nonparametric Goodness-of-Fit Tests in Testing Composite Hypotheses Communications in Statistics - Theory and Methods ISSN: 36-926 (Print) 532-45X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta2 Statistic Distribution Models for Some Nonparametric Goodness-of-Fit

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

5.2 Expounding on the Admissibility of Shrinkage Estimators

5.2 Expounding on the Admissibility of Shrinkage Estimators STAT 383C: Statistical Modeling I Fall 2015 Lecture 5 September 15 Lecturer: Purnamrita Sarkar Scribe: Ryan O Donnell Disclaimer: These scribe notes have been slightly proofread and may have typos etc

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11 Econometrics A Keio University, Faculty of Economics Simple linear model (2) Simon Clinet (Keio University) Econometrics A October 16, 2018 1 / 11 Estimation of the noise variance σ 2 In practice σ 2 too

More information

A Brief History of Statistics (Selected Topics)

A Brief History of Statistics (Selected Topics) A Brief History of Statistics (Selected Topics) ALPHA Seminar August 29, 2017 2 Origin of the word Statistics Derived from Latin statisticum collegium ( council of state ) Italian word statista ( statesman

More information

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests)

Dr. Maddah ENMG 617 EM Statistics 10/15/12. Nonparametric Statistics (2) (Goodness of fit tests) Dr. Maddah ENMG 617 EM Statistics 10/15/12 Nonparametric Statistics (2) (Goodness of fit tests) Introduction Probability models used in decision making (Operations Research) and other fields require fitting

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Parameter Estimation of the Stable GARCH(1,1)-Model

Parameter Estimation of the Stable GARCH(1,1)-Model WDS'09 Proceedings of Contributed Papers, Part I, 137 142, 2009. ISBN 978-80-7378-101-9 MATFYZPRESS Parameter Estimation of the Stable GARCH(1,1)-Model V. Omelchenko Charles University, Faculty of Mathematics

More information

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22 MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA

Hypothesis Testing. Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA Hypothesis Testing Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA An Example Mardia et al. (979, p. ) reprint data from Frets (9) giving the length and breadth (in

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Statistics and Econometrics I

Statistics and Econometrics I Statistics and Econometrics I Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University September 13, 2016 Shiu-Sheng Chen (NTU Econ) Statistics and Econometrics I September 13,

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5) 10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression

More information

Chapter 11. Hypothesis Testing (II)

Chapter 11. Hypothesis Testing (II) Chapter 11. Hypothesis Testing (II) 11.1 Likelihood Ratio Tests one of the most popular ways of constructing tests when both null and alternative hypotheses are composite (i.e. not a single point). Let

More information

Chapter 4. Theory of Tests. 4.1 Introduction

Chapter 4. Theory of Tests. 4.1 Introduction Chapter 4 Theory of Tests 4.1 Introduction Parametric model: (X, B X, P θ ), P θ P = {P θ θ Θ} where Θ = H 0 +H 1 X = K +A : K: critical region = rejection region / A: acceptance region A decision rule

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Lecture 2. Simple linear regression

Lecture 2. Simple linear regression Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Cosmological parameters A branch of modern cosmological research focuses on measuring

More information

Large Sample Properties & Simulation

Large Sample Properties & Simulation Large Sample Properties & Simulation Quantitative Microeconomics R. Mora Department of Economics Universidad Carlos III de Madrid Outline Large Sample Properties (W App. C3) 1 Large Sample Properties (W

More information

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that

H 2 : otherwise. that is simply the proportion of the sample points below level x. For any fixed point x the law of large numbers gives that Lecture 28 28.1 Kolmogorov-Smirnov test. Suppose that we have an i.i.d. sample X 1,..., X n with some unknown distribution and we would like to test the hypothesis that is equal to a particular distribution

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1

Estimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1 Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1

More information

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population Lecture 5 1 Lecture 3 The Population Variance The population variance, denoted σ 2, is the sum of the squared deviations about the population mean divided by the number of observations in the population,

More information