Similar documents
Generalized Linear Models Introduction

Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

MLES & Multivariate Normal Theory

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Answer Key for STAT 200B HW No. 7

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

AMS-207: Bayesian Statistics

Exercises and Answers to Chapter 1

Lecture 15. Hypothesis testing in the linear model

Linear Regression Models P8111

Ch 2: Simple Linear Regression

Math 494: Mathematical Statistics

Master s Written Examination

Introduction to Estimation Methods for Time Series models Lecture 2

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

This paper is not to be removed from the Examination Halls

8. Hypothesis Testing

Poisson regression 1/15

Statistics and Econometrics I

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

Master s Written Examination

Multivariate Regression

HT Introduction. P(X i = x i ) = e λ λ x i

Problem 1 (20) Log-normal. f(x) Cauchy

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

2017 Financial Mathematics Orientation - Statistics

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Generalized Linear Models. Kurt Hornik

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Answer Key for STAT 200B HW No. 8

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Probability and Statistics Notes

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

FIRST MIDTERM EXAM ECON 7801 SPRING 2001

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Stat 579: Generalized Linear Models and Extensions

MAS223 Statistical Inference and Modelling Exercises

Regression Estimation Least Squares and Maximum Likelihood

[y i α βx i ] 2 (2) Q = i=1

Likelihoods for Generalized Linear Models

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Central Limit Theorem ( 5.3)

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Generalized Linear Models

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Multiple Linear Regression

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Association studies and regression

A Very Brief Summary of Statistical Inference, and Examples

Classification. Chapter Introduction. 6.2 The Bayes classifier

Summary of Chapters 7-9

STAT 512 sp 2018 Summary Sheet

Approximating models. Nancy Reid, University of Toronto. Oxford, February 6.

Sampling distribution of GLM regression coefficients

Chapter 3: Maximum Likelihood Theory

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

STAT 730 Chapter 4: Estimation

Likelihood Ratio tests

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

SB1a Applied Statistics Lectures 9-10

Generalized Linear Models (1/29/13)

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Loglikelihood and Confidence Intervals

Linear Methods for Prediction

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

A Very Brief Summary of Statistical Inference, and Examples

Chapter 7. Hypothesis Testing

Outline of GLMs. Definitions

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Composite Hypotheses and Generalized Likelihood Ratio Tests

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Statistics & Data Sciences: First Year Prelim Exam May 2018

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Bayesian Inference. Chapter 9. Linear models and regression

Math 152. Rumbos Fall Solutions to Assignment #12

Chapter 17: Undirected Graphical Models

Topic 19 Extensions on the Likelihood Ratio

Section 4.6 Simple Linear Regression

Beyond GLM and likelihood

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Ch. 5 Hypothesis Testing

7. Estimation and hypothesis testing. Objective. Recommended reading

MATH5745 Multivariate Methods Lecture 07

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Simple and Multiple Linear Regression

where x and ȳ are the sample means of x 1,, x n

Notes on the Multivariate Normal and Related Topics

MIT Spring 2016

Model comparison and selection

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Transcription:

2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats The two hinges are versions of the first and third quartile, ie, close to quantilex, c1,3/4 The hinges equal the quartiles for odd n where n <- lengthx and differ for even n Whereas the quartiles only equal observations for n %% 4 == 1 n = 1 mod 4, the hinges do so additionally for n %% 4 == 2 n = 2 mod 4, and are in the middle of two observations otherwise hinge R quantilex, c1,3/4 1 2 n 4 1 2 2 1 1, 1, 1 1, 2 1, 15, 2 1, 2, 3 15, 20, 25 1, 2, 3, 4 15, 25, 35 1, 2, 3, 4, 5 2, 3, 4 1, 2, 3, 4, 5, 6 2, 35, 5 1, 2, 3, 4, 5, 6, 7 25, 40, 55 1, 2, 3, 4, 5, 6, 7, 8 25, 45, 65 1

14 a 1 d 1 > b 1 c 1, a 2 d 2 > b 2 c 2 a 1 + a 2 d 1 + d 2 < b 1 + b 2 c 1 + c 2 a i, b i, c i, d i c 1 + c 2,d 1 + d 2 c 2,d 2 a 1 + a 2,b 1 + b 2 a 1,b 1 a 2,b 2 c 1,d 1 21 22 23 1 065 p k n 1 p k 1 k! e 1 24 We obtain an estimate of the probability and its standard error as follows: ˆp = 03118, ˆp1 ˆp/N = 00046, which depend on the random seed Here N = 10 4 denotes the number of experiments The value we want to compute is p = i+j+k+l+m+r=10, maxi,j,k,l,m,r=4 10! i!j!k!l!m!r! 1 10 6 One can obtain p = 18774000 6 10 = 03104876 by a brute-force method If you are interested in a faster algorithm, refer to C J Corrado 2011 The exact distribution of the maximum, minimum and the range of multinomial/dirichlet and multivariate hypergeometric frequencies, Stat Comput, 21, 349 359 31, 32 33 34 ŷx = 71186 + 03763x ŷt = 204 832 cos2πt/12 595 sin2πt/12 35 â = ȳ, ˆb i = r xi ys y /s xi 2

35 By direct computation, the regression equations are ŷx 1, x 2 = x 1 + 3x 2 and ŷx 1 = 35 19 35 x 1 35, respectively The sign of the coefficient of x 1 is changed 36 37 is P 2 = P P = P P X Let X = QR be the QR decomposition of X Then the regression coefficient vector ˆβ = X X 1 X y = R Q QR 1 R Q y = R 1 Q y Let z = Q y Since R is an upper triangular matrix, the equation R ˆβ = z is quickly solved by the backward substitution This algorithm is numerically more stable than solving the normal equation directly In terms of numerical linear algebra, the condition number 1 of R is much smaller than that of X X Here we only give an example Let X = 1 100 0 1, y = 1 003 Then the two equations R ˆβ = Q y and X X ˆβ = X y are 1 100 ˆβ1 1 = 0 1 ˆβ 2 003 and 1 100 ˆβ1 1 =, 100 10001 ˆβ 2 10003 respectively Examine the Gaussian elimination method What happens if the 10003 is rounded to 1000? 41, 42 43 See the following table positive any square symmetric definite R function spectral decomposition eigen singular value decomposition SVD svd Cholesky decomposition chol QR decomposition qr Jordan canonical form Schur canonical form LU decomposition Sylvester canonical form The spectral decomposition is available only if all the eigenvectors span the whole space 1 The condition number of a square matrix is defined by the ratio of the maximum singular value to the minimum singular value A linear equation with large condition number is hard to solve numerically 3

44 Denote the spectral decomposition of K by K = n i=1 λ iq i q i Let r = minn, p and assume that λ 1 > > λ r > 0 Then, for 1 i r, the scores of the i-th principal component are given by λ i q i Indeed, let X = r i=1 d iu i v i be the singular value decomposition Then we have K = r i=1 d2 i u iu i and therefore d i = λ i and u i = q i for 1 i r 45 46, 51 52 53 54 fx = 81 85 x 1 + x 2 1 55, 56 57 ROC AUC 075 True positive rate 00 02 04 06 08 10 00 02 04 06 08 10 False positive rate 58 ROC x, y X Y X, Y = 1 y, 1 x y = 1 x AUC 59, 61 62 hθ ĥ = ĥx 1,, X n hθ = E θ [ĥ] hθ = n ĥx 1,, x n θ xt 1 θ 1 xt, θ 0, 1 x {0,1} n t=1 hθ = 1/θ θ 0 θ 0 ĥ0,, 0 63 E[ˆµ] = n i=1 w iµ = µ n i=1 w i = 1 Lagrange 64 4

65 i 1, ii 1/θ, iii 1/2 66 X, Y I X θ, I Y θ I Y θ = f Y y; θ{ θ log f Y y; θ} 2 dy = f X gy; θg y{ θ log f X gy; θ} 2 dy = f X x; θ{ θ log f X x; θ} 2 dx = I X θ 67 E[{ θ log fx; θ} 2 ] = { θ fx; θ}{ θ log fx; θ}dx { } = θ fx; θ θ log fx; θdx fx; θ θ 2 log fx; θdx = E[ θ 2 log fx; θ] 68 69 i E θ [X] = {θ 1 + θ + θ + 1}/3 = θ ii θ ϕx ϕθ 1 + ϕθ + ϕθ + 1 3 = θ, θ Z = {0, ±1, }, ϕ 1 = ϕ0 = ϕ1 = 0 ϕx ϕ2 = ϕ3 = ϕ4 = 3, ϕ5 = ϕ6 = ϕ7 = 6, ϕx θ = 0 0 ϕ0 = ϕ1 = ϕ2 = 0 θ = 1 0 MVUE ˆθ 2 θ = 0, 1 V θ [ˆθ ] = 0 ˆθ 0 = ˆθ 1 = ˆθ 2 = ˆθ 3 MVUE 610 Nθ, 1 ˆθ = X θ ˆθ 2 θ 2 E[ˆθ 2 ] = θ 2 + 1/n 71 72 θ Lθ ˆθ θ ˆθ Lθ < Lˆθ ϕ = hθ Lh 1 ϕ ϕ ˆϕ ϕ Lh 1 ϕ Lh 1 ˆϕ h h 1 ˆϕ = ˆθ 73 Γα + 1 = αγα E[X] = 0 β α x α e βx dx = 1 Γα βγα V[X] = E[X 2 ] E[X] 2 = 0 0 β α x α+1 e βx Γα 5 z α e z dz = Γα + 1 βγα = α β, dx α2 α + 1α = β2 β 2 α2 β 2 = α β 2

74, 75 76 i ii fx; p = r+x 1 x expx log1 p + r log p θ = log1 p, sx = x ψθ = r log p = r log1 e θ iii x k = 1 k 1 i=1 x i fx; p = exp k 1 i=1 x i logp i /p k + log p k θ i = logp i /p k, s i x = x i 1 i k 1 ψθ = log p k = log1 + k 1 i=1 eθ i 77 fx; θ = axe θsx ψθ i Iθ = E θ [ θ 2 log fx; θ] = E θ[ψ θ] = ψ θ ii E θ [ θ log fx; θ] = 0 µθ = E θ [sx] = ψ θ 71 ψ θ > 0 µθ iii Iµ = Iθ/dµ/dθ 2 i, ii I µ = 1/ψ θ iv V θ [sx t ] = 1/Iµ 71 78 79 81 cos2πx 1 E[cos2πX 1 ] = 1 0 cos2πxdx = 0, V[cos2πX 1] = E[cos 2 2πX 1 ] = 1 0 cos2 2πxdx = 1 2 Z n/ n N0, 1/2 82 83 84 i N0, p1 p ii ˆp ± 196 ˆp1 ˆp/ n 85 86 X = 099 ˆθ = 21 X = 002 ˆθ X 0134 2 0268 95% 002 ± 196 0268 = 002 ± 053 V[ˆθ] = 4 n V[X 1] = 4 n 1 θ/2 θ2 /4 ˆθ = 002 V[ˆθ] 1/2 020 87 91 005, 001, 0001 c 196/ n, 258/ n, 329/ n R = {x X c} 164/ n, 233/ n, 309/ n 92 6

93 i Lθ = n t=1 θxt 1 θ 1 xt ˆθ = n 1 n LLR = 2 log Lˆθ n Lθ 0 = 2 x t log ˆθ + 1 x t log 1 ˆθ θ 0 ii LLR = 2n ˆθ log ˆθ θ ˆθ + θ 0 0 iii LLR = 2n log ˆθ = 2n t=1 1 + θ 0 θ 0 ˆθ iv LLR = n log ˆσ2 σ0 2 1 + ˆσ2 + ˆµ µ 0 2 σ0 2 ˆθ log ˆθ θ 0 + 1 ˆθ log 1 ˆθ 1 θ 0 1 θ 0 fx; θ = axe θsx ψθ θ = θ 0 LLR n t=1 2 log fx t; ˆθ n ˆθ t=1 fx t; θ 0 = 2n θ 0 ψ ˆθ ψˆθ + ψθ 0 ψ ˆθ = n 1 n t=1 sx t t=1 x t 94, 95, 96 97 i MLE ˆθ = x/n θ 1 = θ 3 MLE θ 1 = θ 3 = x 1 + x 3 /2n, θ 2 = 1 2 θ 1 ii MLE ˆθ = x 17 n = 40, 10 40, 13 x1 + x 3, θ = 40 2n, x 2 n, x 1 + x 3 2n T x = 2 17 log 17 10 13 + 10 log + 13 log = 0535 15 10 15 = 15 40, 10 40, 15 40 1 5% 384 p 0465 98, 101 102 The likelihood function is Lµ, σ 2 = 2πσ 2 n/2 e y µ 2 /2σ 2, µ M, σ 2 > 0 The maximum likelihood estimator MLE of µ M and σ 2 > 0 is given by ˆµ = P y and ˆσ 2 = y P y 2 /n Note that ˆσ 2 is not unbiased Similarly, the MLE under the null hypothesis µ M 0 is ˆµ 0 = P 0 y and ˆσ 2 0 = y P 0y 2 /n Then the log-likelihood ratio test statistic is 2 log L ˆµ, ˆσ2 L ˆµ 0, ˆσ 0 2 = n log y ˆσ2 ˆµ 2 ˆσ 2 + n log ˆσ 0 2 + y ˆµ 0 2 ˆσ 0 2 = n log ˆσ 2 + n log ˆσ 2 0 = n log y P 0y 2 y P y 2 7

103 P P 0 R 2 = P y P 0 y 2 / y P 0 y 2 F F y = P y P 0 y 2 /p p 0 / y P y 2 /n p R 2 P y P 0 y 2 p p 0 = y P y 2 + P y P 0 y 2 = n p F y 1 + p p 0 n p F y R 2 F y 104 A statistical model for a paired sample is X i Nµ i, σ 2 /2 and Y i Nµ i +a, σ 2 /2, where µ i and a are unknown The null hypothesis is a = 0 The t-test statistic is nȳ x T x, y =, ˆσ 2 = 1 n y i x i ȳ x 2, ˆσ n 1 with the degree of freedom n 1 A statistical model for unpaired two samples is X i Nµ, σ 2 and Y j Nµ + a, σ 2, where µ and a are unknown The null hypothesis is a = 0 Note that µ cannot depend on the index i in contrast to the paired samples The t-test statistic is T x, y = n1 n 2 n 1 + n 2 ȳ x ˆσ, ˆσ 2 = i=1 1 n 1 + n 2 2 n 1 n 2 x i x 2 + y j ȳ 2, with the degree of freedom n 1 + n 2 2 The estimate ˆσ 2 is called the pooled variance Even if n 1 = n 2, the statistic T x, y is different from T x, y Indeed, if n 1 = n 2 = n, nȳ x T x, y =, ˆτ 2 = 1 n {x i x 2 + y i ȳ 2 } ˆτ n 1 It is easy to see that T x, y > T x, y if and only if x and y have positive correlation For example, let n 1 = n 2 = 2, x 1, y 1 = 0, 0 and x 2, y 2 = 50, 51 Then T x, y = 1 and T x, y = 0014 The p-value for each statistic is 025 and 0495, respectively i=1 i=1 j=1 105, 106 107 Let y it 1 i 3, 1 t 4 be the observed data The statistical model is Y it = a i + ε it, ε it N0, σ 2 The F-test statistic for the null hypothesis a 1 = a 2 = a 3 is F = 3 4 i=1 t=1 ȳ i ȳ 2 /3 1 3 4 i=1 t=1 y it ȳ i 2 /12 3 = 835/2 505/9 = 417 056 = 744 In summary, we obtain the following analysis-of-variance ANOVA table: sum of squares degree of freedom variance F-value p-value motor 835 2 417 744 0012 residuals 505 9 056 total 1340 11 122 The p-value is smaller than 005 ie, significant at the level 005, and therefore we will reject the null hypothesis a 1 = a 2 = a 3 In fact, the motor A 3 seems to have better performance than the others since ȳ 1 = 1552, ȳ 2 = 1572 and ȳ 3 = 1386 8

111 n e β x t y t n e β x t y t Lβ =, Lβ = e eβ x t 1 + e β x t y t! t=1 t=1 113 Y 1, Y 2 µ 1, µ 2 Y 1 + Y 2 µ 1 + µ 2 Y 1 + Y 2 Y 1, Y 2 PY 1 = y 1, Y 2 = y 2 PY 1 + Y 2 = y 1 + y 2 = µ y 1 1 y 1! e µ 1 µy2 2 y 2! e µ 2 µ 1 + µ 2 y 1+y 2 = e y 1 + y 2! µ 1 µ 2 114 i 87 ii y 1 + y 2! y 1!y 2! µ1 µ 1 + µ 2 y1 µ2 µ 1 + µ 2 Model fy ϕ ay, ϕ ψη ψ 1 µ Normal linear 2πϕ 1/2 e y η2 /2ϕ σ 2 2πϕ 1/2 e y2 /2ϕ η 2 /2 µ Logistic e ηy /e η + 1 1 1 loge η + 1 logµ/1 µ Poisson e ηy /y!e eη 1 1/y! e η log µ y2 115 116 Here is a part of the output: Coefficients: Estimate Std Error z value Pr> z Intercept -2421513 1206251-2007 00447 * stadiumhome 0420067 0218898 1919 00550 rank1 0051114 0024695 2070 00385 * rank2 0003833 0002246 1707 00879 --- Signif codes: 0 *** 0001 ** 001 * 005 01 1 The z value is the ratio of the estimate to the standard error For example, the z value of the intercept is 2421513/1206251 = 2007 Its p-value is P Z 2007 = 00447, where Z N0, 1 The variable stadium is a factor object and automatically encoded as 1 if stadium == Home and 0 if stadium == Away In the three explanatory variables, only rank1 is 5% significant 121 122 rg rf = 2 fx logfx/gxdx 0, g = f 68 123 Let ŷ k t be the fitted values predicted values of y t for each model k = 0, 1,, 5 The squared prediction error is n 1 n t=1 ỹ t ŷ k t 2, where n = 12 The AIC of the model k is given by AICk = n log ˆσ k 2 + 22k + 2, where ˆσ2 k = n 1 n t=1 y t ŷ k t 2 is the MLE of the variance parameter σ 2 By numerical computation, we obtain the following table of the prediction error and AIC k 0 1 2 3 4 5 prediction error 4888 121 134 108 107 094 AIC 5072 845 899 961 1057 297 9

The number k which minimizes the prediction error is 5, and k which minimizes AIC is also 5 However, there is a large gap between the two models k = 0 and k = 1 Furthermore, in practice, the number of parameters in minimizing AIC is recommended to be at most n/2, where n is the sample size Then we may select the model k = 1 124 The AIC values up to an additive constant of all submodels are shown in the following table, where 123 denotes the submodel using the variables x 1, x 2, x 3 and so on model 1234 123 124 134 234 AIC 6171 6107 8337 7112 6060 model 12 13 14 23 24 34 AIC 8266 7156 8766 6010 8191 6939 model 1 2 3 4 AIC 8633 8119 6963 8579 8439 The submodel selected by the backward selection method is 23, and the linear predictor is µ log 1 µ = 1424 6751 10 5 GDP per capita + 122 10 2 population density, where µ denotes the probability that the country is in Asia 125 We first show that E[ P Y µ 2 ] = p for any orthogonal projection matrix P onto a p-dimensional subspace Indeed, E [ P Y µ 2] = E [ Y µ P P Y µ ] = E [ tr P Y µy µ P ] trab = trba = tr P E[Y µy µ ]P = trp P Y N0, I n = trp 2 = trp = p i Since Y and Ỹ are iid, we have [ E Ỹ P Y 2] = E [ Ỹ µ + µ P µ + P µ P Y 2] = E [ Ỹ µ 2] + µ P µ 2 + E [ P Y µ 2] ii In a similar manner, we obatin iii The log-likelihood function is = n + µ P µ 2 + p E [ Y P Y 2] = E [ I n P Y 2] = E [ I n P Y µ 2] + I n P µ 2 = n p + µ P µ 2 log Lµ = n 2 log2π 1 2 Y µ 2 The MLE of µ in the subspace M is ˆµ = P Y Therefore AIC of the model M is the same as Y P Y 2 + 2p except for a constant term n log2π Finally, we obtain from the result of ii and i E[ Y P Y 2 + 2p] = µ P µ 2 + n p + 2p = E[ Ỹ P Y 2 ] 10