- PDF Free Download

Size: px

Start display at page:

Download ""

Clifton Wheeler
5 years ago
Views:

1 sei@mistiu-tokyoacjp R?boxplot boxplotstats which does the computation?boxplotstats The two hinges are versions of the first and third quartile, ie, close to quantilex, c1,3/4 The hinges equal the quartiles for odd n where n <- lengthx and differ for even n Whereas the quartiles only equal observations for n %% 4 == 1 n = 1 mod 4, the hinges do so additionally for n %% 4 == 2 n = 2 mod 4, and are in the middle of two observations otherwise hinge R quantilex, c1,3/4 1 2 n , 1, 1 1, 2 1, 15, 2 1, 2, 3 15, 20, 25 1, 2, 3, 4 15, 25, 35 1, 2, 3, 4, 5 2, 3, 4 1, 2, 3, 4, 5, 6 2, 35, 5 1, 2, 3, 4, 5, 6, 7 25, 40, 55 1, 2, 3, 4, 5, 6, 7, 8 25, 45, 65 1

2 14 a 1 d 1 > b 1 c 1, a 2 d 2 > b 2 c 2 a 1 + a 2 d 1 + d 2 < b 1 + b 2 c 1 + c 2 a i, b i, c i, d i c 1 + c 2,d 1 + d 2 c 2,d 2 a 1 + a 2,b 1 + b 2 a 1,b 1 a 2,b 2 c 1,d p k n 1 p k 1 k! e 1 24 We obtain an estimate of the probability and its standard error as follows: ˆp = 03118, ˆp1 ˆp/N = 00046, which depend on the random seed Here N = 10 4 denotes the number of experiments The value we want to compute is p = i+j+k+l+m+r=10, maxi,j,k,l,m,r=4 10! i!j!k!l!m!r! One can obtain p = = by a brute-force method If you are interested in a faster algorithm, refer to C J Corrado 2011 The exact distribution of the maximum, minimum and the range of multinomial/dirichlet and multivariate hypergeometric frequencies, Stat Comput, 21, , ŷx = x ŷt = cos2πt/ sin2πt/12 35 â = ȳ, ˆb i = r xi ys y /s xi 2

3 35 By direct computation, the regression equations are ŷx 1, x 2 = x 1 + 3x 2 and ŷx 1 = x 1 35, respectively The sign of the coefficient of x 1 is changed is P 2 = P P = P P X Let X = QR be the QR decomposition of X Then the regression coefficient vector ˆβ = X X 1 X y = R Q QR 1 R Q y = R 1 Q y Let z = Q y Since R is an upper triangular matrix, the equation R ˆβ = z is quickly solved by the backward substitution This algorithm is numerically more stable than solving the normal equation directly In terms of numerical linear algebra, the condition number 1 of R is much smaller than that of X X Here we only give an example Let X = , y = Then the two equations R ˆβ = Q y and X X ˆβ = X y are ˆβ1 1 = 0 1 ˆβ and ˆβ1 1 =, ˆβ respectively Examine the Gaussian elimination method What happens if the is rounded to 1000? 41, See the following table positive any square symmetric definite R function spectral decomposition eigen singular value decomposition SVD svd Cholesky decomposition chol QR decomposition qr Jordan canonical form Schur canonical form LU decomposition Sylvester canonical form The spectral decomposition is available only if all the eigenvectors span the whole space 1 The condition number of a square matrix is defined by the ratio of the maximum singular value to the minimum singular value A linear equation with large condition number is hard to solve numerically 3

4 44 Denote the spectral decomposition of K by K = n i=1 λ iq i q i Let r = minn, p and assume that λ 1 > > λ r > 0 Then, for 1 i r, the scores of the i-th principal component are given by λ i q i Indeed, let X = r i=1 d iu i v i be the singular value decomposition Then we have K = r i=1 d2 i u iu i and therefore d i = λ i and u i = q i for 1 i r 45 46, fx = x 1 + x , ROC AUC 075 True positive rate False positive rate 58 ROC x, y X Y X, Y = 1 y, 1 x y = 1 x AUC 59, hθ ĥ = ĥx 1,, X n hθ = E θ [ĥ] hθ = n ĥx 1,, x n θ xt 1 θ 1 xt, θ 0, 1 x {0,1} n t=1 hθ = 1/θ θ 0 θ 0 ĥ0,, 0 63 E[ˆµ] = n i=1 w iµ = µ n i=1 w i = 1 Lagrange 64 4

5 65 i 1, ii 1/θ, iii 1/2 66 X, Y I X θ, I Y θ I Y θ = f Y y; θ{ θ log f Y y; θ} 2 dy = f X gy; θg y{ θ log f X gy; θ} 2 dy = f X x; θ{ θ log f X x; θ} 2 dx = I X θ 67 E[{ θ log fx; θ} 2 ] = { θ fx; θ}{ θ log fx; θ}dx { } = θ fx; θ θ log fx; θdx fx; θ θ 2 log fx; θdx = E[ θ 2 log fx; θ] i E θ [X] = {θ 1 + θ + θ + 1}/3 = θ ii θ ϕx ϕθ 1 + ϕθ + ϕθ = θ, θ Z = {0, ±1, }, ϕ 1 = ϕ0 = ϕ1 = 0 ϕx ϕ2 = ϕ3 = ϕ4 = 3, ϕ5 = ϕ6 = ϕ7 = 6, ϕx θ = 0 0 ϕ0 = ϕ1 = ϕ2 = 0 θ = 1 0 MVUE ˆθ 2 θ = 0, 1 V θ [ˆθ ] = 0 ˆθ 0 = ˆθ 1 = ˆθ 2 = ˆθ 3 MVUE 610 Nθ, 1 ˆθ = X θ ˆθ 2 θ 2 E[ˆθ 2 ] = θ 2 + 1/n θ Lθ ˆθ θ ˆθ Lθ < Lˆθ ϕ = hθ Lh 1 ϕ ϕ ˆϕ ϕ Lh 1 ϕ Lh 1 ˆϕ h h 1 ˆϕ = ˆθ 73 Γα + 1 = αγα E[X] = 0 β α x α e βx dx = 1 Γα βγα V[X] = E[X 2 ] E[X] 2 = 0 0 β α x α+1 e βx Γα 5 z α e z dz = Γα + 1 βγα = α β, dx α2 α + 1α = β2 β 2 α2 β 2 = α β 2

6 74, i ii fx; p = r+x 1 x expx log1 p + r log p θ = log1 p, sx = x ψθ = r log p = r log1 e θ iii x k = 1 k 1 i=1 x i fx; p = exp k 1 i=1 x i logp i /p k + log p k θ i = logp i /p k, s i x = x i 1 i k 1 ψθ = log p k = log1 + k 1 i=1 eθ i 77 fx; θ = axe θsx ψθ i Iθ = E θ [ θ 2 log fx; θ] = E θ[ψ θ] = ψ θ ii E θ [ θ log fx; θ] = 0 µθ = E θ [sx] = ψ θ 71 ψ θ > 0 µθ iii Iµ = Iθ/dµ/dθ 2 i, ii I µ = 1/ψ θ iv V θ [sx t ] = 1/Iµ cos2πx 1 E[cos2πX 1 ] = 1 0 cos2πxdx = 0, V[cos2πX 1] = E[cos 2 2πX 1 ] = 1 0 cos2 2πxdx = 1 2 Z n/ n N0, 1/ i N0, p1 p ii ˆp ± 196 ˆp1 ˆp/ n X = 099 ˆθ = 21 X = 002 ˆθ X % 002 ± = 002 ± 053 V[ˆθ] = 4 n V[X 1] = 4 n 1 θ/2 θ2 /4 ˆθ = 002 V[ˆθ] 1/ , 001, 0001 c 196/ n, 258/ n, 329/ n R = {x X c} 164/ n, 233/ n, 309/ n 92 6

7 93 i Lθ = n t=1 θxt 1 θ 1 xt ˆθ = n 1 n LLR = 2 log Lˆθ n Lθ 0 = 2 x t log ˆθ + 1 x t log 1 ˆθ θ 0 ii LLR = 2n ˆθ log ˆθ θ ˆθ + θ 0 0 iii LLR = 2n log ˆθ = 2n t=1 1 + θ 0 θ 0 ˆθ iv LLR = n log ˆσ2 σ ˆσ2 + ˆµ µ 0 2 σ0 2 ˆθ log ˆθ θ ˆθ log 1 ˆθ 1 θ 0 1 θ 0 fx; θ = axe θsx ψθ θ = θ 0 LLR n t=1 2 log fx t; ˆθ n ˆθ t=1 fx t; θ 0 = 2n θ 0 ψ ˆθ ψˆθ + ψθ 0 ψ ˆθ = n 1 n t=1 sx t t=1 x t 94, 95, i MLE ˆθ = x/n θ 1 = θ 3 MLE θ 1 = θ 3 = x 1 + x 3 /2n, θ 2 = 1 2 θ 1 ii MLE ˆθ = x 17 n = 40, 10 40, 13 x1 + x 3, θ = 40 2n, x 2 n, x 1 + x 3 2n T x = 2 17 log log + 13 log = = 15 40, 10 40, % 384 p , The likelihood function is Lµ, σ 2 = 2πσ 2 n/2 e y µ 2 /2σ 2, µ M, σ 2 > 0 The maximum likelihood estimator MLE of µ M and σ 2 > 0 is given by ˆµ = P y and ˆσ 2 = y P y 2 /n Note that ˆσ 2 is not unbiased Similarly, the MLE under the null hypothesis µ M 0 is ˆµ 0 = P 0 y and ˆσ 2 0 = y P 0y 2 /n Then the log-likelihood ratio test statistic is 2 log L ˆµ, ˆσ2 L ˆµ 0, ˆσ 0 2 = n log y ˆσ2 ˆµ 2 ˆσ 2 + n log ˆσ y ˆµ 0 2 ˆσ 0 2 = n log ˆσ 2 + n log ˆσ 2 0 = n log y P 0y 2 y P y 2 7

8 103 P P 0 R 2 = P y P 0 y 2 / y P 0 y 2 F F y = P y P 0 y 2 /p p 0 / y P y 2 /n p R 2 P y P 0 y 2 p p 0 = y P y 2 + P y P 0 y 2 = n p F y 1 + p p 0 n p F y R 2 F y 104 A statistical model for a paired sample is X i Nµ i, σ 2 /2 and Y i Nµ i +a, σ 2 /2, where µ i and a are unknown The null hypothesis is a = 0 The t-test statistic is nȳ x T x, y =, ˆσ 2 = 1 n y i x i ȳ x 2, ˆσ n 1 with the degree of freedom n 1 A statistical model for unpaired two samples is X i Nµ, σ 2 and Y j Nµ + a, σ 2, where µ and a are unknown The null hypothesis is a = 0 Note that µ cannot depend on the index i in contrast to the paired samples The t-test statistic is T x, y = n1 n 2 n 1 + n 2 ȳ x ˆσ, ˆσ 2 = i=1 1 n 1 + n 2 2 n 1 n 2 x i x 2 + y j ȳ 2, with the degree of freedom n 1 + n 2 2 The estimate ˆσ 2 is called the pooled variance Even if n 1 = n 2, the statistic T x, y is different from T x, y Indeed, if n 1 = n 2 = n, nȳ x T x, y =, ˆτ 2 = 1 n {x i x 2 + y i ȳ 2 } ˆτ n 1 It is easy to see that T x, y > T x, y if and only if x and y have positive correlation For example, let n 1 = n 2 = 2, x 1, y 1 = 0, 0 and x 2, y 2 = 50, 51 Then T x, y = 1 and T x, y = 0014 The p-value for each statistic is 025 and 0495, respectively i=1 i=1 j=1 105, Let y it 1 i 3, 1 t 4 be the observed data The statistical model is Y it = a i + ε it, ε it N0, σ 2 The F-test statistic for the null hypothesis a 1 = a 2 = a 3 is F = 3 4 i=1 t=1 ȳ i ȳ 2 / i=1 t=1 y it ȳ i 2 /12 3 = 835/2 505/9 = = 744 In summary, we obtain the following analysis-of-variance ANOVA table: sum of squares degree of freedom variance F-value p-value motor residuals total The p-value is smaller than 005 ie, significant at the level 005, and therefore we will reject the null hypothesis a 1 = a 2 = a 3 In fact, the motor A 3 seems to have better performance than the others since ȳ 1 = 1552, ȳ 2 = 1572 and ȳ 3 =

9 111 n e β x t y t n e β x t y t Lβ =, Lβ = e eβ x t 1 + e β x t y t! t=1 t=1 113 Y 1, Y 2 µ 1, µ 2 Y 1 + Y 2 µ 1 + µ 2 Y 1 + Y 2 Y 1, Y 2 PY 1 = y 1, Y 2 = y 2 PY 1 + Y 2 = y 1 + y 2 = µ y 1 1 y 1! e µ 1 µy2 2 y 2! e µ 2 µ 1 + µ 2 y 1+y 2 = e y 1 + y 2! µ 1 µ i 87 ii y 1 + y 2! y 1!y 2! µ1 µ 1 + µ 2 y1 µ2 µ 1 + µ 2 Model fy ϕ ay, ϕ ψη ψ 1 µ Normal linear 2πϕ 1/2 e y η2 /2ϕ σ 2 2πϕ 1/2 e y2 /2ϕ η 2 /2 µ Logistic e ηy /e η loge η + 1 logµ/1 µ Poisson e ηy /y!e eη 1 1/y! e η log µ y Here is a part of the output: Coefficients: Estimate Std Error z value Pr> z Intercept * stadiumhome rank * rank Signif codes: 0 *** 0001 ** 001 * The z value is the ratio of the estimate to the standard error For example, the z value of the intercept is / = 2007 Its p-value is P Z 2007 = 00447, where Z N0, 1 The variable stadium is a factor object and automatically encoded as 1 if stadium == Home and 0 if stadium == Away In the three explanatory variables, only rank1 is 5% significant rg rf = 2 fx logfx/gxdx 0, g = f Let ŷ k t be the fitted values predicted values of y t for each model k = 0, 1,, 5 The squared prediction error is n 1 n t=1 ỹ t ŷ k t 2, where n = 12 The AIC of the model k is given by AICk = n log ˆσ k k + 2, where ˆσ2 k = n 1 n t=1 y t ŷ k t 2 is the MLE of the variance parameter σ 2 By numerical computation, we obtain the following table of the prediction error and AIC k prediction error AIC

10 The number k which minimizes the prediction error is 5, and k which minimizes AIC is also 5 However, there is a large gap between the two models k = 0 and k = 1 Furthermore, in practice, the number of parameters in minimizing AIC is recommended to be at most n/2, where n is the sample size Then we may select the model k = The AIC values up to an additive constant of all submodels are shown in the following table, where 123 denotes the submodel using the variables x 1, x 2, x 3 and so on model AIC model AIC model AIC The submodel selected by the backward selection method is 23, and the linear predictor is µ log 1 µ = GDP per capita population density, where µ denotes the probability that the country is in Asia 125 We first show that E[ P Y µ 2 ] = p for any orthogonal projection matrix P onto a p-dimensional subspace Indeed, E [ P Y µ 2] = E [ Y µ P P Y µ ] = E [ tr P Y µy µ P ] trab = trba = tr P E[Y µy µ ]P = trp P Y N0, I n = trp 2 = trp = p i Since Y and Ỹ are iid, we have [ E Ỹ P Y 2] = E [ Ỹ µ + µ P µ + P µ P Y 2] = E [ Ỹ µ 2] + µ P µ 2 + E [ P Y µ 2] ii In a similar manner, we obatin iii The log-likelihood function is = n + µ P µ 2 + p E [ Y P Y 2] = E [ I n P Y 2] = E [ I n P Y µ 2] + I n P µ 2 = n p + µ P µ 2 log Lµ = n 2 log2π 1 2 Y µ 2 The MLE of µ in the subspace M is ˆµ = P Y Therefore AIC of the model M is the same as Y P Y 2 + 2p except for a constant term n log2π Finally, we obtain from the result of ii and i E[ Y P Y 2 + 2p] = µ P µ 2 + n p + 2p = E[ Ỹ P Y 2 ] 10

Generalized Linear Models Introduction

Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,