The written Master s Examination

Similar documents
First Year Examination Department of Statistics, University of Florida

Composite Hypotheses testing

Statistics II Final Exam 26/6/18

Chapter 11: Simple Linear Regression and Correlation

x = , so that calculated

Negative Binomial Regression

Chapter 13: Multiple Regression

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Stat 543 Exam 2 Spring 2016

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Statistics for Economics & Business

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Stat 543 Exam 2 Spring 2016

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Lecture 4: Universal Hash Functions/Streaming Cont d

STAT 511 FINAL EXAM NAME Spring 2001

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Goodness of fit and Wilks theorem

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

/ n ) are compared. The logic is: if the two

A Robust Method for Calculating the Correlation Coefficient

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

10-701/ Machine Learning, Fall 2005 Homework 3

Comparison of Regression Lines

Economics 130. Lecture 4 Simple Linear Regression Continued

STAT 3008 Applied Regression Analysis

Chapter 14 Simple Linear Regression

Chapter 3 Describing Data Using Numerical Measures

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Sampling Theory MODULE VII LECTURE - 23 VARYING PROBABILITY SAMPLING

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

e i is a random error

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Estimation: Part 2. Chapter GREG estimation

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

COS 521: Advanced Algorithms Game Theory and Linear Programming

Statistics MINITAB - Lab 2

Chapter 15 - Multiple Regression

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Probability Theory. The nth coefficient of the Taylor series of f(k), expanded around k = 0, gives the nth moment of x as ( ik) n n!

Lecture Notes on Linear Regression

Chapter 12 Analysis of Covariance

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Linear Approximation with Regularization and Moving Least Squares

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

PhysicsAndMathsTutor.com

Credit Card Pricing and Impact of Adverse Selection

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lecture 4 Hypothesis Testing

Chapter 15 Student Lecture Notes 15-1

Statistics for Business and Economics

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

4.3 Poisson Regression

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Chapter 9: Statistical Inference and the Relationship between Two Variables

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Module 9. Lecture 6. Duality in Assignment Problems

Problem Set 9 Solutions

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

CS-433: Simulation and Modeling Modeling and Probability Review

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

β0 + β1xi and want to estimate the unknown

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

CS286r Assign One. Answer Key

Joint Statistical Meetings - Biopharmaceutical Section

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Amiri s Supply Chain Model. System Engineering b Department of Mathematics and Statistics c Odette School of Business

Population element: 1 2 N. 1.1 Sampling with Replacement: Hansen-Hurwitz Estimator(HH)

SDMML HT MSc Problem Sheet 4

Limited Dependent Variables

Lecture 6 More on Complete Randomized Block Design (RBD)

Basic Business Statistics, 10/e

Statistics Chapter 4

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Math 217 Fall 2013 Homework 2 Solutions

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

On the Multicriteria Integer Network Flow Problem

Introduction to Regression

Polynomial Regression Models

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Expected Value and Variance

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Learning Objectives for Chapter 11

Lecture 3 Stat102, Spring 2007

RELIABILITY ASSESSMENT

Lecture 3: Probability Distributions

Transcription:

he wrtten Master s Eamnaton Opton Statstcs and Probablty SPRING 9 Full ponts may be obtaned for correct answers to 8 questons. Each numbered queston (whch may have several parts) s worth the same number of ponts. All answers wll be graded, but the score for the eamnaton wll be the sum of the scores of your best 8 solutons. Use separate answer sheets for each queston. DO NO PU YOUR NAME ON YOUR ANSWER SHEES. When you have fnshed, nsert all your answer sheets nto the envelope provded, then seal and prnt your name on t. Any student whose answers need clarfcaton may be requred to submt to an oral eamnaton.

MS Eam, Opton Probablty and Statstcs, SPRING 9. (Stat 4) Let U and V be two ndependent, standard normal random varables. (a) Fnd the jont dstrbuton of U + V and U V. U + V (b) Fnd the dstrbuton of. ( U V). (Stat 4) Let X,, X n be ndependent random varables wth X dstrbuted as N(βa,σ ), for =,,n, where a,..,a n are known (non-random) real numbers at least one of whch s non-zero and σ s known. () Derve the mamum lkelhood estmator (mle) of β. () Is the mle unbased? Justfy your answer. () Fnd the Rao-Cramer lower bound for unbased estmators of β. (v) Is the mle of β effcent (mnmum varance unbased)? Justfy your answer. 3. (Stat 4) Consder one random varable X that has a bnomal dstrbuton wth n = 4 and p = θ. [] Fnd the most powerful test of sze α / 6 based on X only for H : θ =.5 vs. H : θ =.75. [] Calculate the power of your test. [3] Justfy that your test s the most powerful one of sze /6. 4. (Stat 46) A chemst wshes to test the effect of four chemcal agents on the strength of a partcular type of cloth. Because there mght be varablty from one bolt to another, the chemst decdes to use a randomzed block desgn, wth the bolts of cloth consdered as blocks. She selects fve bolts and apples all four chemcals n random order to each bolt. he resultng tensle strengths follow. Bolt 3 4 5 Chemcal A 66 47 6 6 55 Chemcal B 68 48 69 7 55 Chemcal C 66 56 7 68 58 Chemcal D 6 47 64 63 5 Descrbe the basc model of the Fredman test, gve ts test statstc S, and compute S. Eplan also why the Fredman test s here more approprate than the Kruskal-Walls test.

MS Eam, Opton Probablty and Statstcs, SPRING 9 5. (Stat 43) A clent has a fnte populaton of 6 unts and has funds to survey only 3 unts for the purpose of estmatng the mean and ts related standard devaton. here are two samplng plans to be consdered by two survey practtoners. Samplng Plan by Survey Practtoner : Is a smple random samplng plan of sze 3 wthout replacement, SRS (6, 3). Recall that ths s a unform samplng plan wth all samples of sze 3 n ts support. Samplng Plan by Survey Practtoner : Is a controlled samplng plan wth only 4 samples: A- he followng s samples are ecluded from beng surveyed, that s they have zero probablty of selecton. {,,3 }, {,4,5}, {,5,6}, {,3,5}, {,4,6}, {3,4,6}. B- he followng s samples are assgned probablty of / of selecton each: {,,5}, {,3,5}, {,4,6}, {,3,4}, {,3,6}, {4,5,6}. C- he remanng 8 samples are assgned probablty of / of selecton each. Remark: Note that the total probablty over the chosen 4 samples s and the survey s not a unform samplng plan snce s of the samples are assgned twce as much probablty as the remanng 8 samples. Suppose both practtoners mplemented ther survey plans and the sample {,, 5} s selected by both of them and the related survey data are: Y =, Y = 96, and Y 5 = 3. - What are the H estmators and ther values of the populaton mean under both samplng plans? - What are the standard devatons of your estmator under both samplng plans? 3- If you were the consultng statstcan, whch of the above two samplng plans would you recommend to your clent and why? 6. (Stat 46) he number of customers enterng a store on a gven day s Posson dstrbuted wth mean λ =. he amount of money spent by a customer s unformly dstrbuted over (,). Fnd the mean and the varance of the amount of money that the store takes n on a gven day. 3

MS Eam, Opton Probablty and Statstcs, SPRING 9 7. (Stat 47) Home depot stocks wooden planks of length 9 feet. Home Bulders buy such planks and cut them accordng to ther needs. A home bulder needs planks of length 7 feet, planks of length 5 feet, 3 planks of length 4 feet. Suppose only the followng 3 cuttng patterns are used. 7 Feet 5 Feet wth cuttng patterns 3 4 Feet 3 (a). How many 9 feet planks are needed to meet the requrements usng only these cuttng patterns. What s the total length of the wasted porton? (b). Use revsed smple method to check whether there s a potental cuttng pattern that would reduce the number of planks needed. (c). Determne whch pattern has to be replaced by the entry of the new cuttng pattern? (d). How many 9 feet planks are needed to meet the requrements usng only the new cuttng patterns after dscontnung one of the old patterns? 8. (Stat 47) Show that = 5/6, = 5/, = 7 / 6 s optmal to the lnear programmng problem: ma 9 + 4 + 7 3 such that + + 3 6 3 5 + 4 + 3 3 5,, unrestrcted. 9. (Stat 473) On a lnear forest path of unt length covered wth bushes, Joe chooses secretly a locaton to hde and Bob chooses a locaton y to hde. he payoff to Joe from Bob s y y. Show that the strategy whch chooses a random pont y n [,] wth densty f ( ), y gves an epectaton = /6for all. Fnd the value of the game. 4

MS Eam, Opton Probablty and Statstcs, SPRING 9. (Stat 48) A dary company would lke to nvestgate f the mlk s contamnated. In order to nvestgate a possble shpment (batch) effect, the company selects 5 shpments at random. After processng each batch, 6 cartons of mlk are selected at random and are stored for several days. hen the square root of the bactera counts are recorded and denoted by Y j, where =,...,5 and j=,...,6. (a). If the shpment effect s denoted as τ, wrte down ANOVA model and specfy requred dstrbuton of random components n the model. Please state the hypotheses. (b). Complete the followng table and conclude gven sgnfcance level.. Sources df SS MS F Shpment 83.3 Error otal 36. [Gven: F(., 5, 3) = 3.7, F(., 5, 5) = 3.85, F(., 4, 5) = 4.77 ] (c). Estmate the varance component(s).. (Stat 48) In lnear regresson analyss, leave-one-out cross valdaton (LOOCV) procedure s often used to estmate the predcton accuracy of the ftted regresson model. Let ( X, Y) = (, y ); =,..., n be a dataset wth p = (,,..., p) and y R ( X( ), Y ) ( ) y( ) = ( X( ) X( ) ) X ( ) Y( ) R + the -th row. A set of predcton values estmate s defned as, and let be the modfed dataset wthout n LOOCV = n e( ) wth e( ) = y y( ). ( X X) ( X X) (a) Show that( X( ) X( ) ) ( X X) = + ( X X) =. are obtaned and the LOOCV A zz A Hnt: ( A zz ) = A + assumng that A z z s nvertble. z A z e (b) Show that e( ) = ; =,..., n, where H = X( X X) X, h s the -th dagonal element of H h and e= ( e,..., e ) = ( I H) Y. n 5

Statstcs 4&48 MS Eam (Junhu Wang) Sprng Semester 9. Let U and V be two ndependent, standard normal random varables. (a) Fnd the jont dstrbuton of U + V and U V. (b) Fnd the dstrbuton of U + V ( U V). [Soluton] (a) Snce U N, I V, we have U + V U = N, I N,. U V V + ( ) ( ) (b) From (a), U V N,, U V N,, and thus ( ) (, U + V N ), and ( ) ( U V χ ). From (a), U + V and U V are ndependent, whch mples that ( ) U + V and ( ) U V are ndependent as well. herefore, we have U + V ( U V) based on the defnton of t dstrbuton. = ( U + V ) t() ( U V)

Stat 4, Estmaton problem. Sprng 9 Let X ; : : : ; X n be ndependent random varables wth X dstrbuted as N(a, ), for =,...,n, where a ; ::; a n are known (non-random) real numbers at least one of whch s non-zero and s known. () Derve the mamum lkelhood estmator (mle) of. () Is the mle unbased? Justfy your answer. () Fnd the Rao-Cramer lower bound for unbased estmators of. (v) Is the mle of e cent (mnmum varance unbased)? Justfy your answer. Soluton: () he lkelhood s L() = () n= n e ( a) he mle b mnmzes h () = has the soluton h() = n = ( a ) : b = a a : Also, h ( ) b > : Hence b s the mle of. () E b = for all ; hence b s unbased: = aa a () he nformaton n the sample X ; :::; X n s @ ln L() I () = E @ = a : Hence the Rao-Cramer lower bound for the varance of an unbased estmator of s (v) V b unbased. a : = ; the Rao-Cramer lower bound. Hence b s mnmum varance a

[Stat 4, chap. 8, 9] (Je Yang) Consder one random varable X that has a bnomal dstrbuton wth n = 4 and p = θ. [] Fnd the most powerful test of sze α / 6 for H : θ =.5 vs. H : θ =.75 [] Calculate the power of your test. [3] Justfy that your test s the most powerful one of sze /6. [Soluton] [] Snce X follows Bnomal (4,θ ), the lkelhood 4 4 L( θ ; ) = f ( ; θ ) = θ ( θ ), =,,,3,4 herefore, 4 4 3 L(.5; ) 6 L (.5; ) =, L(.75; ) =, = 4 4 4 L(.75; ) 3 3 4 L (.5; ) / L(.75; ) 6 6/3 6/9 6/7 6/8 Pr( X = H ) /6 4/6 6/6 4/6 /6 Pr( X = H ) /56 /56 54/56 8/56 8/56 By Neyman-Pearson theorem, the test wth rejecton regon C : = : L(.5; ) / L(.75; ) k { } s the most powerful one of sze Pr( X C H ). Let k = 6 / 8, then C = { : = 4} leads to the most powerful test of sze / 6. [] he power s Pr( X = 4 H ) = 8/ 56. [3] he test s guaranteed by the Neyman-Pearson theorem to be the most powerful test. Alternatve way to justfy that: Another test wth rejecton regon{ : = } s of sze /6 too. he correspondng power s only /56.

Soluton Statstcs 43-MS Eam Sprng Semester 9 A clent has a fnte populaton of 6 unts and has funds to survey only 3 unts for the purpose of estmatng the mean and ts related standard devaton. here are two samplng plans to be consdered by two survey practtoners. Samplng Plan by Survey Practtoner : Is a smple random samplng plan of sze 3 wthout replacement, SRS (6, 3). Recall that ths s a unform samplng plan wth all samples of sze 3 n ts support. Samplng Plan by Survey Practtoner : Is a controlled samplng plan wth only 4 samples: A- he followng s samples are ecluded from beng surveyed, that s they have zero probablty of selecton. {,,3 }, {,4,5}, {,5,6}, {,3,5}, {,4,6}, {3,4,6}. B- he followng s samples are assgned probablty of / of selecton each: {,,5}, {,3,5}, {,4,6}, {,3,4}, {,3,6}, {4,5,6}. C- he remanng 8 samples are assgned probablty of / of selecton each. Remark: Note that the total probablty over the chosen 4 samples s and the survey s not a unform samplng plan snce s of the samples are assgned twce as much probablty as the remanng 8 samples. Suppose both practtoners mplemented ther survey plans and the sample {,, 5} s selected by both of them and the related survey data are: Y =, Y = 96, and Y 5 = 3. - What are the H estmators and ther values of the populaton mean under both samplng plans? Answer: We observe that the frst ncluson probabltes for both samplng desgn s constant (3/6) for all 6 unts. hus, the H estmator of the populaton mean under both samplng plans s smply the sample mean: (+96+3)/3 = 3. - What are the standard devatons of your estmator under both samplng plans? Answer: We observe also that the second order ncluson probabltes for both samplng plans are constant 3(3-)/6(6-) = 5 for all 5 dstnct pars. hus, thus desgn unbased estmator of the varance of H s smply [(sample varance)/3] [- 3/6)]. Compute the sample varance of the three observatons and plug t n the epresson and take ts postve square root.

3- If you were the consultng statstcan, whch of the above two samplng plans would you recommend to your clent and why? Answer: If there s no need or reason to eclude those samples whch survey desgn has ecluded from the support of the samplng plan then the frst survey plan whch wth s SRS(3, 6) s preferred due to ts statstcal optmalty (see the samplng book by Hedayat and Snha). In addton the statstcan does not have to justfy the ecluson of those samples whch survey desgn has done. However, f those samples whch have receved zero probablty of selecton under survey are to be ecluded from the survey then clearly the second survey desgn should be recommended.

Solutons to OR Problems n Masters Eam Sprng 9.E.S. Raghavan Aprl 7, 9 Soluton to Problem 7: Stat 47 From 9 feet boards we want planks 7 feet long, planks 5 feet long and 3 planks 4 feet long only usng cuttng patterns 7 feet 5 feet usng patterns 3,,. 4 feet 3 Say we need,, 3 planks cut n respectve patterns. Solvng 3 = 3 and roundng to the net hghest nteger, we get = 6, = 5, 3 = 9. Snce the fractonal sum s tself > 7 we want to see whether we can manage wth 8 planks. o check ths we have to fnd strct mprovement and ths must correspond to a new cuttng pattern brought n and one of the above pattern elmnated. By revsed smple method we solve for ub = c B where u s the row vector and c B s the row vector [,, ]. We get u = [u, u, u 3 ] = [ 4, u 7 =, ] 7 7. We now look for a cuttng pattern a a a 3 3 Department of Mathematcs, Statstcs and Computer Scence, 85, South Morgan Street #57, Unversty of Illnos at Chcago, Chcago, IL 667-745 Emal: ter@uc.edu

such that 4 7 a + 7 a + 7 a 3 > 7a + 5a + 4a 3 9 hat s 4a + a + a 3 > 7 7a + 5a + 4a 3 9 he knapsack algorthm gves weghts for a, a, a 3 as 4 7, 5, 4 respectvely. hus choosng a =, a =, a 3 = gves the new cuttng pattern. Usng revsed smple method we determne whch pattern of the current has to be dropped out. o do ths we need to solve Bd = a where we solve 3 d d d 3 =. We get d = 4 7, d = 5 7, d 3 = 7. he new soluton clearly has to scrap the last cuttng pattern as d 3 alone s postve. hus the new cuttng patterns are 7 feet 5 feet 4 feet usng patterns 3, 3,. he new cuts to the nearest nteger not below the fractonal soluton s = 6, = 9, 3 = 3 ths meets the demands and we waste (9)(8) ()(7) ()(5) (3)(4) = 3. Prevously we wasted (9)(9) ()(7) ()(5) (3)(4) = 49 feet

Soluton to Problem 8, stat 47 he LP problem has as ts dual ma 9 + 4 + 7 3 such that + + 3 3 6 5 + 4 + 3 + + 3 5,, 3 unrestrcted mn 6y + y + 5y 3 such that y + 5y + y 3 = 9 y + 4y + y 3 = 4 3y + y + y 3 = 7 y, y, y 3 has feasble soluton y =, y =, y 3 = 4. Snce = 5 6, = 5, 3 = 7 6 s feasble for prmal and gves the value of the objectve functon as 44 and so does the dual soluton the two are optmal for the two problems by the dualty theorem. Soluton to Problem 9, Stat 473 Gven the payoff functon K(, y) = y y, y the epected payoff to player I usng whle player II chooses a y at random usng densty f(y), y s gven by K(, y)dy = y = [ y y y = ] + [ y y ( y)dy + ] [ (y )dy y y + y3 3 ] ( y + y )dy = + + [ + 3 ] = 6 By the symmetry of the payoff functon, the same unform strategy s optmal for player I gvng an epectaton 6 for all y. hus the value of the game s 6. 3

SA 48 -Sprng 9 (Jng Wang) A dary company would lke to nvestgate f the mlk s contamnated. In order to nvestgate a possble shpment (batch) effect, the company selects 5 shpments at random. After processng each batch, 6 cartons of mlk are selected at random and are stored for several days. hen the square root of the bactera counts are recorded and denoted by Y j, where =,..., 5 and j =,..., 6. (a). If the shpment effect s denoted as τ, wrte down ANOVA model and specfy dstrbuton of random components n the model. Please state the hypotheses. Soluton: he shpment (batch) effect s a random effect. Hence a random-effect ANOVA model can be wrtten as Y j = µ + τ + ε j, where =,..., 5, j =,..., 6. In addton, τ and ε j are ndependently dstrbuted wth respectvely normal dstrbutons, τ N (, σ τ ) and εj N (, σ ) for any and j. o nvestgate f the batch effect s sgnfcant s equvalent to test H : σ τ = aganst H : σ τ >. (b). Complete the followng table and conclude based on. sgnfcance level. Soluton: Sources df SS M S F Shpment 4 83..8 9. Error 5 557..3 otal 9 36. Snce the observed statstc F o = 9. > F (., 4, 5) = 4.77, the test s sgnfcant at level α =.,.e. there s consderable varaton among the shpments. (c). Estmate the varance component(s). Soluton: he error varance can be estmated by the MSE drectly, ˆσ = MS Error =.3. he varance component for the random effect s estmated as follows (sample sze for each batch s the same) ˆσ τ = n (MS reatment MS Eroor ) = (.8.3) = 9.75. 6

. In lnear regresson analyss, leave-one-out cross valdaton (LOOCV) procedure s often used to estmate the predcton accuracy of the ftted regresson model. Let ( X, Y) = (, y ); =,..., n be a dataset wth R + ( X, Y ) ( ) ( ) p = (,,..., p) and y R, and let be the modfed dataset wthout the -th row. A set of predcton values y( ) = ( X( ) X( ) ) X Y ( ) ( ) are obtaned and the LOOCV estmate s defned as n n ( ) ( ). = = LOOCV = n e = n ( y y ) ( X X) ( X X) (a) Show that( X( ) X( ) ) = ( X X) + ( X X). A zz A Hnt: ( A zz ) = A + assumng that z A z A z z s nvertble. e (b) Show that e( ) = ; =,..., n, where H = X( X X) X, h s the -th dagonal h element of H and e= ( e,..., e ) = ( I H) Y. [Soluton] n (a) he equalty follows mmedately after the fact that X ( ) X( ) = X X formula n the Hnt. and the (b) Note that X ( ) Y( ) = X Y y and h = ( X X ) ( ) ( ), we thus have ( ) = ( ) = ( ( ) ( ) ) ( ) ( ) e y y y X X X Y = y X X X Y y ( ) ( ) ( X X) ( X X) = y ( X X) + ( X Y y) h = y X X X Y X X ( X X) y + ( X X) ( X X) X Y ( ) ( ) h h h = y β hy + β y h h e = y β =. h h h where ( β = X X) X Y. ( X X) h y