UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Similar documents
STAT 511 FINAL EXAM NAME Spring 2001

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Lecture 6 More on Complete Randomized Block Design (RBD)

Statistics for Economics & Business

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

First Year Examination Department of Statistics, University of Florida

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Statistics Chapter 4

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

/ n ) are compared. The logic is: if the two

Statistics II Final Exam 26/6/18

F statistic = s2 1 s 2 ( F for Fisher )

Chapter 11: Simple Linear Regression and Correlation

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Economics 130. Lecture 4 Simple Linear Regression Continued

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

x = , so that calculated

Chapter 11: I = 2 samples independent samples paired samples Chapter 12: I 3 samples of equal size J one-way layout two-way layout

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

Chapter 12 Analysis of Covariance

Topic- 11 The Analysis of Variance

STATISTICS QUESTIONS. Step by Step Solutions.

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

January Examinations 2015

Comparison of Regression Lines

Basic Business Statistics, 10/e

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lecture 6: Introduction to Linear Regression

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

CS-433: Simulation and Modeling Modeling and Probability Review

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

Statistics for Business and Economics

x i1 =1 for all i (the constant ).

Chapter 13: Multiple Regression

Lecture 4 Hypothesis Testing

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Lecture 3: Probability Distributions

LECTURE 9 CANONICAL CORRELATION ANALYSIS

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

STAT 3008 Applied Regression Analysis

Negative Binomial Regression

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Uncertainty as the Overlap of Alternate Conditional Distributions

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

CHAPTER IV RESEARCH FINDING AND DISCUSSIONS

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Two-factor model. Statistical Models. Least Squares estimation in LM two-factor model. Rats

CHAPTER 8. Exercise Solutions

Statistics Spring MIT Department of Nuclear Engineering

a. (All your answers should be in the letter!

CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

Linear Approximation with Regularization and Moving Least Squares

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

ANOVA. The Observations y ij

Chapter 3 Describing Data Using Numerical Measures

Topic 7: Analysis of Variance

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

PROBABILITY PRIMER. Exercise Solutions

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Statistical tables are provided Two Hours UNIVERSITY OF MANCHESTER. Date: Wednesday 4 th June 2008 Time: 1400 to 1600

e i is a random error

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Chapter 9: Statistical Inference and the Relationship between Two Variables

Limited Dependent Variables

Statistical pattern recognition

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Topic 23 - Randomized Complete Block Designs (RCBD)

Some basic statistics and curve fitting techniques

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Professor Chris Murray. Midterm Exam

SIMPLE LINEAR REGRESSION

Joint Statistical Meetings - Biopharmaceutical Section

A be a probability space. A random vector

Statistics MINITAB - Lab 2

Modeling and Simulation NETW 707

Introduction to Regression

Transcription:

UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x '' ad sheet (both sdes may be used) Please show the your work clearly n the space provded. You may use the back of the pages f necessary but you must reman organzed. Last name: Frst name: Student number: Benchmarks of 5% sgnfcance and 95% confdence may be used when evaluatng results/conclusons. Assume multvarate normalty of data whenever necessary. PLEASE CHECK AND MAKE SURE THAT THERE ARE NO MISSING PAGES IN THIS BOOKLET. 4 5 6 7 8 9 0 Total

) [0] The students n a school take the three courses Math I, Stat I and Math II. Let X, X and X denote the scores of a student n these three courses (Math I, Stat I and Math II respectvely). We have the followng nformaton on the three varables: Mean of X ( µ X ) = 75, Standard devaton of X ( σ X ) = 8 Mean of X ( µ X ) = 70, Standard devaton of X ( σ X ) = 0 Mean of X ( µ X ) = 60, Standard devaton of X ( σ X ) = Correlaton of X and X ( ρ, ) = 0. Correlaton of X and X ( ρ, ) = 0.5 Correlaton of X and X ( ρ,) = 0.4 Assume that ( X, X, X ) has a multvarate normal dstrbuton wth the above parameters. X+ X Gven that a student has an average of 70 for Math I and Math II (.e. the condtonal probablty that ths student wll get an A grade for Stat I. (Note: In order to get an A, the student has to score 80 or above) = 70), fnd (Tme estmate: 0 mnutes) (Soluton on hard copy)

) [0] In a study on counterfet notes, the lengths ( x ) and wdths ( x ) of 00 counterfet notes were measured. The sample mean vector and the sample covarance matrx are gven below: 4.8 0.4 0.0 x 0.0 and S 0.0 0.065 Genune notes have a mean length of 5 and a mean wdth of 0 (n the same unts as measurements above). Test whether the mean dmensons (.e. length and wdth) of the counterfet notes are sgnfcantly dfferent from those of genune notes. Assume bvarate normalty of data.

) The nutrtonal content of 0 dets was analyzed n two labs (lab A and lab B). Each lab measured two varables V and V on each det. The objectve s to compare the measurements n the two labs. Let x and y denote the observaton vectors on det, from lab A and B respectvely and let d = x y. Some useful summary statstcs are gven below (n usual notaton):.0 0 4.55 9.86 0.50.67 d 0.80, W d = dd, W = 9.86 7.888 d 0.67 0.909 0.588 W d has egen pars ( λ, e ) =, where λ =.65, e 0.809 0.809 λ = 0.047, e 0.588 µ Assume that d ~ N( µ d, d) where µ d µ. a) [0] Test whether there s a sgnfcant dfference (on average) between the measurements n the two labs. b) [0] Test the hypothess H : µ = 0 aganst the alternatve µ 0, =, wth o µ > 0 for at least one =,. (Assume that any possble devatons of the mean components from the null hypothess need not be of equal magntude) Sol > db=c(.0, 0.80) > db [].0 0.80 > db=matrx(db) > db [,] [,].0 [,] 0.80 > wd=matrx(c(4.55,9.86,9.86,7.888), nrow=, ncol=, byrow=) > wd [,] [,] [,] 4.55 9.86 [,] 9.86 7.888 > solve(wd) [,] [,] [,] 0.5050-0.66648 [,] -0.66648 0.90867

> egen(solve(wd)) $values [].6985 0.04670746 $vectors [,] [,] [,] 0.588470 0.80868 [,] -0.80868 0.588470 > n=0 > f=n- > f [] 9 > p= > sd=wd-n*db%*%t(db) > sd [,] [,] [,].806.64978 [,].64978.4799 > T_sq=n*t(db)%*%solve(sd)%*%db > f0=((f-p+)/(f*p))*t_sq > f0 [,] [,].970 > fa=qf(0.95,p,f-p+) > fa [] 4.45897 > # Do not rejet H0 > p=matrx(c(0.588,0.809,-0.809,0.588), nrow=, ncol=, byrow=) > p [,] [,] [,] 0.588 0.809 [,] -0.809 0.588 > l=dag(c(.65,0.047)) > l [,] [,] [,].65 0.000 [,] 0.000 0.047 > l^0.5 [,] [,] [,].68 0.0000000

[,] 0.000000 0.67948 > a=p%*%l^0.5%*%t(p) > a [,] [,] [,] 0.54589-0.45687 [,] -0.45687 0.896067 > u=sqrt(0)*a%*%db > u [,] [,] 0.67590 [,] 0.66854 > ub_sq=t(u)%*%u > ub_sq [,] [,] 0.808 > #Read table B wth p= and n=0 > # Ths value s 0.40 and so reject H0.

4) [5] In an experment, three varetes (A, B and C) of rce are sown n 8 plots, where each varety of rce s assgned at random to sx plots. Two varables are measured after sx weeks, y, the heght of the plant, and y, the number of tllers per plant. Some useful summary statstcs (n usual notaton) are gven below: Mean vector y 58.7 5.67 Varety A B C y 50.8 5.50 y 54.8 5.8 Covarance S 7.7 0.67 0.67.07 S 6.7 0.0 0.0.90 S 4.97.0.0 0.57 6.78 4.00 SSTR = 4.00 0. Assumng that all the observatons are ndependently normally dstrbuted wth common covarance matrx and the th varety has mean µ = ( µ,, µ, ), =,,, test the hypothess H : µ = µ = µ aganst alternatve A: at least one mean s dfferent, at the 5% level of sgnfcance. Gve a Bonferron type 95% confdence nterval for µ µ µ +µ. Assume that we are nterested n computng k = such confdence,,,, ntervals, but t s enough to present only ths one snce we don t have tme to compute the other two. (t 5,0.05/ 6 =.694 )

5) In a study on counterfet notes, the nvestgators collected data on four varables: left length, rght length, bottom length and top length. The objectve of the study was to nvestgate the possblty of classfyng the notes as real or fake. Some SAS outputs (usng PROC DISCRIM) s gven below: The DISCRIM Procedure Observatons 49 DF Total 48 Varables 4 DF Wthn Classes 47 Classes DF Between Classes Class Level Informaton Varable Pror type Name Frequency Weght Proporton Probablty fake fake 6 6.0000 0.40996 0.500000 real real 88 88.0000 0.590604 0.500000 Parwse Generalzed Squared Dstances Between Groups - D ( j) = (X - X )' COV (X - X ) j j Generalzed Squared Dstance to type From type fake real fake 0.0790 real.0790 0 Lnear Dscrmnant Functon Constant = -.5 X' COV X Coeffcent Vector = COV X j j j Lnear Dscrmnant Functon for type Varable fake real Constant -907-980 left 75.6560 75.76897 rght 694.64696 69.540 bottom -90.08-96.6407 top -9.9590-7.788

a) [5] Use ths nformaton to classfy a note wth left length 0., rght length 9.9, bottom length 9.9 and top length 0. (= ). x o D o x b) [5] Let denote the estmated Mahalanobs squared dstance between x and ( =, wth = fake and = real). Gve the value of D D.

c) [5] Estmate the probablty of msclassfyng an observaton from populaton (fake notes).

d) [5] The estmated coeffcents n the SAS output above were obtaned usng equal pror probabltes and equal costs of msclassfcaton. If we now want to set the pror probabltes to reflect the fact that only percent of the notes are fake (and 99 % real) and assume that the cost of msclassfyng a fake note as real s 0 tmes greater than the cost of msclassfyng a real note as fake. Classfy the observaton n part (a) (.e. x o ) nto one populaton usng ths addtonal nformaton on pror probabltes and msclassfcaton costs.

Mscellaneous 6) [] X and X are normal random varables both havng mean 0 and varance. State whether the followng statements are true or false. a) ( X, X has a bvarate normal dstrbuton. (True / False) (Crcle one) ) False (Not always, true f X and X are ndependent) b) For all a, where False (smlar to (a) above) x = ( X, X ) has a normal dstrbuton. (True / False) ax 0 7) ( X, X, X) ~ N( µ, ) where µ =[,, ], = 0 0.75 0.50 0.5 = 0.50.00 0.50 0.5 0.50 0.75 a) [7] State whether the followng statements are true or false. and ( X ) ( X ) ) + has a (False ) ( X ) ( X ) ) + has a (True) ) X ~ N (,) and X ~ N(,). (True / False) (True) χ dstrbuton wth degrees of freedom. (True / False) χ dstrbuton wth degrees of freedom. (True / False) v) X X =.~ N (, ). (True / False) (True ) v X + X + X and X X + X are ndependent. (True / False) Ans: F Reason Cov( X+ X + X, X X + X) = -. ( ) ( v) X + X X X + ) + 6 (True / False) has a χ dstrbuton wth degrees of freedom.

X+ X X 6 0 Ans True = ~ N X X X 0 v) The probablty densty at (,,0) s greater than that at (,,0). (True / False) sol: False Reason: > a=matrx(c(,,0), nrow=, ncol=, byrow=) > a [,] [,] [,] [,] 0 > b=matrx(c(,,0), nrow=, ncol=, byrow=) > b [,] [,] [,] [,] 0 > s= matrx(c(,,0,,,,0,,), nrow=, ncol=, byrow=) > s [,] [,] [,] [,] 0 [,] [,] 0 > m=matrx(c(,,), nrow=, ncol=, byrow=) > m [,] [,] [,] [,] > t(a-m)%*%solve(s)%*%(a-m) [,] [,] 9 > t(b-m)%*%solve(s)%*%(b-m) [,] [,] 6.75 (,,0) s closer to m = (,,) (Mahalanobs dstance =6.75) than (,,0) ( ths has Mahalanobs dstance 9) and therefore has hgher densty.

b) [] Fnd EQ ( ), where Q= X + X + X + XX 4XX.

8) [4] Let x, x,, x 0 be ndependently and dentcally dstrbuted N ( µ, ) (µ, known) random varables wth 0 C. 0 0 x = x, 0 = 0 S = ( x x)( x x) and 9 Gve the dstrbuton (wth the values of parameters) of the followng statstcs: = a) ( x µ ) ( x µ ) b) ( x µ ) S ( x µ ) c) ( x µ ) C C C C x µ ( ) ( ) d) ( x µ ) C CSC x µ ( ) C( ) 9) [] In a profle analyss of two normal populatons wth mean vectors (unknown) µ and µ, we test H: µ µ = γ and f H s accepted, then test H : γ = 0. State whether the followng statements are true or false. a) If we accept H at α = 0.05, then for any r p matrx C, the test of H : C( µ µ ) = 0 wll also be accepted at α = 0.05. (True / False) (False) b) If we accept both H and H at α = 0.05, then the test of H : µ = µ (wth a Hotellng s T test) wll also be accepted at α = 0.05. (True / False) (False) c) If H : µ = µ (wth a Hotellng s T test) s accepted at α = 0.05, then n a profle analyss, both and H wll also be accepted at α = 0.05. (True / False) H

(False) 0)[6] Eght patents were gven a certan drug and the change n ther blood pressure was measured. The 95% confdence regon for the mean change n blood pressure s gven by {( µ, µ ) ( µ + 0.75).707( µ + 0.75)( µ.5) +.68( µ.5).9}. State whether the followng statements are true or false. a) The p-value for the Hotellng s T test for greater than 0.05. (True / False) b) The p-value for the Hotellng s T test for greater than 0.05. (True / False) H : µ = (,) aganst H : µ (,) s 0 H 0 : µ = (,) aganst H : µ (,) s c) The null hypothess H 0 : µ = (0,) wll be rejected at (usng a Hotellng s T test) any α < 0.05. (True / False)