ρ = Chapter 2 Review of Statistics Summarize from Heij Chapter Descriptive Statistics

Similar documents
MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Parameter, Statistic and Random Samples

Topic 9: Sampling Distributions of Estimators

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Random Variables, Sampling and Estimation

Asymptotic Results for the Linear Regression Model

Lecture 7: Properties of Random Samples

Properties and Hypothesis Testing

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Probability and statistics: basic terms

TAMS24: Notations and Formulas

Topic 9: Sampling Distributions of Estimators

Chapter 6 Principles of Data Reduction

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Topic 9: Sampling Distributions of Estimators

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Mathematical Statistics - MS

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Parameter, Statistic and Random Samples

Lecture 2: Monte Carlo Simulation

Estimation for Complete Data

of the matrix is =-85, so it is not positive definite. Thus, the first

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

32 estimating the cumulative distribution function

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Joint Probability Distributions and Random Samples. Jointly Distributed Random Variables. Chapter { }

AMS570 Lecture Notes #2


Chapter 6 Sampling Distributions

[ ] ( ) ( ) [ ] ( ) 1 [ ] [ ] Sums of Random Variables Y = a 1 X 1 + a 2 X 2 + +a n X n The expected value of Y is:

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Stat 421-SP2012 Interval Estimation Section

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

IIT JAM Mathematical Statistics (MS) 2006 SECTION A

Introductory statistics

Lecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett

Bayesian Methods: Introduction to Multi-parameter Models

STAT Homework 1 - Solutions

Unbiased Estimation. February 7-12, 2008

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Binomial Distribution

Statistical Inference Based on Extremum Estimators

MA Advanced Econometrics: Properties of Least Squares Estimators

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

3/3/2014. CDS M Phil Econometrics. Types of Relationships. Types of Relationships. Types of Relationships. Vijayamohanan Pillai N.

Chapter 2 Descriptive Statistics

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Lecture 19: Convergence

Efficient GMM LECTURE 12 GMM II

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

7.1 Convergence of sequences of random variables

Exponential Families and Bayesian Inference

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Solutions: Homework 3

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

STATISTICAL INFERENCE

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

MATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED

Basis for simulation techniques

Questions and Answers on Maximum Likelihood

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Statisticians use the word population to refer the total number of (potential) observations under consideration

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

November 2002 Course 4 solutions

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

2. The volume of the solid of revolution generated by revolving the area bounded by the

1 Inferential Methods for Correlation and Regression Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

Logit regression Logit regression

SDS 321: Introduction to Probability and Statistics

Kernel density estimator

4. Basic probability theory

This section is optional.

Frequentist Inference

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

1.010 Uncertainty in Engineering Fall 2008

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Distribution of Random Samples & Limit theorems

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Statistical Theory MT 2008 Problems 1: Solution sketches

Lecture 18: Sampling distributions

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

A statistical method to determine sample size to estimate characteristic value of soil parameters

Chapter 2 The Monte Carlo Method

An Introduction to Asymptotic Theory

HOMEWORK I: PREREQUISITES FROM MATH 727

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Maximum Likelihood Estimation

Lecture 33: Bootstrap

Lesson 11: Simple Linear Regression

Transcription:

Chapter Review of Statistics Summarize from Heij Chapter. Descriptive Statistics Presetatio of Statistics: there are may methods to preset iformatio i data set such as table, graph, ad picture. The importat iformatio is ceter of data, variatio, ad distributio. The detail of ceter of data ca be preseted by average, mode, ad media. The variatio ca be preseted by rage, stadard deviatio (SD), coefficiet of variatio (CV), iter-quartile rage, mea absolute deviatio (MAD). The distributio of data ca be preseted by histogram ad percetile. Sometime, we ca use Box plot as visualizatio for presetig the distributio of data. I case of may variables, we may iterest the relatioship betwee two series, so the correlatio coefficiet ad covariace are measuremets. The followig list is presetig method. a) Summary of Data Statistics Formula Average Mea xi x = Stadard Deviatio ( xi x) S = Variace ( xi x) S = Rage Rage = X Max X Mi Mea absolute deviatio xi x MAD = Coefficiet of Variatio SD CV = x Covariace ( xi x)( yi y) Covxy = Correlatio Coefficiet Covxy ρ = SS x y Use the CH_XM0STU.dta, See Appedix B of [Heij page 749] for detail.

b) Table ad Cross Table c) Histogram plot d) Graph Example use the CH_XM0STU.dta for presetig the descriptive statistics A. Summary of data. use "D:\Folder Name\ch_xm0stu.dta", clear a) to fid the summary of data. su fgpa satm satv fem sata Variable Obs Mea Std. Dev. Mi Max -------------+-------------------------------------------------------- fgpa 609.79796.460375.5 3.97 satm 609 6.4844.595765 4 7.9 satv 609 5.56486.673975 3. 7.6 fem 609.387505.4875846 0 sata 609 5.90665.50939 4 7.6 b) to fid the summary of data i detail. su fgpa, detail (This will show the detail of that data.) fgpa ------------------------------------------------------------- Percetiles Smallest %.934.5 5%.04.7 0%.96.805 Obs 609 5%.485.888 Sum of Wgt. 609 50%.773 Mea.79796 Largest Std. Dev..460375 75% 3.6 3.87 90% 3.439 3.88 Variace.886 95% 3.65 3.948 Skewess.67886 99% 3.803 3.97 Kurtosis.505 b) to fid correlatio matrix. corr fgpa satm satv sata (obs=609) fgpa satm satv sata -------------+------------------------------------ fgpa.0000 satm 0.950.0000 satv 0.09 0.878.0000 sata 0.749 0.7748 0.884.0000

B. Histogram. histogram fgpa (b4, start=.5, width=.095833). cdf fgpa (This commad is dowloadable, search cdf) PDF CDF C. Box Plot. graph box fgpa satm satv sata Note that, before you start aalyzig data, please carefully check the data. I some case, data have some error such as missig value, overvalue ad double coutig. Make sure that your data is very clea. 3

. Radom Variable.. Sigle Radom Variable Radom variable (RV) is a variable that ca take differet outcome depedig upo state of ature which is defied by probability. RV ca be classified ito two groups: discrete RV ad cotiuous RV. A fuctio that presets the probability correspodig the outcome is calls probability mass fuctio (i case of discrete fuctio) ad probability desity fuctio (i case of cotiuous fuctio). We ca calculate mea ad variace of PMF ad CDF by followig summary table. Discrete RV Probability f () v = P( V = v) Mea (Expected Value) f() v = Variace σ = E[( y μ) ] Cotiuous RV Pa ( V b) = f( vdv ) μ = ν ipi μ = vf () v dv = p ( v μ) Cumulative distributio Fv () = PV ( < v).. Multivariate Distributio or Joit RV i i σ = ( v μ) f( v) dv v Fv () = f() vdv There are more tha variable havig the relatioship (joit), so the followig table show importat otatios. Give V = { v, v,...} ad W = { w, w,...} CDF Fvw (, ) = PV ( vw, w ) = p {( i, j), vi v, wj w} ij Joit Desity Fuctio b b P( a V b, a W b) = f ( v, w) dwdv Covariace Cov = E[( v μ )( w μ )] = ( v μ )( w μ ) f( v, w) dvdw Margial Distributio a a vw v w v w f () v = f(, v w) dw Coditioal Distributio f (, vw) f( v W = w) = f( v w) = f ( w) Coditioal Expectatio E[ V W = w] = E[ V w] = vf( v w) dv Idepedece f (, vw) = f() v f( w) b a 4

..3 Probability Distributio There are may importat distributios that we may use i the ear future. We should kow the probability distributio, ad the we ca calculate mea ad variace by usig formula i topic.. ad... Distributio Fuctioal Form Biomial Distributio v v Py [ = v] = p( p) v Normal Distributio ( v μ ) σ f() v = e, where < v < σ π Parameter μ, σ, N( μ, σ ) Multivariate Normal Distributio ( ) ' v μ ( v μ ) f() v = e ( π ) (det( )) Parameter μ,, N ( μ, ) Chi-Squared Distributio Give YY,,..., Y are set of idepedet stadard ormal, chi-squared distributio J J ( Y j μ) ξ = Yj = ~ χ j= j= σ With J degree of freedom. t-distributio X Give X ~ N(0,) ad ξ ~ χ J, so t =. ξ J F-distributio Give ξ χ ad ξ χ, so f ξ J = ξ J ~ J ~ F( J, J ) ~ J with degree of freedom (umerator) ad (deomiator). J J For ormal distributio, there is some property about addig ad multiplyig of distributio by followig properties. Give y ~ N( μ, σ ) a) ( ay + b)~ N( aμ + b, a σ ) y μ b) ~ N (0,) σ v c) We ca call b) as stadard ormal: φ() v = e π For multivariate ormal distributio, there is some property about addig ad multiplyig of distributio by followig properties. Give y ~ N( μ, σ ) ' a) ( Ay + b)~ N( Aμ + b, A A ) b) y y = v ~ N( μ + ( v μ ), ) 5

.3 Estimatio.3. Notatio Give fθ ( y, y,..., y ) is a joit probability of outcome y i of radom variable from observatios ( i =,,..., ). We call θ as parameters ad set of distributio as { f θ : θ Θ} model. The Θ is a symbol for possible value of ukow parameters. We ca oly estimate the umeric value of θ with some estimatio method, so we ca get the estimated value of parameter which is estimator ( ˆ θ ). Statistic is a give fuctio g( y, y,..., y ) that is umerical expressio ca be evaluated from observed aloe. Estimator is statistic that is used to guess about ukow parameter. So, the estimator may or may ot close to the true parameter, sometime we do t kow that. We have to use a appropriate model ad estimatio method to make a good guess..3. Estimatio method a) The method of momet Suppose that θ cotais k ukow parameters, the distributio is defied implyig for defiig of populatio momet i term of θ, If k are selected (as momet), the θ ca be solved from k equatio. Note that, estimators may be differet for differet choices of momet. Example From data CH_XM0STU.dta, FGPA mea y =.793 s = 0.460, so variace s = 0.6, skewess 0.68, kurtosis.5., ad SD ( ) s The First momet is y =.793. The Secod momet m = = 0., If FGPA is assumed ormally distributed, first ad secod momet equal μ ad σ, the momet estimators become ˆ μ =.793 ad ˆ σ = 0.. If we use the forth 4 momet (which is 3σ ) ad sample kurtosis equals sample forth momet divided by 4 4 4 s, so m4 = Ks = 0., the estimated of m4 = 3σ idicated that which is differet from the estimatio by secod momet. b) Lest Squares Example Give RV ( y, y,..., y) of distributio with ukow parameter mea μ ad variace σ. Let ε i = y i μ, the error will follow that ( ε, ε,..., ε) are iid. (idetically ad idepedetly distributed) with mea zero ad variaceσ (You ca prove this statemet by use property i.) or εi ~ IID (0, σ ). The model will be y = μ + ε ad εi ~ IID (0, σ ). i i 6

Least square estimator estimate value of μ that miimize the vale of followig fuctio S( μ) = ( yi μ). We ca express as ˆ μ = ArgMi ( yi μ). The μ estimator ˆ θ is a solutio of this miimizatio problem that ca be fid from first order coditio of least square fuctio which is ( yi μ) = 0 ad the μ = y. c) Maximum Likelihood From { f θ : θ Θ} as a joit probability of ( y, y,..., y ), for every value of θ, the distributio gives a certai value of probability fθ ( y, y,..., y ) for give observatios. The likelihood fuctio is defied by L( θ ) = fθ ( y, y,..., y ) where θ Θ. The objective is to fid the estimator ˆ θ that retur the maximum value of likelihood fuctio, i.e. ˆ θ = ArgMax f ( y,,..., ) θ y y θ For makig simple process, we ca trasform the likelihood fuctio with fuctio h(.) which is ivertible trasformatio, the model expressig as set of distributio { f ψ : ψ Ψ} where f } ψ = f h ad ψ = h ( Θ ). Suppose ˆ θ ad ψ ˆ are the maximum ( ψ ) likelihood estimators of θ ad ψ respectively, the ˆ θ = h ( ψˆ ). Sometime, we use the logarithm as a trasformatio fuctio of likelihood fuctio, so we ca call log-likelihood fuctio. Example Give yi ~ NID ( μ, σ ) with observatios ad ukow parameters ( ) y i μ σ θ = ( μσ, ). The likelihood fuctio is L( μσ, ) = ( e ). To make σ π simplicity, we ca take logarithm to this likelihood fuctio log( L( μσ, )), so we ca maximize log( L( μ, σ )) = log( π) log( σ ) ( y ) i μ. The first order σ i = log( L) log( L) coditio = ( y ) i μ =0ad = + ( ) 4 i μ σ i = y μ =0. σ σ σ i = ( ) s Thus, ˆ μml = yi = y, ˆ σml = ( yi μ) =. I this case, we assume that RV has homoskedasticity property that implies costat variace. 7

To isure the maximum coditio, we have to check the Hessia Matrix which is log( L) log( L) ( y ) 4 i μ μ μ σ σ σ H ( θ ) = = beig log( L) log( L) ( y ) ( ) 4 i μ y 4 6 i μ σ μ ( σ ) σ σ σ egative defiite whe evaluated at ˆML μ ad ˆ σ ML..3.3 Statistical Properties The importat properties for evaluatig estimators are cosistecy, bias, ad efficiecy. The first termiology is data geeratig process (DGP) which creates ( y, y,..., y) with specific distributio f θ where θ 0 0 Θ. We estimate θ Θad get ˆ θ of RV ( y, y,..., y ), so ˆ θ itself with this DGP distributio also depeds o θ 0. The estimator would be perfect if P[ ˆ θ = θ0] =. We ca measure the deviatio of ˆ θ from θ 0 by use Mea square error (MSE) ad Bias. The MSE is defied by MSE( ˆ θ ) = E[( ˆ θ θ ˆ ˆ 0) ] = Var( θ) + ( E[ θ] θ0) ad Bias is defied as E[ ˆ θ ] θ0. So the ubiased estimator is the estimator ˆ θ such that E[ ˆ θ] θ0 = 0or E[ ˆ θ ] = θ0. We ca check the efficiecy of estimator by use the variace as measuremet. With the same class of liear ubiased estimator, we try to miimize variace. The Cramer- Rao Lower boud expresses that for every ubiased estimator, the ˆ θ holds ˆ dlog( L( θ)) d log( L( θ)) Var( θ ) E = E. dθ dθ We ca use the matrix otatio i the case of havig more tha oe parameter by followig list. ' ' Var( ˆ θ) = E ( ˆ θ E[ ˆ θ])( ˆ θ E[ ˆ θ]) E ( ˆ θ θ ˆ = 0)( θ θ0) ' dlog( L( θ)) dlog( L( θ)) d log( L( θ)) I0 = E = E ' dθ dθ θ θ (Fisher iformatio) Assigmet: Give 3 with observatios ad ukow parameters yi ~ NID ( μ, σ ) = ( μσ, ). Ivestigate a) ubiasedess of MLE, b) variace ad efficiecy of MLE θ c) Simulated sample distributio with 0 observatio 000 times, ad compare two estimators media for μ ad sample variace s for σ. (See sectio.5) 3 I this case, we assume that RV has homoskedasticity property that implies costat variace. 8

.3.4 Asymptotic Property I geeral, we do t kow the exact distributio of data. There are at least two solutios: DGP ad Asymptotic properties. However, the asymptotic property bases o the large sample size that is a disadvatage of usig such this method. The properties are followig list. A. Cosistecy Let θ be a parameter ad ˆ θ be a estimator of θ o sample size. With the assumptio that data are geerated by parameter θ 0, the cosistet estimator will be cosistet if it coverges i probability toθ 0, i.e. lim P ˆ θ θ0 < δ = for δ > 0 or plim ˆ θ = θ0. I additio, sufficiet coditios for cosistecy are lim E ˆ θ = θ0 ad limvar ˆ θ = 0. There are some rules for calculatio of probability limit. Give plim y = c ad plim z = c ad c 0, where y ad z are two sequeces of RVs. The, a) plim( y + z) = c+ c b) p lim( yz ) = cc y c c) p lim( ) = z c d) plim( g( y)) = g( c) where g(.) is icreasig fuctio These properties are also compatible with the matrix sequeces. B. Law of Large Number (LLN) If data of RV from populatio, sample momet cosists of populatio momet, if yi ~ IID, i =,,...,, with fiite populatio mea E[ y i ] = μ, so plim yi μ = r r ad plim ( yi y) μ =, where r is give populatio momet. C. Cetral Limit Theorem (CLT) We ca defie coverge i distributio for RV y with cumulative distributio d F(.), lim F ( v) = F( v) for all v ad cotiuous F(.), that is y, i.e. asymptotic distributio. y 9

The mai result of CLT is Asymptotically ormally distributed which is y μ d z = z ~ N(0,). This is called CLT. σ Suppose y i is radom sample from p-dimesioal distributio with vector mea d μ ad covariace matrix Σ, yi μ N(0, ) Σ. Suppose A is a sequece of p q matrix of RV ad y is a sequece of p vector of RV, if d plim( A ) = Aad y N(0, Σ ), the A y N A A d ' (0, Σ )..4 Hypothesis Testig There are several importat hypothesis testig such as testig mea, variace. I list oly frequet testig. You have to review about p-value, how to reject the ull hypothesis, ad critical regio. Testig Hypothesis Statistics Mea H0: μ = μ Kow populatio variace, 0 H: μ μ y μ 0 ~ N(0,) σ Ukow populatio variace y μ ~ t ( ) s Variace H 0: σ = σ 0 ( ) s ( yi y) = ~ χ ( ) H: σ > σ σ 0 σ0 Two H 0: σ = σ s variace ~ F (, ) H : σ σ s.5 Computatioal Radom Number Geeratig (Stata Commad) We ca make radom umber with specific distributio i Stata. The process is simple but powerful. A. Radom sigle umber from specific distributio. set seed 00. scalar u=ruiform() (This commad for uiform distributio (0,) oly). dis u (This commad for display the value i variable u which is scalar).6796649 0

B. Geerate Series of RV with specific distributio. quietly set obs 00. set seed 00. ge x = ruiform(). su x Variable Obs Mea Std. Dev. Mi Max -------------+-------------------------------------------------------- x 00.53984.84797.00995.9983786 From Normal Distributio. quietly set obs 00. set seed 00. ge uiform=ruiform(). ge stormal=rormal() (stadard ormal). ge ormal5_=rormal(,3) (ormal at mea = ad s = 3). tabstat uiform stormal ormal5_, stat(mea sd skew kurt mi max) col(stat) variable mea sd skewess kurtosis mi max -------------+------------------------------------------------------------ uiform.53984.84797 -.76387.90469.00995.9983786 stormal.050938.060347 -.837998.468348 -.403888.53584 ormal5_.956556.8978.070708.9353-5.60774 8.575759 -------------------------------------------------------------------------- /*Show histogram with ormal desity lie*/. histogram stormal, ormal (b0, start=-.4038885, width=.493547). histogram ormal5_, ormal (b0, start=-5.607736, width=.483033)

From Other Distributios. quietly set obs 00. set seed 00. ge xt = rt(0) (From t- distributio with Df = 0). ge xchi = rchi(0) (From χ distributio with Df = 0). ge cf = rchi(0)/0 (Numerator of F(0,5)). ge cfd = rchi(0)/5 (Deomiator of F(0,5)). ge xf = cf/cfd (From F distributio with F(0,5)). su xt xchi xf Variable Obs Mea Std. Dev. Mi Max -------------+-------------------------------------------------------- xt 00.0470486.9664 -.8440.600535 xchi 00 9.39344 4.003.8358.3339 xf 00.735886.675786.47364 4.659898 Draw histogram with Kerel desity plot. twoway(histogram xchi) (kdesity xchi), title("chi Square Df = 0"). graph save "D:\Folder Name\graph.gph", replace (file D:\Folder Name\graph.gph saved). twoway(histogram xf) (kdesity xf), title("f(0,5)"). graph save "D:\ Folder Name \graph.gph", replace (file D:\ Folder Name \graph.gph saved). graph combie "D:\ Folder Name 5\graph.gph" "D:\ Folder Name \graph.gph"

C. Simulatio with Stata C. First write the program. program TestModel, rclass. drop _all. quietly set obs 30 3. ge x = ruiform() 4. su x 5. retur scalar meaofsample = r(mea) 6. ed. set seed 00. TestModel Variable Obs Mea Std. Dev. Mi Max -------------+-------------------------------------------------------- x 30.5459987.803788.054637.9983786. retur list scalars: r(meaofsample) =.5459987563873 Note that: the expected value of Uiform(0,) is 0.5 ad variace is 0.0833 C. Secod, simulatio by commad simulate. Sytax: simulate [exp_list], reps(#) [optios] : commad. simulate xbar=r(meaofsample), seed(00) reps(0000) odots: TestModel commad: CTL_Test xbar: r(meaofsample). su xbar Variable Obs Mea Std. Dev. Mi Max -------------+-------------------------------------------------------- xbar 0000.4995835.0533809.3008736.699056. twoway (histogram xbar) (kdesity xbar) 3

.6 Homework ad Empirical Assigmet Assigmet. Show Mea ad Variace of the Beroulli Distributio are equal to p ad p(- p) respectively.. Show Mea ad Variace of the Biomial Distributio are equal to p ad p(-p) respectively. 3. (Empirical Simulatio) The DGP of t-distributio, t(3), whe variace is 3 ad mea is 0. We focus o the costructio of iterval estimate of the mea ad correspodig test. a) Geerate a sample of = 0 idepedetly drawig from t(3) distributio. let y be a sample mea ad s is stadard deviatio. Compute the iterval y± s, Reject the ull hypothesis of zero mea if ad oly if this iterval does ot iclude 0. b) Repeat the simulatio by ruig 0,000 times ad compute umber of times that the ull hypothesis is rejected. c) Chage the sample size to be = 00 ad =,000 istead of 0. d) Give the explaatio of your fidig. 4. Suppose that pairs of outcome ( xi, yi) of the variables x ad y have bee observed. Prove that the sample correlatio coefficiet is ivariat uder the * * trasformatio xi = ax i + bad yi = ayi + bfor all i =,,..., ad a > 0 ad a > 0 positive costat..7 Readig List [Heij Chapter ] [Verbeek Appedix B].8 Referece No additioal Referece 4