Lecture Notes 3 1. Types of ecoomc varables () Cotuous varable takes o a cotuum the sample space, such as all pots o a le or all real umbers Example: GDP, Polluto cocetrato, etc. () Dscrete varables fte umber of elemets or a ftely coutable umber, such as all postve tegers. Example: Number of workers, etc. () Categorcal data are grouped accordgly to some qualty or attrbute Example: Sex or type of automoble.. Revew of statstcs () Populato the total group set of elemets of terest. Sample a subset of the populato. We usually collect samples because t s too costly to sample the etre populato. Example College studets survey Kazakhsta Populato s all college studets Probablty the relatve frequecy or occurrece of a evet after repettve trals or expermets. Probablty les betwee 0 ad 1 All probabltes for all evets have to sum to 1 Example: 60% chace of ra today Imples a 40% of o ra, summg to oe or 100% 1
() Probablty dstrbuto fuctos (PDF) a fucto that assocates each value of a dscrete radom varable wth the probablty that ths value wll occur. Deoted as p(x) or f(x) Cumulatve probablty dstrbuto fucto (CDF) - tegral of a probablty fucto Deoted by a captal letter, such as P(x) or F(x). P x x f t dt If you sum over all probabltes, the t has to equal oe. P x f t dt 1 The ormal probablty dstrbuto s show below
() Descrptve statstcal measures descrbe a sample or populato. Measures of cetral tedeces Mea calculated by x x f (x ) where f(x) s the pdf ad x s the radom varable. Ths s also the expected value If every observato s equally lkely, the s the umber of observatos 3 1 x x where Meda the mddle pot or observato whe the data are ordered from smallest to largest. Mode the value, whch occurs most ofte a dstrbuto. The peak of a dstrbuto Use calculus to fd maxmum value Rage the dfferece betwee the largest value the sample (the maxmum) ad the smallest value (the mmum), R x max x m Varace s a measure of devato from the mea, deoted by Varace has a problem If the uts are $ s, the varace s $ The -1 s the sample varace Degrees of freedom (df) the amout of formato you have,.e. the umber of observatos Sce you estmated the varace, you lose oe pece of formato I regresso, you have k parameters df = k, because you estmated k stadard errors
1 ˆ ( x x) 1 where x deotes the mea of the sample Stadard devato the varablty or spread of the data aroud ts mea Has same uts of the varable (.e. mles, dollars, klometers, etc.). ˆ ˆ (v) The Normal Dstrbuto or Gaussa Dstrbuto the most commo dstrbuto Bell-shaped curve Regresso you do ot eed a dstrbuto to estmate parameters However, f are testg the statstcal sgfcace of a parameter, the you eed a dstrbuto. The top dstrbuto s symmetrc, thus the mea = meda = mode 4
The bottom dstrbutos are ot symmetrc, so mea meda mod e The Normal Dstrbuto We have a relatoshp betwee probabltes ad the mea ad stadard devato Wrte t as ~ Nx, x Statstcas have a trck to make all ormal dstrbutos stadard, whch s ~ N0,1 z Dstrbuto s below: 5
The trasformato s: z x x I the old days, people carred tables that had the probablty for partcular z values. Excel ca calculate ths easly =ormdst(z) Note ths returs the probablty from egatve fty to the z value (v) Stadard Error the varablty the mea whe takg repeated samples The true parameter s ukow, Each tme I take a dfferet sample from the populato, I get a dfferet estmate for the parameter Example Mea of frst sample s 3.5 Mea of secod sample s 4. Mea of thrd sample s.9 std. error Sce I caot keep takg samples, or ca I take observatos of the whole populato, I would lke to kow how my estmator vares. ˆ z ˆ Very smlar to the z-trasformato. I ca use ths relatoshp for hypothess testg. Each hypotheses has two parts Null - s the hypothess of terest, H 0 Alteratve - s the complemet of the ull, H A 6
A hypothess has to corporate all possble outcomes Example H 0 : = 4 H A : > 4 What about values below 4? Properly stated ull ad alteratve hypothess cover all alteratves. Example Two-tal test H 0 : = 0 H A : 0 Ths s very commo Automatcally computed for Lear Regresso Does t appear the parameter equals zero? The x varable has o mpact o the y Example Two-tal test H 0 : < 0 H A : 0 We have a problem, we do ot kow what the stadard devato s ad have to use a dfferet dstrbuto (v) t-dstrbuto a symmetrc bell-shaped dstrbuto The problem s we do ot kow the parameter, Further, we do ot kow the stadard devato too,.e. A lttle fatter ad shorter tha the stadard bell curve The shape depeds o degrees of freedom 7
Degrees of freedom s the amout of formato,.e. umber of observatos We had to estmate the stadard devato, so we subtract 1 Regresso each parameter estmate has a stadard error k parameters, thus, df = k As the degrees of freedom approach fty, the t- dstrbuto approaches the stadard ormal dstrbuto. ˆ The t-test s ˆ t ˆ 8 ˆ where hat s the varable of terest s the ull hypothess value ˆ s a approprate estmate of the stadard devato of x s the umber of observatos. Example Homework #1 Does the data support that =? We choose a level of sgfcace, Usually = 0.05 Two-tal test H 0 : = - H A : - A two-tal test. Now calculate the degrees of freedom 60 observatos ad estmated two parameters df= 58 Fd crtcal value for t value.
Remember t s a two-tal test, so put half alpha to each tal Be careful wth Excel: Use Excel =tv(0.05, 58) It returs t c =.00 The c s for crtcal value Now calculate the t-statstc From regresso output, calculate the stadard error Std. error = 0.0 Parameter estmate for b = -0.5 ˆ t ˆ 0.51 0.0 7.4 Reject the H 0 f t >.00 or t < -.00 Fal to reject f -.00 < t <.00 Reject the H 0 ad coclude that the parameter estmate does ot equals -. Let s do the most commo Two-tal test H 0 : = 0 H A : 0 ˆ t ˆ 0.51 0 0.0.5 Reject the H 0 ad coclude that the parameter estmate does ot equal zero 9
Selected Crtcal Values for the t-dstrbuto Level of Sgfcace α - see dagrams above Degrees of.10.05.05.01 Freedom 1 3.078 6.314 1.706 63.657 15 1.341 1.753.131.947 19 1.38 1.79.093.861 0 1.35 1.75.086.845 1 1.33 1.33.080.518 1.8 1.8 1.960.36 3. Aalyss of Varace (ANOVA) I terms of regressos, ANOVA s used to test hypothess may types of statstcal aalyss Sum of Squared Total (SST) s defed as: 10 SST y s the depedet varable the regresso The y y s the total varato for observato Sum of Squared Regresso (SSR) s defed as: Ths s the varato explaed by the regresso 1 ( y y). SSR (ŷ y). Sum of Squared Errors (SSE), whch was earler defed as: SSE 1 y yˆ. uˆ. 1 SSE s the amout of varato ot explaed by the regresso equato. Thus, SST = SSR + SSE, whch s proved the chapter 1
We ca use ths formato to calculate the R statstc: Show relatoshp: SST SSR SSE SSR SSE 1 SST SST SSR SSE R 1 SST SST R 1 SSE SST Problem the more parameters added to the regresso, the hgher the R. R = 1, f = k, the umber of parameters equal observatos Now we eed the degrees of freedom for each measure: Sum of Squared Regresso (SSR) df =k 1 Sum of Squared Errors (SSE) df = k Sum of Squared Total (SST) df = 1 We calculate the Mea Square (MS) Regresso (MS) = SSR / (k 1) Resdual (MS) =SSE / ( k) Total (MS) NA Addtoal formato Whe you have a varable wth a ormal dstrbuto If you add or subtract f from other varables wth a ormal dstrbuto, the t s stll ormally dstrbuted Calculatg a mea s a frst momet If you square a radom varable wth a ormal dstrbuto, the you get a ch-square dstrbuto wth degrees of freedom. The squares are varaces ad called the secod momet All the Mea Squares are dstrbuted as ch squares 11
F-dstrbuto ca test a whole group of hypothess or test a whole regresso model F- test ca test may other thgs The F-test s a rato of two ch-squares The F-test s a oe-taled test assocated wth the rght-had tal. Squarg makes all terms postve The F-dstrbuto ad test s as follows: H 0 : Regresso model does ot expla the data,.e. all the parameters estmates are zero H A : Regresso model does expla the model,.e. at least oe parameter estmate s ot zero Frst, we eed the crtcal value: a = 0.05, df 1 = 1, ad df =58 I Excel, =fv(0.05,1,58) F c = 4.00 Excel calculates the ANOVA ANOVA df SS MS F Sgfcace F Regresso 1 33.06087 33.06087 6.489695 0.01354 Resdual 58 95.473 5.094365 Total 59 38.534 1
SSR df SSE df 33.06 33.06 5.09 1 Calculate the F-value = 1 6. 50 95.47 58 The computed F exceeds the F c, so reject the H 0, ad coclude at least oe parameter s ot equal to zero. Example from homework #3. How may observatos? 10 How may parameters, k? 4 Degrees of freedom for error df = 10 4 = 6 Degrees of freedom for total df = 10 1 = 9 ANOVA df SS MS F Sgfcace F Regresso 3 5001.859635 1667.87 3.465 1.35E-06 Resdual 6 43.04036468 7.173394 Total 9 5044.9 13