Cathy Walker March 5, 010 Part : Problem Set 1. What s the level of measurement for the followng varables? a) SAT scores b) Number of tests or quzzes n statstcal course c) Acres of land devoted to corn d) Number of break-ns n 004 by neghborhood e) Socal Securty Number f) Impresson of a certan place selected by recpents from a scale of 1 to 5 g) Name of your brthplace h) Year of brth. Consder the followng set of numbers that are the result of a random process or phenomena: 0,0,1,,,3,6,,,9,9,9,9,10,1,14,1,3,,34,47 a) Calculate and/or dentfy the mean, medan. mode, range, nter-quartle range, maxmum, outlers, and mnmum of these numbers. b) Draw a hstogram, box-plot, and stem-and-leaf plot for these numbers 3. The heght of men n the Unted States are dstrbuted N(5 10, 3 ). Your two brothers are both 6 1. What percentage of men are taller than your two brothers? 4. How s the Standard Devaton dfferent than the Mean Absolute Devaton? Defne both of these terms and calculate each of them for the set of numbers n problem #. 5. Defne the dea of a sample space n terms of rollng two sx-sded dce. Defne the elementary outcomes for rollng two sx-sded dce. Draw a probablty mass functon for the random experment of rollng to dce and summng ther face up numbers. Determne the probablty of at least one of the de showng a 5 on ts face n a sngle roll of two dce. 6. Defne and provde examples of the followng terms: Populaton, Sample, Parameter, Random Selecton, and Samplng Frame.
Cathy Walker March 5, 010 7. You work for a recycle collecton company collectng curbsde recyclable materal from Resdental neghborhoods n Tacoma, WA. Your trucks can carry 7,000 pounds of recyclable materal before they must be drven back to the transfer staton to be unloaded. You randomly sample 40 of your resdental recycle customers and get the followng values for the weght of ther recyclables. 15 9 30 1 1 19 1 40 3 10 9 14 19 70 63 55 40 1 63 5 47 50 1 15 13 11 7 11 9 1 9 4 6 54 4 6 64 50 1 a) Estmate the mean and the 95% Confdence Interval for the mean amount of recyclables (n lbs) your average customer produces every two weeks. Interpret your results. b) How many resdental recycle customers should you schedule for each truck f you wanted to be 99% confdent that any gven truck would not exceed ts weght lmt on ts assgned route. 3
y Cathy Walker March 5, 010. Smple Lnear Regresson (Ordnary Least Squares) a) Gven the date below, plot the ponts to create a scatterplot, complete the tables wth the values below, and calculate the values ndcated. X Y 1 3 6 7 6 1 3 4 6 3 7 5 9 Fll n the followng table: x x y 0 x y x x y y x x y y Calculate the regresson parameters: Slope b: Intercept a: Usng your regresson parameters, fll n ths table and calculate the values for: Yˆ y ŷ yˆ y SSE: SSR: R : 4
Cathy Walker March 5, 010 9. Descrptve Spatal Statstcs Consder the followng pont locatons n the scatterplot below. (Note: The coordnate locatons of these ponts are provded n the table below along wth each ponts Z value) 1 10 6 4 0 0 4 6 10 1 a) What s the mean center of these ponts? b) What s the weghted mean center of these ponts? (Usng the Z values provded above as the weght) X Y Z 3 13 4 10 5 3 1 5 4 6 4 1 1 4 6 4 1 5 19 7 10 1 9 4 15 9 1 5 17 4 3 1 5 9 14 3 4 3 1 15 10 9 4 4 1 7 5
Cathy Walker March 5, 010 10. Gven the smulated raster mage below wth only 5 known values, use the Inverse Dstance Weghted (IDW) methodology to fll n all the blanks. (Ths wll probably be easest usng Excel). 6 5 7 3 1 a) How good do you thnk your IDW estmates are compared to the real unknown values? How do you thnk he accuracy of your estmates vary as a functon of the dstance from the known values provded? 6
Cathy Walker March 5, 010 Problem Set Answer Key 1. What s the level of measurement for the followng varables? a) SAT scores INTERVAL b) Number of tests or quzzes n statstcal course RATIO c) Acres of land devoted to corn RATIO d) Number of break-ns n 004 by neghborhood ORDINAL e) Socal Securty Number INTERVAL f) Impresson of a certan place selected by recpents from a scale of 1 to 5 ORDINAL g) Name of your brthplace NOMINAL h) Year of brth INTERVAL. Consder the followng set of numbers that are the result of a random process or phenomena: 0,0,1,,,3,6,,,9,9,9,9,10,1,14,1,3,,34,47 a) Calculate and/or dentfy the mean, medan. mode, range, nter-quartle range, maxmum, outlers, and mnmum of these numbers. 1 Mean: (0 0 1 3 6 9 9 9 9 10 1 14 1 3 34 47) 5 1 n 1 1 Medan: 9 Mode: 9 Range: 0 to 47; 4 numbers Outlers: 47 Mnmum: 0 ; Maxmum: 47 Inter-quartle Range: Q 1 =3+/=.5 Q 3 =1+14/=16 IQR=Q 3 -Q 1 =16-.5=13.5 b) Draw a hstogram, box-plot, and stem-and-leaf plot for these numbers 7
Cathy Walker March 5, 010 s n 1 636 131. 0 3. The heght of men n the Unted States are dstrbuted N(5 10, 3 ). Your two brothers are both 6 1. What percentage of men are taller than your two brothers? X X 73 70 3 Z 1 s 3 3 Usng Appendx Table A n the book, the Z-score value of 1 corresponds to a value of 0.3413. Snce we want to know the percentage of men that are above the heght of the two brothers and the sde of the curve totals 0.5000, we fnd that 0.5000-0.3413=0.157 or approxmately 15.7% or 16% of men are taller than the two brothers. 4. How s the Standard Devaton dfferent than the Mean Absolute Devaton? Defne both of these terms and calculate each of them for the set of numbers n problem #. The standard devaton measures the spread of the numbers n a sample for the mean. The mean absolute devaton s the mean of the absolute devatons of a set of data about the data s mean. 0,0,1,,,3,6,,,9,9,9,9,10,1,14,1,3,,34,47 Standard Devaton (s): X X 1 1 1 1 3 1 6 1 1 1 9 1 9 1 9 1 9 1 10 1... 47 1 Mean Absolute Devaton (MD): 1 1 1 MD N N 1 x x 11 1 1 31 6 1 1.. 34 1 47 1 1 160 7.6 1
Cathy Walker March 5, 010 5. Defne the dea of a sample space n terms of rollng two sx-sded dce. Defne the elementary outcomes for rollng two sx-sded dce. Draw a probablty mass functon for the random experment of rollng to dce and summng ther face up numbers. Determne the probablty of at least one of the de showng a 5 on ts face n a sngle roll of two dce. The sample space s the set or collecton of elementary outcomes. For the rollng of two dce the elementary outcomes are as follows: Probablty Mass Functon: f ( y) P e : Y( e ) y Lookng at the possble two-dce combnatons, the probablty of rollng at least one 5 n a sngle roll of two dce s 10/36 or approxmately 7.7% y f(y) 1/36 3 /36 4 3/36 5 4/36 6 5/36 7 6/36 5/36 9 4/36 10 3/36 11 /36 1 1/36 9
Cathy Walker March 5, 010 6. Defne and provde examples of the followng terms: Populaton, Sample, Parameter, Random Selecton, and Samplng Frame. Populaton - the unverse of all ndvduals from whch your sample can be taken.. Example: the populaton of the U.S. or the populaton of the world Sample - a subset or porton of the ndvduals selected from the populaton used for detaled analyss. Example: a 1,000 randomly selected college ages students (1-5 years old), used to determne the drnkng habts of ths age group wthn the U.S. populaton. Parameter - the varable wth whch the sample s gong to measure. Example: drnkng habts. Random Selecton - the procedure of selectng ndvduals for a sample of n objects that are all equally lkely. Example: the selecton of college students for a survey usng randomly selected student ID numbers. Samplng Frame - the practcal or operatonal structure that contans the entre set of elements from whch the sample wll actually be drawn. Example: the entre lst of DU student ID numbers would be the samplng frame from the above example. 10
Cathy Walker March 5, 010 7. You work for a recycle collecton company collectng curbsde recyclable materal from Resdental neghborhoods n Tacoma, WA. Your trucks can carry 7,000 pounds of recyclable materal before they must be drven back to the transfer staton to be unloaded. You randomly sample 40 of your resdental recycle customers and get the followng values for the weght of ther recyclables. 15 9 30 1 1 19 1 40 3 10 9 14 19 70 63 55 40 1 63 5 47 50 1 15 13 11 7 11 9 1 9 4 6 54 4 6 64 50 1 a) Estmate the mean and the 95% Confdence Interval for the mean amount of recyclables (n lbs) your average customer produces every two weeks. Interpret y 15 9 47 1 9 14 50 9 30 19 1 4 1 70 15 6 1 63 13 54 19 55 11 4 1 40 6 40 1 7 64 3 63 11 50 10 5 9 1 Standard Dev. = Mean = 0.6405 33.675 o s X Z u N r 95% _ Confdence : r 0.6405 33.675 1.96 33.675 6.3964 40.07lbs e 40 s u 0.6405 33.675 1.96 33.675 6.3964 7.lbs l 40 t For 40.07 lbs of recyclables the recycle truck can servce approxmately 174.66 or 174 households on a pckup run wthout needng to go back to unload. For 7. lbs of recyclables the recycle truck can servce approxmately 56.59 or 56 households on a pckup run wthout needng to go back to unload. Gven the average pounds of recyclables, you can say wth 95% confdence that the recycle trucks can servce anywhere from 174 to 56 customers on a sngle recyclables pck-up route. b) How many resdental recycle customers should you schedule for each truck f you wanted to be 99% confdent that any gven truck would not exceed ts weght lmt on ts assgned route. s X Z N 99% _ Confdence : 0.6405 33.675.5 33.675.4196 4.0949lbs 40 0.6405 33.675.5 33.675.4196 5.551lbs 40 Gven the average pounds of recyclables, you can say wth 99% confdence that the recycle trucks can servce anywhere from 166 to 77 customers on a sngle recyclables pck-up route. 11
y Cathy Walker March 5, 010. Smple Lnear Regresson (Ordnary Least Squares) b) Gven the date below, plot the ponts to create a scatterplot, complete the tables wth the values below, and calculate the values ndcated. X Y 1 3 6 7 6 1 3 4 6 3 7 5 9 5 x 3.15 4 y 6 Fll n the followng table: x x y y x xy y x x y y -.15-4.5-4.51563 4-0.15-4 0.5 0.01565 16.75 1.75.6563 1-1.15 0 0 1.6563 0 -.15-3 6.375-4.51563 9 0.75 0 0 0.76565 0-0.15 1 0.15-0.01565 1 1.75 3 5.65 3.51563 9 11.5 4.716 40 (SST) Calculate the regresson parameters: Slope b:.3594_ x x y y 1 11.5 b.3594 Intercept a:_-1.3593_ 4.716 x x Yˆ y ŷ y y -3.1336 15.06 15.06-9.55009 133.405 133.405-19.100 61. 61. -6.36673 15.936 15.936-3.1336 3.339 3.339-1.7335 350.944 350.944-9.55009 73.905 73.905-15.916 60.47 60.47-79.541 376.56 (SSE) 0 x ˆ 376.56 (SSR) 1 a y b x (6) (3.15.3594) 1.3593 SSR 376.56 R 59.414 SST 40 SSE: 376.56_ SSR: 376.56 R : 59.414 1
X Y wc WC f f fy f Cathy Walker March 5, 010 9. Descrptve Spatal Statstcs Consder the followng pont locatons n the scatterplot below. (Note: The coordnate locatons of these ponts are provded n the table below along wth each ponts Z value) 1 10 6 4 0 0 4 6 10 1 a) What s the mean center of these ponts? X 4 5 1 6 1 4 7 9 9 4 5 3 310 4 97 X c 4.5 n 0 0 Y 310 3 5 4 4 5 10 4 3 9 4 1 9 4 7 105 Yc 5.5 n 0 0 The mean center of these ponts s (4.5, 5.5) b) What s the weghted mean center of these ponts? (Usng the Z values provded above as the weght) The weghted mean center of these ponts s (3.946, 3.9577) X Y Z 3 13 4 10 5 3 1 5 4 6 4 1 1 4 6 4 1 5 19 7 10 1 9 4 15 9 1 5 17 4 3 1 5 9 14 3 4 3 1 15 10 9 4 4 1 7 X ( 13) (4 ) (5 ) (1 4) (6 1) (1 6) (4 1) ( 19) (7 1) (9 15) (9 1) ( 5)... ( ) 106 3.946 13 4 1 6 1 19 1 15 1 5 17 1 14 15 1 60 (3*13) (10* ) (3*) (5*4) (4*1) (4*6) (*1) (5*19) (10*1) (4*15) (*1) (*5)... (7 *) 109 3.9577 60 60 13
Cathy Walker March 5, 010 10. Gven the smulated raster mage below wth only 5 known values, use the Inverse Dstance Weghted (IDW) methodology to fll n all the blanks. (Ths wll probably be easest usng Excel). Between Ponts Dstance X Y Z Pont # 3. 3.49 4.05 4.3 4.6 4.9 4.73 X,1 6.00000 1 3.6 3.53 3.9 4.6 4.5 4.6 6 4.5 X, 5.3095 3 5 5 3.99 4.15 4.39 4.53 4.66 4.74 4.7 4.53 X,3 5.00000 5 4 7 3 4.19 4.44 5 4.1 5.0 4.73 4.35 4.03 X,4 7.011 6 1 1 4 4.5 4.43 4.63 4.9 7 4.79 4.06 3 X,5 1.4141 7 7 6 5 4.1 4.31 4.43 4.61 4.7 4.7 3. 3.69 X,6 4.00000 4 3 6 4.13 4.16 4.16 4.07 3.74 3.9 3.43 3.6 d x x y X 4.04 4.01 3.9 3.69 3.06 1.9 3.43 1 y1 z o s 1 s 1 1 z d 1 d k k X Y OR (Rounded) 7.70 1.63633 k = 1 = 4.79066 3 3 4 4 5 5 5 Z 4 4 4 4 5 5 6 5 4 4 4 5 5 5 5 5 4 4 5 5 5 5 4 4 4 4 5 5 7 5 4 3 4 4 4 5 5 4 4 4 4 4 4 4 4 3 3 4 4 4 4 4 3 1 3 3 ) How good do you thnk your IDW estmates are compared to the real unknown values? How do you thnk he accuracy of your estmates vary as a functon of the dstance from the known values provded? Lookng at the estmates stuated around the known values n the table some of the Z estmates seem to be a lttle lower than I would expect. Ths s especally true n the case of pont (5, 4). Wth a Z value of 7 the estmated values around ths pont seem lower than I would expect gven ths relatvely hgh Z value. 14