Stat 342 Homework Fall 2014

Size: px
Start display at page:

Download "Stat 342 Homework Fall 2014"

Transcription

1 Stat 34 Homework Fall 014 Assigmet 1 Due 9/5/14 Sectio 1.1 of the Course Outlie 1. Cosider a probability model for the radom pair yx, with joit desity 1 y exp for 0 x 1 ad y 0 f y, x x x 0 otherwise a) Fid the margial distributio of x ad the coditioal desity of y x. b) Fid both the optimal squared error loss ad optimal absolute error loss predictors of y based o x. c) Evaluate the risk of the optimal squared error loss predictor.. Suppose that the radom pair yx, has a joit distributio o 0,1 0, specified as follows: P y that may be P y 1.7 ad 0.3 coditioal o the value of y, the variable x is Expoetial with mea y1 A "desity" for yx, (that oe adds over x ad itegrates over y ) is the f 1 x.7i y 0 exp x.3i y 1 exp if y, x 0,1 0, y, x 0 otherwise a) Evaluate Px 1. b) Fid the coditioal probability that 1 c) Fid the 0-1 loss optimal predictor of y based o x. d) Evaluate the risk of your predictor i c). y give the value of x, P y 1 x. 1

2 3. Below is a table givig a joit pmf for the radom pair yx., x 1 x x 3 y y y a) Fid the SEL optimal predictor of y based o x. (You eed to evaluate all of 1,, ad 3 ˆ ˆ ˆ y opt y opt y opt.) Evaluate the risk of this predictor. b) Fid the 0-1 loss predictor of y based o x. (Agai, you eed to evaluate all of 1,, ad 3 ˆ ˆ ˆ y opt y opt y opt.) Evaluate the risk of this predictor. Sectio 1. of the Course Outlie 4. Cosider the toy statistical model where the etries of x x x x are iid N,1 a) Write out the likelihood fuctio for iferece about.,,, 1. b) Fid the maximum likelihood estimator of (the fuctio of x that maximizes the MLE likelihood fuctio, ˆ x ). (Operatioally, you will probably fid it easiest to maximize the logarithm of the likelihood fuctio.) c) Fid the SEL risk fuctio R for the maximum likelihood estimator of. MLE d) A secod possible estimator of is.5ˆ x. Fid the risk fuctio of this secod estimator of ad compare it to your aswer to c). Is oe of the two estimators always (for all ) better tha the other? If ot, for which values of (the ukow) is this secod estimator preferable? 5. Suppose that x Bi 5, p. a) Plot o the same set of axes, the 6 differet possible log-likelihood fuctios

3 x l p 1 5 x p (the 5 x is icosequetial). b) I light of the plot i a) for the x 0 ad x 5 cases ad some calculus, what is the maximum likelihood estimator of p, MLE ˆp c) Two more possible estimators of p are x? What is the risk fuctio of this estimator? 1 x 1 pˆx 5 ad pˆ3x Fid the SEL risk fuctios for these estimators ad plot them together with the risk fuctio from b) o the same set of axes. Is ay oe of these uiformly/always smaller tha the others? 6. Cosider the two biomial pmf's defied o 0,1,,5 decisio fuctio 5 5 f x f x x x ad ad decisio about 0,1 x 5 x x 5 x Ix.5 a x (You must fid the two values R0 ad R 1.). Fid the 0-1 loss risk fuctio for the Sectio 1.3 of the Course Outlie 7. For x N,1 ad a N 0, 10 prior distributio for, what is the SEL Bayes optimal estimator of. Hit: What is the form of the coditioal distributio of give x? 8. For x Bi 5, p ad a Uiform 0,1 prior distributio for p, what is the SEL Bayes optimal estimator of p? Hit: What is the form of the coditioal distributio of p give x? 3

4 Assigmet Due 9/19/14 Sectio.1 of the Course Outlie 9. I class we derived the distributio of U U U 1 1 for U1 ad U iid with pmf u 0 1 f u a) Fid the distributio of U U U U U U U U U radom variables U1, U, U3, U 4 with this margial distributio. for iid b) Compare the mea ad variace for U 1 to the meas ad variaces for both the ad 4 versios of U. 10. Suppose that U1 ad U are iid discrete radom variables, each uiformly distributed o the set 0,1,,3, 4,5,6. Fid the pmf of the radom variable V UU 1. (Determie the set of possible values ad correspodig probabilities ad put them ito a table specifyig this pmf.) 11. Suppose that UV, is uiformly distributed o the uit square. That is, suppose that the pair has joit pdf Cosider the radom variable T U V., 0 1 ad 0 1 f u v I u v a) Fid the cdf of T. (You ca fid differet "formulas" for Ft depedig upo how t compares to the values 0,1, ad. Oce you idetify the set of, equal to t, simple geometry ca be used to fid the value of Ft.) b) Use a) ad fid the pdf of T. uv with total less tha or 1. Suppose that U ad V are iid Exp1, i.e. have joit pdf 4

5 f uv, I0 u1 ad 0 v1 expu v Fid the cdf ad pdf for the product of U ad V. (The first will require computatio of a double itegral over a appropriate uv, regio.) 13. Fid a fuctio h so that for U U0,1 the radom variable hu has pdf 3 f v v I v Cosider the discrete distributio of Problem 9. Fid a fuctio h so that for V U0,1 the radom variable hv has that simple distributio. (There are may solutios here. h should have oly 3 possible values, ad it suffices to break its domai ito 3 itervals correspodig to those possible values.) 15. Fid the momet geeratig fuctio of the U0, distributio. 16. Argue usig momet geeratig fuctios that the sum of 5 iid Exp 1 radom variables has a gamma distributio. x 1,,, are iid N x, 17. Suppose that the etries of x x x y y1, y,, ym are iid N y,, ad x ad sample mea ad variace of the Defie x i 's ad, the etries of y are idepedet. Let x ad s x are the y ad s y are the sample mea ad variace of the y 's. i s pooled 1 x 1 1m1 s m s y a) Argue carefully that there is a multiple of multiplier ad what are the degrees of freedom?) s pooled that has a distributio. (What is the 5

6 b) Idetify a fuctio of the radom variables x y s has a t distributio. What are appropriate degrees of freedom? ad pooled ad the differece x y that c) Suppose that x 7, s 1, 10, y 5, s.8, ad m 1. Use your aswer to b) ad fid 90% two-sided cofidece limits for x y. x y Assigmet 3 Due 9/6/ I Example 18 the claim is made that the sum of two idepedet Poisso radom variables is Poisso. Prove this. Sectio. of the Course Outlie 19. Use the fact that for large a Poisso distributio with mea is approximately ormal ad fid limits based o a Poisso variable X that (at least for large ) fuctio as approximate 90% cofidece limits for. What iterval is produced if a Poisso observatio is X 500? 0. Suppose that x1, x, x3, are iid U0,. Cosider the radom variable m max x, x,, x 1 the largest of the first of these observatios. Notice that m t if ad oly if all of the first observatios are less tha or equal to t. (So, for t the Pm t Px t.) Make from m the radom variable a) Fid for y 0 the limit as of What is the approximate distributio of y? y m P y y 1 y b) What is a approximate distributio for y /? (What is the limit of P t?) t for positive 6

7 c) Use your aswer to b) ad fid a large approximately 95% upper cofidece boud for that is a multiple of m. 1. As i the previous problem, suppose that x1, x, x3, are iid U0,. For x the sample mea of the first of these variables, fid (usig the CLT ad a delta method argumet) a approximate distributio for x. Sectio.3 of the Course Outlie. Use R simulatios to a) Fid approximate aswers for all of Problem 11. b) Fid approximate aswers for all of Problem 1. (I both cases use at least 10,000 simulated values of the variables i questio.) Assigmet 4 Due 10/17/14 3. Below is some BUGS code that ca be used to do the simulatios i Problem a). You ca read about BUGS sytax i the user maual available through the help meu i either WiBUGS or OpeBUGS. Ru this code i either WiBUGS or OpeBUGS with at least 100,000 iteratios ad get estimated desities for UV,, ad T ad estimated values for the cdf of T at the values t 0,.,.4,,1.8,.0. model { U~duif(0,1) V~duif(0,1) T<-U+V I.<-step(.-T) I.4<-step(.4-T) I.6<-step(.6-T) I.8<-step(.8-T) I1.0<-step(1.0-T) I1.<-step(1.-T) I1.4<-step(1.4-T) I1.6<-step(1.6-T) I1.8<-step(1.8-T) I.0<-step(.0-T) } 7

8 4. Modify the code i Problem 3 ad get estimated desities for UV,, ad b) usig either WiBUGS or OpeBUGS with at least 100,000 iteratios. 5. Suppose that x Bi 5, p ad a priori U0,1 a) What is the posterior distributio of p x? p. T for Problem b) Use WiBUGS or OpeBUGS with at least 100,000 iteratios to approximate the posterior desity ad the posterior mea of p for the observatio x. Also fid a correspodig 95% credible iterval for p. Here is some relevat BUGS code: model { X~dbi(p,5) p~duif(0,1) } list(x=) Sectio.4 of the Course Outlie 6. Below is some R code for implemetig a o-parametric bootstrap for studyig the distributio of T the sample media of observatios (supposig that oe is makig iid draws from some distributio). Apply this code to the sample of 0 umbers 0.18,0.15,0.14,0.44,.89,1.3,0.54,0.96,0.15,1.39,0.76,1.4,4.4,1.05,1.04,1.88,0.65,0.34,0.59,.36 (that actually comprise a rouded sample of size 0 geerated from a Exp 1 distributio. Use that code to estimate E T0, VarT 0, ad the first ad 9 th deciles of the distributio of T 0. #Set the seed o the radom umber geerator to get the same results for repeat rus set.seed(0) #Create some data iput "data" <-0 observed<-c(roud(rexp(),digits=)) observed #Ready a matrix with B rows ad colums for B bootstrap samples of size B< Boot<-matrix(c(rep(0,B*)),row=B,byrow=TRUE) #Create the matrix of bootstrapped samples 8

9 for (i i 1:B) { Boot[i,]<-sample(observed,replace=TRUE,) } #Ready a vector for bootstrapped values of T Tstar<-c(rep(0,B)) #Make a vector of bootstrapped values of T for (i i 1:B) { Tstar[i]<-media(Boot[i,]) } #Get some summaries of the bootstrap distrbutio of Tstar summary(tstar) hist(tstar) quatile(tstar,probs=seq(0,1,.05)) 7. The mea of the observatios listed i Problem 6 is x 1.1. If oe assumes that the values i Problem 6 are a rouded sample from some expoetial distributio, the expoetial distributio with mea 1.1 is a plausible choice. (If oe ackowledges the roudig, 1.1 is ot ecessarily quite the MLE of the expoetial mea, but we'll igore this fie poit.) Thus a parametric bootstrap replaces samplig with replacemet from the observed values with samplig from a rouded versio of the expoetial distributio with mea 1.1 (ad thus "rate" 1/1.1 ). I the code above, oe ca simply replace sample(observed,replace=true,) with roud(rexp(,rate=1/1.1),) a) Redo Problem 6 usig this parametric bootstrap. b) Replace 1.1 above with 1.00 ad thereby use simulatio to fid the true values of the characteristics of the distributio of the sample media of 0 rouded expoetial variables that are beig estimated i Problem 6 ad above i part b). Sectio 3.1 of the Course Outlie 8. Suppose that x1, x,, x for the parameter vector,. are iid Beta,. Idetify a two-dimesioal sufficiet statistic 9

10 9. Suppose that x1, x,, x are iid N,. a) Argue carefully that the statistic T is sufficiet for the parameter vector,. b) The show that x, s sufficiet. x, i i1 i1 x i S is a 1-1 oto fuctio of T (is equivalet to T ) ad is thus also 30. For x1, x,, x Poisso argue carefully that T iid x xi is miimal sufficiet for. (Showig sufficiecy is easy. Showig miimal sufficiecy is somewhat harder. You ca argue that if Tx Tx the,1,1 x x as fuctios of. That meas that you have to show that for some particular the values of the likelihood ratios differ.) i1 31. Cosider the statistical model for x with 1,, 3 ad pmfs f x give i the table below. Idetify a miimal sufficiet statistic for this model, say T. (Give values of T for each possible value of x.) x Sectio 3. of the Course Outlie 3. Fid the Fisher iformatio i a sigle Beroulli p radom variable about the parameter p i two differet ways. First use the defiitio of Fisher iformatio, the use Propositio of the typed outlie. 10

11 33. Fid the Fisher iformatio i a sigle Expoetial variable with mea about the parameter. First use the defiitio of Fisher iformatio, the use Propositio of the typed outlie. 34. Suppose that ad with mea. x y are idepedet radom variables, x Poisso a) What is the Fisher iformatio i the pair x, y about? ad y Expoetial b) If Tx, y is ubiased for, what is the miimum possible value for Var Tx, y For a fixed umber 0,1 let ˆ, 1 c) Show that each ˆ is ubiased for. x y x y.? d) Plot the SEL risk fuctios for both ˆ ˆ 1 ad over the rage 0 4. O the same set of 3 axes plot, the Cramèr-Rao lower boud for the variace of a ubiased estimator of (this is a fuctio of ). For which values of do the two liear combiatios of x ad y achieve this lower boud? Assigmet 5 (Not to be Collected, but to be Covered o Exam ) 35. Argue carefully that for x1, x,, x Beroulli p variables, pˆ x is ubiased for p. The show that the variace of this estimator achieves the Cramèr-Rao lower boud for the variace of a ubiased estimator of p for all values of p. iid 36. Argue carefully that for x1, x,, x iid Expoetial variables with mea, x is ubiased for. The show that the variace of this estimator achieves the Cramèr-Rao lower boud for the variace of a ubiased estimator of for all values of. Sectio 3.3 of the Course Outlie 11

12 37. Cosider the cotext of Problem 35. a) Argue that pˆ x is the maximum likelihood estimator of p. b) Argue two ways that for ay 0 P ˆ p 0 p p0 0 as First use Propositio 7 ad the use Theorem 13 from the typed outlie. c) Argue two ways that for large, valid but uusable approximately 95% cofidece limits for p are pˆ 1.96 p 1 p First use Propositio 8 ad the use Theorem 14 from the typed outlie. d) Use Corollary 9 ad fid a valid usable replacemet for the limits of part c). (Use the "expected iformatio" fix.) e) Use Corollary 30 ad fid a valid usable replacemet for the limits of part c). (Use the "observed iformatio" fix.) 38. Suppose that x1, x,, x Poisso. are iid a) Argue that ˆ x is the maximum likelihood estimator of. b) Argue two ways that for ay 0 P ˆ as First use Propositio 7 ad the use Theorem 13 from the typed outlie. c) Argue two ways that for large, valid but uusable approximately 95% cofidece limits for are ˆ

13 First use Propositio 8 ad the use Theorem 14 from the typed outlie. d) Use Corollary 9 ad fid a valid usable replacemet for the limits of part c). (Use the "expected iformatio" fix.) e) Use Corollary 30 ad fid a valid usable replacemet for the limits of part c). (Use the "observed iformatio" fix.) 39. Suppose that x1, x,,, x are iid with margial pmf f x p specified below. x f x p 1 p 31 p p 31 p p 3 p Let 0 3 the umber of xi takig the value 0 the umber of xi takig the value the umber of x takig the value 3 i a) Give a formula for the log-likelihood fuctio based o the x i, L p. b) A sample of size 100 produces 0 64, 9, ad 3 7 ad a log-likelihood that is plotted below. Further, some umerical aalysis ca be doe to show that L , L ad L Use Corollary 30 (the "observed iformatio" fix) ad give approximate 95% cofidece limits for p based o this sample. 13

14 Sectio 3.4 of the Course Outlie 40. Cosider a statistical model where a sigle discrete observatio x has probability mass fuctio f x idicated i the table below. x

15 a) Iitially cosider oly the possibilities that 4 ad 1. Idetify a miimum Bayes risk test of H 0: 4 vs H a : 1 for a prior distributio with g4.6 ad g1.4. What are the size ad power for this test? b) Now cosider testig H 0: 4 vs H a : 4. Fid values for the geeralized log likelihood ratio statistic x l max, 4 potetially useful i this simple versus composite testig problem. Cosider the test that decides i favor of H a : 4 exactly whe x l 3 /. For which x values does this test reject H 0? What is the size of this test? x 41. For x1, x,, x iid Expoetial variables with mea, cosider testig H : 1 vs H : 1. 0 a a) Fid the geeralized log likelihood ratio statistic x l max,1 ad ote that it is a fuctio of the sample mea. Argue that as a fuctio of x it is decreasig to the left of x 1 ad icreasig to the right of x 1. b) O the basis part a), for 100 give two equatios i x that ca be solved umerically to L U L U produce d100 1 ad d100 1 so that a test rejectig H 0 if x d100 or x d100 is a likelihood ratio test of size approximately.05. c) Show how you could use the Cetral Limit Theorem to evaluate the (Type II) error probability for this test for the possibility that. (Give a formula for this i terms of d ad d.) L U d) Suppose that 100 ad x Plot the log-likelihood ad show graphically how you ca make a approximately 95% cofidece iterval for. What is this iterval? x 15

16 4. Retur to Problem 39 ad the plot of the likelihood provided for a particular sample of size 100. Read off from that plot a approximate 95% cofidece iterval for p as the set of all p 's with log-likelihood withi.5 times the upper 5% poit of the maximum log-likelihood. 1 distributio of the Assigmet 6 Due 11/14/14 Sectios 4.1 ad 4. of the Course Outlie 43. Vardema will sed you a R file providig code that will allow you to study simple -class classificatio problems with x 0,1. As provided, the code is set up to study a case where x x x x f 0 I 0,1, g 0.5, f 1 x x I 0,1, ad g You ca make simple modificatios to chage all of these, but for this problem, begi with this set-up. (You should thoroughly uderstad the code ad be able to chage parameters it uses ad modify it to hadle other problems.) a) Fid the form of the Bayes optimal classifier i this problem ad use -variable calculus or geometry to fid the error rate for this classifier. (Note that NO data-based approximatio to the Bayes optimal classifier ca have a real error rate lower tha this value.) b) Geerate (usig the code) a traiig set of size N 100 for this problem. Make a plot of the regios i 0,1 where the 5 earest eighbor classifier classifies to 0 ad to 1. How does this classifier compare to the optimal classifier? What is the K 5 fold cross validatio error rate for this classifier? Now make a similar plot ad evaluate the correspodig cross-validatio error rate for the 9 earest eighbor classifier. c) Redo part b) for a traiig set of size N 400. d) For the size N 400 traiig set, fid the k for a earest eighbor classifier with the best K 10 cross-validatio error rate. e) Redo part d) for the bootstrap estimate of error rate. f) Commo practice i predictive aalytics is to look for the least complex classifier with predicted error rate "fairly close" to the miimum. What does this heuristic suggest is a good k i the preset problem with N 400? 16

17 44. Redo Problem 43 usig x 1 1 x 0,1 f x1 x I Assigmet 7 Due 11/1/ Retur to the situatio of Problem 43 ad geerate traiig samples of sizes 400, 4000, ad a) For each traiig set size, use cross-validatio to idetify a good k for use i a k classifier. Compare the resultig classifiers to the Bayes/optimal classifier i terms of appearace of a plot of yˆ x1, x (white for 0 ad black for 1) at the poits o the grid used i the first set of classificatio code Vardema distributed. b) Create a test set of 100,000 pairs y, x (separate from those used to make the classifiers). Compute test error rates based o this set for the 3 classifiers of a). How do these compare to the theoretically optimal error rate you computed i 43a)? 46. (Extra Credit, ot required but worth a extra "poit" if doe completely/well.) I Sectio 4.1 of the typed outlie, the suggestio is made that oe might try estimatig f x 0 ad f x 1 ad with g ˆ 0 the fractio of y 0 cases i the traiig set, cosider the classifier ˆ I gˆ0 fˆ 0 1gˆ0 f 1 x x (*) I the cotext of Problem 43 with bivariate cotiuous x x, x for a "badwidth parameter" ad, to use 1 ˆ f x, x 0 exp x x x x N 1 1 1i i 0 i s.t. y i 0, a way to estimate desities is ˆ f x, x 1 exp x x x x N 1 1 1i i 1 i s.t. y i 1 17

18 where N0 ad N 1 are the couts of y 0 ad y 1 cases i the traiig set. For the 400 traiig set of Problem 43, treat as a complexity parameter ad compare several cases of i terms of the traiig error rates for the classifier (*) ad appearace of a plot of ˆ, y x x (white for 0 ad black for 1) at the poits o the grid used i the first set of classificatio code Vardema distributed. (I'm guessig that a reasoable place to start lookig for a good badwidth parameter is aroud.05.) 1 Sectios 4.3 ad 4.4 of the Course Outlie 47. Retur to the situatio of Problem 43. (Uless specifically idicated to the cotrary, use the N 400 traiig set size below.) a) Fit classificatio trees usig the tree() fuctio with both default parameter settigs ad those provided i Vardema's code. Which "full" tree is simpler? Usig the more complex full tree, use cross validatio with the cost-complexity pruig idea to fid a good sub-tree of the full tree. What is the cross-validatio error rate for the chose complexity/weight/umber-of-fialodes? b) Ru the radomforest code provided by Vardema. Does the radom forest predictor seem to behave more sesibly if it is built o a much larger traiig set tha 400? (Also ru the code for 4000.) Compare error rates ad plots of how the classifiers split up 0,1 ito classificatio regios. c) Fit ad compare (based o traiig error rates ad plots of how the classifiers break up 0,1 ito classificatio regios) good logistic regressio-based classifiers produced usig glmet() with 0,.5,1. (Use lambda.1se.) d) How would you modify the classifiers you have idetified i c) if you were give the iformatio that actually, the prevalece of y 0 case i the uiverse is ot as represeted i the traiig set but is actually much more like g (If the traiig set had bee made up to be represetative of the uiverse, oe would have expected oly about 4 istaces of y 0.) (See the outlie discussio of case-cotrol studies.) Show how these classifiers split up 0,1 ito classificatio regios as compared to the oes from which they are derived. 18

19 e) Choosig betwee a umber of possible cost parameters o the basis of cross-validatio, fid a good support vector classifier for the problem. What is its traiig error rate? Make a plot of how it breaks up 0,1 ito classificatio regios. f) Combie the three classifiers you idetify i c) i differet ways. First, with equal weights average fitted probabilities that y 0 ad use a classifier built o the average probability. The simply use equal weight majority votig betwee the classifiers to defie a ew oe. (These are the two ideas of the "Esemble" video.) Compare these two classifiers to each other ad to the three classifiers from which they were made i terms of a test set error rate like that made i Problem 45b). 48. Redo Problem 47 usig x 1 1 x 0,1 f x1 x I I a) look for a choice of parameters that gives a large iitial tree. Assigmet 8 Not to be Collected but Covered o Exam 3 1/5/14 Sectio 5.1 of the Course Outlie 49. (Simple liear regressio i terms of the ormal liear model) Below is a small set of fake x, y data. x y Cosider the (SLR) ormal liear model y where the i are idepedet x i 1 i i otatio as Y Xβ ε for Normal 0, radom variables. This is writte i matrix 19

20 y y Yy 3, X1 3, β, ad ε 3 y y Usig matrix calculatios (either "by had" or aided by the matrix calculatio facility of R) do all of the followig. a) Fid fuctio ˆOLS ˆMLE OLS β β. Based o this fit to the "traiig data," what is y ˆ 1.5 OLS ŷ x?? What is the b) Fid SSE ad the value of the MLE for. c) Make 95% two-sided cofidece limits for i the ormal liear model. d) Give a 90% two-sided cofidece iterval for the icrease i average y that accompaies a uit icrease i x. (See agai the "o-matrix" form of the model.) e) Give a 90% two-sided cofidece iterval for the average value of y whe x 1.5. f) Give a 90% two-sided predictio iterval for the ext value of y whe x 1.5. g) Give a 90% two-sided predictio iterval for the sample mea of the ext 5 values of y whe x 1.5. h) Give a 90% two-sided cofidece iterval for the differece i mea y at x.5 ad mea y at x 1.5. i) Give a 90% two-sided predictio iterval for the differece betwee a ew y at x.5 ad a differet ew y at x (The oe-way ormal model i terms of the geeral ormal liear model) I a ISU egieerig research project, so called "tilt table tests" were doe i order to determie the agles at which certai vehicles experiece lift-off of the "high side" set of wheels ad begi to roll over o their sides. So called "tilt table ratios" (which are the tagets of agles at which liftoff occurred) were measured for 4 differet vas with the followig results. 0

21 Va #1 Va # Va #3 Va #4.96,.970,.967, 1.010, 1.04, 1.01, , , 1.093, 1.090, , 1.001, 1.00, (Notice that Va #3 was tested 5 times while the others were tested 4 times each.) Vas #1 ad # were miivas ad Vas #3 ad #4 were full size vas. We'll cosider aalysis of these data usig a "oe-way ormal model" for y jth tilt table ratio for va i ij of the form yij j ij for ij iid N0,. With y11 11 y y y ,, Y, ad y X β ε y y y this ca be writte i matrix otatio as Y Xβ ε Usig matrix calculatios (either "by had" or aided by the matrix calculatio facility of R) do all of the followig. a) Fid ˆ ˆ OLS MLE β β. b) Fid SSE ad the value of the MLE for. c) Make 95% two-sided cofidece limits for i the ormal liear model. 1

22 d) Give 95% two-sided cofidece limits for 1 (the mea tilt table ratio for Va #1). e) Give 95% two-sided cofidece limits for 3 (the mea tilt table ratio for Va #3). f) Give 95% two-sided predictio limits for the ext tilt table ratio for Va #3. g) Give 95% two-sided cofidece limits for 1 3 (the differece i mea tilt table ratios for Va #1 ad Va #3). h) It might be of iterest to compare the average of the tilt table ratios for the miivas to that of the full size vas. Accordigly, give a 95% two-sided cofidece iterval for the quatity Assigmet 9 Not to be Collected but Covered o the Fial Exam Sectio 5 of the Course Outlie (Practical SEL Predictio) 51. This questio cocers the aalysis of a set of home sale price data obtaied from the Ames City Assessor s Office. Data o sales May 00 through Jue 003 of 1 ad story homes built 1945 ad before, with (above grade) size of 500 sq ft or less ad lot size 0,000 sq ft or less, located i Low- ad Medium-Desity Residetial zoig areas. (The data are i a Excel spreadsheet o the Stat 34 Web page. These eed to be loaded ito R for aalysis.) 88 differet homes fittig this descriptio were sold i Ames durig this period. ( were actually sold twice, but oly the secod sales prices of these were icluded i our data set.) For each home, the value of the respose variable Price recorded sales price of the home ad the values of 14 potetial explaatory variables were obtaied. These variables are 1 Size Lad Bedrooms Cetral Air Fireplace Full Bath the floor area of the home above grade i sq ft, the area of the lot the home occupies i sq ft, a cout of the umber i the home a dummy variable that is 1 if the home has cetral air coditioig ad is 0 if it does ot, a cout of the umber i the home, a cout of the umber of full bathrooms above grade,

23 Half Bath Basemet Fiished Bsmt Bsmt Bath Garage Multiple Car a cout of the umber of half bathrooms above grade, the floor area of the home's basemet (icludig both fiished ad ufiished parts) i sq ft, the area of ay fiished part of the home's basemet i sq ft, a dummy variable that is 1 if there is a bathroom of ay sort (full or half) i the home's basemet ad is 0 otherwise, a dummy variable that is 1 if the home has a garage of ay sort ad is 0 otherwise, a dummy variable that is 1 if the home has a garage that holds more tha oe vehicle ad is 0 otherwise, Style ( Story ) a dummy variable that is 1 if the home is a story (or a home ad is 0 otherwise, ad Zoe ( Tow Ceter ) a dummy variable that is 1 if the home is i a area zoed as "Urba Core Medium Desity" ad 0 otherwise. 1 story) a) I preparatio for aalysis, stadardize all variables that are ot dummy variables (those we'll leave i raw form), makig a data frame with 15 colums. Say clearly how oe goes from a particular ew set of home characteristics to a correspodig set of predictors. The say clearly how a predictio for the stadardized price to a predictio for the dollar price. b) Fid predictors for stadardized price of all the followig forms: OLS Lasso (choose by cross-validatio) Ridge (choose by cross-validatio) Elastic Net with.5 (choose by cross-validatio) Nearest Neighbor (based o the k 4 predictors that have the largest coefficiets i the liear predictors you idetify) Sigle Regressio Tree (choose the tree by cost-complexity pruig of full trees) Radom Forest (use default parameters) N-W Kerel Smoother (based o the k 4 predictors that have the largest coefficiets i the liear predictors you idetify) (look for a good badwidth) Local Liear Regressio Smoother (based o the k 4 predictors that have the largest coefficiets i the liear predictors you idetify) (look for a good badwidth) Neural Network (use oe hidde layer with 8 odes ad cross-validatio i the fittig) Fid the sets of predictios all these methods produce for the traiig set. Compare these sets of predictios by makig scatterplots for all pairs of predictio types ad computig all pairs of correlatios betwee predictios. Which two methods give the least similar predictios? (If these both have good cross-validatio errors, they would become good cadidates for combiig ito a sigle "stacked" predictor.) 3

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

1 Models for Matched Pairs

1 Models for Matched Pairs 1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

Mathematical Statistics - MS

Mathematical Statistics - MS Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios

More information

6. Sufficient, Complete, and Ancillary Statistics

6. Sufficient, Complete, and Ancillary Statistics Sufficiet, Complete ad Acillary Statistics http://www.math.uah.edu/stat/poit/sufficiet.xhtml 1 of 7 7/16/2009 6:13 AM Virtual Laboratories > 7. Poit Estimatio > 1 2 3 4 5 6 6. Sufficiet, Complete, ad Acillary

More information

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A) REGRESSION (Physics 0 Notes, Partial Modified Appedix A) HOW TO PERFORM A LINEAR REGRESSION Cosider the followig data poits ad their graph (Table I ad Figure ): X Y 0 3 5 3 7 4 9 5 Table : Example Data

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

Stat 139 Homework 7 Solutions, Fall 2015

Stat 139 Homework 7 Solutions, Fall 2015 Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +

More information

Stat 319 Theory of Statistics (2) Exercises

Stat 319 Theory of Statistics (2) Exercises Kig Saud Uiversity College of Sciece Statistics ad Operatios Research Departmet Stat 39 Theory of Statistics () Exercises Refereces:. Itroductio to Mathematical Statistics, Sixth Editio, by R. Hogg, J.

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators. IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight) Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........

More information

TAMS24: Notations and Formulas

TAMS24: Notations and Formulas TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =

More information

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

MATH/STAT 352: Lecture 15

MATH/STAT 352: Lecture 15 MATH/STAT 352: Lecture 15 Sectios 5.2 ad 5.3. Large sample CI for a proportio ad small sample CI for a mea. 1 5.2: Cofidece Iterval for a Proportio Estimatig proportio of successes i a biomial experimet

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes. Term Test October 3, 003 Name Math 56 Studet Number Directio: This test is worth 50 poits. You are required to complete this test withi 50 miutes. I order to receive full credit, aswer each problem completely

More information

Lecture 33: Bootstrap

Lecture 33: Bootstrap Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA STATISTICAL THEORY AND METHODS PAPER I

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA STATISTICAL THEORY AND METHODS PAPER I THE ROYAL STATISTICAL SOCIETY 5 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA STATISTICAL THEORY AND METHODS PAPER I The Society provides these solutios to assist cadidates preparig for the examiatios i future

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

MATHEMATICAL SCIENCES PAPER-II

MATHEMATICAL SCIENCES PAPER-II MATHEMATICAL SCIENCES PAPER-II. Let {x } ad {y } be two sequeces of real umbers. Prove or disprove each of the statemets :. If {x y } coverges, ad if {y } is coverget, the {x } is coverget.. {x + y } coverges

More information

There is no straightforward approach for choosing the warmup period l.

There is no straightforward approach for choosing the warmup period l. B. Maddah INDE 504 Discrete-Evet Simulatio Output Aalysis () Statistical Aalysis for Steady-State Parameters I a otermiatig simulatio, the iterest is i estimatig the log ru steady state measures of performace.

More information

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D.

Sample Size Estimation in the Proportional Hazards Model for K-sample or Regression Settings Scott S. Emerson, M.D., Ph.D. ample ie Estimatio i the Proportioal Haards Model for K-sample or Regressio ettigs cott. Emerso, M.D., Ph.D. ample ie Formula for a Normally Distributed tatistic uppose a statistic is kow to be ormally

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

PRACTICE PROBLEMS FOR THE FINAL

PRACTICE PROBLEMS FOR THE FINAL PRACTICE PROBLEMS FOR THE FINAL Math 36Q Fall 25 Professor Hoh Below is a list of practice questios for the Fial Exam. I would suggest also goig over the practice problems ad exams for Exam ad Exam 2 to

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis Sectio 9.2 Tests About a Populatio Proportio P H A N T O M S Parameters Hypothesis Assess Coditios Name the Test Test Statistic (Calculate) Obtai P value Make a decisio State coclusio Sectio 9.2 Tests

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

HOMEWORK I: PREREQUISITES FROM MATH 727

HOMEWORK I: PREREQUISITES FROM MATH 727 HOMEWORK I: PREREQUISITES FROM MATH 727 Questio. Let X, X 2,... be idepedet expoetial radom variables with mea µ. (a) Show that for Z +, we have EX µ!. (b) Show that almost surely, X + + X (c) Fid the

More information

7.1 Convergence of sequences of random variables

7.1 Convergence of sequences of random variables Chapter 7 Limit Theorems Throughout this sectio we will assume a probability space (, F, P), i which is defied a ifiite sequece of radom variables (X ) ad a radom variable X. The fact that for every ifiite

More information

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Lecture 6 Simple alternatives and the Neyman-Pearson lemma STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull

More information

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n. ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Output Analysis and Run-Length Control

Output Analysis and Run-Length Control IEOR E4703: Mote Carlo Simulatio Columbia Uiversity c 2017 by Marti Haugh Output Aalysis ad Ru-Legth Cotrol I these otes we describe how the Cetral Limit Theorem ca be used to costruct approximate (1 α%

More information

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all! ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Solutios Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced

More information

Sample Size Determination (Two or More Samples)

Sample Size Determination (Two or More Samples) Sample Sie Determiatio (Two or More Samples) STATGRAPHICS Rev. 963 Summary... Data Iput... Aalysis Summary... 5 Power Curve... 5 Calculatios... 6 Summary This procedure determies a suitable sample sie

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Math 10A final exam, December 16, 2016

Math 10A final exam, December 16, 2016 Please put away all books, calculators, cell phoes ad other devices. You may cosult a sigle two-sided sheet of otes. Please write carefully ad clearly, USING WORDS (ot just symbols). Remember that the

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Solutions: Homework 3

Solutions: Homework 3 Solutios: Homework 3 Suppose that the radom variables Y,...,Y satisfy Y i = x i + " i : i =,..., IID where x,...,x R are fixed values ad ",...," Normal(0, )with R + kow. Fid ˆ = MLE( ). IND Solutio: Observe

More information

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACATÓLICA Quatitative Methods Miguel Gouveia Mauel Leite Moteiro Faculdade de Ciêcias Ecoómicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACatólica 006/07 Métodos Quatitativos

More information

Basis for simulation techniques

Basis for simulation techniques Basis for simulatio techiques M. Veeraraghava, March 7, 004 Estimatio is based o a collectio of experimetal outcomes, x, x,, x, where each experimetal outcome is a value of a radom variable. x i. Defiitios

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes.

IE 230 Seat # Name < KEY > Please read these directions. Closed book and notes. 60 minutes. IE 230 Seat # Name < KEY > Please read these directios. Closed book ad otes. 60 miutes. Covers through the ormal distributio, Sectio 4.7 of Motgomery ad Ruger, fourth editio. Cover page ad four pages of

More information

AP Statistics Review Ch. 8

AP Statistics Review Ch. 8 AP Statistics Review Ch. 8 Name 1. Each figure below displays the samplig distributio of a statistic used to estimate a parameter. The true value of the populatio parameter is marked o each samplig distributio.

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Probability and statistics: basic terms

Probability and statistics: basic terms Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values of the secod half Biostatistics 6 - Statistical Iferece Lecture 6 Fial Exam & Practice Problems for the Fial Hyu Mi Kag Apil 3rd, 3 Hyu Mi Kag Biostatistics 6 - Lecture 6 Apil 3rd, 3 / 3 Rao-Blackwell

More information

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday

MidtermII Review. Sta Fall Office Hours Wednesday 12:30-2:30pm Watch linear regression videos before lab on Thursday Aoucemets MidtermII Review Sta 101 - Fall 2016 Duke Uiversity, Departmet of Statistical Sciece Office Hours Wedesday 12:30-2:30pm Watch liear regressio videos before lab o Thursday Dr. Abrahamse Slides

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Summary. Recap ... Last Lecture. Summary. Theorem

Summary. Recap ... Last Lecture. Summary. Theorem Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

4. Partial Sums and the Central Limit Theorem

4. Partial Sums and the Central Limit Theorem 1 of 10 7/16/2009 6:05 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 4. Partial Sums ad the Cetral Limit Theorem The cetral limit theorem ad the law of large umbers are the two fudametal theorems

More information

5. Likelihood Ratio Tests

5. Likelihood Ratio Tests 1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,

More information

32 estimating the cumulative distribution function

32 estimating the cumulative distribution function 32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve

More information

Last Lecture. Wald Test

Last Lecture. Wald Test Last Lecture Biostatistics 602 - Statistical Iferece Lecture 22 Hyu Mi Kag April 9th, 2013 Is the exact distributio of LRT statistic typically easy to obtai? How about its asymptotic distributio? For testig

More information

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS

NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS NANYANG TECHNOLOGICAL UNIVERSITY SYLLABUS FOR ENTRANCE EXAMINATION FOR INTERNATIONAL STUDENTS AO-LEVEL MATHEMATICS STRUCTURE OF EXAMINATION PAPER. There will be oe 2-hour paper cosistig of 4 questios.

More information

Statisticians use the word population to refer the total number of (potential) observations under consideration

Statisticians use the word population to refer the total number of (potential) observations under consideration 6 Samplig Distributios Statisticias use the word populatio to refer the total umber of (potetial) observatios uder cosideratio The populatio is just the set of all possible outcomes i our sample space

More information

Lecture 7: Properties of Random Samples

Lecture 7: Properties of Random Samples Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ

More information

Chapter 2 The Monte Carlo Method

Chapter 2 The Monte Carlo Method Chapter 2 The Mote Carlo Method The Mote Carlo Method stads for a broad class of computatioal algorithms that rely o radom sampligs. It is ofte used i physical ad mathematical problems ad is most useful

More information