1 Review of Probability & Statistics
|
|
- Jodie Barton
- 5 years ago
- Views:
Transcription
1 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over smokers over 5 44 over 5 who both smoke ad imbibe 50 people uder 5 who either smoke or imbibe Determie whether the above report is cosistet. Solutio: Let A be the set of smokers, B be the set of people over 5, ad C be the set of people who imbibe. So we have A = 61, B = 670, C = 960, A B = 86, B C = 90, A C = 180, A B C = 44, A B C = 50. O oe had, we have A B C = 000 A B C = 000 A B C = = O the other had, we have A B C = A + B + C ( A B + A C + B C ) + A B C = ( ) + 44 = So, the above report is NOT cosistet. b. A game begis by choosig betwee dice A ad B i some maer such that the probability that A is selected is p. The die thus selected is the tossed util a white face appears, at which time the game is cocluded. Die A has 4 red ad white faces. Die B has red ad 4 white faces. After playig this game a great may times, it is observed that the probability that a game is cocluded i exactly 3 tosses of the selected die is 7/81. Determie the value of p. Solutio: The probability of dice B is selected is 1 p. So if A is selected, the probability of a toss beig red face is 4/6 (because dice A has 4 red faces), ad a toss beig white face is /6. Similarly, if dice B is selected, the probability of a toss beig red face is /6, ad a toss beig white face is 4/6. If a game is cocluded i exactly 3 tosses, the the first ad the secod toss must be a red face, ad the third be a white face. So we have p (1 p) 4 = 7, i.e., p + 7 (1 p) = 7 81, hece p =
2 c. Compaies A, B, C, D ad E each sed three delegates to a coferece. A committee of four delegates, selected by lot, is formed. Determie the probability that: 1. compay A is ot represeted o the committee.. compay A has exactly oe represetative o the committee. 3. either compay A or compay E is represeted o the committee. Solutio: If we do t have limitatios, the i order to form the committee, we just have to select four people from 15 persos. The total umber of these combiatios ca be deoted by C 4 15 (select 4 out of 15). 1. If compay A is ot represeted o the committee, the we have to select 4 people from compaies B, C, D, ad E, with C 4 1 combiatios. So the probability i this case is C4 1 C 4 15 = = = If A has exactly oe represetative o the committee, the we have to select the other 3 people from compaies B, C, D, ad E. The umber of these combiatios is C3C So the probability i this case is C1 3 C3 1 = 44 = C If either compay A or compay E is represeted o the committee, the we have to choose 9 people from compaies B, C, ad E, with C9 4 combiatios. So the probability i this case is C4 9 C 4 15 = 6 65 = d. Two batches of a certai chemical were delivered to a factory. For each batch te determiatios were made of the percetage of magaese i the chemical. The results were as follows: Batch 1: Batch : Is there a sigificat differece betwee the two sample meas (averages) of Batch 1 ad Batch? Solutio: Let µ 1 ad µ be the true percetages of magaese i batch 1 ad batch, respectively, the the ull hypothesis is give by H 0 : µ 1 = µ, ad the alterative hypothesis is give by H 1 : µ 1 µ. I this case we do a two-tailed test ad assume both sets of data are ormally distributed. So for Batch 1, the mea i=1 x i x 1 = = 3.6, ad sample variaces s 1 = Ad for Batch, the mea i=1 x i i=1 x i ( i=1 x i ) 1 = i=1 x i ( i=1 x i ) x = = 3.5, ad sample variaces s = = So the combied estimate of the variace is give by s = s 1 +s = Hece s = Thus the test statistic is give by t 0 = x 1 x s =.98.
3 By checkig the t-table, we have t 0.05,18 =.1. Sice t 0 > t 0.05,18, it s reasoable for us to reject H 0, i.e., there is a sigificat differece betwee two batches at a 95% cofidece level. e. For questio d, check whether the sample variaces of the two batches are estimates of the same populatio variace. Solutio: Suppose these two samples are draw from a ormal distributio with variace. Let s 1 be the estimate of 1, ad let s be the estimate of. We assume both populatios are ormally distributed ad use the F-test to test the ull hypothesis H 0 : 1 =, ad the alterative hypothesis is give by H 1 : 1. The test statistic is give by F 0 = s 1 s = = 1.8. By checkig the F -table, we have F 0.05,9,9 = 4.03, which is bigger tha F 0, so we ca accept H 0, i.e., the sample variaces of the two batches are estimates of the same populatio variace. f. A die is tossed 10 times. Fid the probability that a four will tur up less tha 15 times. Show the precise formula ad its approximatio by a ormal distributio. Solutio: For each toss, the probability p that a four turs up is 1, i.e., p = Hece the probability it does ot tur up is 1 p. So if four turs up less tha 15 times after 10 tosses, the it ca tur up from 0 to 14 times. If it turs up k (0 k 14) times out of 10, the it does ot tur up for 10 k times. Ad these k times ca appear i ay sequeces of 10 times, i.e., there is a total umber of C10 k possibilities. So the probability that four turs up less tha 15 times is give by: P r = 14 i=0 C i 10p i (1 p) 10 i (1) If we take its approximatio by a ormal distributio, we have mea p = 0 ad variace p(1 p) = 50. So it is give by: 3 P r = 14 0 e 1 (x µ) dx () π Let v = x µ, the equatio () is equivalet to P r = b a e x π dx (3) where a = 0 p = 4.90, ad b = 14 p = So P r = Φ(b) Φ(a) = =
4 g. A electroic compoet is mass-produced ad the tested uit-by-uit o a automatic testig machie which classifies the uit as good or defective. But there is a probability of 0.1 that the machie will misclassify the uit, so that each compoet is i fact tested five times ad regarded as good if so classified three or more times. What is ow the probability of misclassificatio? Solutio: Now the compoet is tested five times, so oly whe three or more times it is misclassified will it be regarded as defective. If it is misclassified k (3 k 5) times, the the misclassificatio ca appear i ay sequeces of the 5 times, i.e., there are C k 5 possibilities. So the probability of misclassificatio is ow give by: P r = 5 C50.1 i i (1 0.1) 5 i (4) i=3 which is C (1 0.1) +C (1 0.1) 1 +C (1 0.1) 0 = = Decisio Tree Learig a. Trai a C5.0 classifier o the Adult data set. Familiarize yourself ad make maximum use of the features of the C5.0 program. Experimet with the pruig softeig threshold optios. Documet ad commet o your efforts to improve the performace of your classifier by tuig C5.0. Solutio: I did various experimets with differet optios of C5.0 program as follows: 1. Decisio tree: Trai the data set with basic optio: c5.0 -f adult. The size of the resultig decisio tree was quite large, 91. This classifier misclassified 393 cases o the traiig data, leadig to a error of 1.0%. The error rate o test data is 13.9%, which is ot quite satisfactory (see yogzhe/6505a1/a 1.txt for details).. Discrete value subsets: Sice the data set has o discrete values, it did ot make sese to test -s optio. 3. Rulesets: Trai the data set usig -r optio. The geerated classifier (called rulesets) cosists of 97 rules (see yogzhe/6505a1/a 3.txt for details). These rules ca be used to classify a ew case if all coditios are satisfied. The error rates o traiig data set ad test data set are almost the same as those i the first experimet. 4. Adaptive boostig: I tried to trai the data set usig -b optio, which is equivalet to -x. However, it took too log time (almost oe ad a half hours) to trai the classifier with boostig optio. The fial boost error o traiig data was as low as 3.0%, but as high as 15.3% o test data (see yogzhe/6505a1/a 4.txt for details). 4
5 CF 5% 50% 75% decisio tree size traiig data error 1.0%.4% 9.7% test data error 13.9% 14.6% 15.0% Table 1: Error rates o traiig data ad test data with differet cofidece values for decisio tree pruig m decisio tree size traiig data error 1.0% 1.8% 13.3% 13.9% 14.3% test data error 13.9% 14.0% 14.0% 14.1% 14.4% Table : Size of decisio tree resultig from umber of cases at each brach poit 5. Softeig thresholds: Trai the data set usig -p optio. The result (see yogzhe/6505a1/a 5.txt for details) showed that most thresholds i this experimet were still quite tight. However, the error rates ad decisio tree size did ot chage from those i step 1, i.e., this experimet did ot improve the accuracy of the classifier. 6. Cofidece for pruig: The default cofidece value of decisio tree pruig is 5%. As the cofidece value (optio -c) icreases, the program teds to do less pruig, leadig to the icreased size of the resultig decisio tree. I did tests with 50% ad 75% cofidece values, ad the sizes of the resultig trees are 9 ad 1391 (see yogzhe/6505a1/a 6.txt for details), respectively, compared with 91 with the default value. At the same time, less pruig results i overfittig o the traiig data ad performig worse o the test data. This is verified i Table Splittig size: C5.0 program has the optio -m to costrai the umber of cases (default ) at each brach poit. As we ca see i Table (more details ca be foud at yogzhe/6505a1/a 7.txt), whe m icreases, the size of the resultig tree decreases sigificatly, while the error rates o traiig data ad test data did ot icrease too much. 8. Dataset samplig: C5.0 has a optio -S to draw a sample from a large data set. It trais the classifier o this data sample ad the test it o a disjoit set of remaiig cases. The umber of cases i the test set depeds o the samplig value. I did tests o sample data of 0% to 80%, ad the results ( yogzhe/6505a1/a 8.txt) showed that there was sigificat differece betwee sizes of resultig decisio trees, but little differece betwee error rates. 9. Cross-validatio: f-fold cross-validatio divides the dataset ito f blocks of subsets with approximately equal size ad class distributio. Each tur the 5
6 hold-out subset is tested with the classifier traied o remaiig data cases. As ormal, I did -fold cross-validatio, ad the result ( yogzhe/6505a1/a 9.txt) showed that there was o sigificat improvemet i the performace. b. What is the error rate o the test data of the best decisio tree classifier you ca come up with? What is the 95% cofidece iterval of the estimate? Solutio: I the experimet ( yogzhe/6505a1/a 1.txt), the classifier misclassified 61 cases out of 1681 cases, leadig to the error rate of 13.9%. This is the best classifier I achieved i various experimets. Let r = 61 ad = 1681, we assume r = is a good estimate of the true error rate p. We have = p(1 p) = Ad we have ε = 1.96 for the 95% cofidece iterval of ormal distributio. So the 95% cofidece iterval is ε = c. This is a ubalaced data set (4% of istaces have label > 50K, 76% of istaces have label 50K. Repeat parts a ad b after balacig the traiig set. Balacig is doe by repeatig each istace i the > 50K class three times. Has balacig helped substatially improve the performace of the classifier? Is the differece betwee your error rate estimates i b ad c statistically sigificat (with 95% cofidece)? Solutio: I wrote a c program ( yogzhe/6505a1/balace.c) to duplicate cases with label > 50K ad I copied these cases ito adult data set such that it is balaced. The I repeated the ie experimets i a). The results showed that the error rate o traiig data has sigificatly decreased, mostly below.0%, but the error rate o test data has icreased very much, mostly above 18.0%. This ca be explaied by that the classifier were overfitted o the traiig data with regards to the triplicatio of cases with label > 50K, while the test data are far from balaced. I oe poit, the balacig did ot help sigificatly improve the performace of the classifier, uless we balace the test data, too. So our ull hypothesis here is that the performace of the classifier traied o ubalaced data is better tha that of o balaced data. The error rate of the best classifier i c is 17.4% ( yogzhe/6505a1/c 3.txt). So as a similar approach i questio 1d, we have s 1 = 7.351e 6, ad s = 8.88e 6. Hece s = s 1 +s 6. So the test statistic t 0 = 13.9% 17.4% 8.09e = 8.09e = We have t 0.05,18 =.1. Sice t 0 >> t 0.05,18, it s reasoable for us to reject the ull hypothesis, i.e., the performace o ubalaced data is much better tha that o balaced data. d. There is a Java implemetatio of the C4.5 decisio tree classifier that accompaies the textbook by Witte ad Frak: ml/weka/. C4.5 is the last publicly available versio of this family of classifiers, before they wet commercial as C5.0. Repeat a, b ad c usig C4.5. Is there a statistically sigificat beefit (i terms of improved performace, i.e. reduced error rate) to usig C5.0 istead of C4.5 o the adult dataset? 6
7 Solutio: First I extracted part of adult data with ative coutry beig Uited- States as the set for C4.5 program. There are 9170 ad 1466 cases i the ew traiig data set ad ew test data set, respectively. The best error rate o ubalaced test data I achieved is 14.0% ( yogzhe/6505a1/d 1.txt), ad 18.7% ( yogzhe/6505a1/d.txt) for balaced data. I traied C4.5 classifier with three optios, Cofidece for pruig, Splittig size, ad -fold cross validatio (with details at yogzhe/6505a1/d 6.txt, yogzhe/6505a1/d 7.txt, ad yogzhe/6505a1/d 9.txt, respectively). We have r = 05 = 0.14 as a estimate of true error p. So = p(1 p) = Ad we have ε = 1.96 for the 95% cofidece iterval of ormal distributio. So the 95% cofidece iterval is ε = As a similar approach i questio c, we have s 1 = 8.1e 6, ad s = 1.037e 5. Hece s = s 1 +s = 9.91e 6. So the test statistic t 0 = = We have t 0.05,18 =.1. Sice t 0 >> t 0.05,18, 14.0% 18.7% 9.91e it s reasoable to say the performace of C4.5 classifier o ubalaced data is much better tha that o balaced data. Actually it is ot suitable to compare performace of C5.0 ad C4.5 classifiers i this situatio because they are traied o differet data sets. Ay way, we have p 1 = 0.139, ad p = 0.140, 1 = 1681, ad = So s 1 = 7.351e 6, ad s = 8.1e 6. Hece s = s 1 +s t 0 = 13.9% 14.0% 7.78e = 7.78e 6. So the test statistic = 3.6. We have t 0.05,18 =.1. Sice t 0 > t 0.05,18, it s reasoable to coclude that C5.0 performs better tha C4.5 o the adult data set at a 95% cofidece level. e. I class we discussed how to classify a istace with missig values usig a already built decisio tree. Ca you outlie how the approach ca be adapted for use i traiig a decisio tree whe istaces of the traiig set may have missig values? Solutio: Suppose the istace with missig value o attribute A is of class C. The basic idea is the to assig the most probable value a i, of A to the missig value. This ca be doe i three steps: 1. Suppose attribute A has k observed values a 1, a,..., a k. The Cout the frequecies f 1, f,..., f k of the various values of A, separately.. Calculate the probability p i (1 i k) of each possible value a i (1 i k) based o f i (1 i k), i.e., p i =f i / k i=1 f i. Or more geerally, we ca assig a weight w i to each possible value of a i (1 i k). 3. Fially, we assig the correspodig value with maximum probability max k i=1 p i to the missig value. 7
8 3 Evaluatio of Hypothesis a. I a supervised learig problem, the test set S is of size. A hypothesis (a specific decisio tree for the problem) h classifies r istaces of S icorrectly. The sample error is therefore r, ad serves as a estimate of the true error p of h. Plot the size of the 95% cofidece iterval as a fuctio of (assume that r/ is approximately p, where p is ukow but costat). Solutio: Let error S (h) be the sample error r ad error D(h) be the true error p. Our objective here is to determie the size ε such that P r( error S (h) error D (h) < ε) = We assume r = p, where p is ukow but costat. We assume the sample S cotais examples idepedet of each other i h. The we defie a radom variable, t, where: t = 1 if failure, ad t = 0, if success. So P r(t = 1) = p, ad P r(t = 0) = 1 p. The mea of t is p ad variace of p is p(1 p). Now we defie variables, t 1, t,..., t, oe for each elemet of S ad let r = t 1 + t t. So mea of r is p ad variace of r is p(1 p). We treat error S (h) as a radom variable, so it has mea p ad variace p(1 p). The probability desity fuctio is give by: 1 (x µ) pdf(x) = e (5) π I particular, we have ε = 1.96 for ormal distributio such that P r( 1.96 x 1.96) = So based o equatio (5), we have the 95% cofidece iterval p(1 p). CI = 3.9 If we plot CI as a fuctio of, we have CI = (1.96 p(1 p)) 1. A approximatio of this curve is show i Figure 1. b. I a supervised learig problem, we wat to compare the goodess of two differet hypotheses h 1 ad h. Two idepedet test sets S 1 ad S of sizes 1 ad are used for h 1 ad h respectively. h 1 ad h make r 1 ad r errors of classificatio over S 1 ad S respectively. The true errors for h 1 ad h are p 1 ad p respectively. For 1 = 50, r 1 = 4, = 30, r = : fid a estimate of p 1 p, give the 95% cofidece iterval for your estimate, fid the probability that p 1 > p. Solutio: Let error D (h 1 ) ad error D (h ) be the true errors of h 1 ad h, respectively. Ad let error S1 (h 1 ) ad error S (h ) be the sample errors of S 1 ad S, respectively. Also let d = error D (h 1 ) error D (h ), i.e., d = p 1 p. 8
9 CI Figure 1: Cofidece iterval as a fuctio of We assume error S1 (h 1 ) ad error S (h ) are estimates of p 1 ad p, respectively, i.e., p 1 = r 1 1 =, ad p 5 = r = 1. The 15 d = error S1 (h 1 ) error S (h ) = 1 75 is a estimate of p 1 p. We assume d is ormally distributed. Its mea µ = p 1 p ad variace = p 1(1 p 1 ) 1 + p (1 p ) = Hece = I particular, we have ε = 1.96 for ormal distributio such that P r( 1.96 x 1.96) = So based o equatio (5), we have the iterval ε = 1.96 = First we have P r(p 1 > p ) = P r(p 1 p > 0) = P r(d > 0). So we have: P r(d > 0) = + 0 e 1 (x µ) dx (6) π Let v = x µ, the equatio (6) is equivalet to P r(d > 0) = b a e x π dx (7) where a = 0 µ = 0.39, ad b = +. So P r(p 1 > p ) = 1 Φ(a) = =
Topic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationData Analysis and Statistical Methods Statistics 651
Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationLecture 2: Monte Carlo Simulation
STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationThis is an introductory course in Analysis of Variance and Design of Experiments.
1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class
More informationMath 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency
Math 152. Rumbos Fall 2009 1 Solutios to Review Problems for Exam #2 1. I the book Experimetatio ad Measuremet, by W. J. Youde ad published by the by the Natioal Sciece Teachers Associatio i 1962, the
More informationPSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9
Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I
More informationClass 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700
Class 23 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 2017 by D.B. Rowe 1 Ageda: Recap Chapter 9.1 Lecture Chapter 9.2 Review Exam 6 Problem Solvig Sessio. 2
More informationMath 140 Introductory Statistics
8.2 Testig a Proportio Math 1 Itroductory Statistics Professor B. Abrego Lecture 15 Sectios 8.2 People ofte make decisios with data by comparig the results from a sample to some predetermied stadard. These
More informationSTA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:
STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio
More informationIf, for instance, we were required to test whether the population mean μ could be equal to a certain value μ
STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially
More informationStatistics 20: Final Exam Solutions Summer Session 2007
1. 20 poits Testig for Diabetes. Statistics 20: Fial Exam Solutios Summer Sessio 2007 (a) 3 poits Give estimates for the sesitivity of Test I ad of Test II. Solutio: 156 patiets out of total 223 patiets
More informationNCSS Statistical Software. Tolerance Intervals
Chapter 585 Itroductio This procedure calculates oe-, ad two-, sided tolerace itervals based o either a distributio-free (oparametric) method or a method based o a ormality assumptio (parametric). A two-sided
More informationChapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more
More informationRecall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.
Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed
More information10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random
Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),
More informationTable 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab
Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationChapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.
Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo
More informationChapter 8: Estimating with Confidence
Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig
More informationConfidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.
MATH1005 Statistics Lecture 24 M. Stewart School of Mathematics ad Statistics Uiversity of Sydey Outlie Cofidece itervals summary Coservative ad approximate cofidece itervals for a biomial p The aïve iterval
More informationApril 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE
April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE TERRY SOO Abstract These otes are adapted from whe I taught Math 526 ad meat to give a quick itroductio to cofidece
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationTopic 18: Composite Hypotheses
Toc 18: November, 211 Simple hypotheses limit us to a decisio betwee oe of two possible states of ature. This limitatio does ot allow us, uder the procedures of hypothesis testig to address the basic questio:
More informationA quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population
A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate
More informationModule 1 Fundamentals in statistics
Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationCommon Large/Small Sample Tests 1/55
Commo Large/Small Sample Tests 1/55 Test of Hypothesis for the Mea (σ Kow) Covert sample result ( x) to a z value Hypothesis Tests for µ Cosider the test H :μ = μ H 1 :μ > μ σ Kow (Assume the populatio
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationTests of Hypotheses Based on a Single Sample (Devore Chapter Eight)
Tests of Hypotheses Based o a Sigle Sample Devore Chapter Eight MATH-252-01: Probability ad Statistics II Sprig 2018 Cotets 1 Hypothesis Tests illustrated with z-tests 1 1.1 Overview of Hypothesis Testig..........
More informationChapter 11: Asking and Answering Questions About the Difference of Two Proportions
Chapter 11: Askig ad Aswerig Questios About the Differece of Two Proportios These otes reflect material from our text, Statistics, Learig from Data, First Editio, by Roxy Peck, published by CENGAGE Learig,
More informationSince X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain
Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the
More information6.3 Testing Series With Positive Terms
6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial
More information7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals
7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses
More informationMA238 Assignment 4 Solutions (part a)
(i) Sigle sample tests. Questio. MA38 Assigmet 4 Solutios (part a) (a) (b) (c) H 0 : = 50 sq. ft H A : < 50 sq. ft H 0 : = 3 mpg H A : > 3 mpg H 0 : = 5 mm H A : 5mm Questio. (i) What are the ull ad alterative
More informationSample Size Determination (Two or More Samples)
Sample Sie Determiatio (Two or More Samples) STATGRAPHICS Rev. 963 Summary... Data Iput... Aalysis Summary... 5 Power Curve... 5 Calculatios... 6 Summary This procedure determies a suitable sample sie
More informationUniversity of California, Los Angeles Department of Statistics. Hypothesis testing
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Elemets of a hypothesis test: Hypothesis testig Istructor: Nicolas Christou 1. Null hypothesis, H 0 (claim about µ, p, σ 2, µ
More informationRandom Variables, Sampling and Estimation
Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig
More informationChapter 23: Inferences About Means
Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationStat 421-SP2012 Interval Estimation Section
Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible
More informationENGI 4421 Confidence Intervals (Two Samples) Page 12-01
ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly
More informationGG313 GEOLOGICAL DATA ANALYSIS
GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationOverview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions
Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples
More informationLecture 10: Performance Evaluation of ML Methods
CSE57A Machie Learig Sprig 208 Lecture 0: Performace Evaluatio of ML Methods Istructor: Mario Neuma Readig: fcml: 5.4 (Performace); esl: 7.0 (Cross-Validatio); optioal book: Evaluatio Learig Algorithms
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationTopic 6 Sampling, hypothesis testing, and the central limit theorem
CSE 103: Probability ad statistics Fall 2010 Topic 6 Samplig, hypothesis testig, ad the cetral limit theorem 61 The biomial distributio Let X be the umberofheadswhe acoiofbiaspistossedtimes The distributio
More informationStatistics 511 Additional Materials
Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability
More informationInstructor: Judith Canner Spring 2010 CONFIDENCE INTERVALS How do we make inferences about the population parameters?
CONFIDENCE INTERVALS How do we make ifereces about the populatio parameters? The samplig distributio allows us to quatify the variability i sample statistics icludig how they differ from the parameter
More informationHomework 5 Solutions
Homework 5 Solutios p329 # 12 No. To estimate the chace you eed the expected value ad stadard error. To do get the expected value you eed the average of the box ad to get the stadard error you eed the
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationFinal Examination Solutions 17/6/2010
The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:
More informationBecause it tests for differences between multiple pairs of means in one test, it is called an omnibus test.
Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal
More information(7 One- and Two-Sample Estimation Problem )
34 Stat Lecture Notes (7 Oe- ad Two-Sample Estimatio Problem ) ( Book*: Chapter 8,pg65) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye Estimatio 1 ) ( ˆ S P i i Poit estimate:
More informationLecture 6 Simple alternatives and the Neyman-Pearson lemma
STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More information(all terms are scalars).the minimization is clearer in sum notation:
7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1
More informationMOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.
XI-1 (1074) MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. R. E. D. WOOLSEY AND H. S. SWANSON XI-2 (1075) STATISTICAL DECISION MAKING Advaced
More informationIntroduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3
Itroductio to Ecoometrics (3 rd Updated Editio) by James H. Stock ad Mark W. Watso Solutios to Odd- Numbered Ed- of- Chapter Exercises: Chapter 3 (This versio August 17, 014) 015 Pearso Educatio, Ic. Stock/Watso
More informationStat 200 -Testing Summary Page 1
Stat 00 -Testig Summary Page 1 Mathematicias are like Frechme; whatever you say to them, they traslate it ito their ow laguage ad forthwith it is somethig etirely differet Goethe 1 Large Sample Cofidece
More informationDirection: This test is worth 150 points. You are required to complete this test within 55 minutes.
Term Test 3 (Part A) November 1, 004 Name Math 6 Studet Number Directio: This test is worth 10 poits. You are required to complete this test withi miutes. I order to receive full credit, aswer each problem
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More informationDiscrete probability distributions
Discrete probability distributios I the chapter o probability we used the classical method to calculate the probability of various values of a radom variable. I some cases, however, we may be able to develop
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationMATH/STAT 352: Lecture 15
MATH/STAT 352: Lecture 15 Sectios 5.2 ad 5.3. Large sample CI for a proportio ad small sample CI for a mea. 1 5.2: Cofidece Iterval for a Proportio Estimatig proportio of successes i a biomial experimet
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationInfinite Sequences and Series
Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet
More informationBig Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.
5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece
More informationSection 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis
Sectio 9.2 Tests About a Populatio Proportio P H A N T O M S Parameters Hypothesis Assess Coditios Name the Test Test Statistic (Calculate) Obtai P value Make a decisio State coclusio Sectio 9.2 Tests
More informationAgreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times
Sigificace level vs. cofidece level Agreemet of CI ad HT Lecture 13 - Tests of Proportios Sta102 / BME102 Coli Rudel October 15, 2014 Cofidece itervals ad hypothesis tests (almost) always agree, as log
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.
ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic
More information6.867 Machine learning
6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples
More informationEstimation of a population proportion March 23,
1 Social Studies 201 Notes for March 23, 2005 Estimatio of a populatio proportio Sectio 8.5, p. 521. For the most part, we have dealt with meas ad stadard deviatios this semester. This sectio of the otes
More information6 Sample Size Calculations
6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More information5. A formulae page and two tables are provided at the end of Part A of the examination PART A
Istructios: 1. You have bee provided with: (a) this questio paper (Part A ad Part B) (b) a multiple choice aswer sheet (for Part A) (c) Log Aswer Sheet(s) (for Part B) (d) a booklet of tables. (a) I PART
More informationChapter 13: Tests of Hypothesis Section 13.1 Introduction
Chapter 13: Tests of Hypothesis Sectio 13.1 Itroductio RECAP: Chapter 1 discussed the Likelihood Ratio Method as a geeral approach to fid good test procedures. Testig for the Normal Mea Example, discussed
More informationc. Explain the basic Newsvendor model. Why is it useful for SC models? e. What additional research do you believe will be helpful in this area?
1. Research Methodology a. What is meat by the supply chai (SC) coordiatio problem ad does it apply to all types of SC s? Does the Bullwhip effect relate to all types of SC s? Also does it relate to SC
More informationChapter 6 Principles of Data Reduction
Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a
More informationFinal Review for MATH 3510
Fial Review for MATH 50 Calculatio 5 Give a fairly simple probability mass fuctio or probability desity fuctio of a radom variable, you should be able to compute the expected value ad variace of the variable
More information- E < p. ˆ p q ˆ E = q ˆ = 1 - p ˆ = sample proportion of x failures in a sample size of n. where. x n sample proportion. population proportion
1 Chapter 7 ad 8 Review for Exam Chapter 7 Estimates ad Sample Sizes 2 Defiitio Cofidece Iterval (or Iterval Estimate) a rage (or a iterval) of values used to estimate the true value of the populatio parameter
More informationStatistical and Mathematical Methods DS-GA 1002 December 8, Sample Final Problems Solutions
Statistical ad Mathematical Methods DS-GA 00 December 8, 05. Short questios Sample Fial Problems Solutios a. Ax b has a solutio if b is i the rage of A. The dimesio of the rage of A is because A has liearly-idepedet
More informationChapter 1 (Definitions)
FINAL EXAM REVIEW Chapter 1 (Defiitios) Qualitative: Nomial: Ordial: Quatitative: Ordial: Iterval: Ratio: Observatioal Study: Desiged Experimet: Samplig: Cluster: Stratified: Systematic: Coveiece: Simple
More informationComparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading
Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual
More informationKinetics of Complex Reactions
Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationBIOS 4110: Introduction to Biostatistics. Breheny. Lab #9
BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous
More information( ) = p and P( i = b) = q.
MATH 540 Radom Walks Part 1 A radom walk X is special stochastic process that measures the height (or value) of a particle that radomly moves upward or dowward certai fixed amouts o each uit icremet of
More informationTMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.
Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx
More information