106 Stat 1434 / 1435 H. Chapter 1: Organizing and Displaying Data

Size: px

Start display at page:

Download "106 Stat 1434 / 1435 H. Chapter 1: Organizing and Displaying Data"

Alaina Curtis
6 years ago
Views:

1 106 Stat Refereces -Biostatistics : A foudatio i Aalysis i the Health Sciece -By : Waye W. Daiel -Elemetary Biostatistics with Applicatios from Saudi Arabia By : Nacy Hasabelaby 1434 / 1435 H Chapter 1: Orgaizig ad Displayig Data 1.1: Itroductio (مصطلحات) Here we will cosider some basic defiitios ad termiologies Statistics: Is the area of study that is iterested i how to orgaize ad summarize iformatio ad aswer research questios. Biostatistics: Is a brach of statistics that iterested i iformatio obtaied from biological ad medical scieces. Populatio: Is the largest group of people or thigs i which we are iterested i a particular time ad about which we wat to make some statemet or coclusios. Sample: A part of the populatio o which we collect data. The umber of the elemet i the sample is called the sample size ad deoted by. Variable: the characteristic to be measured o the elemets of populatio or sample. 1

2 Types of variables Qualitative: If the values of the variables are word idicatig to which category a elemet of the populatio belogs. Quatitative; if the value of the variable are umbers idicatig how much or how may of somethig Nomial: the value of the variables are ames oly Examples: *Geder: Female or male. * Eye colour: Black, brow, gree, etc Ordial: variables ca be ordered. Examples: Educatioal level: elemetary,itermediate, high school. Blood pressure: Low, medium, high Discrete: Ca have coutable umbers of values ( there are gaps betwee the values) Examples: *Number of patiets admitted to a hospital i oe day (x=1,, ) * Number of pai killer tablets (x= 0.5,1,1.5,,.5, ) Note: Discrete values ca take either iteger values or decimal values with gaps betwee the values. Cotiuous: Ca have ay value withi a certai iterval of values. it is usually measured o some scale i terms of some measuremet uits like kilograms, meters etc Examples: *Level of chemical i drikig water *height (140<x<190) *blood sugar level of a perso. 3 Example 1 Suppose we measure the amout of milk that a child driks i a day (i ml) for a sample of 5 two-years childre i Saudi Arabia. The populatio: all two years childre i Saudi Arabia The variable: the amout of milk that a child drik i a day (i ml) The variable is quatitative, cotiuous. The sample size is 5. Example Suppose we measure weather or ot a child has a hearig loss for a sample of 0 youg childre with a history of repeated ear ifectios. The populatio: all youg childre with a history of repeated ear ifectio. The variable: whether or ot a child has a hearig loss The variable is qualitative, omial. Sice the values are either yes or o. The sample size is 0 4

3 Example 3 Suppose we measure the temperature for a sample of 5 aimals havig a certai disease. The populatio The variable The type of the variable The sample size Orgaizig the Data Suppose we collect a sample of size from a populatio of iterest. A first step i orgaizig is to order the data from smallest to largest (if it is ot omial). A further step is to cout how may umbers are the same (if ay). The last step is to orgaize it ito a table called frequecy table (or frequecy distributio). The frequecy distributio has two kids 1) Simple (ugrouped) frequecy distributio: for ) Grouped frequecy distributio: for Qualitative variables Discrete quatitative with small umber of differet variables Cotiuous quatitative variables Discrete quatitative with large umber of differet variables. 6 3

4 Frequecy of wome Example 1..1: (simple frequecy distributio) Suppose we are iterested i the umber of childre that a Saudi woma has ad we take a sample of 16 wome ad obtai the followig data o the umber of childre 3, 5,, 4, 0, 1, 3, 5,, 3,, 3, 3,, 4, 1 Q1: What is the variable? The populatio? ad the sample size?. What are the differet values of the variable? -the differet values are: 0,1,,3,4,5 Q: Obtai a simple frequecy distributio (table)? If we order the data we obtaied 0, 1,1,,,,, 3, 3, 3, 3, 3, 4, 4, 5, 5 To obtai a simple frequecy distributio (table) we have to kow the followig cocepts The frequecy: is obtaied by coutig how ofte each umber i the data set. The sample size (): is the sum of the frequecies. Relative frequecy= frequecy/ Percetage frequecy= Relative frequecy*100= (frequecy/)* Simple frequecy table for the umber of childre. Number of childre (variable) frequecy of wome (frequecy) Relative frequecy Percetage frequecy Total = The simple frequecy distributio has the frequecy bar chart as graphical represetatio Frequecy bar chart of the umber of childre Number of childre Exercise: for more exercises ad details about graphs m/chapter/graphig_q ualitative.html 8 4

5 Example 1.. :grouped frequecy distributio The followig table gives the hemoglobi level (i g/dl) of a sample of 45 apparetly healthy me aged 0-4. Fid the grouped frequecy distributio for the (ظاهريا) data What is the variable? The sample size? The max= The mi=13.5 -The rage=max-mi= =4.8 Notes 1. I example 1.. to group the data we use a set of itervals, called class itervals.. The width (w) is the distace from the lower or upper limit of oe class iterval to the same limit of the ext class iterval. 3. Let we deote the lower limit ad upper limit of the class iterval by L ad U, that is the first class is L1-U1, the secod class is L-U 4. To fid the class itervals we use the followig relatioship +w L1 U1 +w L U +w L3 ad so o U3 +w 9 6. Cumulative frequecy: is the umber of values obtaied i the class iterval or before, which fid by addig successfully the frequecies. 7. Cumulative relative frequecy: is the proportio of values obtaied i the class iterval or before, which fid by addig successfully the relative frequecies. 8. The Grouped frequecy distributio for Example 1.. is Class Iterval Frequecy Relative frequecy Cumulative frequecy Cumulative relative frequecy Total = True classes ad displayig grouped frequecy distributios ( To Fid the true class itervals we have two ways: 1) Subtract from the lower limit ad add to the upper limit oe- half of the smallest uit. ) Decrease the last decimal place of the lower limit by 1 ad put 5 after it, ad for the upper limit we simply put 5 after the limit. 10 5

True class True class 13 13.9 14 14.9 1.95 13.95 14.95 15 To illustrate this let us fid the true classes of example 1.. s.u=0.1 Class Iterval True class iterval Mid poits Frequecy 13.0-13.9 1.95 - <13.

6 True class True class To illustrate this let us fid the true classes of example 1.. s.u=0.1 Class Iterval True class iterval Mid poits Frequecy < < < < < < Total =45 Notes: - Each upper limit of the true class iterval eds with the same lower limit of the previous true class itervals - The lower ad upper limit of the true class iterval must always ed i 5, ad they must always have oe more decimal place tha class limit. - The mid poit =(upper limit + lower limit)/. - To fid the midpoit of the iterval we simply add the width to the previous midpoit Displayig grouped frequecy distributios Grouped frequecy distributios ca be displayed by Histogram Polygo curves For frequecy or relative frequecy distributios Histogram

7 Frequecy Polygo Hemoglobi Level Exercises: 1.R.1 (a-c-d-e), 1.R. (a-c-d-e), 1.R.5 pg 5 13 Exammple 1.4: I the study, the blood glucose level (i mg/100 ml) was measured for a sample from all apparetly healthy adult males. a) Idetify variable ad the populatio i the study. b) From the table, fid Class iterval (glucose level ) Frequecy Relative frequecy Cumulative frequecy Total ) w= ) = 3) The umber of healthy males with glucose level mg/100 ml 4) The percetage of healthy males with glucose level less tha mg/100 ml 5) The umber of healthy males with glucose level less tha 99 mg/100 ml 6) The umber of healthy males with glucose level greater tha

8 Chapter : Basic Summary Statistics.1: Itroductio This chapter cocers maily about describig the middle of the observatios ad how spread out they are. Measures of cetral tedecy Measures of dispersio Measures which are i some sese idicate where the middle or cetre of the data is. (e.g.mea, media ad mode) Measures which idicate how spread out the observatio from each other. (e.g. Rage, variace, stadard deviatio ad coefficiet of variatio) 15 Populatio The populatio values of the variable of iterest: X1, X,, XN (usually they are ukow). N=The populatio size Sample the sample values of the variable : x1, x,, x = the sample size. Ay measure obtaied from the populatio values of the variable of iterest is called a parameter Ay measure obtaied from the sample values of the variable of iterest is called a statistics 16 8

9 .: Measures of cetral tedecy We use the term cetral tedecy to refer to the atural fact that the values of the variable ofte ted to be more cocetrated about the cetre of the data. We will cosider three such measures: the mea, the media ad the mode. Mea: (or average) ukow Populatio mea: let X1,X,, XN be the populatio values of the variable (usually ukow), the the populatio mea is X X X X 1... N i Parameter Sample mea :let x1, x,, x be the sample values of the variable, the the sample mea is x x 1 x... x i Kow from the x estimator of a sample populatio The sample mea is a estimator of a populatio mea. mea Statistic Questio: which oe is a parameter ad which oe is a statistic? N N Example.1: Cosider a populatio cosistig of the 5 urses who work i a particular cliic, ad we are iterested i the age of these urses i years X1=30, X=, X3=35, X4=7, X5=41 The average urse populatio is years Media (or med) The media is the middle value of the ordered observatio To fid the media of a sample of observatio, we first order the data, the 1) If is odd, the middle observatio is the order (+1)/. ) If is eve, the middle two observatios are the / ad the ext observatio, the media is the average of them. Example..1: Fid the media of the followig samples a) 9, 30, 3, 31, 8, 9, 30, 4, 40, 40, 40. First we order the data 8, 9, 9, 30, 30, 31, 3, 40, 40, 40, 4 = 11, odd, the order of the media is (+1)/=(11+1)/=6 th 8, 9, 9, 30, 30, 31, 3, 40, 40, 40, 4 med=31 (uit) 6 th b) 1.5, 3.0, 18.5, 4.0, 1.0, 4.5, 6.0, 9.5, 10.5, 15.0, 11.0, 11.5 =1, eve, /=6 th, hece we take the average of the 6 th ad the 7 th value The ordered sample is 1.5, 3.0, 4.5, 6.0, 9.5, 10.5, 11, 11.5, 1.0, 15.0, 18.5, 4.0 med=( )/=10.75 (uit) 6 th 7 th 18 9

10 Mode (or modal) The mode of set of values is that value which occurs with highest frequecy. Ay data must has oe of the three cases No mode: example: Data(1): 1, 15,,19, 14, 18 Data(): 3, 3, 5,5, 4, 4, 6, 6 Oe mode, example :Data (1): 3, 15, 3, 17,, 3, 19, 0,,. The mode= (uit) Data(): 13.5, 1, 13.5, 15, 15, 14.6, 17, 1, 15 The mode=15 (uit) More tha oe mode: example 18, 0, 19, 19, 1, 17, 0 modes: 19, 0 (uit) Notes: Mea ad media ca oly be foud for quatitative variables, the mode ca be foud for quatitative ad qualitative variables. There is oly oe mea ad oe media for ay data set. The mea ca be distorted by extreme values so much. measures that ot affected so much by extreme values are the media ad the mode. Aimated example o the web: 19 Example.. The followig table shows the computer results of the coutry of maufacturig of 50 coditioer devices Valid Total America Europea Japaese Total Frequecy Percet Valid Percet Cumulative Percet From this table: A)What is the variable, what is the type of the variable? B) The umber of Japaese-made devices is (a) (b) 16 (c) 6 (d) 50 (e)11 C) The percetage of America-made devices is (a) % (b) 31% (c) 6% (d) 50% (e)100% D) The mode of coutry-made devices is (a) Japaese (b) Europea (c) America(d) America ad Japaese (e)no mode E) If we wat to represet this type of data, we will use (a) Histogram (b) Lie chart (c) Pie chart (d) Bar chart (e) we ca't 10

11 Example..3 A sample of 80 families have bee asked about the umber of times to travel abroad. The computer results of the SPSS are give below Frequecy Percet Valid Percet Cumulative Percet Valid Total Total 80 From above table: A) The variable is (a) Number of families (b) Number of times to Travel abroad (c) Noe of these B) The type of the variable is (a) Quatitative Discrete (b) Qualitative (c) Quatitative Cotiuous (d) Normal (e) Biomial (f) Noe of these C) Number of families that travelled abroad 7 times is (a) 7.5 (b) 0 (c) 6 (d) 1 (e) 53.8 (f)10 D) The percetage of families that travelled abroad less tha or equal to 10 times is (a) 73.8% (b) 0 (c)100% (d) 88.8% (e) 5% (f) 95% E) The mode of travel times is (a) 13.8 (b) 0 (c) 6 (d) 11 (e) 80 (f)13 11

12 Frequacy Example..4 By usig the computer results of SPSS the plot of the umber of courses i Eglish that studet takes i a year is obtaied: ) The type of the graph is: a) Bar chart (b) polygo (c) histogram (d) lie (e) curve )The Variable is: a) Number of studets (b) umber of courses (c) Eglish (d) Arabic 3) The total umber of studets who study i Eglish is: a) 0 (b) 5 (c) 1 (d) 6 (e) 3 4) The umber of studets who study two courses i Eglish is: a) 0 (b) (c) 7 (d) 8 (e) 5 5)The umber of studets who study at least two courses i Eglish is: a)7 (b) 8 (c) 15 (d) 18 (e) 5 6)The percet of studets who study at most oe course i Eglish is: a)7% (b) 18% (c) 8% (d) 60% (e) 7% 7) The sample mea is a).55 (b) 55 (c) 3 (d) (e) 40 8)The sample mode is a)0 (b) 3 (c) 10 (d) (e) o mode 1

13 .3: Measure of dispersio The variatio or dispersio i a set of observatios refers to how spread out the observatios are from each other. -Whe the variatio is small, this meas that the observatios are close to each other (but ot the same). - Ca you metio a case whe there is o variatio? Larger variatio Same mea Smaller variatio Smaller variatio Larger variatio We will cosider four measures of dispersio: the rage, the variace, the stadard deviatio ad the coefficiet of variatio. 5 Rage (R): Is the differece betwee the largest ad smallest values i the set of values Example.3 (q.6- pg 35): Below are the birth weights (i kg) for a sample of babies bor i Saudi Arabia: 1.69, 1.79, 3.3, 3.6,.71,.4,.59, 1.05, 3.19, 3.40, 3.3, 3.37, 3.6, Fid the mea, mod ad media. - R= =.58. Note: The rage is easy to calculate but it is ot useful as a measure of variatio sice it oly takes ito accout two of the values. Variace: Is a measure which uses the mea as poit of referece. Populatio variace: let X1,X,, XN be the populatio values of the variable (usually ukow), the the populatio variace is populatio mea. N ( X i i ) 1 N where µ is the Sample Variace :let x1, x,, x be the sample values of the variable, the the sample variace is ( x x) i1 i s where x is the sample mea

14 Notes: The variace is less whe all the values are close to the mea, while it is more whe all the values are spread out of the mea. xi x x x x1 ( x x) ( x i x) ( x x ) ( x x 1 ) 0 The variace is always a oegative value (, s 0). Populatio variace is usually ukow (parameter), hece it is estimated by the sample variace s (statistic). ( ) 1 A simpler formula to use for calculatig sample variace is x x i i s 1 The variace is expressed i squared uit. 7 Stadard deviatio (std. dev.): The stadard deviatio is defied to be the root of the variace. Populatio stadard deviatio Sample stadard deviatio N ( ) X i ( i ) 1 1 x x i i s s 1 N 8 14

15 Coefficiet of variatio (CV): - The variace ad stadard deviatio are useful as measures of variatio of the values of sigle variable for a sigle populatio. - If we wat to compare the variatio i two data set the variace ad stadard deviatios may give misleadig results because: - The two variable may have differet uits as kilogram ad cetimeters which caot be compared. - Although the same uits are used, the mea of the two may be quit differet i size. - The coefficiet of variatio (CV) is used to compare the relative variatio i two data set ad it dose ot deped o either the uit or how large the values are, the formula of CV is give by s CV 100(%) x - Suppose we have two data set as the followig ad we wat to compare the variatio mea Std.dev. CV Set 1 x 1 s 1 Set x s s CV x CV x s 100(%) 100(%) 9 The we say that the variability i the first data set is larger tha the variability i the secod data set if CV 1 > CV (ad vice versa). Example.5 Suppose two sets of samples of huma males of differet ages give the followig results weight set1: o males aged 9: x 1 =66kg s 1 =4.5kg CV 1 =(4.5/66) 100%=6.8% set: o males aged 10: x =36kg s 1 =4.5kg CV =(4.5/36) 100%=1.5% Sice CV > CV 1, the variability i the weight of the d set (10-years old) is greater tha the variability i the 1 st data set (9-years old). Examples: pg A site that explais the cocepts i Arabic A site that explais how to use SPSS for descriptive statistics /spss/descript1.htm 30 15

16 frequecy Example.4.1 For a sample of patiets, we obtai the followig graph for approximated hours spet without pai after a certai surgery Hours 1) The type of the graph is: a) Bar chart (b) polygo (c) histogram (d) lie (e) curve ) The umber of patiets stayed the logest time without pai is: a) 10 (b) 15 (c) 6 (d) 5 (e) 80 3)The percet of patiets spet 3.5 hours or more without pai is: a)37.5% (b) 68.75% (c) 18.75% (d) 50% (e) 5% 4)The lowest umber of hours spet without pai is: a)10 (b) 1 (c) 0.5 (d) 5 (e) 5 (f) 6.5 5)What the approximate value of the sample mea a).55 (b) 55 (c) 3 (d) (e) 40 (f) we ca't fid it 6)The sample mode equals a)80 (b) 3 (c) 15 (d),4 (e) 6 (f) we ca't fid it 16

17 The SPSS computer results of the age of patiets i oe of the Riyadh hospitals are give below Fid : a) Variable ame a) The type of the variable b) The mode c) The mea age of the patiets d) The media age of the patiets e) The variace f) Sample size g) The coefficiet of variatio 33.4: Calculatig measures from a ugrouped frequecy tables: Suppose we have the followig frequecy table, where m i is the i th value i the Value (or midpoit) m 1 m m k frequecy f1 f f k f i = ugrouped frequecy table or the midpoit i the grouped frequecy table, ad f i is the i th frequecy. The formulas for sample mea ad variace will be modified as follows: = f i (the sample size= the sum of frequecies) k=umber of distict values (or umber of itervals) k x i i m i i f 1 1 i, k i1 x i i1 mi fi x i 1 x i k m i i 1 x f i x i 1 i ( x) s 1 k i 1 i ( ) m fi x s 1 For usig calculator to fid the mea, variace ad stadard deviatio, you ca visit the site

18 Notes: Whe data are grouped we caot determie from the frequecy distributio what the actual data values are but oly how may of them are i the class iterval. We ca t fid the actual values for the sample mea ad sample variace but we ca fid approximatio of them. For grouped data we assume that all values i particular class iterval are located at the midpoit of the iterval (m i ) because the mid poit is best represetative for whole iterval 35 Example.6: Suppose that i a study o drug cosumptio by pregat Saudi wome, the umber of differet drugs takig durig pregacy was determied for a sample of Saudi wome who took at least oe medicatio obtaiig: Value m i Fid the measure of cetral tedecy ad dispersio. Solutio: =30 x - =83/30= drugs - To fid the media: sice =30 is eve, the order of the two middle values is /=15 th ad 16 th, from the cumulative frequecy the 16 th ad 15 th ordered observatio is, ad hece - Med=(+)/= drugs Frequecy f Cumulative frequecy m i f m i f Total =

- The mode is sice it has the highest frequecy. The variace - The rage : R=7-1=6 ( ) 95 (30)(.7666) mi fi x s.5 1 9 - The stadard deviatio s=.5 =1.5 - The coefficiet of variatio =(1.5/.8) 100=53.

19 - The mode is sice it has the highest frequecy. The variace - The rage : R=7-1=6 ( ) 95 (30)(.7666) mi fi x s The stadard deviatio s=.5 =1.5 - The coefficiet of variatio =(1.5/.8) 100=53.6 % ============================================================= Example.7: The followig are the ages of a sample of 100 wome havig childre who were admitted to a particular hospital i Madiah i particular moth. Class Iterval Mid poits Frequecy Total =100 Fid the mea, the variace, ad the coefficiet of variatio. Note: we did t put ay uit here sice the variable is discrete, the word (drug) is just a idicator of what we are coutig 37 Chapter 3: Some Basic Probability Cocepts 3.1 Geeral view of probability Probability: The probability of some evet is the likelihood (chace) that this evet will occur. A experimet: Is a descriptio of some procedure that we do. The uiversal set (Ω): Is the set of all possible outcomes, A evet: Is a set of outcomes i Ω which all have some specified characteristic. Notes: 1. Ω (the uiversal set) is called sure evet. (the empty set) is called impossible evet 38 19

20 Example (3.1) Cosider a set of 6 balls umbered 1,, 3, 4, 5, ad 6. If we put the sex balls ito a bag ad without lookig at the balls, we choose oe ball from the bag, the this is a experimet which is has 6 outcomes. Ω ={1,, 3, 4, 5,6 } Cosider the followig evets E 1 =the evet that a eve umber occurs={, 4, 6}. E =the evet of gettig umber greater tha ={3,4, 5, 6}. E 3 =the evet that a odd umber occurs={1, 3, 5}. E 4 =the evet that a egative umber occurs={}=. 39 Equally likely outcomes: The outcomes of a experimet are equally likely if they have the same chace of occurrece. Probability of equally likely evets cosider a experimet which has N equally likely outcomes, ad let the umbers of outcomes i a evet E give by (E), the the probability of E is give by ( E) P( E) ( ) ( E) N Notes 1. For ay evet A, 0 P(A) 1 (why?) That is, probability is always betwee 0 ad 1.. P(Ω)=1, ad P( )=0 (why?) 1 meas the evet is a certaity, 0 meas the evet is impossible 40 0

21 Example (3.) I the ball experimet we have (Ω)=6, (E 1 )=3, (E )=4, (E )=3 P(E 1 )=3/6=0.5 P(E )=4/6=0.667 P(E 3 )=3/6=0.5 P(E 4 )=0 Repaper that E 1 =the evet that a eve umber occurs={, 4, 6}. E =the evet of gettig umber greater tha ={3,4, 5, 6}. E 3 =the evet that a odd umber occurs={1, 3, 5}. E 4 =the evet that a egative umber occurs={}=. 41 Relatioships betwee evets Uio : A B, cosists of all those outcomes i A or i B or i both A ad B B A A B Itersectio : A B, cosists of all those outcomes i both A ad B B A A B Complemet : A c (or A`) Cosists of all outcomes that are i Ω but ot i A A A c 4 1

22 Notes: 1- (A B)= (A)+(B)-(A B) ad hece P(A B)= P(A)+P(B)-P(A B) Ω B A. (A c )=(Ω)- (A) So that P(A c )=1- P(A) Ω A A c Sets (evets) ca be represeted by Ve Diagram Ω B A A c B AB AB c A c B c 43 Ω 44

23 Disjoit evets Two evets A ad B are said to be disjoit (mutually exclusive) if A B=. - I the case of disjoit evets P(A B)=0 P(A B)= P(A)+P(B) Ω B A 45 Example 3.3 From a populatio of 80 babies i a certai hospital i the last moth, let the eve B= is a boy, ad O= is over weight we have the followig icomplete Ve diagram. - It is a boy P(B) =(3+39)/80= It is a boy ad overweight P(B O)= 3/80= It is a boy or it is overweight P(B U O)= (39+3+7)/80=0.615 B O 46 3

24 Coditioal probability: the coditioal probability of A give B is equal to the probability of A B divided by the probability of B, providig the probability of B is ot zero. That is P(A B)=P(A B )/ P(B), P(B) 0 Notes: 1. P(A B) is the probability of the evet A if we kow that the evet B has occurred. P(B A)=P(A B )/ P(A), P(A) Example Referrig to example 3.3 what is the probability that - He is a boy kowig that he is over weight? P(B O)= P(B O )/ P(O)= (3/80) / (10/ 80) =3/10=0.3 - If we kow that she is a girl, what is the probability that she is ot overweight? P(O c B c )= P(B c O c ) / P(B c ) = (31/80) / [(7+31)/80] = 31/38= Idepedet evets -Two evets A ad B are said to be idepedet if the occurrece of oe of them has o effect o the occurrece of the other. Multiplicatio rule for idepedet evets -If A ad B are idepedet the 1-P(A B)=P(A) P(B) -P(A B)= P(A ) (Why?) 3- P(B A)= P(B ) (Why?) 48 4

25 Example 3.4 I a populatio of people with a certai disease, let M= Me ad S= suffer from swolle leg We have the followig icomplete Ve diagram If we radomly choose oe perso M Complete the Ve diagram Fid the probability that this perso Is a ma ad suffer from swolle leg? P(M S)= Is a wome? P(M c )= 3- Is a wome that does ot suffer from swolle leg? P(M c S c )= Does ot sufferig from swolle leg? P(S c )= = = 0.41 (or P(M c )=1-P(M)= 1-( )=0.41 ) S 49 Margial prbability: Defiitio: Give some variable that ca be broke dow ito m categories desigated by A 1, A,,A m ad aother joitly accurace variable that is broke dow ito categories desigated by B 1, B,,,B, the margial probability of A i, called P(A i ), is equal to the sum of the joit probabilities of A i with all categories of B. That is P(A i )= P(A i B j ), for all values of j. This will be clear i the followig example Example 3.5: The followig table shows 1000 ursig school applicats classified accordig to scores made o a college etrace examiatio ad the quality of the high school form which they graduated, as rated by the group of educators. 50 5

26 Score Score Poor (p) Quality of high school Average (A) Superior (S) Low (L) Medium (M) High (H) total total - Q1-How may margial probabilities ca be calculated from these data? State each probability otatio ad do calculatios. - 6 margial probabilities, P(L), P(M), P(H), P(p), P(A), P(S). - Q-Calculate the probability that a applicat picked at radom from this group: 1-Made a low score o the examiatio P(L)= 0/1000=0. - Graduated from superior high school. P(S)= 500/1000= Made a low score o the examiatio give that he or she graduated from Superior high school P(L S)= P(L S) / P(S) = (55/1000) / (500 /1000)= 55/500 = Made a high score or graduated from a superior high school. P( H S) = P(H) +P(S) P(H S)= ( ) / 1000 = Calculate the followig probabilities 1. P(A) = 300/1000=0.3. P(S) = 500/ 1000= P(M) = 390/1000= P(M P) = 70/ 1000= P(A L)= ( )/1000 = P(P S) = 0 7.P(L H) =(0+ 390)/ 1000 = P(H/S) = 300/ 500= 0.6 Quality of high school P A S total L M H total

27 Chapter 4: Probability Distributio 4.1 Probability Distributio of Discrete Radom Variables - Radom variable: is a variable that measured o populatio where each elemet must have a equal chace of beig selected. - let X be a discrete radom variable, ad suppose we are able to cout the umber of populatio where X=x, the the value of x together with the probability P(X=x) are called probability distributio of the discrete radom variable X. Example 4.1 Suppose we measure the umber of complete days that a patiet speds i the hospital after a particular type of operatio i Dammam hospital i oe year, obtaiig the followig results. 53 Number of days, x The probability of the evet { X=x } is the relative frequecy P(X=x)= ( X x) ( X x) ( S) That is: P(X=1)=5/50=0.1 P(X=)=/50=0.44 P(X=3)=15/50=0.3 P(X=4)=8/50= What is the value of P(X=x)? Frequecy N 50 N 54 7

28 The probability distributio must satisfy the coditios ` 1- - Number of days, x P(X=x) Sum 1 0 P( X x) 1 P( X x) 1 The first coditio must be satisfied sice P(X=x) is a probability, ad the secod coditio must be satisfied sice the evets {X=x} are mutually exclusive ad there uio is the sample space. 55 -Populatio mea for a discrete radom variable: If we kow the distributio fuctio P(X=x) for each possible value x of a discrete radom variable, the the populatio mea (or the expected value of the radom variable X ) is x P( X x) Example: The expected umber of complete days that a patiet speds i the hospital after a particular type of operatio i Dammam hospital i oe year (example 3.1) is x P( X x) =1(0.1)+(0.44)+3(0.3)+4(0.16)=.5 -Cumulative distributios : the cumulative distributio or the cumulative probability distributio of a radom variable is P( X x) It is obtaied i a way similar to fidig the cumulative relative frequecy distributio for samples. -referrig to example 3.1 P(X 1)=0.1 P(X )=P(X=1)+P(X=)= =

29 P(X 3)=P(X=1)+P(X=)+P(X=3)= =0.84 P(X 4)=P(X=1)+P(X=)+P(X=3)+ P(X=4)= =1 The cumulative probability distributio ca be displayed i the followig table Number of days x P(X=x) -From the table fid: 1-P(X<3)=P(X )=0.54 -P( X 4)=P(X=4)+P(X=3)+P(X=)=0.9 Or P( X 4)=P(X 4)-P(X<)=1-0.1=0.9 3-P(X>)= P(X=3)+P(X=4)=0.46 Or P(X>)= 1-P(X )=1-0.54=0.46 P(X x) Sum 1 57 I geeral we ca use the followig rules for iteger umber a ad b 1- P(X a) is a cumulative distributio probability -P(X < a)=p(x a-1) 3-P(X b)=1-p(x< b)=1-p(x b-1) 4-P(X>b)= 1-P(X b) 5- P(a X b)=p(x b) P(X<a)= P(X b) P(X a-1) 6- P(a<X b) = P(X b) - P(X a) 7-P(a X<b) = P(X b-1) - P(X a-1) 8- P(a<X<b)=P(X b-1)-p(x a) 58 9

30 4. Biomial Distributio The biomial distributio is a discrete distributio that is used to model the followig experimet 1-The experimet has a fiite umber of trials. - Each sigle trial has oly two possible (mutually exclusive )outcomes of iterest such as recovers or does t recover; lives or dies; eeds a operatio or does't eed a operatio. We will call havig certai characteristic success ad ot havig this characteristic failure. 3- The probability of a success is a costat π for each trial. The probability of a failure is 1- π. 4- The trials are idepedet; that is the outcome of oe trial has o effect o the outcome of ay other trial. The the discrete radom variable X=the umber of successes i trials has a Biomial(,π) distributio for which the probability distributio fuctio is give by P(X=x)= x ( 1 ) x 0 x x=0,1,,, otherwise 59 Where! Note x x! ( x)! If the discrete radom variable X has a biomial distributio, we write X ~ Bi(,π) The mea ad variace for the biomial distributio: - The mea for a Biomial(, π) radom variable is μ=σx P(X=x)= π The variace σ = π (1- π) Example 4. Suppose that the probability that Saudi ma has a high blood pressure is If we radomly select 6 Saudi me. a- Fid the probability distributio fuctio for the umber of me out of 6 with high blood pressure. b- Fid the probability that there are 4 me with high blood pressure? c-fid the probability that all the 6 me have high blood pressure? d-fid the probability that oe of the 6 me have high blood pressure? e- what is the probability that more tha two me will have high blood pressure? f-fid the expected umber of high blood pressure

31 Solutio: Let X= the umber of me out of 6 with high blood pressure. The X has a biomial distributio ( why?). Success= The ma has a high blood pressure Failure= The ma does t have a high blood pressure Probability of success= π=0.15 ad hece Probability of failure= Number of trials= =6 =6, π=0.15, 1- π=0.85 The X has a Biomial distributio, X~ Bi (6,0.15) a - the probability distributio fuctio is 6 P( X x) 0.15 x x 0,1,..., b- the probability that 4 me will have high blood pressure 6 4 P(X=4)= 0.15 (0.85) =(15)(0.15) 4 (0.85) = C- the probability that all the 6 me have high blood pressure (0.85) P(X=6)= π=0.85 x (0.85) 6x 61 d-the probability that oe of 6 me have high blood pressure is P(X=0)= 0.15 (0.85) e- the probability that more tha two me will have high blood pressure is P(X>)=1-P(X )=1-[P(X=0)+P(X=1)+P(X=)] =1-[ (0.85) (0.85) ] 1 =1-[ ] F- the expected umber of high blood pressure is 6(0.15) 0. 9 ad the variace is (1 ) 6(0.15)(0.85)

32 4.3 The Poisso Distributio The Poisso distributio is a discrete distributio that is used to model the radom variable X that represets the umber of occurreces of some radom evet i the iterval of time or space. The probability that X will occur ( the probability distributio fuctio ) is give by: e P( X x) x! 0 x, x 01,,,... otherwise λ is the average umber of occurreces of the radom variable i the iterval. The mea μ=λ The variace σ = λ If X has a Poisso distributio we write X~ Poisso (λ) Examples of Poisso distributio: - The umber of patiets i a waitig room i a hour. - The umber of serious ijuries الخطيرة) (االصابات i a particular factory i a year. - The umber of times a three year old child has a ear ifectio األذن) (عدوى i a year. 63 Example 4.3: Suppose we are iterested i the umber of sake bite ( األفعى (لدغة cases see i a particular Riyadh hospital i a year. Assume that the average umber of sake bite cases at the hospital i a year is What is the probability that i a radomly chose year, the umber of sake bites cases will be 7? - What is the probability that the umber of cases will be less tha i 6 moths? 3-What is the probability that the umber of cases will be 13 i year? 4- What is Expected umber of sake bites i a year? What is the variace of sake bites i a year? Solutio: X= umber of sake bite cases see at this hospital i a year. Ad the mea is 6 The X~ Poisso (6) First ote the followig The average umber of sake bite cases at the hospital i a year =λ =6 X~ Poisso (6) The average umber of sake bite cases at the hospital i 6 moths = = the average umber of sake bite cases at the hospital i (1/) year =(1/)λ =3 Y~ Poisso (3) The average umber of sake bite cases at the hospital i years = λ =1 V~ Poisso (1) 64 3

33 1- The probability that the umber of sake bites will be 7 i a year 6 x e 6 λ=6 P(X x), x 01,,,... x! 6 e 6 P(X 7 ) 7! The probability that the umber of cases will be less tha i 6 moths 3 y e 3 P( Y y), y 0,1,... λ*=3 y! P( Y ) P( Y 0) P( Y 1) e 3 e ! 1! 3- The probability that the umber of cases will be 13 i years 1 v e 1 P(V v) v! 1 13 λ ** =1 e 1 P(V 13) ! Remember If X~ Poisso (λ) x e P( X x) x! x 0,1,, the expected umber of sake bites i a year: 6 the variace of sake bites i a year: 6 λ= Probability Distributio of Cotiuous Radom Variable If X is a cotiuous radom variable, the there exist a fuctio f(x) called probability desity fuctio that has the followig properties: 1- The area uder the probability curve f(x) =1 f(x) area= f ( x) dx 1 x 66 33

34 - Probability of iterval evets are give by areas uder the probability curve f(x) f(x) f(x) a b x a x a x b P(a X b)= a f ( x) dx P(X a)= f ( x) dx 3- P(X=a)=0 (why?) 4-P(X a)=p(x>a) ad P(X a)=p(x<a) 7- P(X a)= P(X<a) is the cumulative probability 5- P(X a)= 1- P(X a) 6-P(a<X<b)=P(X<b)-P(X<a) a P(X<b) P(X<a) P(X a)= f ( x) dx a f(x) a b x The Normal Distributio: The ormal distributio is oe of the most importat cotiuous distributio i statistics. It has the followig characteristics 1- X takes values from - to. - The populatio mea is μ ad the populatio variace is σ, ad we write X~ N(μ, σ ). 3- The graph of the desity of a ormal distributio has a bell shaped curve, that is symmetric about μ f(x) - μ x 68 34

35 4- μ= mea=mode=media of the ormal distributio. 5-The locatio of the distributio depeds o μ (locatio parameter). The shape of the distributio depeds o σ (shape parameter). σ σ 1 - μ 1 μ - μ μ 1 < μ σ1> σ 69 Stadard ormal distributio: The stadard ormal distributio is a ormal distributio with mea µ=0 ad variace σ =1. σ =1-0 Result If X~ N(μ, σ ) the Z= X ~ N(0, 1). Notes - The probability A= P(Z z ) is the area to the left of z uder the stadard ormal curve. -There is a Table gives values of P(Z z ) for differet values of z

36 Calculatig probabilities from Normal (0,1) P(Z z ) From the table ( the area uder the curve to the left of z ) P(Z z ) σ =1-0 z z P(Z z ) =1- P(Z z ) From the table ( the area uder the curve to the right of z ) z - 0 σ =1 P(Z z ) P( z 1 Z z ) = P(Z z ) - P(Z z 1 ) From the table ( the area uder the curve betwee z 1 ad z ) P(z 1 Z z ) z 1-0 σ =1 z 71 Notes: P(Z 0 ) = P(Z 0 ) =0.5 (why?) P(Z =z )=0 for ay z. P(Z z )= P(Z < z ) ad P(Z z )= P(Z > z ) If z the P(Z z ) =0, ad if z 3.49 the P(Z z )=1. Example 4.1 : - P(Z 1.5 ) = P( Z.4)= P(Z.4 ) -P(Z < 1.33) = = = P(Z 0.98 )=1- P(Z 0.98 )= = Z : : 7 36

37 Example 4. : Suppose that the hemoglobi level for healthy adult males are approximately ormally distributed with mea 16 ad variace of Fid the probability that a radomly chose healthy adult male has hemoglobi level a) Less tha 14. b) Greater tha 15. C) Betwee 13 ad 15 Solutio Let X= the hemoglobi level for healthy adult male, the X~ N(μ=16, σ =0.81). a) Sice μ=16, σ =0.81, we have σ= P(X<14)= P(Z< 14 )= P(Z< 1416 )= P(Z< -. )= b) P(X>15)= P(Z > 15 )= P(Z> 1516 )= P(Z> )= 1- P(Z -1.11)= = = c) P(13<X<15)= P( 13 <Z< 15 )= P(Z< 1516 )- P(Z< 1316 ) = P(Z -1.11) P(Z -3.33) = = d) P(X=13)=0 73 Result(1) Let X 1, X,,X be a radom sample of size from N ( µ, σ ), the 1) ) x i i X 1 ~ N ( µ, σ /) x Z ~ N ( 0, 1). Cetral Limit Theorem Let X 1, X,,X be a radom sample of size from ay distributio with mea µ ad variace σ, ad if is large ( 30), the x Z / N ( 0,1). ( that is, Z has approximately stadard ormal distributio) 74 37

38 Result () If σ is ukow i the cetral limit theorem, the s ( the sample stadard deviatio ) ca be used istead of σ, that is x Z s / N ( 0,1). Where s i1 i x ( x) 1 75 Chapter 5: Statistical Iferece 5.1 Itroductio: There are two mai purposes i statistics -Orgaizig ad summarizig data (descriptive statistics). -Aswer research questios about populatio parameter (statistical iferece). There are two geeral areas of statistical iferece: Hypothesis testig: aswerig questios about populatio parameters. Estimatio: approximatig the actual values of populatio parameters. there are two kids of estimatio: opoit estimatio. oiterval estimatio ( cofidece iterval)

39 Here we will cosider two types of populatio parameters Populatio mea: µ ( for quatitative variable) µ=the average ( mea ) value for some qualitative variable. Populatio proportio o.of elemet i thepopulatiowith some charachtaristic Totalo.of elemet i thepopulatio Examples: -The mea life spa for some bacteria - The icome mea for some bacteria - The icome mea of govermet employee i Saudi Arabia. Examples: -The proportio of Saudi people who have some disease - The proportio of smokers i Riyadh. - The proportio of Childre i Saudi Arabia : Estimatio of Populatio Mea: μ 1) Poit Estimatio: A poit estimate is a sigle umber used to estimate the correspodig populatio parameter. x is a poit estimate of μ That is, the sample mea is a poit estimate of the populatio mea ) Iterval Estimatio (Cofidece Iterval:C.I) of μ Defiitio: (1-α)100% Cofidece Iterval: (1-α)100% Cofidece Iterval is a iterval of umbers (L,U), defied by lower L ad upper U limits that cotais the populatio parameter with probability (1-α)

40 1-α: the cofidece coefficiet. L: Lower limit of the cofidece iterval. U : upper limit of the cofidece iterval. A (1-α)100% CI for μ is If the distributio is ormal If the distributio is ot ormal If σ is kow If σ is ukow is large (>30) x z 1 is large (>30) If σ is kow If σ is ukow x z 1 s x z 1 x z 1 s 79 Note: The C.I x z 1 meas (L, U)= ( x z, z ) 1 x 1 s s - Similarly for s, (L, U)= ( x z, x z.) x z Iterpretatio of the CI: We are (1-α)100% cofidet that the (mea) of (variable) for the (populatio) is betwee L ad U. μ Example 5.1: Let Z~N(0, 1) Z??? 1 Here we have the probability ( the area) ad we wat to fid the exact value of z. hece we ca use the table of stadard ormal but i the opposite directio. a) α=0.05 α/= α/=

41 From the stadard ormal table b) α=0.1 α/= α/=0.95 Z Z Z : : : (الحماض الكيتوني السكري) Example 5.: O 13 patiet of diabetic ketoacidosis patiet i Saudi Arabia, the mea blood glucose level was 6. with a stadard deviatio of 3.3 mm0l/l. Fid the 90% cofidece iterval for the mea blood glucose level of such diabetic ketoacidosis patiet. Solutio: Variable: blood glucose level (i mmol/l) Populatio: Diabetic ketoacodosis patiet i Saudi Arabia. Parameter: μ (the average blood glucose level) =13, x 6. s=3.3 - σ ukow, =13>30 (large) the 90% CI for μ is give by x z 1 s 81 90% = (1- α)100% 1- α=0.9 α=0.1 α/= α/=0.95 Z s The 90% CI for μ is x z 1 Which is ca be writte as ( x z s, x z ( 6. (1645), 6. (1645) 13 ( 5.71, 6.69) s ) 3.3 ) 13 Iterpretatio: We are 90 % cofidet that the mea blood glucose level of diabetic ketoacidosis patiet i Saudi Arabia is betwee 5.71 ad

42 Exercises Q1: Suppose that we are iterested i makig some statistical ifereces about the mea µ of ormal populatio with stadard deviatio 0.. Suppose that a radom sample of size = 49 from this populatio gave the sample mea4.5 The distributio of is (a) N(0,1) (b) t(48) (c)n(µ,(0.0857) ) (d)n(µ,.0) A good poit estimate for µ is (a) 4.5 (b) (c).5 (d) 7 (e) 1.15 Assumptios is (a) Normal, σ kow (b) Normal, σ ukow (c)ot Normal, σ kow (d) ot Normal, σ ukow (4)A 95% cofidet iterval for µ is (a) (3.44,5.56) (b) (3.34,5.66) (c) (4.444, 4.556) (d) (3.94,5.05) (e) (3.04,5.96) 83 Q:A electroics compay wated to estimate i mothly operatig expeses riyals (µ). Assume that the populatio variace equals Suppose that a radom sample of size 49 is take ad foud that the sample mea equals Fid Poit estimate for µ The distributio of the sample mea is The assumptios? A 90% cofidet iterval for µ. Q3:The radom variable X, represetig the lifespa of a certai light bulb is distributed ormally with mea of 400 hours,ad stadard deviatio of 10 hours. -What is the probability that a particular light bulb will last for more tha 380 hours? -What is the probability that a particular light bulb will last for exactly 399 hours? -What is the probability that a particular light bulb will last for betwee 380 ad 40 hours? The mea is.. The variace is.. The stadard deviatio 84 4

43 Q4: The tesile of a certai type of thread is approximately ormally distributed with stadard deviatio of 6.8 Kg. A sample of 0 pieces of the thread has a average stregth of 7.8 Kg. The A poit estimate of the populatio mea of tesile stregth µ is (a)7.8 (b) 0 (c) 6.8 (d) 46.4 (e) oe of these A 98% Cofidet iterval for mea of tesile stregth µ,the lower boud equal to : (a)68.45 (b) 69.6 (c) (d) (e) oe of these A 98% Cofidet iterval for mea of tesile stregth µ,the upper boud equal to : (a)74.16 (b) (c) 75.9 (d) (e) oe of these : Estimatio of Populatio Proportio π Recall that, the populatio proportio π= o.of elemet i thepopulatiowith some charachtaristic Totalo.of elemet i thepopulatio N To estimate the populatio proportio we take a sample of size from the populatio ad fid the sample proportio p o.of elemet i thesample with somecharachtristic p Totalo.of elemet i thesample Result: whe both π>5 ad (1- π)>5 the p p N (, (1 ) / ). Z N ( 0,1). ad hece (1 ) / 86 43

44 Estimatio for π 1) Poit Estimatio: A poit estimator of π ( populatio proportio) is p (sample proportio) 1) Iterval Estimatio: If p>5 ad (1-p)>5, The (1-α)100% Cofidece Iterval for π is give by p z 1 p(1 p) p(1 p) Note:1) p z ca be writte as 1 p(1 p) p(1 p) p z, p z 1 1 ) p= the umber i the sample with the characteristic (1-p)= the umber i the sample wich did ot have the characteristic. 87 Example 5. I the study o the fear (خوف) of detal care i Riyadh, % of 347 adults said they would hesitate (تردد) to take a detal appoitmet due to fear. Fid the poit estimate ad the 95% cofidece iterval for proportio of adults i Riyadh who hesitate to take detal appoitmets..solutio: Variable: whether or ot the perso would hesitate to take a detal appoitmet out of fear. Populatio: adults i Riyadh. Parameter: π, the proportio who would hesitate to take a appoitmet. = 347, p= %=0., p=(347)(0.)=76.34 >5 ad (1-p)=(437)(0.78)=70.66>5 1- poit estimatio of π is p=0. p z p(1 p) - 95% CI for π is 1 1-α=0.95 α=0.05 α/= α/=0.975 Z1 / Z

45 The 95 % CI for π is p z 1 0. (1.96) p(1 p), p z 1 0.(0.78),0. (1.96) 347 p(1 p) 0.(0.78) (1.96)(0.0379),0. (1.96)(0.0379) 0.176,0.64 Iterpretatio: we are 95% cofidet that the true proportio of adult i Riyadh who hesitate to take a detal appoitmet is betwee ad Exercises Q1: A radom sample of 00 studets from a certai school showed that 15 studets smoke. let π be the proportio of smokers i the school. Fid a poit estimate for π Fid 95% cofidece iterval for π Q. A researcher was iterested i makig some statistical ifereces about the proportio of females (π) amog the studets of a certai uiversity. A radom sample of 500 studets showed that 150 studets are female. 1. A good poit estimate forπis (A) 0.31 (B) 0.30 (C) 0.9 (D) 0.5 (E) The lower limit of a 90% cofidece iterval for πis (A) (B) (C) (D) (E) The upper limit of a 90% cofidece iterval for πis (A) (B) (C) (D) (E)

46 Q3. I a radom sample of 500 homes i a certai city, it is foud that 114 are heated by oil. Let π be the proportio of homes i this city that are heated by oil. 1. Fid a poit estimate for π.. Costruct a 98% cofidece iterval for π. Q4. I a study ivolved 100 car drivers, it was foud that 50 car drivers do ot use seat belt. A poit estimate for the proportio of car drivers who do ot use seat belt is: (A) 50 (B) (C) (D) 1150 (E) Noe of these The lower limit of a 95% cofidece iterval of the proportio of car drivers ot usig seat belt is (A) 0.03 (B) (C) (D) (E) Noe of these The upper limit of a 95% cofidece iterval of the proportio of car drivers ot usig seat belt is (A) (B) (C) (D) (E) Noe of these Q5. A study was coducted to make some ifereces about the proportio of female employees (π) i a certai hospital. A radom sample gave the followig data: Calculate a poit estimate (p) for the proportio of female employees (π). Costruct a 90% cofidece iterval for p. Sample size 50 Number of females

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for