College of Science Department of Statistics & OR

Size: px
Start display at page:

Download "College of Science Department of Statistics & OR"

Transcription

1 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Kig Saud Uiversity College of Sciece Departmet of Statistics & OR STAT 45 BIOSTATISTICS Summer Semester 43/43 Lectures' Notes Prof. Abdullah Al-Shiha Kig Saud Uiversity

2 قسم اإلحصاء وبحوث العمليات كلية العلوم جامعة الملك سعود الفصل الدراسي الثاني 44 44/ 4- احص اإلحصاء الحيوي Stat 4 : Biostatistic Offic Room No. B Tel Off..-Mob 05.- Web : Office hours Week Title W( 5 /03/445) Itroductio to Bio-Statistics, (.-.4) W( 0/04/435) types of data ad graphical represetatio, (.-.4) W3( 09/4/43 ) Descriptive statistics: Measures of Cetral tedecy- Mea, media, mode (.-.6 Excludig stem plot percetiles ) W4( 6/4/43 ) Measures of dispersio-rage, Stadard deviatio, coefficiet of Variatio. (.-.6 Excludig stem plot percetiles ) W5( 3/4/ 43 ) Calculatig Measures from a Ugrouped Frequecy Table -Approximatig Measures from Grouped Data (.-.6 Excludig stem plot percetiles ) W6( 0/4/43 ) Basic probability. Coditioal probability, cocept of idepedece, sesitivity, specificity, (3.-3.6) W7( 40/4/43 ) Bayes Theorem for predictive probabilities. (3.-3.6) W8( 4/4/43) Some discrete probability distributios: cumulative probability (4.-4.4) W9(/06/435) is vacatio W0(9/0/435) Biomial, ad Poisso -their mea ad variace (4.-4.4Excludig the use of biomial ad Poisso tables). W(06/06/435) Cotiuous probability distributios: Normal distributio-z-table ( ) W(3/06/435) Samplig with ad without replacemet, samplig distributio of oe ad two sample meas ad oe ad two proportios. ( Excludig samplig without Replacemet) W3(0/06/435) Samplig with ad without replacemet, samplig distributio of oe ad two sample meas ad oe ad two proportios. ( Excludig samplig without Replacemet) W4(7/06/435) Statistical iferece: Poit ad iterval estimatio, Type of errors, Cocept of P-value ( Excludig Variaces ot equal page 8-8) W(05/07/435) Testig hypothesis about oe ad two samples meas ad proportios icludig paired data differet cases uder ormality. ( Excludig Variaces ot equal page 8-8) W6(/07/435) Testig hypothesis about oe ad two samples meas ad proportios icludig paired data differet cases uder ormality. ( Excludig Variaces ot equal page 8-8) Text Book Biostatistics: Basic Cocepts ad Methodology for the Heath Scieces by Waye W. Daiel. [9th ed.] Books available from uiversity book store below SAMBA bak. The book costs 70 Riyals for studets.

3 للتواصل مع اعضاء هيئه التدريس رقم المكتب (B05) االسم أ.سناء عبد هللا أبونصره أ.سماح الغامدي أ.ريم ظافر المبطي أ.أمل عبد هللا المحيسن االيميل الوظيفة محاضر محاضر معيده محاضر د.سبأ علوان KSU.EDU.SA أستاذ مساعد معيده KSU.EDU.SA ا.ربى اليافي أ.تغريد المالكي أ. العنود الزغيبي معيده محاضر 3

4 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER : Gettig Acquaited with Biostatistics. Itroductio: The course "Biostatistics" (STAT-45) is about iformatio; how it is obtaied, how it is aalyzed, ad how it is iterpreted. The objective of the course is to lear: () How to orgaize, summarize, ad describe data. (Descriptive Statistics) () How to reach decisios about a large body of data by examie oly a small part of the data. (Iferetial Statistics). Some Basic Cocepts: Data: Data is the raw material of statistics. There are two types of data: () Quatitative data (umbers: weights, ages, ). () Qualitative data (words or attributes: atioalities, occupatios, ). Statistics: Statistics is the field of study cocered with: () The collectio, orgaizatio, summarizatio, ad aalysis of data. (Descriptive Statistics) () The drawig of ifereces ad coclusios about a body of data (populatio) whe oly a part of the data (sample) is observed. (Iferetial Statistics) Biostatistics: Whe the data is obtaied from the biological scieces ad medicie, we use the term "biostatistics". Kig Saud Uiversity 4

5 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Sources of Data:. Routiely kept records.. Surveys. 3. Experimets. 4. Exteral sources. (published reports, data bak, ) Populatio: - A populatio is the largest collectio of etities (elemets or idividuals) i which we are iterested at a particular time ad about which we wat to draw some coclusios. - Whe we take a measuremet of some variable o each of the etities i a populatio, we geerate a populatio of values of that variable. - Example: If we are iterested i the weights of studets erolled i the college of egieerig at KSU, the our populatio cosists of the weights of all of these studets, ad our variable of iterest is the weight. Populatio Size (N): The umber of elemets i the populatio is called the populatio size ad is deoted by N. Sample: - A sample is a part of a populatio. - From the populatio, we select various elemets o which we collect our data. This part of the populatio o which we collect data is called the sample. - Example: Suppose that we are iterested i studyig the characteristics of the weights of the studets erolled i the college of egieerig at KSU. If we radomly select 50 studets amog the studets of the college of egieerig at KSU ad measure their weights, the the weights of these 50 studets form our sample. Sample Size (): The umber of elemets i the sample is called the sample Kig Saud Uiversity 5

6 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 size ad is deoted by. Variables: The characteristic to be measured o the elemets is called variable. The value of the variable varies from elemet to elemet. Example of Variables: () No. of patiets () Height (3) Sex (4) Educatioal Level Types of Variables: () Quatitative Variables: A quatitative variable is a characteristic that ca be measured. The values of a quatitative variable are umbers idicatig how much or how may of somethig. Examples: (i) Family Size (ii) No. of patiets (iii) Weight (iv) height Types of Quatitative Variables: (a) Discrete Variables: There are jumps or gaps betwee the values. Examples: - Family size (x =,, 3, ) - Number of patiets (x = 0,,, 3, ) (b) Cotiuous Variables: There are o gaps betwee the values. A cotiuous variable ca have ay value withi a certai iterval of values. Examples: - Height (40 < x < 90) - Blood sugar level (0 < x < 5) () Qualitative Variables: The values of a qualitative variable are words or attributes idicatig to which category a elemet belog. Examples: Kig Saud Uiversity 6

7 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 - Blood type - Natioality - Studets Grades - Educatioal level Types of Qualitative Variables: (a) Nomial Qualitative Variables: A omial variable classifies the observatios ito various mutually exclusive ad collectively o-raked categories. The values of a omial variable are ames or attributes that ca ot be ordered or sorted or raked. Examples: - Blood type (O, AB, A, B) - Natioality (Saudi, Egyptia, British, ) - Sex (male, female) (b) Ordial Qualitative Variables: A ordial variable classifies the observatios ito various mutually exclusive ad collectively raked categories. The values of a ordial variable are categories that ca be ordered, sorted, or raked by some criterio. Examples: - Educatioal level (elemetary, itermediate, ) - Studets grade (A, B, C, D, F) - Military rak.4 Samplig ad Statistical Iferece: There are several types of samplig techiques, some of which are: Kig Saud Uiversity 7

8 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 () Simple Radom Samplig: If a sample of size () is selected from a populatio of size (N) i such a way that each elemet i the populatio has the same chace to be selected, the sample is called a simple radom sample. () Stratified Radom Samplig: I this type of samplig, the elemets of the populatio are classified ito several homogeous groups (strata). From each group, a idepedet simple radom sample is draw. The sample resultig from combiig these samples is called a stratified radom Sample. Kig Saud Uiversity 8

9 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER : Strategies for Uderstadig the Meaig of Data:. Itroductio: I this chapter, we lear several techiques for orgaizig ad summarizig data so that we may more easily determie what iformatio they cotai. Summarizatio techiques ivolve: - frequecy distributios - descriptive measures. The Ordered Array: A first step i orgaizig data is the preparatio of a ordered array. A ordered array is a listig of the values i order of magitude from the smallest to the largest value. Example: The followig values represet a list of ages of subjects who participate i a study o smokig cessatio: The ordered array is: Grouped Data: The Frequecy Distributio: To group a set of observatios, we select a suitable set of cotiguous, o-overlappig itervals such that each value i the set of observatios ca be placed i oe, ad oly oe, of the itervals. These itervals are called "class itervals". Example: The followig table gives the hemoglobi level (g/dl) of a sample of 50 me We wish to summarize these data usig the followig class Kig Saud Uiversity 9

10 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 itervals: , , , , , Solutio: Variable = X = hemoglobi level (cotiuous, quatitative) Sample size = = 50 Max= 8.3 Mi= 3.5 Class Iterval Tally Frequecy The grouped frequecy distributio for the hemoglobi level of the 50 me is: Class Iterval Frequecy (Hemoglobi level) (o. of me) Total =50 Notes:. Miimum value first iterval.. Maximum value last iterval. 3. The itervals are ot overlapped. 4. Each value belogs to oe, ad oly oe, iterval. 5. Total of the frequecies = the sample size = Kig Saud Uiversity 0

11 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Mid-Poits of Class Itervals: upper limit + Mid-poit = lower limit True Class Itervals: d = gap betwee class itervals d = lower limit upper limit of the precedig class iterval true upper limit = upper limit +d/ true lower limit = lower limit - d/ Class Iterval True Class Iterval Mid-poit Frequecy For example: Mid-poit of the st iterval = ( )/ = 3.45 : Mid-poit of the last iterval = ( )/ = 8.45 Note: () Mid-poit of a class iterval is cosidered as a typical (approximated) value for all values i that class iterval. For example: approximately we may say that: there are 3 observatios with the value of 3.45 there are 5 observatios with the value of 4.45 : there are observatio with the value of 8.45 () There are o gaps betwee true class itervals. The edpoit (true upper limit) of each true class iterval equals to the start-poit (true lower limit) of the followig true class iterval. Kig Saud Uiversity

12 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Cumulative frequecy: Cumulative frequecy of the st class iterval = frequecy. Cumulative frequecy of a class iterval = frequecy + cumulative frequecy of the precedig class iterval Relative frequecy ad Percetage frequecy: Relative frequecy = frequecy/ Percetage frequecy = Relative frequecy 00% Class Iterval Frequecy Cumulative Frequecy Relative Frequecy Cumulative Relative Frequecy Percetage Frequecy 6% 0% 30% 3% 0% % Cumulative Percetage Frequecy 6% 6% 46% 78% 98% 00% From frequecies: The umber of people whose hemoglobi levels are betwee 7.0 ad 7.9 = 0 From cumulative frequecies: The umber of people whose hemoglobi levels are less tha or equal to 5.9 = 3 The umber of people whose hemoglobi levels are less tha or equal to 7.9 = 49 From percetage frequecies: The percetage of people whose hemoglobi levels are betwee 7.0 ad 7.9 = 0% From cumulative percetage frequecies: The percetage of people whose hemoglobi levels are less tha or equal to 4.9 = 6% The percetage of people whose hemoglobi levels are less tha or equal to 6.9 = 78% Kig Saud Uiversity

13 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Displayig Grouped Frequecy Distributios: For represetig frequecy (or relative frequecy or percetage frequecy) distributios, we may use oe of the followig graphs: The Histogram The Frequecy Polygo Example: Cosider the followig frequecy distributio of the ages of 00 wome. True Class Iterval Frequecy Cumulative Mid-poits (age) (No. of wome) Frequecy Total =00 Width of the iterval: W =true upper limit true lower limit = = 5 () Histogram: Orgaizig ad Displayig Data usig Histogram: Kig Saud Uiversity 3

14 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 () The Frequecy Polygo: Orgaizig ad Displayig Data usig Polygo: Polygo (Ope) Polygo (Closed) Kig Saud Uiversity 4

15 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Kig Saud Uiversity 5

16 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43.4 Descriptive Statistics: Measures of Cetral Tedecy: (Measures of locatio) I the last sectio we summarize the data usig frequecy distributios (tables ad figures). I this sectio, we will itroduce the cocept of summarizatio of the data by meas of a sigle umber called "a descriptive measure". A descriptive measure computed from the values of a sample is called a "statistic". A descriptive measure computed from the values of a populatio is called a "parameter". For the variable of iterest there are: () "N" populatio values. () "" sample of values. Let X, X, K, X N be the populatio values (i geeral, they are ukow) of the variable of iterest. The populatio size = N Let x, x, K, x be the sample values (these values are kow). The sample size =. (i) A parameter is a measure (or umber) obtaied from the populatio values: X, X, K, X N. - Values of the parameters are ukow i geeral. - We are iterested to kow true values of the parameters. (ii) A statistic is a measure (or umber) obtaied from the sample values: x, x, K, x. - Values of statistics are kow i geeral. - Sice parameters are ukow, statistics are used to approximate (estimate) parameters. Kig Saud Uiversity 6

17 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Measures of Cetral Tedecy: (or measures of locatio): The most commoly used measures of cetral tedecy are: the mea the media the mode. The values of a variable ofte ted to be cocetrated aroud the ceter of the data. The ceter of the data ca be determied by the measures of cetral tedecy. A measure of cetral tedecy is cosidered to be a typical (or a represetative) value of the set of data as a whole. Mea: () The Populatio mea ( µ ): If X, X, K, X N are the populatio values, the the populatio mea is: N N i= X i X + X + L + X µ = = (uit) N N The populatio mea µ is a parameter (it is usually ukow, ad we are iterested to kow its value) () The Sample mea ( x ): If x, x, K, x are the sample values, the the sample mea is: xi x + x + L + x i= x = = (uit) The sample mea x is a statistic (it is kow we ca calculate it from the sample). The sample mea x is used to approximate (estimate) the populatio mea µ. Example: Suppose that we have a populatio of 5 populatio values: Kig Saud Uiversity 7

18 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 X = 4, X = 30, X 3 = 35, X 4 =, X 5 = 7. (N=5) Suppose that we radomly select a sample of size 3, ad the sample values we obtaied are: x = 30, x = 35, x3 = 7. (=3) The: The populatio mea is: µ = = = 3 (uit) The sample mea is: Notice that = = = 9 = x is approximately equals to µ = 3. 5 x (uit) Note: The uit of the mea is the same as the uit of the data. Advatages ad disadvatages of the mea: Advatages: Simplicity: The mea is easily uderstood ad easy to compute. Uiqueess: There is oe ad oly oe mea for a give set of data. The m ea takes ito accout all values of the data. Disadvatages: Extreme values have a ifluece o th e mea. Therefore, the mea may be distorted by extreme values. For example: Sample Data mea A B The mea ca oly be foud for quatitative variables. Media: The media of a fiite set of umbers is that value which divides the ordered array ito two equal parts. The umbers i the first part are less tha or equal to the media ad the umbers i the secod part are greater tha or equal to the Kig Saud Uiversity 8

19 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 media. Notice that: 50% (or less) of the data is Media 50% (or less) of the data is Media Calculatig the Media: Let x, x, K, x be the sample values. The sample size () ca be odd or eve. First we order the sample to obtai the ordered array. Suppose that the ordered array is: y, y,, K We compute the rak of the middle value (s): rak = + If the sample size () is a odd umber, there is oly oe value i the middle, ad the rak will be a iteger: + rak = = m (m is iteger) The media is the middle value of th e ordered observatios, which is: Media = y m. y Kig Saud Uiversity 9

20 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 If the sample size () is a eve umber, there are two values i the middle, ad the rak will be a iteger plus 0.5: = + rak = m Therefore, the raks of the middle values are (m) ad (m+). The media i s the m ea (average) of the two middle values of the ordered observatios: y m + ym+ Media =. Example (odd umber): Fid the media for the sample values: 0, 54,, 38, 53. Solutio:. = 5 (odd umber) There is oly oe value i the middle. The rak of the middle valu e is: + rak = 5 + = = 3. (m=3) Ordered set (middle value) Rak (or order) 3 (m) The media =38 (uit) 4 5 Example (eve umber): Fid the media for the sample values: 0, 35, 4, 6, 0, 3 Solutio:. = 6 (eve umber) There are two values i the middle. The rak is: Kig Saud Uiversity 0

21 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/ = + = rak = 3.5 = = m+0.5 (m=3) There fore, the raks of the middle values are:.m = 3 ad m+ = 4 Ordered set Rak (or order) (m) (m+) The middle values are 0 ad The media = = = = 6 (uit) Not e : The uit of the media is the same as the uit of the data. Advatages ad disadvatages of the media: Advatages: Simplicity: The media is easily uderstood ad easy to compute. Uiqueess: There is oly oe media for a give set of data. The media is ot as drastically affected by extreme values as is the mea. (i.e., the media is ot affected too much by extreme values). For example: Sample Data media A B Disadvatages: The media does ot take ito accout all values of the sample. I geeral, the media ca oly be foud for quatitative variables. However, i some cases, the media ca be foud for ordial qualitative variables. Mode: The mode of a set of values is that value which occurs most frequetly (i.e., with the highest frequecy). Kig Saud Uiversity

22 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 If all values are differet or have the same frequecies, there will be o mode. A set of data may have more tha oe mode. Example: Data set Type M ode(s) 6, 5, 5, 34 Quatitative 5 3, 7,, 6, 9 Quatitative No mode 3, 3, 7, 7,,, 6, 6, 9, 9 Quatitative No mode 3, 3,, 6, 8, 8 Quatitative 3 ad 8 B C A B B B C B B Qualitative B B C A B A B C A C Qualitative No mode B C A B B C B C C Qualitative B ad C Note: The uit of the mode is the same as the uit of the data. Advatages ad disadvatages of the mode: Advatages: Simplicity: the mode compute.. is easily uderstood ad easy to The mode is ot as drastically affected by extreme values as is the mea. (i.e., the mode is ot affected too much by extreme values). For example: Sample Data Mode A B The mode may be foud for both quatitative ad qualitative variables. Disadvatages: The mode is ot a good measure of locatio, because it depeds o a few values of the data. The mode does ot take ito accout all values of the sample. There might be o mode for a data set. There might be more tha oe mode for a data set. Kig Saud Uiversity

23 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43.6 Descriptive Statistics: Measures of Dispersio (Measures of Variatio): The dispersio (variatio) of a set of observatios refers to the variety that they exhibit. A measure of dispersio coveys iformatio regardig the amout of variability preset i a set of data. There are several measures of dispersio, some of which are: Rage, Variace, Stadard Deviatio, ad Coefficiet of Variatio. The variatio or dispersio i a set of values refers to how spread out the values is from each other. The dispersio (variatio) is small whe the values are close together. There is o dispersio (o variatio) if the values are the same. The Rage: The Rage is the differece betwee the largest value (Max) ad the smallest value (Mi). Rage (R) = Max Mi Example: Fid the rage for the sample values: 6, 5, 35, 7, 9, 9. Kig Saud Uiversity 3

24 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Solutio:.max = 35.mi = 5 Rage ( R) = 35 5 = 0 (uit) Notes:. The uit of the rage is the same as the uit of the data.. The usefuless of the rage is limited. The rage is a poor measure of the dispersio because it oly takes ito accout two of the values; however, it plays a sigificat role i may applicatios. The Variace: The variace is oe of the most importat measures of dispersio. The variace is a measure that uses the mea as a poit of referece. The variace of the data is small whe the observatios are close to the mea. The variace of the data is large whe the observatios are spread out from the mea. The variace of the data is zero (o variatio) whe all observatios have the same value (cocetrated at the mea). Deviatios of sample values from the sample mea: Let x, x, K, x be the sample values, ad x be the sample mea. The deviatio of the value from the sample mea x is: The squared deviatio is: xi x i ( x i x The sum of squared deviatios is: x ) Kig Saud Uiversity 4

25 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 i= ( x i x ) The followig graph shows the squared deviatios of the values from their mea: () The Populatio Variace σ : (Variace computed from the populatio) Le t X, X, K, X N be the populatio values. The populatio variace (σ ) is defied by: σ = = N N i= ( X ) ( X µ ) + ( X µ ) + L+ ( X N µ ) i µ N N (uit) X i i= wher e, µ = is the populatio mea, ad (N) is the N populatio size. Notes: σ is a parameter because it is obtaied from the populatio values (it is ukow i geeral). σ 0 () The Sample Variace S : (Variace computed from the sample) Let x, x, S K, x be the sample values. The sample variace ( ) is defied by: Kig Saud Uiversity 5

26 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 where x S i = = = x (x ( x i x) x) + ( x + L+ ( x x) x) ( uit) i i= = is the sample mea, ad () is the sample size. Notes: S is a statistic because it is obtaied from the sample values (it is kow). S is used to approximate (estimate) σ. S 0 = 0 S all observatio have the same value there is o dispersio (o variatio) Example: We wat to compute the sample variace of the followig sample values: 0,, 33, 53, 54. Solutio: =5 x S 5 x i x i = = i= = i = ( x i x) ( xi 34.) i= i= = = = = ( ) + ( ) + ( ) + ( ) + ( ) S = = = (uit) 4 Aother Method for calculatig sample variace: x i ( x x) = ( x 34.) ( ) ( ) i i x x i = x i 34. Kig Saud Uiversity 6

27 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x = 5 x i i= 5 5 x i i= x ( x x) = ( x 34.) ( ) ( ) i i i x i x = 7 5 i= 7 == = 34. a 5 ( x) = ( x) x i 34. x = 0 = d i S = == 4 x i Stadard D eviatio: The variace represets squared uits, therefore, is ot appropriate measure of dispersio whe we wish to express the cocept of dispersio i terms of the origial uit. The stadard deviatio is aother measure of dispersio. The stadard deviatio is the square root of the variace. The stadard deviatio is expressed i the origial uit of the data. () Populatio stadard deviatio is: σ = σ (uit) () Sample stadard deviatio is: S = S S = i= ( x i x) (uit) Example: For the previous example, the sample stadard deviatio is S = S = = 9.4 (uit) Coefficiet of Variatio (C.V.): The variace ad the stadard deviatio are useful as measures of variatio of the values of a sigle variable for a sigle populatio. If we wat to compare the variatio of two variables we caot use the variace or the stadard deviatio because: Kig Saud Uiversity 7

28 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43. The variables might have differet uits.. The variables might have differet meas. We eed a measure of the relative variatio that will ot deped o either the uits or o how large the values are. This measure is the coefficiet of variatio (C.V.). The coefficiet of variatio is defied by: S C.V. = 00% x The C.V. is free of uit (uit-less). To compare the variability of two sets of data (i.e., to determie which set is more variable), we eed to calculate the followig quatities: Mea Stadard C.V. deviatio st data set d data set x S x S S C. V = x S C. V = x 00% 00% The data set with the larger value of CV has larger variatio. The relative variability of th e st data set is larger tha the d relative variability of the data set if C.V > C.V (ad vice versa). Example: Supp ose we have two data sets: st data set: x 66 kg, S = 4.5 kg = d da ta set: x = 36 kg, S = 4.5 kg 4.5 C. V = *00% = 6.8% C. V = *00% =.5% 36 Sice C. V ariability of the d > C. V, the relative v data set is larger tha the relative variability of the st data set. Kig Saud Uiversity 8

29 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 If we use the stadard deviatio to compare the variability of the two data sets, we will wrogly coclude that the two data sets have the same variability because the stadard deviatio of both sets is 4.5 kg. Kig Saud Uiversity 9

30 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Chapter 3: Probability The Basis o Statistical Iferece 3. Itroductio 3. Probability 3.3 Elemetary Properties of Probability 3.4 Calculatig the Probability of a Evet Geeral Defiitios ad Cocepts: Probability: Probability is a measure (or umber) used to measure the chace of the occurrece of some evet. This umber is betwee 0 ad. A Experimet: A experimet is some procedure (or process) that we do. Sample Space: The sample space of a experimet is the set of all possible outcomes of a experimet. Also, it is called the uiversal set, ad is deoted by Ω. A Evet: Ay subset of the sample space Ω is called a evet. φ Ω is a evet (impossible evet) Ω Ω is a evet (sure evet) Example: Experimet: Selectig a ball from a box cotaiig 6 balls umbered from to 6 ad observig the umber o the selected ball. This experimet has 6 possible outcomes. The samp le space is: Ω = {,, 3, 4, 5, 6}. Cosider the followig evets: E = gettig a eve umber = {, 4, 6 } Ω Kig Saud Uiversity 30

31 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 E = gettig a umber less tha 4 = {,, 3 } Ω E 3 = gettig or 3 = {, 3 } Ω E 4 = gettig a odd umbe r = {, 3, 5} Ω E 5 = gettig a egative umber = { } = φ Ω = gettig a umber less tha 0 = {,,3, 4, 5, 6 } E = Ω Ω 6 Notatio: ( Ω)= o. of outcomes (elemets) i Ω ( E)=o. of outcomes (elemets) i the evet E Equally Likely Outcomes: The outcomes of a experimet are equally likely if the outcomes have the same chace of occurrece. Probability of A Evet: If the experimet has (Ω) equally likely outcomes, the the probability of the evet E is deoted by P(E) ad is defied by: ( ) ( E) o. of outcomes i E P E = = ( Ω) o. of outcomes i Ω Example: I the ball experimet i the previous example, suppose the ball is selected at radom. Determie the probabilities of the followig evets: E = gettig a eve umber E = gettig a umber less tha 4 = gettig or 3 E 3 Solutio: Ω =,, 3, 4, 5, 6 E E = = { } ; ( Ω) = 6 {, 4, 6} ; ( E ) = 3 {,, 3} ; ( E ) = 3 {, 3} ; ( E3 ) = E3 = The outcomes are equally likely. 3 3 P( E ), E = 6 6 = P ( ), P ( E 3 ) =, 6 Kig Saud Uiversity 3

32 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Some Operatios o Evets: Let A ad B be two evets defied o the sample space Ω. Uio of Two evets: ( A B) or ( A + B ) The evet A B cosists of all outcomes i A or i B or i both A ad B. The evet A B occurs if A occurs, or B occurs, or both A ad B occur. Itersectio of Two Evets: ( A B) The evet A B Cosists of all outcomes i both A ad B. The evet A B Occurs if both A ad B occur. C Complemet of a Evet: ( A ) or ( A ) or ( A' ) The complemet of the eve A is deo ted by A. The eve A cosists of all outcomes of Ω but are ot i A. The eve A occurs if A doe s ot. Example: Experimet: Selectig a ball from a box cotaiig 6 balls umbered,, 3, 4, 5, ad 6 radomly. Defie the followig evets: E = {, 4, 6} = gettig a eve umber. Kig Saud Uiversity 3

33 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 E = {,, 3} = gettig a umber < 4. E 4 = {, 3, 5} = gettig a odd umber. () E E = {,, 3, 4, 6 } = gettig a eve umber or a umber less tha 4. ( ) ( E E ) 5 P E E = = Ω ( ) 6 () E E 4 = {,, 3, 4, 5, 6 } = Ω = gettig a eve umber or a odd umber. ( ) ( E E4 ) 6 P E E4 = = = ( Ω) 6 Note: E E 4 = Ω. E ad E 4 are called exhaustive evets. The uio of these evets gives the whole sample space. (3) E E = { } = gettig a eve umber ad a umber less tha 4. ( ) ( E E ) P E E = = Ω ( ) 6 Kig Saud Uiversity 33

34 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 (4) E E 4 = φ = gettig a eve umber ad a odd umber. ( ) ( E E4 ) ( φ) 0 P E E4 = = = = 0 Ω 6 6 ( ) Note: E E 4 = φ. E ad E 4 are called disjoit (or mutually exclusive) evets. These kids of evets ca ot occurred simultaeously (together i the same time). (5) The complemet of E E {, 4,6} = ot gettig a eve umber = = {, 3, 5} = gettig a odd umber. = E 4 Mutually exclusive (disjoit) Evets: The evets A ad B are disjoit (or mutually exclusive) if: A B = φ. For this case, it is impossible that both evets occur simultaeously (i.e., together i th e sam e time). I this case: (i) P ( A B) = 0 (ii) P ( A B) = P( A) + P( B) If A B φ, the A ad B are ot mutually exclusive (ot disjoit). Kig Saud Uiversity 34

35 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 A B φ A ad B are ot mutually exclusive (It is possible that both evets occur i the same time) A B = φ A ad B are mutually exclusive (disjoit) (It is impossible that both evets occur i the same time) Exhaustive Evets: The evets A, A, K, A are exhaustive eve ts if: A A K A = Ω. For this case, P( A A K A ) = P( Ω) = Note:. A A = Ω (A ad A are exhaustive evets). A A = φ (A ad A are mutually exclusive (disjoi t) evets) 3. ( A) = ( Ω) ( A) 4. P ( A) = P( A) Geeral Probability Rules: 0 P A. ( ). P ( Ω) = 3. P ( φ) = 0 4. P( A) = P( A) Kig Saud Uiversity 35

36 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 The Additio Rule: For ay two evets A ad B: P ( A B) = P( A) + P( B) P( A B) Special Cases:. For mutually exclusive (disjoit) evets A ad B ( A B) = P( A) P( B) P +. For mutually exclusive (disjoit) evets E, E, K, E : P( E E K ) = P( E ) + P( E ) + L + P( E E ) Note: If the evets A, A, K, A are exhaustive ad mutually exclusive (disjoit) evets, the: P( A A K ) = P( A ) + P( A ) + L+ P( A ) = P ( = A Ω) Margial Probability: Give some variable that ca be broke dow ito (m) categories desigated by A, A, L, Am ad aother joitly occurrig variable that is broke dow ito () categories desigated by B B,,., L B B B B Total A A B ) A B ) A B ) A ) ( ( ( ( A A B ) A B ) A B ) (A ) ( ( ( A m ( A m B m ( A m B ) ( Am ) Total B ) B ) B ) (Ω) ( ( (T his t able cotais the umber of elemets i each evet) ( Kig Saud Uiversity 36

37 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 B B B Margial Probability A P(A B ) P( A B ) P( A B ) P( A ) A P( A B ) P( A B ) P( A B ) P( A ) A P A m B ) P A m B ) P A m B ) P( ) m Margial Probability ( ( ( P( ) P( B ) P( B ).00 B (This table cotais the probability of each evet) The margial probability of A i, P( A i ), is equal to the sum of the joit pro babilities of A i with all categories of B. That is: P( A i For example, ) = ( Ai B ) + P( Ai B ) + K+ P( Ai B P ) = P(A B ) j= j= i j ( A ) = P( A B ) + P( A B ) + K+ P( A P = P( A B ) j B ) We defie the margial probability of B j, P( B j ), i a similar way. Example: Table of umber of elemets i each evet: B B B 3 Total A A A Total Table of probabilities of each evet: B B B Margial 3 Probability A A A Margial Probability A m Kig Saud Uiversity 37

38 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 For example: Applicatios: P( A ) P( A B ) + P( A B ) + P( A = = B = 0. Example: 630 patiets are classified as follows: Blood Type O A B AB ( E ) ( E ) ( E 3 ) ( E4 ) Total No. of patiets Experimet: Selectig a patiet at radom ad observe his/her blood type. This experimet has 630 equally likely outcomes ( Ω) = 6 30 Defie the evets: E = The blood type of the selected patiet is "O" E = The blood type of the selected patie t is " A" E3 = The blood type of the selected patiet is "B" E4 = The blood type of the selected patiet is "AB" Number of elemets i each evet: ( E )= 84, ( E ) = 58, ( E 3 )= 63, ( E 4 ) = 5. Probabilities of the evets: P ( E ) = = , P( E ) = =0.4095, P ( E 3 ) = =0., P ( E 4 ) = =0.0397, Some operatios o the evets:. E E 4 = the blood type of the selected patiets is "A" ad " AB". E = φ (disjoit evets / mutually exclusive evets) E 4 P ( E E4 ) = P( φ) = 0. E E 4 = the blood type of the selected patiets is "A" or "AB" ) Kig Saud Uiversity 38

39 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P( E E4 ) = P E ( E E 4 ) ( Ω) = = = or = + = = ( ) P( ) E 4 (sice E E 4 = φ ) 3. E = the blood type of the selected patiets is ot "O". E ) = ( Ω) ( E ) = = 346 P ( ( E ) 346 ) = = ( Ω) 630 ( E = aother solutio: P( C E ) = P( E ) = = Notes:. E, E, E3, E4 are mutually disjoit, Ei E j = φ ( i j).. E, E, E E are exhaustive evets, E E E = Ω 3, 4 E. 3 4 Example: 339 physicias are classified based o their ages ad smokig habits as follows. Smokig Habit Daily Occasioally Not at all ( B ) ( B ) ( B 3 ) Total 0-9 ( A ) ( A ) ( A 3) ( A 4 ) Total Age Experimet: Selectig a physicia at radom The umb er of elemets of the sample space is ( Ω) = 339. The outcomes of the experimet are equally likely. Some evets: A = the selected physicia is aged Kig Saud Uiversity 39

40 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 ( A ) ( Ω) 3 79 P( A 3 ) = = = B = the selected physicia sm okes occasioally ( B ) ( Ω) 60 P ( B ) = = = A 3 B = the selected physicia is aged ad smokes occasioally. ( A3 B ) P ( A3 B ) = = = ( Ω) 339 A 3 B = the selected physicia is aged or smokes occasioally (or both) P A B = P A + P B P A B ( ) ( ) ( ) ( ) 3 A4 = P = = = the selected physicia is ot 50 years or older. A A = A3 4 = P A4 ( A ) ( ) ( A4 ) 4 = = = 0.99 ( Ω) 339 A = the selected physicia is aged or is aged = the selected physicia is aged ( ) ( A A3 ) P A A3 = = = = ( Ω) or P( A A3 ) = P( A ) + P( A3 ) = + = (Sice A = φ ) A3 A 3 Example: Suppose that there is a populatio of pregat wome with: 0% of the pregat wome delivered prematurely. 5% of the pregat wome used some sort of medicatio. 3 Kig Saud Uiversity 40

41 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 5% of the pregat wome de livered prematurely ad used some sort of medicatio. Experimet: Selectig a woma radomly from this populatio. Defie the evets: D = The selected woma delivered prematurely. M = The selected wome used medicatio. D M = The selected woma delivered prematurely ad used some sort of medicatio. Percetages: %( D) = 0% %( M ) = 5% %( D M ) = 5% The complemet evets: D = The selected woma did ot deliver prematurely. M = The selected wome did ot use medicatio. A Two-way table: (Percetages give by a two-way table): M M Total D 5? 0 D??? Total 5? 00 M M Total D D Total The probabilities of the give evets are: ( D) % 0% P ( D) = = = 0. 00% 00% % ( ) ( M ) 5% P M = = = % 00% % ( ) ( D M ) 5% P D M = = = % 00% Calculatig probabilities of some evets: D M = the selected woma delivered prematurely or used medicatio. P ( D M ) = P( D) + ( M ) P( D M ) = = 0.3 (by the rule) Kig Saud Uiversity 4

42 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 M = D = The selected woma did ot use medicatio P ( M ) = P( M ) = 0.5 = (by the rule) 75 P M = = 0. (from the table) ( ) The selected woma did ot deliver prematurely P D = P D = 0.0 = 0. (by the rule) ( ) ( ) ( D ) = = P (from the table) 00 D M = did ot deliver prematurely ad the selected woma did ot use medicatio. 70 P D M = = 0. ( from the table) ( ) D M = the selected woma did ot deliver prematurely ad used medicatio. 0 ( D M ) = 0. 0 P = ( from the table) 00 D M = the selected woma delivered prematurely ad did ot use medicatio. 5 ( D M ) = = P (from the table) 00 D M = the selected woma delivered prematurely or did ot use medicatio. P ( D M ) = P( D) + ( M ) P( D M ) = = 0.8 (by the rule) D M = the selected woma did ot de liver prematurely or used medicatio. P ( D M ) = P( D ) + ( M ) P( D M ) = = 0.95 (by the rule) D M = the selected woma did ot deliver prematurely or did ot use medicatio. P ( D M ) = P( D ) + ( M ) P( D M ) = = 0.95 (by the rule) Coditioal Probability: The coditioal probability of the evet A whe we kow that the evet B has already occurred is defied by: P ( ) ( A B) P A B = ; P( B) 0 P B ( ) Kig Saud Uiversity 4

43 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P(A B) = The coditioal probability of A give B. Notes: P () ( ) ( A B) P A B = ( A B) / ( Ω) = P( B) ( B) / ( Ω) P () ( ) ( A B) P B A = P( A) (3) For calculatig ( A B) followig: (i) P( A B) = ( A B) = ( A B) ( B) P, we may use ay oe of the P (ii) P = ( A B) P( B) ( A B) ( B) (iii) Usig the restricted table dire ctly. Multiplicatio Rules of Probability: For ts A ad B, we have: P A B = P B P A B ay two eve ( ) ( ) ( ) P (A B) = P( A) P( B A) Example: Smokig Habit Daily Occasioally Not at all ( B ) ( B ) ( B 3 ) Total 0-9 ( A ) ( A ) ( A3 ) ( A 4 ) Total Cosider the followig evet: (B A ) = the selected physicia smokes daily give that his ge A Kig Saud Uiversity 43

44 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 age is betwee 30 ad 39 ( B ) 76 P( B ) = = = ( Ω) 339 ( P ) ( B A ) P B A = P( A ) = = ( B A ) 0 P ( B A ) = = = ( Ω) 339 ( A ( ) ) 89 P A = = = ( Ω) 339 aother solutio: ( ) ( B A ) 0 P B A = = = ( A ) 89 Notice that: P ( B ) = P ( B A ) = P ( B A ) > P ( B )!! P( B ) P( B A ) What does this mea? We will aswer this questio after talkig about the cocept of idepedet evets. Example: (Multiplicatio Rule of Probability) A traiig health program cosists of two cosecutive parts. To pass this program, the traiee must pass both parts of the program. From the past experiece, it is kow that 90% of the traiees pass the first part, ad 80% of those who pass the first part pass the secod part. If you are admitted to this program, what is the probability that you will pass the program? What is the percetage of traiees who pass the program? Solutio: Defie the followig evets: A = the evet of passig the first part Kig Saud Uiversity 44

45 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 B = the evet of passig the secod part A B = the evet of passig the first part ad the secod Part = the evet of passig both parts = the evet of passig the program Therefore, the probability of passig the program is P(A B). From the give iformatio: The probability of passig the first part is: 90% P(A) = 0.9 = 0.9) ( 00% The probability of passig the secod part give that the traiee has already passed the first part is: 80% P(B A) = 0.8 ( = 0.8) 00% Now, we use the multiplicatio rule to fid P(A B) as follows: P( A B) = P(A) P(B A) = (0.9)(0.8) = 0.7 We ca coclude that 7% of the traiees pass the program. Idepedet Evets There are 3 cases: (A P B) > P( A) (kowig B icreases the probability of occurrece of A) P ( A B) < P(A) (kowig B decreases the probability of occurrece of A) P ( A B) = P( A) (kowig B has o effect o the probability of occurrece of A). I this case A is idepedet of B. Idepedet Evets: Two evets A ad B are idepedet if oe of the followig coditios is satisfied: ( i ) P( A B) = P( A) ( ii ) P( B A) = P( B) ( iii ) P( B A) = P( A) P( B) Note: The third coditio is the multiplicatio rule of idepedet evets. Kig Saud Uiversity 45

46 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: Suppose that A ad B are two evets such that: P(A) = 0.5, P(B)=0.6, P(A B)=0.. Theses two evets are ot idepedet (they are depedet) because: P(A) P(B) = = 0.3 P(A B)=0.. P(A B) P(A) P(B) P A B Also, P(A)= 0.5 P(A B) = ) = 0. = Also, P(B) = 0.6 P(B A) = ( P( B) P( A B) P( A) = = For this example, we may calculate probabilities of all evets. We ca use a two-way table of the probabilities as follows: B B Total A 0.? 0.5 A?.?? Total 0.6?.00 We complete the table: B B Total A A Total P ( A) = 0.5 P ( B) = 0.4 P ( A B) = 0.3 P ( A B) = 0.4 P ( A B) = 0. P ( A B) = P( A) + P( B) P( A B) = = 0.9 P ( A B) = P( A) + P( B) P( A B) = = 0.6 P ( A B) = exercise P ( A B) = exercise Note: The Additio Rule for Idepedet Evets: If the evets A ad B are idepedet, the Kig Saud Uiversity 46

47 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P ( A B) = P( A) + P( B) P( A B) = P ( A) + P( B) P(A) P(B) (Additio rule) Example: (Readig Assigmet) Suppose that a detal cliic has urses classified as follows: Nurse Has childre Yes No No No No Yes No No Yes No No No Wo rks at ight No No Yes Yes Yes Yes No No Yes Yes Yes Yes The experimet is to radomly choose oe of these urses. Cosider the followig evets: C = the chose urse has childre N = the chose urse works ight shift a) Fid The probabilities of the followig evets:. the chose urse has childre.. the chose urse works ight shift. 3. the chose urse has childre ad works ight shift. 4. the chose urse has childre ad does ot work ight shift. b) Fid the probability of choosig a urse who woks at ight give that she has childre. c) Are the evets C ad N idepedet? Why? d) Are the evets C ad N disjoit? Why? e) Sketch the evets C ad N with their probabilities usig Ve diagram. Solutio: We ca classify the urses as follows: N N total (Night shift) (No ight shift) C 3 (Has Childre) C (No Childre) total 8 4 a) The experimet has (Ω) = equally likely outcomes. ( C) 3 ( Ω) ( N) = ( Ω) P(The chose urse has childre) = P(C) = = = 0. 5 P(The chose urse works ight shift) = P(N) = P(The chose urse has childre ad works ight shift) ( C I N) ( Ω) = P(C N)= = = = P(The chose urse has childre ad does ot work ight shift) Kig Saud Uiversity 47

48 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 ( C I N ) = P( C I N ) = = = (Ω) b) The probability of choosig a urse who woks at ight give that she has childre: P( C I N) / P( N C) = = = P( C) 0.5 c) The evets C ad N are idepedet because P ( N C) = P( N ). d) The evets C ad N ot are disjoit because C N φ. (Note: (C N)=) e) Ve diagram Kig Saud Uiversity 48

49 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/ Bayes' Theorem, Screeig Tests, Sesitivity, Specificity, ad Predictive Value Positive ad Negative: (pp.79-83) There are two states regardig the disease ad two states regardig the result of the screeig test: We defie the followig evets of iterest: D : the idividual has the disease (presece of the disease) D : the idividual does ot have the disease (absece of The disease) T : the idividual has a positive screeig test result T : the idividual has a egative screeig test result There are 4 possible situatios: True status of the disease +ve (D: Preset) -ve ( D :Abset) Result of +ve (T) Correct diagosig false positive result the test -ve (T ) false egative result Correct diagosig Defiitios of False Results: There are two false results:. A false positive result: This result happes whe a test idicates a positive status whe the true status is egative. Its probability is: P ( T D) = P(positive result absece of the disease). A false egative result: This result happes whe a test idicates a egative status whe the true status is positive. Its probability is: P ( T D) = P(egative result presece of the disease) Defiitios of the Sesitivity ad Specificity of the test:. The Sesitivity: Kig Saud Uiversity 49

50 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P The sesitivity of a test is the probability of a positive test result give the presece of the disease. ( T D) = P(positive result of the test presece of the disease). The specificity: The specificity of a test is the probability of a egative test result give the absece of the disease. P ( T D) = P(egative result of the test absece of the disease) To clarify these cocepts, suppose we have a sample of () subjects who are cross-classified accordig to Disease Status ad Screeig Test Result as follows: Disease Test Result Preset (D) Abset ( D ) Total Positive (T) a b a + b = (T) Negative (T ) c d c + d = (T ) Total a + c = (D) b + d = ( D ) For example, there are (a) subjects who have the disease ad whose screeig test result was positive. From this table we may compute the followig coditioal probabilities:. The probability of false positive result: P( T D) ( T D) = = ( D) b. The probability of false egative result: ( T D) c P( T D) = = ( D) a + c 3. The sesitivity of the screeig test: ( T D) a P ( T D) = = ( D) a + c 4. The specificity of the screeig test: ( T D) d P ( T D ) = = ( D) b + d b + d Kig Saud Uiversity 50

51 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Defiitios of the Predictive Value Positive ad Predictive Value Negative of a Screeig Test:. The predictive value positive of a screeig test: The predictive value positive is the probability that a subject has the disease, give that the subject has a positive screeig test result: P ( D T ) = P(the subject has the disease positive result) = P(presece of the disease positive result). The predictive value egative of a screeig test: The predictive value egative is the probability that a subject does ot have the disease, give that the subject has a egative screeig test result: P( D T ) = P(the subject does ot have the disease egative result) = P(absece of the disease egative result) Calculatig the Predictive Value Positive ad Predictive Value Negative: (How to calculate P ( D T ) ad P ( D T )): We calculate these coditioal probabilities usig the kowledge of:. The sesitivity of the test = P ( T D). The specificity of the test = P( T D) 3. The probability of the relevat disease i the geeral populatio, P(D). ( It is usually obtaied from aother idepedet study) Calculatig the Predictive Value Positive, P(T D) P(D T) = P(T ) P( D T ) : Kig Saud Uiversity 5

52 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 But we kow that: P(T) = P(T D) + P(T D) P(T D) P(D) (multiplicatio rule) P(T D) P(D) (multiplicatio rule) P(T D) = P(T D) = P(T) = P(T D) P(D) + P(T D) P(D) Therefore, we reach the followig versio of Bayes' Theorem: P(D T) = P(T D) P(D) P(T D) P(D) + P(T D) P(D) () Note: P(T D) = sesitivity. P(T D) = P( T D) = specificity. P(D) = The probability of the relevat disease i populatio. P(D) = - P(D). the geeral Calculatig the Predictive Value Nega tive, P( D T ) : To obtai the predictive value egative of a screeig test, we use the followig statemet of Bayes' theorem: P(T D) P(D) P(D T) = P(T D) P(D) + P(T D) P(D) () Note: P( T D) = specificity. P( T D) = P ( T D) = sesitivity. Example: A medical research team wished to evaluate a proposed screeig test for Alzheimer's disease. The test was give to a radom sample of 450 patiets with Alzheimer's disease ad a idepedet radom sample of 500 patiets without symptoms of the disease. The two samples were draw from populatios of subjects who were 65 years of age or older. The results are as follows: Kig Saud Uiversity 5

53 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Alzheimer Disease Test Result Preset (D) Abse t ( D ) Total Positive (T) Negative (T ) Total Base d o aother idepedet study, it is kow that the percetage of patiets with Alzheimer's disease (the rate of prevalece of the disease) is.3% out of all subjects who were 65 years of age or older. Solutio: Usig these data we estimate the followig quatities:. The sesitivity of the test:. The specificity of the test: (T D) 436 P(T D) = = = (D) 450 ( T D) P(T D) = ( D) = = The probability of the disease i the geeral populatio, P(D): The rate of disease i the relevat geeral populatio, P(D), caot be computed from the sample data give i the table. However, it is give that the percetage of patiets with Alzheimer's disease is.3% out of all subjects who were 65 year s of age or older. Therefor e P(D) ca be computed to be:.3 % P(D) = = % 4. The predictive value positive of the test: We wish to estimate the probability that a subject who is positive o the test has Alzheimer disease. We use the Bayes' formul a of Equati o (): P(T D) P(D) P(D T) =. P(T D) P(D) + P(T D) P(D) Fro m the tabulated dat a we compute: Kig Saud Uiversity 53

54 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/ P(T D) = (T D) 5 P(T D) = = = 0.0 ( D) 500 = (From part o. ) Substitutig of these results ito Equatio (), we get: P(D T) (0.9689) P(D) = (0.9689) P(D) + (0.0) P(D) (0.9689) (0.3) = (0.9689) (0.3) + (0.0) (- 0.3) = 0.93 As we see, i this case, the predictive value positive of the test is very high. 5. The predictive value egative of the test: We wish to estimate the probability that a subject who is egative o the test does ot have Alzheimer disease. We use the Bayes' formula of Equatio (): To compute probabilities: P(T D) P(D) P( D T) = P( T D) P(D) + P(T D) P(D) P( D T), we first compute the followig 495 P( T D) = P( D) = - P(D) = = = (From part o. ) ( T D) 4 P( T D) = = 0.03 ( D) 450 = Substitutio i Equatio () gives: P( D T) = P(T D) P(D) (0.99)(0.887) = (0.99)(0.887) + (0.03)(0.3) = P(T D) P(D) P(T D) P(D) As we see, the predictive value egative is also very high. + Kig Saud Uiversity 54

55 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER 4: Probabilistic Features of Certai Data Distributio (Probability Distributios) 4. Itroductio : The cocept of radom variables is very importat i Statistics. Some evets ca be defied usig radom variables. There are two types of radom variables: Radom variables Discrete Radom Variables Cotiuous Radom Variables 4. Probability Distributios of Discrete Radom Variables: Defiitio: The probability distributio of a discrete radom variable is a table, graph, formula, or other device used to specify all possible values of the radom variable alog with their respective probabilities. Examples of discrete r v. s The o. of patiets visitig K KUH i a week. The o. of t imes a pers o had a cold i last year. Exam ple: Cosider the followig discrete radom variable. X = The umber of times a Saudi perso had a cold i Jauary 00. Suppose we are able to cout the o. of Saudis which X = x: x Frequecy of x (o. of colds a Saudi perso had (o. of Saudi people who had a i Jauary 00) cold x times i Jauary 00) 0 0,000,000 3,000,000,000,000 3,000,000 Total N = 6,000,000 Kig Saud Uiversity 55

56 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Note that the possible values of the radom variable X are: x = 0,,, 3 Experimet: Selectig a perso at radom Defie the evet: (X = 0) = The evet that the selected perso had o cold. (X = ) = The evet that the selected perso had cold. (X = ) = The evet that the selected perso had colds. (X = 3) = The evet that the selected perso had 3 colds. I geeral: (X = x) =The evet that the selected perso had x colds. ( Ω) = 6,000,000 equally likely For this experimet, there are outc omes. The umber of elemets of the evet (X = x) is: (X=x) = o. of Saudi people who had a cold x times i Jauary 00. = frequecy of x. The probability of the evet (X = x) is: x P ( X = x) = ( X = x) (X = = ( Ω) freq. of x ( X x) x) 0, for x=0,,, 3 ( X x) P = ( X x) = = = (Relative frequecy) Total Note: ( ) ( X = x) frequecy P X = x = = Relative Frequecy = The probability distributio of the give by the followig tabl e: discrete radom variable X is Kig Saud Uiversity 56

57 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x P ( X = x) = f (x) Total.0000 Notes: The probability distributio of ay discrete radom variable X must satisfy the followig two properties: () 0 P ( X = x) () P( X = x) = x Usig the probability distributio of a discrete r.v. we ca fid the probability of ay evet expressed i term of the r.v. X. Example: Cosider the discrete r.v. X i the previous example. x P ( X = x) Total.0000 () P ( ) = P( X = ) + P( X = 3) = = 0. () P ( X > ) = P( X = 3) = [ote: ( X ) P( (3) ( X < 3) = P( X = ) + P( X = ) = = (4) P ( X ) = P( X = 0) + P( X = ) + P( X = ) X 875 P > X )] P = = aother solutio: P( X ) = P (( X ) ) = P( X > ) = P( X = 3) = 0.65= (5) P ( X < ) = P( X = 0) + P( X = ) = = Kig Saud Uiversity 57

58 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 (6) P (.5 X <.3) = P( X = 0) + P( X = ) = = 0.85 (7) P( X = 3.5) = P( φ) = 0 P ( X 0 ) = P( X = 0) + P X = + P( X = ) + P( X = 3) = P( Ω) = (8) ( ) (9) The proba bility tha t the selec ted perso had at least cold: P ( X ) = P( X = ) + P( X = 3) = (0) The probability that the selected p erso had at most colds: P ( X ) = () The probabil ity that the selected perso had more tha colds: P( X > ) = P( X = 3) = () The probability that the selected perso had less tha colds: P ( X < ) = P( X = 0) + P( X = ) = Graphical Presetatio: The probability distributio of a discrete r. v. X ca be graphically represeted. Example: The probability distributio of the radom variable i the previous example is: x P ( X = x) The graphical presetatio of this probability distributio is give by the followig figure: Kig Saud Uiversity 58

59 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Mea ad Varia ce of a Discrete Radom Variable Mea: The mea (or expected value) of a discrete radom variable X is deoted by µ or µ X. It is defied by: µ = x P( X = x) x Variace: The variace of a discrete radom variable X is deoted by σ or σ X. It is defied by: σ = ( x µ) P( X = x) x Example: We wish to calculate the mea µ ad the variace of the discrete r. v. X whose probability distributio is give by the followig table: x P ( X = x) Solutio: x P ( X = x) x P( X = x) ( µ ) x ( x µ) ( x µ ) P( X = x) Total µ = σ = x P( X = x) ( x µ ) P( X = x) =.9 = 0.69 ( X = x) = ( 0 )( 0.05) + ( )( 0.5) + ( )( 0.45) + ( 3)( 0.5) =. 9 µ = x P x = ( x.9) P X = x σ ( ) x = ( 0.9) ( 0.05) + (.9) ( 0.5) + (. 9) ( 0.45) + ( 3.9) ( 0.5) = 0.69 Kig Saud Uiversity 59

60 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Cumulative Distributios: The cumulative distributio fuctio of a discrete r. v. X is defied by: ( X x) = P( X = a) P (Sum over all values x) a x Example: Calculate the cumulative distributio of the discrete r. v. X whose probability distributio is give by the followig table: x P ( X = x) Use the cumulative distributio to fid: P(X ), P(X<), P(X.5), P(X<.5), P(X>), P(X ) Solutio: The cumulative distributio of X is: x P( X x) ( X 0 ) = P ( X = 0) ( X ) = P( X = 0) + P( X = ) ( X ) = P( X = 0) + P( X = ) + P( X = ) ( X 3 ) = P( X = 0) + + P( X = 3) P P P P L Usig the cumulative distributio, P(X ) = 0.75 P(X<) = P(X ) = 0.30 P(X.5) = P(X ) = 0.30 P(X<.5) = P(X ) = 0.30 P(X>) = - P( ( X > ) ) = -P(X ) = = 0.70 P(X ) = - P( ( X ) ) = -P(X<) = - P(X 0) = = 0.95 Example: (Readig Assigmet) Give the followig probability distributio of a discrete radom variable X represetig the umber of defective teeth of the patiet visitig a Kig Saud Uiversity 60

61 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 certai detal cliic: x P(X = x) K a) Fid the value of K. b) Fid the flowig probabilities:. P(X < 3). P( X 3) 3. P(X < 6) 4. P(X < ) 5. P(X = 3.5) c) Fid the probability that the patiet has at least 4 defective teeth. d) Fid the probability that the patiet has at most defective teeth. e) Fid the expected umber of defective teeth (mea of X). f) Fid the variace of X. Solutio: a) = P( X = x) = K = K K = 0.95 K = 0.05 The probability distributio of X is: x P(X = x) Total.00 b) Fidig the probabilities:. P(X < 3) = P(X=)+P(X=) = = P( X 3) = P(X=)+P(X=)+P(X=3) = P(X < 6) = P(X=)+P(X=)+ P(X=3)+P(X=4)+P(X=5)= P(Ω)= 4. P(X < ) = P(φ)=0 5. P(X = 3.5) = P(φ)=0 c) The probability that the patiet has at least 4 defective teeth P(X 4) = P(X=4)+P(X=5) = =0. d) The probability that the patiet has at most defective teeth P(X ) = P(X=)+P(X=) = =0.6 Kig Saud Uiversity 6

62 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 e) The expected umber of defec tive teeth (mea of X) x P(X = x) x P(X = x) Total P( X = x) = µ = x P( X = x) =.4 The expected umber of defective teeth (mea of X) is µ = x P( X = x) =()(0.5)+()(0.35)+(3)(0.)+(4)(0.5)+(5)(0.05)=.4 f) The variace of X: x P ( X = x) ( x µ ) ( x µ) ( x µ ) P( X = x) Total σ = ( x µ ) P( X = x) =.34 The variace is σ = ( x µ ) P( X = x) =.34 Kig Saud Uiversity 6

63 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Combiatios: Notatio (!):! is read " factorial". It defied by: ( )( ) L( )( )! = for 0!= 5! = = Example: ()()( )( )( ) 0 Combiatios: The umber of differet ways for selectig r objects from distict objects is deoted by C r or ad is give by: r! C r = ; for r = 0,,, K, r! r! ( ) Notes:. Cr is read as choose r.. C =, C 0 =, 3. Cr = C r (for example: 0C 3 = 0C 7 ) 4. Cr = umber of uordered subsets of a set of () objects s uch that each subset cotais (r) objects. Example: For = 4 ad r = : 4! 4! 4 3 C 4 = = = = = 6! ( 4 )!!! ( ) ( ) 4 C = 6 = The umber of differet ways for selectig objects from 4 distict objects. Example: Suppose th at we have the set {a, b, c, d} of (=4) objects. We wish to choose a subset of two objects. The possible subsets of this set with elemets i each subset are: {a, b}, {a, c}, {a, d}, {b, d}, {b, c}, {c, d} The umber of these subsets is 4C = 6. Kig Saud Uiversity 63

64 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/ Biomial Distributio: Beroulli Trial: is a experimet with oly two possible outcomes: S = success ad F= failure (Boy or girl, Saudi or o-saudi, sick or well, dead or alive). Biomial distributio is a discrete distributio. Biomial distributio is used to model a experimet for which:. The experimet has a sequece of Beroulli trials.. The probability of success is P ( S ) = p, ad the probability of failure is P ( F ) p = q 3. The pr obability of success ( S ) p =. P = is costat for each trial. trial has 4. The trials are idepedet; that is the outcome of oe o effect o the outcome of ay other trial. I this type of experimet, we are iterested i the represetig the umber of successes i the trials. discrete r. v. X = The umber of successes i the trials The possible values of X (umber of success i trails) are: x = 0,, 3,, The r.v. X has a biomial distributio with parameters ad p, ad we write: X ~ Biomial, p ( ) The probability distributio of X is give by: x x Cx p q for x = 0,,, K, P( X = x) = 0 otherwise Where: C x! = x! ( x)! We ca write the probability distribu tio of X as a table as follows. x P ( X = x) C p q = q 0 C p q Kig Saud Uiversity 64

65 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x P ( X = x) C p q M M C Total.00 p q 0 C p q = Result: (Mea ad Variace for ormal distributio) If X~ Biomial(, p), the The mea: µ = p (expected valu e) The variace: σ = pq Example: Suppose that the probability that a Saudi ma has high blood pressure is 0.5. Suppose that we radomly select a sample of 6 Saudi me. () Fid the probability distributio of the radom variable (X) represetig the umber of me with high blood pressure i the sample. ( ) Fid the expected umber of me with high blood pressure i the sample (mea of X). (3) Fid the variace X. (4) What is the probability that there will be exactly me with high blood pressure? (5) What is the probability that there will be at most me with high blood pressure? (6) What is the probabil ity that there will be at lease 4 me with high blood pressure? Solutio: We are iterested i the followig radom variable: X = The umber of me with high b lood pressure i the sample of 6 me. No tes: - Beroulli trial: diagosig whether a ma has a high blood pressure or ot. There are two outcomes for each trial: p Kig Saud Uiversity 65

66 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 S = Success: The ma has high blood pressure F = failure: The ma does ot have high blood pressure. - Number of trials = 6 (we eed to check 6 me) - Probability of success: P ( S ) = p = P robability of failure: P ( F ) = q = p = Number of trials: = 6 - The trials are idepedet because of the fact that the result of each ma does ot affect the result of ay other ma sice the selectio was made ate radom. The radom variable X has a biomial distributio with parameters: =6 ad p=0.5, that is: X ~ Biomial (, p) X ~ Biomial (6, 0.5) The possible values of X are: x = 0,, 3, 4, 5, 6 () The probability distributio of X is: P ( X = x) = 6 x ( 0.5) ( 0. ) Cx x ; x = 0,,,3, 4,5, 6 ; otherwise The probabilities of all values of X are: P P P ( X = 0) = 6 C ( 0.5) ( 0.85) = ( )( 0.5) ( 0.85) = ( X =) = 6 C 5 5 ( 0.5) ( 0.85) = ( 6)( 0.5)( 0.85) = ( X = ) = 6 C 4 4 ( 0.5) ( 0.85) = ( 5)( 0.5) ( 0.85) = ( X = 3) = 6C ( 0.5) ( 0.85) = ( 0)( 0.5) ( 0.85) = ( X = 4) = 6 C4 4 4 ( 0.5) ( 0.85) = ( 5)( 0.5) ( 0.85) = ( X = 5) = 6 C5 5 5 ( 0.5) ( 0.85) = ( 6)( 0.5) ( 0.85) = ( X = 6) = C ( 0.5) ( 0.85) = ( )( 0.5) ( ) P 5 P P 9 P 6 6 = The probability distributio of X ca by preseted by the followig table: Kig Saud Uiversity 66

67 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x P ( X = x) The probability distributio of X ca by preseted by the followig graph: () The mea of the distributio (the expected umber of me out of 6 with high blood pressure) is: µ = p = ( 6 )( 0.5) = 0. 9 (3) The variace is: σ = pq = ( 6)( 0.5)( 0.85) = (4) The probability that there will be exactly me with high blood pressure is: P(X = ) = (5) The probability that there will be at most me with high blood pressure is: P(X ) = P(X=0) + P(X=) + P(X=) = = (6) The probability that there will be at lease 4 me with high blood pressure is: Kig Saud Uiversity 67

68 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P(X 4) = P(X=4) + P(X=5) + P(X=6) = = Example: (Readig Assigmet) Suppose that 5% of the people i a certai populatio have low hemoglobi levels. The experimet is to choose 5 people at radom from this populatio. Let the discrete radom variable X be the umber of people out of 5 with low hemoglobi levels. a) Fid the probability distributio of X. b) Fid the probability that at least people have low hemoglobi levels. c) Fid the probability that at most 3 people have low hemoglobi levels. d) Fid the expected umber of people with low hemoglobi levels out of the 5 people. e) Fid the variace of the umber of people with low hemoglobi levels out of the 5 people. Solutio: X = the umber of people out of 5 with low hemoglobi levels The Beroulli trail is the process of diagosig the perso Success = the perso has low hemoglobi Failure = the perso does ot have low hemoglobi = 5 (o. of trials) p = 0.5 (probability of success) q = p = 0.75 (probability of failure) a) X has a biomial distributio with parameter = 5 ad p = 0. 5 X ~ Biomial (, p) X ~ Biomial( 5, 0.5) The possible values of X are: x=0,,, 3, 4, 5 The probability distributio is: x p q ; for x 0,,, K, ( = ) = x x C = P X x 0 ; otherwise P ( X = x) 5C = 0 x 5 x x (0.5) (0.75) ; ; for otherwise x = 0,,, 3,4,5 x P(X = x) C = Kig Saud Uiversity 68

69 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x P(X = x) 5 C = C = C = C = C = Total P ( X = x) = b) The probability that at least people have low hemoglobi levels: P(X ) = P( X= )+P(X=3)+P(X=4)+P(X=5) = = c) The probability that at most 3 people have low hemoglobi levels: P(X 3) = P(X=0)+P(X=)+P(X=)+P(X=3) = = d) The expected umber of people with low hemoglobi levels out of the 5 people (the mea of X): µ = p = 5 0.5=. 5 e) The variace of the umber of people with low hemoglobi levels out of the 5 people (the variace of X) is: σ = pq = = The Poisso Distributio: It is a discrete distributio. The Poisso distributio is used to model a discrete r. v. represetig the umber of occurreces of some radom evet i a iterval of time or space (or some volume of matter). The possible values of X are: x= 0,,, 3, The discrete r. v. X is said to have a Poisso distributio with parameter (average or mea) λ if the probability distributio of X is give by Kig Saud Uiversity 69

70 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P ( X = x) = λ x e λ x! 0 ; ; for x = otherwise 0,,, 3, K where e =.788 (the atural umber). We write : X ~ Poisso (λ) Result: (Mea ad Variace of Poisso distributio) If X ~ Poisso (λ), the: The mea (average) of X is : µ = λ (Expected va lue) The variace of X is: σ = λ Example: Some radom quatities that ca be modeled by Poisso distributio: No. of patiets i a waitig room i a hours. No. of surgeries performed i a moth. No. of rats i each house i a particular city. Note: λ is the average (mea) of the distributio. If X = The umber of patiets see i the emergecy uit i a day, ad if X ~Poisso (λ), the:. The average (mea) of patiets see every day i the emergecy uit = λ.. The average (mea) of patiets see every moth i the emergecy uit =30λ. 3. The average (mea) of patiets see every year i the emergecy uit = 365λ. 4. The average (mea) of patiets see every hour i the emergecy uit = λ/4. Also, otice that: (i) If Y = The umber of patiets see every moth, the: Kig Saud Uiversity 70

71 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Y ~ Poisso (λ * ), where λ * =30λ (ii) W = The umber of patiets see every year, the: W ~ Poisso (λ * ), where λ * =365λ (iii) V = The umber of patiets see every hour, the: V ~ Poisso (λ * ), where λ * = Example: Suppose that the umber of sake bites cases see at KKUH i a year has a Poisso distributio with average 6 bite cases. ( ) What is the probability that i a year: ( i) The o. of sake bite cases will be 7? (ii) The o. of sake bite cases will be less tha? () What is the probability that there will be 0 sake bite cases i year s? (3) What is the probability that there will be o sake bite cases i a moth? Solutio: λ 4 () X = o. of s ake bite cases i a year. X ~ Poisso (6) (λ=6) 6 x e 6 P( X = x) = ; x! x = 0,,, K (i) 6 7 e 6 P( X = 7) = = ! (ii) ( X ) = P( ) + P( ) P < X = 0 X = e 6 e 6 = + = = !! () Y = o of sake bite cases i years * λ Y ~ Poisso() ( = λ = ( )( 6) = e y! ( = y) = : y = 0,, K P Y P e ( Y = 0) = = y 0! 0 (3) W = o. of sake bite cases i a moth. * W ~ Poisso (0.5) ( = 6 * λ = λ = 0. 5 ) ) Kig Saud Uiversity 7

72 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 e w! ( = w) = : w = 0,, K P W ( = 0) P W e = 0.5 ( 0.5) 0! 0 w = Cotiuous Probability Distributios: For ay cotiuous r. v. X, there exists a fuctio f(x), calle d the probability desity fuctio (pdf) of X, for which: () The total area uder the curve of f(x) equals to. = b Total area f ( x) dx = P( a X b) = f ( x) dx = area () The probability hat X is betwee the poits (a) ad (b) equals to the area uder the curve of f(x) which is bouded by the poit a ad b. (3) I geeral, the probability of a iterval evet is give by the area uder the curve of f(x) ad above that iterval. a P P(X a)= f ( x) dx = area P (X b) = f (x) dx = area b ( a X b ) = f ( x) dx = area a Note: If X is cotiuous r.v. th e:. P ( X = a) = 0 for ay a.. P ( X a) = P( X < a) a b Kig Saud Uiversity 7

73 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 3. P ( X b) = P( X > b) 4. a X b = P a X < b = P a < X 5. P( X x)= cumulative probability 6. P( X a) = P( X < a) = P( X a) 7. P a X b = P X b P X a ( ) ( ) ( b) = P( a < X b) P < ( ) ( ) ( ) P ( X a) = P( X a) P( a X b) = P( X b) P( X a ) A = B Total area = f ( x) dx = f ( x) dx f ( x) dx 4.6 The Normal D istributio: Oe of t he most importat cotiuous distributios. May measurable characteristics are ormally or approximately ormally distributed. (Examples: height, weight, ) The probability desity fuctio of the ormal distributio is give by: x µ ( ) σ f ( x) = e ; < x < σ π where (e=.788) ad (π=3.459). The parameters of the distributio are the mea ( µ) ad the stadard deviatio ( σ). The cotiuous r. v. X which has a ormal distributio has several importat characteristics:. < X <,. The desity fuctio of X, f(x), has a bell-shaped curve: b a b a Kig Saud Uiversity 73

74 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 mea = µ stadard deviatio = σ variace = σ 3. The highest poit of the curve of f(x) at the mea µ. (Mode = µ ) 4. The curve of f(x) is symmetric about the mea µ. µ = mea = mode = media 5. The ormal distributio depeds o two parameters: mea = µ (determies the locatio) stadard deviatio = σ (determies the shape) 6. If the r.v. X is ormally distributed with mea µ ad stadard deviatio σ (variace σ ), we write: X ~ Normal ( µ,σ ) or X ~ N( µ,σ ) 7. The locatio of the ormal distributio depeds o µ. The shape of the ormal distributio depeds o σ. Not e: The locatio of the ormal distributio depeds o µ ad its shape depeds o σ. Suppose we have two ormal distributios: N(µ, σ ) N(µ, σ ) µ < µ, σ =σ Kig Saud Uiversity 74

75 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 µ = µ, σ < σ µ < µ, σ <σ The Stadard Normal Distributio: The ormal distributio with mea µ = 0 ad variace σ = is called the stadard ormal distributio ad is deoted by Normal (0,) or N(0,). The stadard ormal radom variable is deoted by (Z), ad we write: Z ~ N(0, ) The probability desity fuctio (pdf) of Z~N(0,) is give by: f ( z) = ( z;0,) = z e π The stadard ormal distributio, Normal (0,), is very importat because pro babilities of ay ormal distributio ca be calculated from the probabilities of the stadard ormal distributio. Kig Saud Uiversity 75

76 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Result: If X ~ Normal (,σ ) µ, the X µ Z = ~ Normal (0,). σ Calculatig Probabilities of Normal (0,): Suppose Z ~ Normal (0,). For the stadard ormal distributio Z ~ N(0,), there is special table used to calculate probabilities of the form: P Z a ( ) a (i) P( Z a)= From the table (ii) ( ) ( ) P Z b = P Z b Where: P( Z b) = From the table (iii) P( a Z b) = P( Z b) P( z a) Where: P Z b P z a ( ) ( ) = from the table = from the table Kig Saud Uiversity 76

77 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 ( ) 0 (iv) P Z = a = for every a. Example: Suppose that Z ~ N(0,) () P ( Z.50) = Z : : () P ( Z.98) 0 = P( Z 0.98) = = Z : : : : (3) P (. 33 Z. 4) P ( Z.4) = P ( Z.33) = = Z : : : (4) P ( Z 0 ) = P( Z 0) = 0. 5 N otatio: P( Z Z A ) = A Kig Saud Uiversity 77

78 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 For example: Result: Sice the pdf of Z~N(0,) is symmetric about 0, we have: Z A = Z A For example: Z 0.35 = Z 0.35 = Z 0.65 Z 0.86 = Z 0.86 = Z 0.4 Example: Suppose that Z ~ N(0,). If P ( Z a) = The a =.65 Z 0.05 : : Kig Saud Uiversity 78

79 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: Suppose that Z~N(0,). Fid the value of k such that P(Z k)= Solutio:.k =.04 Notice that k= Z =.04 Z : : : Example: If Z ~ N(0,), the: Z 0.90 =.85 Z 0.95 =.645 Z =.96 Z 0.99 =.35 Usig the result: Z A = Z A Z 0.0 = Z 0.90 =.85 Z 0.05 = Z 0.95 =.645 Z 0.05 = Z =.96 Z 0.0 = Z 0.99 =.35 Calculatig Probabilities of Normal ( µσ, ) Recall the result: X ~ Normal ( µσ, ) X µ = ~ σ Z Normal (0,) : Kig Saud Uiversity 79

80 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 X a X µ a µ a µ Z σ σ σ. ( X a) = P Z a µ σ P = From the table.. P( X a) = P( X a) = P Z 3. P X a µ σ a ( a X b) = P( X b) P( ) ( ) 0 = P Z 4. P X = a =, for every a. b µ µ P Z a σ σ 4.7 Normal Distributio Applicatio: Example Suppose that the hemoglobi levels of healthy adult males are approximately ormally distributed with a mea of 6 ad a variace of 0.8. (a) Fid that probability that a radomly chose healthy adult male has a hemoglobi level less tha 4. (b) What is the percetage of healthy adult males who have hemoglobi level less tha 4? (c) I a populatio of 0,000 healthy adult males, how may would you expect to have hemoglobi level less tha 4? Solutio: X = hemoglobi level for healthy adults males Mea: µ = 6 Variace: σ = 0.8 Stadard deviatio: σ = 0.9 X ~ Normal (6, 0.8) (a) The probability that a radomly chose healthy adult male has hemoglobi level less tha 4 is ( X 4) P. Kig Saud Uiversity 80

81 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P 4 µ = P Z σ 4 6 = P Z 0.9 = P ( X 4) ( Z.) = 0.03 (b) The percetage of healthy adult males who have hemoglobi level less tha 4 is: P X 4 00 % = % =.3 ( ) % (c) I a populatio of 0000 healthy adult males, we would expect that the umber of males with hemoglobi level less tha 4 to be: P X = = 3 ( ) males Example: Suppose that the birth weight of Saudi babies has a ormal distributio with mea µ=3.4 ad stadard deviatio σ=0.35. (a) Fid the probability that a radomly chose Saudi baby has a birth weight betwee 3.0 ad 4.0 kg. (b) What is the percetage of Saudi babies who have a birth weight betwee 3.0 ad 4.0 kg? (c) I a populatio of Saudi babies, how may would you expect to have birth weight betwee 3.0 ad 4.0 kg? Solutio: X = birth weight of Saudi babies Mea: µ = 3.4 Stadard deviatio: σ = 0.35 Variace: σ = (0.35) = 0.5 X ~ Normal (3.4, 0.5 ) (a) The probability that a radomly chose Saudi baby has a P 3.0 < X < 4.0 birth weight betwee 3.0 ad 4.0 kg is ( ) Kig Saud Uiversity 8

82 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P ( 3.0 < X < 4.0) = P( X 4.0) P( X 3.0) 4.0 µ 3.0 µ = P Z P Z σ σ = P Z P Z = P ( Z.7) P( Z.4) = = (b) The percetage of Saudi babies who have a birth weight betwee 3.0 ad 4.0 kg is P(3.0<X<4.0) 00%= %= 8.93% (c) I a populatio of 00,000 Saudi babies, we would expect that the umber of babies with birth weight betwee 3.0 ad 4.0 kg to be: P(3.0<X<4.0) 00000= = 8930 babies Kig Saud Uiversity 8

83 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Stadard Normal Table Areas Uder the Stadard Normal Curve z z Kig Saud Uiversity 83

84 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Stadard Normal Table (cotiued) Areas Uder the Stadard Normal Curve z z Kig Saud Uiversity 84

85 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER 5: Probabilistic Features of the Distributios of Certai Sample Statistics 5. Itroductio: I this Chapter we will discuss the probability distributios of some statistics. As we metio earlier, a statistic is measure computed form the radom sample. As the sample values vary from sample to sample, the value of the statistic varies accordigly. A statistic is a radom variable; it has a probability distributio, a mea ad a variace. 5. Samplig Distributio: The probability distributio of a statistic is called the samplig distributio of that statistic. The samplig distributio of the statistic is used to make statistical iferece about the ukow parameter. 5.3 Distributio of the Sample Mea: (Samplig Distributio of the Sample Mea X ): Suppose that we have a populatio with mea µ ad variace σ. Suppose that X, X, K, X is a radom sample of size () selected radomly from this populatio. We kow that the sample mea is: X = i= Suppose that we select several radom samples of size =5. st sample d sample 3rd sample Last sample Sample values Sample mea X X i. Kig Saud Uiversity 85

86 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 - The value of the sample mea X varies from radom sample to aother. - The value of X is radom ad it depeds o the radom sample. - The sample mea X is a radom variable. - The probability distributio of X is called the samplig distributio of the sample mea X. - Questios: o What is the samplig distributio of the sample mea X? o What is the mea of the sample mea X? o What is the variace of the sample mea X? Some Results about Samplig Distributio of X : Result (): (mea & variace of X ) If X, X, K, X is a radom sample of size from ay distri butio with mea µ ad variace σ ; the:. The mea of X is: µ X = µ.. The variace of X is: σ σ X =. 3. The Stadard deviatio of X is call the stadard error ad is defied by: σ = σ = X X σ. Result (): (Samplig from ormal populatio) If X, X, K, X is a radom sample of size from a ormal populatio with mea µ ad variace σ ; that is Normal( µ,σ ), the the sample mea has a ormal distributio with mea µ ad variace σ /, that is:. X ~ Normal µ, σ. X µ Z σ /. = ~ Normal (0,). Kig Saud Uiversity 86

87 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 We use this result whe samplig from ormal distributio with kow variace σ. Result (3): (Cetral Limit Theorem: Samplig from Noormal populatio) Suppose that X, X, K, X is a radom sample of size from o-ormal populatio with mea µ ad variace σ. If the sample size is large ( 30), the the sample mea has approximately a ormal distributio with mea µ ad variace σ /, that is, σ. X Normal µ (approximately) X µ. Z = Normal (0,) (approximately) σ / Note: meas approximately distributed. We use this result whe samplig from o-ormal distributio with kow variace σ ad with large sample size. Result (4): (used whe σ is ukow + ormal distributio) If X, X, K, X is a radom sample of size from a ormal distributio with mea µ ad ukow variace σ ; that is Normal( µ,σ ), the the statistic: X µ T = S / has a t- distributio with ( ) degrees of freedom, where S is the sample stadard deviatio give by: i = ( X i X ) S = S = We write: X µ T = ~ t( ) S / Notatio: degrees of freedom = df = ν Kig Saud Uiversity 87

88 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 The t-distributio: (Sectio 6.3. pp 7-74) Studet's t distributio. t-distributio is a distributio of a cotiuous radom variable. Recall that, if X, X,, X is a radom sample of size from a ormal distributio with mea µ ad variace σ, i.e. N(µ,σ ), the X µ Z = ~N(0,) σ / We ca apply this result oly whe σ is kow! If σ is ukow, we replace the populatio variace σ ( X ) with the sample variace = = i X i S to have the followig statistic X µ T = S / Recall: If X, X,, X is a radom sample of size from a ormal distributio with mea µ ad variace σ, i.e. N(µ,σ ), the the statistic: X µ T = S / has a t-distributio with ( ) degrees of freedom ( df = ν = ), ad we write T~ t(ν) or T~ t( ). Note: t-distributio is a cotiuous distributio. The value of t radom variable rage from - to + (that is, - <t< ). The mea of t distributio is 0. It is symmetric about the mea 0. The shape of t-distributio is similar to the shape of the stadard ormal distributio. t-distributio Stadard ormal distributio as. Kig Saud Uiversity 88

89 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Notatio: (t α) t α = The t-value uder which we fid a area equal to α = The t-value that leaves a area of α to the left. The value t α satisfies: P(T< t α) = α. Sice the curve of the pdf of T~ t(ν) is symmetric about 0, we have t α = t α For example: t 0.35 = t 0.35 = t 0.65 t 0.8 = t 0.86 = t 0.4 Values of t α are tabulated i a special table for several values of α ad several values of degrees of freedom. (Table E, appedix p. A-40 i the textbook). Example: Fid the t-value with ν=4 (df) that leaves a area of: (a) 0.95 to the left. (b) 0.95 to the right. Solutio: ν = 4 (df); T~ t(4) (a) The t-value that leaves a area of 0.95 to the left is t 0.95 =.76. Kig Saud Uiversity 89

90 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 (b) The t-value that leaves a area of 0.95 to the right is t 0.05 = t 0.05 = t 0.95 =.76 Note: Some t-tables cotai values of α that are greater tha or equal to Whe we search for small values of α i these tables, we may use the fact that: t α = t α Example: For ν = 0 degrees of freedom (df), fid t 0.93 ad t Solutio: t 0.93 = (.37+.8)/ =.59 (from the table) t 0.07 = t 0.07 = t 0.93 =.59 (usig the rule: t α = t α) Kig Saud Uiversity 90

91 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Critical Values of the t-distributio (t α ) ν=df t 0.90 t 0.95 t t 0.99 t Kig Saud Uiversity 9

92 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Applicatio: Example: (Samplig distributio of the sample mea) Suppose that the time duratio of a mior surgery is approximately ormally distributed with mea equal to 800 secods ad a stadard deviatio of 40 secods. Fid the probability that a radom sample of 6 surgeries will have average time duratio of less tha 775 secods. Solutio: X= the duratio of the surgery µ=800, σ=40, σ = 600 X~N(800, 600) Sample size: =6 Calculatig mea, variace, ad stadard error (stadard deviatio) of the sample mea X : Mea of X : µ X = µ =800 σ 600 Variace of X : σ = = 00 X = 6 Stadard error (stadard deviatio) of σ 40 X : σ = = = 0 X 6 Usig the cetral limit theorem, X has a ormal distributio with mea µ = 800 ad variace σ = 00, that is: X σ X ~ N(µ, X )=N(800,00) X µ X 800 Z = = ~N(0,) σ / 0 The probability that a radom sample of 6 surgeries will have a average time duratio that is less tha 775 secods equals to: X µ 775 µ X P( X < 775) = P < = P < σ / σ / = P Z < = P ( Z <.50) = Kig Saud Uiversity 9

93 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: If the mea ad stadard deviatio of serum iro values for healthy mea are 0 ad 5 microgram/00ml, respectively, what is the probability that a radom sample of size 50 ormal me will yield a mea betwee 5 ad 5 microgram/00ml? Solutio: X= the serum iro value µ=0, σ=5, σ = 5 X~N(0, 5) Sample size: =50 Calculatig mea, variace, ad stadard error (stadard deviatio) of the sample mea X : Mea of X : µ X = µ =0 σ 5 Variace of X : σ = = 4. 5 X = 50 Stadard error (stadard deviatio) of σ 5 X : σ = = =. X 50 Usig the cetral limit theorem, X has a ormal distributio with mea µ = 0 ad variace σ = 4. 5, that is: X σ X ~ N(µ, X )=N(0,4.5) X µ 0 Z = = X ~N(0,) σ /. The probability that a radom sample of 50 me will yield a mea betwee 5 ad 5 microgram/00ml equals to: 5 µ X µ 5 µ P(5 < X < 5) = P < < σ / σ / σ / Kig Saud Uiversity 93

94 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/ X µ 5 0 = P < < = P.36 < Z <.36. σ /. Z <.36 P Z <.36 P ( ) ( ) = - = = ( ) 5.4 Distributio of the Differece Betwee Two Sample Meas ( X X ): Suppose that we have two populatios: -st populatio with mea µ ad variace σ -d populatio with mea µ ad variace σ We are iterested i comparig µ ad µ, or equivaletly, makig ifereces about the differece betwee the meas (µ µ ). We idepedetly select a radom sample of size from the -st populatio ad aother radom sample of size from the -d populatio: Let X ad S be the sample mea ad the sample variace of the -st sample. Let X ad S be the sample mea ad the sample variace of the -d sample. The samplig distributio of X X is used to make ifereces about µ µ. Kig Saud Uiversity 94

95 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 The samplig distributio of X X : Result: The mea, the variace ad the stadard deviatio of X X are: Mea of X X is: µ = µ µ X X Variace of X X is: σ σ σ = + X X Stadard error (stadard) deviatio of X X is: σ σ = σ X X = X X σ + Result: If the two radom samples were selected from ormal distributios (or o-ormal distributios with large sample σ σ sizes) with kow variaces ad, the the differece betwee the sample meas ( X ) has a ormal distributio with mea ( µ X ( σ / ) + ( σ / µ ) ad variace ( ) ), that is: σ X X ~ N µ µ, ( X X ) ( ) σ + µ µ Z = ~ N(0,) σ σ Applicatio: + Example: Suppose it has bee established that for a certai type of cliet (type A) the average legth of a home visit by a public health urse is 45 miutes with stadard deviatio of 5 miutes, ad that for secod type (type B) of cliet the average home visit is 30 miutes log with stadard deviatio of 0 miutes. If a urse radomly visits 35 cliets from the first type ad 40 Kig Saud Uiversity 95

96 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 cliets from the secod type, what is the probability that the average legth of home visit of first type will be greater tha the average legth of home visit of secod type by 0 or more miutes? Solutio: For the first type: µ = 45 σ = 5 σ 5 = = 35 For the secod type: µ = 30 σ = 0 σ 400 = = 40 The mea, the variace ad the stadard deviatio of X X are: Mea of X X is: µ X = µ µ X = = 5 Variace of X X is: σ σ σ = + = + = X X Stadard error (stadard) deviatio of X X is: σ X X The samplig distributio of X = σ X X = X X is: ~ 5, X N ( ) = ( X X ) 5 Z = ~ N(0,) The probability that the average legth of home visit of first type will be greater tha the average legth of home visit of secod type by 0 or more miutes is: Kig Saud Uiversity 96

97 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P( X ( X X > 0) = P = > 0 5 Z X σ ) ( µ µ ) σ + 0 ( µ µ ) > σ σ + P = P(Z>.3) = P(Z<.3) = = Distributio of the Sample Proportio ( pˆ ): For the populatio: N( A)= umber of elemets i the populatio with a specified characteristic A N = total umber of elemets i the populatio (populatio size) The populatio proportio is N( A) p = (p is a parameter) N For the sample: ( A)= umber of elemets i the sample with the same characteristic A = sample size The sample proportio is ( A) p ˆ = ( pˆ is a statistic) The samplig distributio of pˆ is used to make ifereces Kig Saud Uiversity 97

98 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 about p. Result: The mea of the sample proportio ( pˆ ) is the populatio proportio (p); that is: µ p ˆ = p The variace of the sample proportio ( pˆ ) is: p( p) pq σ p ˆ = =. (where q= p) The stadard error (stadard deviatio) of the sample proportio ( pˆ ) is: ( p) p pq σ p ˆ = = Result: For large sample size ( 30, p > 5, q > 5 ), the sample proportio ( pˆ ) has approximately a ormal distributio with mea µ p ad a variace σ ˆ = pq, that is: p ˆ = p / pq p ˆ ~ N p, (approximately) pˆ p Z = ~ N(0,) pq (approximately) Example: Suppose that 45% of the patiets visitig a certai cliic are females. If a sample of 35 patiets was selected at radom, fid the probability that:. the proportio of females i the sample will be greater tha the proportio of females i the sample will be betwee 0.4 ad 0.5. Solutio:. = 35 (large) p = The populatio proportio of females = 45 = Kig Saud Uiversity 98

99 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 pˆ = The sample proportio (proportio of females i the sample) The mea of the sample proportio ( pˆ ) is p = 0.45 The variace of the sample proportio ( pˆ ) is: p( p) pq 0.45( 0.45) = = = The stadard error (stadard deviatio) of the sample proportio ( pˆ ) is: ( p) p = = , p = = 5.75 > 5, q = = 9.5 > 5. The probability that the sample proportio of females ( pˆ ) will be greater tha 0.4 is: P( pˆ > 0.4) = P( pˆ < 0.4) = P = -P pˆ p p 0.4 p < ( ) ( ) p p p Z < ( ) = - P ( Z < 0.59) = = The probability that the sample proportio of females ( pˆ ) will be betwee 0.4 ad 0.5 is: P(0.4 < pˆ < 0.5) = P( pˆ < 0.5) P( pˆ < 0.4) = P = P pˆ p p 0.5 p < ( ) ( ) p p p Z < ( ) Kig Saud Uiversity 99

100 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 = P ( Z < 0.59) = = Distributio of the Differece Betwee Two Sample Proportios ( pˆ pˆ ): idepedet Suppose that we have two populatios: p = proportio of elemets of type (A) i the -st populatio. p = proportio of elemets of type (A) i the -d populatio. We are iterested i comparig p ad p, or equivaletly, makig ifereces about p p. We idepedetly select a radom sample of size from the -st populatio ad aother radom sample of size from the -d populatio: Let X = o. of elemets of type (A) i the -st sample. Let X = o. of elemets of type (A) i the -d sample. ˆp = X = sample proportio of the -st sample Kig Saud Uiversity 00

101 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 X ˆp = = sample proportio of the -d sample The samplig distributio of pˆ pˆ is used to make ifereces about p p. The samplig distributio of pˆ ˆ p : Result: The mea, the variace ad the stadard error (stadard deviatio) of pˆ pˆ are: Mea of pˆ p is: Variace of ˆ µ pˆ p pˆ pˆ = p p is: ˆ σ p ˆ ˆ p = p q p q + Stadard error (stadard deviatio) of pˆ pˆ is: q p ad σ p ˆ p ˆ = = q = p p q p q + Result: For large samples sizes ( 30, 30, p > 5, q > 5, p > 5, q > 5 ), we have that pˆ pˆ has approximately ormal distributio with mea µ p pˆ = p ad variace σ p q p q p ˆ = +, that is: ˆ p ˆ p p q p q p ˆ + pˆ ~ N p p, (Approximately) ( pˆ pˆ ) ( p p ) Z = p q p q ~ N(0,) (Approximately) + Kig Saud Uiversity 0

102 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: Suppose that 40% of No-Saudi residets have medical isurace ad 30% of Saudi residets have medical isurace i a certai city. We have radomly ad idepedetly selected a sample of 30 No-Saudi residets ad aother sample of 0 Saudi residets. What is the probability that the differece betwee the sample proportios, pˆ ˆ p, will be betwee 0.05 ad 0.? Solutio: p = populatio proportio of o-saudi with medical isurace. p = populatio proportio of Saudi with medical isurace. ˆp = sample proportio of o-saudis with medical isurace. ˆp = sample proportio of Saudis with medical isurace. p = 0.4 =30 p = 0.3 =0 µ p pˆ = p = = 0. ˆ p σ p ˆ ˆ p = p q p q + = (0.4)(0.6) 30 p σ q p q pˆ pˆ = + = (0.3)(0.7) 0 + = = 0.06 The probability that the differece betwee the sample proportios, pˆ p, will be betwee 0.05 ad 0. is: ˆ P(0.05 < pˆ pˆ <0.) = P( pˆ pˆ <0.) P( pˆ pˆ <0.05) = P ( pˆ pˆ ) ( p p ) 0. ( p p ) < p q p q p q p q + + Kig Saud Uiversity 0

103 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P ( pˆ pˆ ) ( p p ) 0.05 ( p p ) < p q p q p q p q + + = P Z < - P Z < Z < 0.83 = P ( Z <.67) - P ( ) = = Kig Saud Uiversity 03

104 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER 6: Usig Sample Data to Make Estimatios About Populatio Parameters 6. Itroductio: Statistical Ifereces: (Estimatio ad Hypotheses Testig) It is the procedure by which we reach a coclusio about a populatio o the basis of the iformatio cotaied i a sample draw from that populatio. There are two mai purposes of statistics; Descriptive Statistics: (Chapter & ): Orgaizatio & summarizatio of the data Statistical Iferece: (Chapter 6 ad 7): Aswerig research questios about some ukow populatio parameters. () Estimatio: (chapter 6) Approximatig (or estimatig) the actual values of the ukow parameters: - Poit Estimate: A poit estimate is sigle value used to estimate the correspodig populatio parameter. - Iterval Estimate (or Cofidece Iterval): A iterval estimate cosists of two umerical values defiig a rage of values that most likely icludes the parameter beig estimated with a specified degree of cofidece. () Hypothesis Testig: (chapter 7) Aswerig research questios about the ukow parameters of the populatio (cofirmig or deyig some cojectures or statemets about the ukow parameters). Kig Saud Uiversity 04

105 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 6. Cofidece Iterval for a Populatio Mea (µ) : I this sectio we are iterested i estimatig the mea of a certai populatio ( µ ). Populatio: Populatio Size = N Populatio Values: X X,, Populatio Mea: µ Populatio Variace: σ, K N i= = N X N i i= = X N ( X µ ) i N Sample: Sample Size = Sample values: Sample Mea: x x,, X Sample Variace:, K i= = S = x i i= x ( x x) i (i) Poit Estimatio of µ: A poit estimate of the mea is a sigle umber used to estimate (or approximate) the true value of µ. - Draw a radom sample of size from the populatio: - - Compute the sample mea: Result: The sample mea populatio mea ( µ ). X = x i i= x x,,, K X = x x i i= is a "good" poit estimator of the Kig Saud Uiversity 05

106 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 (ii) Cofidece Iterval (Iterval Estimate) of µ: A iterval estimate of µ is a iterval (L,U) cotaiig the true value of µ "with a probability of α ". * α = is called the cofidece coefficiet (level) * L = lower limit of the cofidece iterval * U = upper limit of the cofidece iterval Result: (For the case whe σ is kow) (a) If X, X K, X is a radom sample of size from a ormal distributio with mea µ ad kow variace σ, the: α 00% cofidece iterval for µ is: A ( ) X Z X Z α X ± Z X ± Z σ σ α, α < µ < σ σ X X + Z α X + Z α α σ σ (b) If X, X K, X is a radom sample of size from a o- ormal distributio with mea µ ad kow variace σ, ad if the sample size is large ( 30), the: α 00% cofidece iterval for µ is: A approximate ( ) X Z X Z X ± Z α X ± Z σ σ α, α < µ < σ σ X X + Z α X + Z α α σ σ Kig Saud Uiversity 06

107 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Note that:. We are ( α ) 00% cofidet that the true value of µ belogs σ σ. to the iterval ( X Z, X + Z ) α α. Upper limit of the cofidece iterval = 3. Lower limit of the cofidece iterval = 4. Z = Reliability Coefficiet 5. α Z α σ X + Z X Z α α σ σ = margi of error = precisio of the estimate 6. I geeral the iterval estimate (cofidece iterval) may be expressed as follows: X ± Z σ α estimator ± (reliability coefficiet) (stadard Error) estimator ± margi of error 6.3 The t Distributio: (Cofidece Iterval Usig t) We have already itroduced ad discussed the t distributio. Result: (For the case whe σ is ukow + ormal populatio) If X, X K, X is a radom sample of size from a ormal distri butio with mea µ ad ukow variace σ, the: α 00% cofidece iterval for µ is: A ( ) X t α X ± t X ± t S α α ˆ σ S X X + t, X α S Kig Saud Uiversity 07

108 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 where the degrees of freedom is: df = ν = -. Note that: α 00% cofidet that the true value of µ belogs. We are ( ) S S to the iterval X t α, X + t α.. σˆ = S X (estimate of the stadard error of X ) 3. t = Reliability Coefficiet α 4. I this case, we replace σ by S ad Z by t. 5. I geeral the iterval estimate (cofidece iterval) may be expressed as follows: Estimator ± (Reliability Coefficiet) (Estimate of the Stadard Error) X ± t ˆ σ α Notes: (Fidig Reliability Coefficiet) () We fid the reliability coefficiet Z from the Z-table as follows: X α () We fid the reliability coefficiet follows: (df = ν = -) t from the t-table as α Kig Saud Uiversity 08

109 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: Suppose that Z ~ N(0,). Fid Z α () α =0. () α =0.05 (3) α =0.0 Solutio: () For α =0.: 0. α = = 0.95 () For α =0.05: 0.05 α = = (3) For α =0.0: 0.0 α = = for the followig cases: Z = Z 0.95 =.645 α Z = Z =.96. α Z = Z =.575. α Example: Suppose that t ~ t(30). Fid t for α = = = α Solutio: df = ν = 30 α t α = t0.975 =. 043 Kig Saud Uiversity 09

110 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: (The case where σ is kow) Diabetic ketoacidosis is a potetial fatal complicatio of diabetes mellitus throughout the world ad is characterized i part by very high blood glucose levels. I a study o 3 patiets livig i Saudi Arabia of age 5 or more who were admitted for diabetic ketoacidosis, the mea blood glucose level was 6. mmol/l. Suppose that the blood glucose levels for such patiets have a ormal distributio with a stadard deviatio of 3.3 mmol/l. () Fid a poit estimate for the mea blood glucose level of such diabetic ketoacidosis patiets. () Fid a 90% cofidece iterval for the mea blood glucose level of such diabetic ketoacidosis patiets. Solutio: Variable = X = blood glucose level (quatitative variable). Populatio = diabetic ketoacidosis patiets i Saudi Arabia of age 5 or more. Parameter of iterest is: µ = the mea blood glucose level. Distributio is ormal with stadard deviatio σ = σ is kow ( σ = 0.89) X ~ Normal( µ, 0.89) µ =?? (ukow- we eed to estimate µ ) Sample size: = 3 (large) Sample mea: X = 6. () Poit Estimatio: We eed to fid a poit estimate for µ. X = 6. is a poit estimate for µ. µ 6. () Iterval Estimatio (Cofidece Iterval = C. I.): We eed to fid 90% C. I. for µ. 90% = ( α ) 00% α α = 0.9 α = = 0. α = The reliability coefficiet is: = Z =. 645 Z α 90% cofidece iterval for µ is: 0.95 Kig Saud Uiversity 0

111 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 σ σ X Z α, X + Z α (.645), 6. + (.645) 3 3 ( , ) ( , ) We are 90% cofidet that the true value of the mea µ lies i the iterval ( 5.7, 6.69), that is: 5.7 < µ < 6.69 Note: for this example eve if the distributio is ot ormal, we may use the same solutio because the sample size =3 is large. Example: (The case where σ is ukow) A study was coducted to study the age characteristics of Saudi wome havig breast lump. A sample of Saudi wome gave a mea of 37 years with a stadard deviatio of 0 years. Assume that the ages of Saudi wome havig breast lumps are ormally distributed. (a) Fid a poit estimate for the mea age of Saudi wome havig breast lumps. (b) Costruct a 99% cofidece iterval for the mea age of Saudi wome havig breast lumps Solutio: X = Variable = age of Saudi wome havig breast lumps (quatitative variable). Populatio = All Saudi wome havig breast lumps. Parameter of iterest is: µ = the age mea of Saudi wome havig breast lumps. X ~ Normal( µ, σ ) µ =?? (ukow- we eed to estimate µ ) σ =?? (ukow) Sample size: = Sample mea: X = 37 Kig Saud Uiversity

112 Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Sample stadard deviatio: S = 0 Degrees of freedom: df =ν = = 0 (a) Poit Estimatio: We eed to fid a poit estimate for µ. X = 37 is a "good" poit estimate for µ. µ 37 years (b) Iterval Estimatio (Cofidece Iterval = C. I.): We eed to fid 99% C. I. for µ. 99% = ( α ) 00% α α = 0.99 α = = 0. ν = df = 0 The reliability coefficiet is: t α = t0.995 =. 67 α = % cofidece iterval for µ is: Aother Way: 37 X ± t α 37 ± (.67) S 37 ±.38 0 ( 37.38, ) ( 34.6, 39.38) X t α S 0 + t, X α S 0 (.67), 37 + (.67) 37.38, ( ) Kig Saud Uiversity

Chapter 2 Descriptive Statistics

Chapter 2 Descriptive Statistics Chapter 2 Descriptive Statistics Statistics Most commoly, statistics refers to umerical data. Statistics may also refer to the process of collectig, orgaizig, presetig, aalyzig ad iterpretig umerical data

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

(6) Fundamental Sampling Distribution and Data Discription

(6) Fundamental Sampling Distribution and Data Discription 34 Stat Lecture Notes (6) Fudametal Samplig Distributio ad Data Discriptio ( Book*: Chapter 8,pg5) Probability& Statistics for Egieers & Scietists By Walpole, Myers, Myers, Ye 8.1 Radom Samplig: Populatio:

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter & Teachig Material.

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying. Lecture Mai Topics: Defiitios: Statistics, Populatio, Sample, Radom Sample, Statistical Iferece Type of Data Scales of Measuremet Describig Data with Numbers Describig Data Graphically. Defiitios. Example

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter 2 & Teachig

More information

What is Probability?

What is Probability? Quatificatio of ucertaity. What is Probability? Mathematical model for thigs that occur radomly. Radom ot haphazard, do t kow what will happe o ay oe experimet, but has a log ru order. The cocept of probability

More information

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2 Aa Jaicka Mathematical Statistics 18/19 Lecture 1, Parts 1 & 1. Descriptive Statistics By the term descriptive statistics we will mea the tools used for quatitative descriptio of the properties of a sample

More information

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Radom Samplig The basic idea of the statistical iferece is that we are allowed to draw ifereces or coclusios about a populatio based

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls Ecoomics 250 Assigmet 1 Suggested Aswers 1. We have the followig data set o the legths (i miutes) of a sample of log-distace phoe calls 1 20 10 20 13 23 3 7 18 7 4 5 15 7 29 10 18 10 10 23 4 12 8 6 (1)

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. XI-1 (1074) MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND. R. E. D. WOOLSEY AND H. S. SWANSON XI-2 (1075) STATISTICAL DECISION MAKING Advaced

More information

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9 BIOS 4110: Itroductio to Biostatistics Brehey Lab #9 The Cetral Limit Theorem is very importat i the realm of statistics, ad today's lab will explore the applicatio of it i both categorical ad cotiuous

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 7 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 013 by D.B. Rowe 1 Ageda: Skip Recap Chapter 10.5 ad 10.6 Lecture Chapter 11.1-11. Review Chapters 9 ad 10

More information

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY Structure 2.1 Itroductio Objectives 2.2 Relative Frequecy Approach ad Statistical Probability 2. Problems Based o Relative Frequecy 2.4 Subjective Approach

More information

Chapter 1 (Definitions)

Chapter 1 (Definitions) FINAL EXAM REVIEW Chapter 1 (Defiitios) Qualitative: Nomial: Ordial: Quatitative: Ordial: Iterval: Ratio: Observatioal Study: Desiged Experimet: Samplig: Cluster: Stratified: Systematic: Coveiece: Simple

More information

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Data Description. Measure of Central Tendency. Data Description. Chapter x i Data Descriptio Describe Distributio with Numbers Example: Birth weights (i lb) of 5 babies bor from two groups of wome uder differet care programs. Group : 7, 6, 8, 7, 7 Group : 3, 4, 8, 9, Chapter 3

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis Sectio 9.2 Tests About a Populatio Proportio P H A N T O M S Parameters Hypothesis Assess Coditios Name the Test Test Statistic (Calculate) Obtai P value Make a decisio State coclusio Sectio 9.2 Tests

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Topic 10: Introduction to Estimation

Topic 10: Introduction to Estimation Topic 0: Itroductio to Estimatio Jue, 0 Itroductio I the simplest possible terms, the goal of estimatio theory is to aswer the questio: What is that umber? What is the legth, the reactio rate, the fractio

More information

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Itroductio to Statistics Misleadig Iformatio: Surveys ad advertisig claims ca be biased by urepresetative samples, biased questios, iappropriate

More information

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers Chapter 4 4-1 orth Seattle Commuity College BUS10 Busiess Statistics Chapter 4 Descriptive Statistics Summary Defiitios Cetral tedecy: The extet to which the data values group aroud a cetral value. Variatio:

More information

Econ 371 Exam #1. Multiple Choice (5 points each): For each of the following, select the single most appropriate option to complete the statement.

Econ 371 Exam #1. Multiple Choice (5 points each): For each of the following, select the single most appropriate option to complete the statement. Eco 371 Exam #1 Multiple Choice (5 poits each): For each of the followig, select the sigle most appropriate optio to complete the statemet 1) The probability of a outcome a) is the umber of times that

More information

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions We have previously leared: KLMED8004 Medical statistics Part I, autum 00 How kow probability distributios (e.g. biomial distributio, ormal distributio) with kow populatio parameters (mea, variace) ca give

More information

Discrete probability distributions

Discrete probability distributions Discrete probability distributios I the chapter o probability we used the classical method to calculate the probability of various values of a radom variable. I some cases, however, we may be able to develop

More information

As stated by Laplace, Probability is common sense reduced to calculation.

As stated by Laplace, Probability is common sense reduced to calculation. Note: Hadouts DO NOT replace the book. I most cases, they oly provide a guidelie o topics ad a ituitive feel. The math details will be covered i class, so it is importat to atted class ad also you MUST

More information

Computing Confidence Intervals for Sample Data

Computing Confidence Intervals for Sample Data Computig Cofidece Itervals for Sample Data Topics Use of Statistics Sources of errors Accuracy, precisio, resolutio A mathematical model of errors Cofidece itervals For meas For variaces For proportios

More information

Introduction to Probability and Statistics Twelfth Edition

Introduction to Probability and Statistics Twelfth Edition Itroductio to Probability ad Statistics Twelfth Editio Robert J. Beaver Barbara M. Beaver William Medehall Presetatio desiged ad writte by: Barbara M. Beaver Itroductio to Probability ad Statistics Twelfth

More information

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 7: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review How ca we set a cofidece iterval o a proportio? 2 Review How ca we set a cofidece iterval

More information

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE TERRY SOO Abstract These otes are adapted from whe I taught Math 526 ad meat to give a quick itroductio to cofidece

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10 DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set

More information

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE Part 3: Summary of CI for µ Cofidece Iterval for a Populatio Proportio p Sectio 8-4 Summary for creatig a 100(1-α)% CI for µ: Whe σ 2 is kow ad paret

More information

106 Stat 1434 / 1435 H. Chapter 1: Organizing and Displaying Data

106 Stat 1434 / 1435 H. Chapter 1: Organizing and Displaying Data 106 Stat Refereces -Biostatistics : A foudatio i Aalysis i the Health Sciece -By : Waye W. Daiel -Elemetary Biostatistics with Applicatios from Saudi Arabia By : Nacy Hasabelaby 1434 / 1435 H Chapter 1:

More information

Formulas and Tables for Gerstman

Formulas and Tables for Gerstman Formulas ad Tables for Gerstma Measuremet ad Study Desig Biostatistics is more tha a compilatio of computatioal techiques! Measuremet scales: quatitative, ordial, categorical Iformatio quality is primary

More information

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function. MATH 532 Measurable Fuctios Dr. Neal, WKU Throughout, let ( X, F, µ) be a measure space ad let (!, F, P ) deote the special case of a probability space. We shall ow begi to study real-valued fuctios defied

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times Sigificace level vs. cofidece level Agreemet of CI ad HT Lecture 13 - Tests of Proportios Sta102 / BME102 Coli Rudel October 15, 2014 Cofidece itervals ad hypothesis tests (almost) always agree, as log

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Probability and statistics: basic terms

Probability and statistics: basic terms Probability ad statistics: basic terms M. Veeraraghava August 203 A radom variable is a rule that assigs a umerical value to each possible outcome of a experimet. Outcomes of a experimet form the sample

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics 8.2 Testig a Proportio Math 1 Itroductory Statistics Professor B. Abrego Lecture 15 Sectios 8.2 People ofte make decisios with data by comparig the results from a sample to some predetermied stadard. These

More information

MEASURES OF DISPERSION (VARIABILITY)

MEASURES OF DISPERSION (VARIABILITY) POLI 300 Hadout #7 N. R. Miller MEASURES OF DISPERSION (VARIABILITY) While measures of cetral tedecy idicate what value of a variable is (i oe sese or other, e.g., mode, media, mea), average or cetral

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS

Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS Lecture 7: No-parametric Compariso of Locatio GENOME 560 Doug Fowler, GS (dfowler@uw.edu) 1 Review How ca we set a cofidece iterval o a proportio? 2 What do we mea by oparametric? 3 Types of Data A Review

More information

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700 Class 23 Daiel B. Rowe, Ph.D. Departmet of Mathematics, Statistics, ad Computer Sciece Copyright 2017 by D.B. Rowe 1 Ageda: Recap Chapter 9.1 Lecture Chapter 9.2 Review Exam 6 Problem Solvig Sessio. 2

More information

On an Application of Bayesian Estimation

On an Application of Bayesian Estimation O a Applicatio of ayesia Estimatio KIYOHARU TANAKA School of Sciece ad Egieerig, Kiki Uiversity, Kowakae, Higashi-Osaka, JAPAN Email: ktaaka@ifokidaiacjp EVGENIY GRECHNIKOV Departmet of Mathematics, auma

More information

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals Chapter 6 Studet Lecture Notes 6-1 Busiess Statistics: A Decisio-Makig Approach 6 th Editio Chapter 6 Itroductio to Samplig Distributios Chap 6-1 Chapter Goals After completig this chapter, you should

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Biostatistics for Med Students. Lecture 2

Biostatistics for Med Students. Lecture 2 Biostatistics for Med Studets Lecture 2 Joh J. Che, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 22, 2017 Lecture Objectives To uderstad basic research desig priciples

More information

Elementary Statistics

Elementary Statistics Elemetary Statistics M. Ghamsary, Ph.D. Sprig 004 Chap 0 Descriptive Statistics Raw Data: Whe data are collected i origial form, they are called raw data. The followig are the scores o the first test of

More information

Statistics Independent (X) you can choose and manipulate. Usually on x-axis

Statistics Independent (X) you can choose and manipulate. Usually on x-axis Statistics-6000 Variable: are characteristic that ca take o differet values with respect to persos, time, ad place ad types of variables are as follow: Idepedet (X) you ca choose ad maipulate. Usually

More information

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates. 5. Data, Estimates, ad Models: quatifyig the accuracy of estimates. 5. Estimatig a Normal Mea 5.2 The Distributio of the Normal Sample Mea 5.3 Normal data, cofidece iterval for, kow 5.4 Normal data, cofidece

More information

GG313 GEOLOGICAL DATA ANALYSIS

GG313 GEOLOGICAL DATA ANALYSIS GG313 GEOLOGICAL DATA ANALYSIS 1 Testig Hypothesis GG313 GEOLOGICAL DATA ANALYSIS LECTURE NOTES PAUL WESSEL SECTION TESTING OF HYPOTHESES Much of statistics is cocered with testig hypothesis agaist data

More information

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6) STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day LECTURE # 8 Mea Deviatio, Stadard Deviatio ad Variace & Coefficiet of variatio Mea Deviatio Stadard Deviatio ad Variace Coefficiet of variatio First, we will discuss it for the case of raw data, ad the

More information

Estimation of a population proportion March 23,

Estimation of a population proportion March 23, 1 Social Studies 201 Notes for March 23, 2005 Estimatio of a populatio proportio Sectio 8.5, p. 521. For the most part, we have dealt with meas ad stadard deviatios this semester. This sectio of the otes

More information

Chapter two: Hypothesis testing

Chapter two: Hypothesis testing : Hypothesis testig - Some basic cocepts: - Data: The raw material of statistics is data. For our purposes we may defie data as umbers. The two kids of umbers that we use i statistics are umbers that result

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Sets and Probabilistic Models

Sets and Probabilistic Models ets ad Probabilistic Models Berli Che Departmet of Computer ciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Referece: - D. P. Bertsekas, J. N. Tsitsiklis, Itroductio to Probability, ectios 1.1-1.2

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Simulation. Two Rule For Inverting A Distribution Function

Simulation. Two Rule For Inverting A Distribution Function Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump

More information

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence PSet ----- Stats, Cocepts I Statistics Cofidece Iterval Guesswork with Cofidece VII. CONFIDENCE INTERVAL 7.1. Sigificace Level ad Cofidece Iterval (CI) The Sigificace Level The sigificace level, ofte deoted

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Lecture 1 Probability and Statistics

Lecture 1 Probability and Statistics Wikipedia: Lecture 1 Probability ad Statistics Bejami Disraeli, British statesma ad literary figure (1804 1881): There are three kids of lies: lies, damed lies, ad statistics. popularized i US by Mark

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACATÓLICA Quatitative Methods Miguel Gouveia Mauel Leite Moteiro Faculdade de Ciêcias Ecoómicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS MBACatólica 006/07 Métodos Quatitativos

More information

Median and IQR The median is the value which divides the ordered data values in half.

Median and IQR The median is the value which divides the ordered data values in half. STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media

More information

Massachusetts Institute of Technology

Massachusetts Institute of Technology Solutios to Quiz : Sprig 006 Problem : Each of the followig statemets is either True or False. There will be o partial credit give for the True False questios, thus ay explaatios will ot be graded. Please

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate

More information

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1 October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 1 Populatio parameters ad Sample Statistics October 25, 2018 BIM 105 Probability ad Statistics for Biomedical Egieers 2 Ifereces

More information

MA238 Assignment 4 Solutions (part a)

MA238 Assignment 4 Solutions (part a) (i) Sigle sample tests. Questio. MA38 Assigmet 4 Solutios (part a) (a) (b) (c) H 0 : = 50 sq. ft H A : < 50 sq. ft H 0 : = 3 mpg H A : > 3 mpg H 0 : = 5 mm H A : 5mm Questio. (i) What are the ull ad alterative

More information

Summarizing Data. Major Properties of Numerical Data

Summarizing Data. Major Properties of Numerical Data Summarizig Data Daiel A. Meascé, Ph.D. Dept of Computer Sciece George Maso Uiversity Major Properties of Numerical Data Cetral Tedecy: arithmetic mea, geometric mea, media, mode. Variability: rage, iterquartile

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Unit 6 Estimation Week #10 - Practice Problems SOLUTIONS

Unit 6 Estimation Week #10 - Practice Problems SOLUTIONS PubHlth 540 Itroductory Biostatistics Page of 7 Uit 6 Estimatio Week #0 - Practice Problems SOLUTIONS. A etomologist samples a field for egg masses of a harmful isect by placig a yardsquare frame at radom

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9 Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I

More information