College of Science Department of Statistics & OR

Similar documents
Chapter 2 Descriptive Statistics

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

(6) Fundamental Sampling Distribution and Data Discription

Statistics 511 Additional Materials

Chapter 6 Sampling Distributions

Random Variables, Sampling and Estimation

Quick Review of Probability

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Quick Review of Probability

What is Probability?

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Final Examination Solutions 17/6/2010

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics


Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

This is an introductory course in Analysis of Variance and Design of Experiments.

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

Expectation and Variance of a random variable

Class 27. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

UNIT 2 DIFFERENT APPROACHES TO PROBABILITY THEORY

Chapter 1 (Definitions)

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Parameter, Statistic and Random Samples

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Read through these prior to coming to the test and follow them when you take your test.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Module 1 Fundamentals in statistics

1 Inferential Methods for Correlation and Regression Analysis

Topic 10: Introduction to Estimation

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Econ 371 Exam #1. Multiple Choice (5 points each): For each of the following, select the single most appropriate option to complete the statement.

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Discrete probability distributions

As stated by Laplace, Probability is common sense reduced to calculation.

Computing Confidence Intervals for Sample Data

Introduction to Probability and Statistics Twelfth Edition

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

106 Stat 1434 / 1435 H. Chapter 1: Organizing and Displaying Data

Formulas and Tables for Gerstman

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Topic 9: Sampling Distributions of Estimators

Agreement of CI and HT. Lecture 13 - Tests of Proportions. Example - Waiting Times

Topic 9: Sampling Distributions of Estimators

Probability and statistics: basic terms

Frequentist Inference

Properties and Hypothesis Testing

Math 140 Introductory Statistics

MEASURES OF DISPERSION (VARIABILITY)

Chapter 8: Estimating with Confidence

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

A statistical method to determine sample size to estimate characteristic value of soil parameters

Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

On an Application of Bayesian Estimation

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

An Introduction to Randomized Algorithms

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Biostatistics for Med Students. Lecture 2

Elementary Statistics

Statistics Independent (X) you can choose and manipulate. Usually on x-axis

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

GG313 GEOLOGICAL DATA ANALYSIS

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

4.1 Sigma Notation and Riemann Sums

6.3 Testing Series With Positive Terms

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Estimation of a population proportion March 23,

Chapter two: Hypothesis testing

Topic 9: Sampling Distributions of Estimators

Sets and Probabilistic Models

Estimation for Complete Data

Simulation. Two Rule For Inverting A Distribution Function

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

Chapter 6 Principles of Data Reduction

Lecture 1 Probability and Statistics

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

Median and IQR The median is the value which divides the ordered data values in half.

Massachusetts Institute of Technology

Chapter 23: Inferences About Means

Element sampling: Part 2

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

MA238 Assignment 4 Solutions (part a)

Summarizing Data. Major Properties of Numerical Data

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Unit 6 Estimation Week #10 - Practice Problems SOLUTIONS

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Transcription:

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Kig Saud Uiversity College of Sciece Departmet of Statistics & OR STAT 45 BIOSTATISTICS Summer Semester 43/43 Lectures' Notes Prof. Abdullah Al-Shiha Kig Saud Uiversity

قسم اإلحصاء وبحوث العمليات كلية العلوم جامعة الملك سعود الفصل الدراسي الثاني 44 44/ 4- احص اإلحصاء الحيوي Stat 4 : Biostatistic Offic Room No. B05 - email: Tel Off..-Mob 05.- Web : http://faculty.ksu.edu.sa/... Office hours Week Title W( 5 /03/445) Itroductio to Bio-Statistics, (.-.4) W( 0/04/435) types of data ad graphical represetatio, (.-.4) W3( 09/4/43 ) Descriptive statistics: Measures of Cetral tedecy- Mea, media, mode (.-.6 Excludig stem plot percetiles ) W4( 6/4/43 ) Measures of dispersio-rage, Stadard deviatio, coefficiet of Variatio. (.-.6 Excludig stem plot percetiles ) W5( 3/4/ 43 ) Calculatig Measures from a Ugrouped Frequecy Table -Approximatig Measures from Grouped Data (.-.6 Excludig stem plot percetiles ) W6( 0/4/43 ) Basic probability. Coditioal probability, cocept of idepedece, sesitivity, specificity, (3.-3.6) W7( 40/4/43 ) Bayes Theorem for predictive probabilities. (3.-3.6) W8( 4/4/43) Some discrete probability distributios: cumulative probability (4.-4.4) W9(/06/435) is vacatio W0(9/0/435) Biomial, ad Poisso -their mea ad variace (4.-4.4Excludig the use of biomial ad Poisso tables). W(06/06/435) Cotiuous probability distributios: Normal distributio-z-table ( 4.5-4.8) W(3/06/435) Samplig with ad without replacemet, samplig distributio of oe ad two sample meas ad oe ad two proportios. ( 5.-5.7 Excludig samplig without Replacemet) W3(0/06/435) Samplig with ad without replacemet, samplig distributio of oe ad two sample meas ad oe ad two proportios. ( 5.-5.7 Excludig samplig without Replacemet) W4(7/06/435) Statistical iferece: Poit ad iterval estimatio, Type of errors, Cocept of P-value (6.-6.6. 7.-7.6 Excludig Variaces ot equal page 8-8) W(05/07/435) Testig hypothesis about oe ad two samples meas ad proportios icludig paired data differet cases uder ormality. (6.-6.6. 7.-7.6 Excludig Variaces ot equal page 8-8) W6(/07/435) Testig hypothesis about oe ad two samples meas ad proportios icludig paired data differet cases uder ormality. (6.-6.6. 7.-7.6 Excludig Variaces ot equal page 8-8) Text Book Biostatistics: Basic Cocepts ad Methodology for the Heath Scieces by Waye W. Daiel. [9th ed.] Books available from uiversity book store below SAMBA bak. The book costs 70 Riyals for studets.

للتواصل مع اعضاء هيئه التدريس رقم المكتب (B05) االسم أ.سناء عبد هللا أبونصره أ.سماح الغامدي أ.ريم ظافر المبطي أ.أمل عبد هللا المحيسن االيميل sabuasrah@ksu.edu.sa samalghamdi@ksu.edu.sa ralmubty@ksu.edu.sa amalmoh@ksu.edu.sa الوظيفة محاضر محاضر معيده محاضر د.سبأ علوان salwa@ KSU.EDU.SA أستاذ مساعد معيده ralyafi@ KSU.EDU.SA tmalki@ksu.edu.sa ا.ربى اليافي أ.تغريد المالكي أ. العنود الزغيبي معيده محاضر aalzughibi@ksu.edu.sa 3

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER : Gettig Acquaited with Biostatistics. Itroductio: The course "Biostatistics" (STAT-45) is about iformatio; how it is obtaied, how it is aalyzed, ad how it is iterpreted. The objective of the course is to lear: () How to orgaize, summarize, ad describe data. (Descriptive Statistics) () How to reach decisios about a large body of data by examie oly a small part of the data. (Iferetial Statistics). Some Basic Cocepts: Data: Data is the raw material of statistics. There are two types of data: () Quatitative data (umbers: weights, ages, ). () Qualitative data (words or attributes: atioalities, occupatios, ). Statistics: Statistics is the field of study cocered with: () The collectio, orgaizatio, summarizatio, ad aalysis of data. (Descriptive Statistics) () The drawig of ifereces ad coclusios about a body of data (populatio) whe oly a part of the data (sample) is observed. (Iferetial Statistics) Biostatistics: Whe the data is obtaied from the biological scieces ad medicie, we use the term "biostatistics". Kig Saud Uiversity 4

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Sources of Data:. Routiely kept records.. Surveys. 3. Experimets. 4. Exteral sources. (published reports, data bak, ) Populatio: - A populatio is the largest collectio of etities (elemets or idividuals) i which we are iterested at a particular time ad about which we wat to draw some coclusios. - Whe we take a measuremet of some variable o each of the etities i a populatio, we geerate a populatio of values of that variable. - Example: If we are iterested i the weights of studets erolled i the college of egieerig at KSU, the our populatio cosists of the weights of all of these studets, ad our variable of iterest is the weight. Populatio Size (N): The umber of elemets i the populatio is called the populatio size ad is deoted by N. Sample: - A sample is a part of a populatio. - From the populatio, we select various elemets o which we collect our data. This part of the populatio o which we collect data is called the sample. - Example: Suppose that we are iterested i studyig the characteristics of the weights of the studets erolled i the college of egieerig at KSU. If we radomly select 50 studets amog the studets of the college of egieerig at KSU ad measure their weights, the the weights of these 50 studets form our sample. Sample Size (): The umber of elemets i the sample is called the sample Kig Saud Uiversity 5

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 size ad is deoted by. Variables: The characteristic to be measured o the elemets is called variable. The value of the variable varies from elemet to elemet. Example of Variables: () No. of patiets () Height (3) Sex (4) Educatioal Level Types of Variables: () Quatitative Variables: A quatitative variable is a characteristic that ca be measured. The values of a quatitative variable are umbers idicatig how much or how may of somethig. Examples: (i) Family Size (ii) No. of patiets (iii) Weight (iv) height Types of Quatitative Variables: (a) Discrete Variables: There are jumps or gaps betwee the values. Examples: - Family size (x =,, 3, ) - Number of patiets (x = 0,,, 3, ) (b) Cotiuous Variables: There are o gaps betwee the values. A cotiuous variable ca have ay value withi a certai iterval of values. Examples: - Height (40 < x < 90) - Blood sugar level (0 < x < 5) () Qualitative Variables: The values of a qualitative variable are words or attributes idicatig to which category a elemet belog. Examples: Kig Saud Uiversity 6

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 - Blood type - Natioality - Studets Grades - Educatioal level Types of Qualitative Variables: (a) Nomial Qualitative Variables: A omial variable classifies the observatios ito various mutually exclusive ad collectively o-raked categories. The values of a omial variable are ames or attributes that ca ot be ordered or sorted or raked. Examples: - Blood type (O, AB, A, B) - Natioality (Saudi, Egyptia, British, ) - Sex (male, female) (b) Ordial Qualitative Variables: A ordial variable classifies the observatios ito various mutually exclusive ad collectively raked categories. The values of a ordial variable are categories that ca be ordered, sorted, or raked by some criterio. Examples: - Educatioal level (elemetary, itermediate, ) - Studets grade (A, B, C, D, F) - Military rak.4 Samplig ad Statistical Iferece: There are several types of samplig techiques, some of which are: Kig Saud Uiversity 7

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 () Simple Radom Samplig: If a sample of size () is selected from a populatio of size (N) i such a way that each elemet i the populatio has the same chace to be selected, the sample is called a simple radom sample. () Stratified Radom Samplig: I this type of samplig, the elemets of the populatio are classified ito several homogeous groups (strata). From each group, a idepedet simple radom sample is draw. The sample resultig from combiig these samples is called a stratified radom Sample. Kig Saud Uiversity 8

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER : Strategies for Uderstadig the Meaig of Data:. Itroductio: I this chapter, we lear several techiques for orgaizig ad summarizig data so that we may more easily determie what iformatio they cotai. Summarizatio techiques ivolve: - frequecy distributios - descriptive measures. The Ordered Array: A first step i orgaizig data is the preparatio of a ordered array. A ordered array is a listig of the values i order of magitude from the smallest to the largest value. Example: The followig values represet a list of ages of subjects who participate i a study o smokig cessatio: 55 46 58 54 5 69 40 65 53 58 The ordered array is: 40 46 5 53 54 55 58 58 65 69.3 Grouped Data: The Frequecy Distributio: To group a set of observatios, we select a suitable set of cotiguous, o-overlappig itervals such that each value i the set of observatios ca be placed i oe, ad oly oe, of the itervals. These itervals are called "class itervals". Example: The followig table gives the hemoglobi level (g/dl) of a sample of 50 me. 7.0 7.7 5.9 5. 6. 7. 5.7 7.3 3.5 6.3 4.6 5.8 5.3 6.4 3.7 6. 6.4 6. 7.0 5.9 4.0 6. 6.4 4.9 7.8 6. 5.5 8.3 5.8 6.7 5.9 5.3 3.9 6.8 5.9 6.3 7.4 5.0 7.5 6. 4. 6. 5.7 5. 7.4 6.5 4.4 6.3 7.3 5.8 We wish to summarize these data usig the followig class Kig Saud Uiversity 9

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 itervals: 3.0 3.9, 4.0 4.9, 5.0 5.9, 6.0 6.9, 7.0 7.9, 8.0 8.9 Solutio: Variable = X = hemoglobi level (cotiuous, quatitative) Sample size = = 50 Max= 8.3 Mi= 3.5 Class Iterval Tally Frequecy 3.0 3.9 4.0 4.9 5.0 5.9 6.0 6.9 7.0 7.9 8.0 8.9 3 5 5 6 0 The grouped frequecy distributio for the hemoglobi level of the 50 me is: Class Iterval Frequecy (Hemoglobi level) (o. of me) 3.0 3.9 3 4.0 4.9 5 5.0 5.9 5 6.0 6.9 6 7.0 7.9 0 8.0 8.9 Total =50 Notes:. Miimum value first iterval.. Maximum value last iterval. 3. The itervals are ot overlapped. 4. Each value belogs to oe, ad oly oe, iterval. 5. Total of the frequecies = the sample size = Kig Saud Uiversity 0

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Mid-Poits of Class Itervals: upper limit + Mid-poit = lower limit True Class Itervals: d = gap betwee class itervals d = lower limit upper limit of the precedig class iterval true upper limit = upper limit +d/ true lower limit = lower limit - d/ Class Iterval True Class Iterval Mid-poit Frequecy 3.0 3.9 4.0 4.9 5.0 5.9 6.0 6.9 7.0 7.9 8.0 8.9.95-3.95 3.95-4.95 4.95-5.95 5.95-6.95 6.95-7.95 7.95 8.95 3.45 4.45 5.45 6.45 7.45 8.45 3 5 5 6 0 For example: Mid-poit of the st iterval = (3.0+3.9)/ = 3.45 : Mid-poit of the last iterval = (8.0+8.9)/ = 8.45 Note: () Mid-poit of a class iterval is cosidered as a typical (approximated) value for all values i that class iterval. For example: approximately we may say that: there are 3 observatios with the value of 3.45 there are 5 observatios with the value of 4.45 : there are observatio with the value of 8.45 () There are o gaps betwee true class itervals. The edpoit (true upper limit) of each true class iterval equals to the start-poit (true lower limit) of the followig true class iterval. Kig Saud Uiversity

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Cumulative frequecy: Cumulative frequecy of the st class iterval = frequecy. Cumulative frequecy of a class iterval = frequecy + cumulative frequecy of the precedig class iterval Relative frequecy ad Percetage frequecy: Relative frequecy = frequecy/ Percetage frequecy = Relative frequecy 00% Class Iterval 3.0 3.9 4.0 4.9 5.0 5.9 6.0 6.9 7.0 7.9 8.0 8.9 Frequecy Cumulative Frequecy 3 5 5 6 0 3 8 3 39 49 50 Relative Frequecy 0.06 0.0 0.30 0.3 0.0 0.0 Cumulative Relative Frequecy 0.06 0.6 0.46 0.78 0.98.00 Percetage Frequecy 6% 0% 30% 3% 0% % Cumulative Percetage Frequecy 6% 6% 46% 78% 98% 00% From frequecies: The umber of people whose hemoglobi levels are betwee 7.0 ad 7.9 = 0 From cumulative frequecies: The umber of people whose hemoglobi levels are less tha or equal to 5.9 = 3 The umber of people whose hemoglobi levels are less tha or equal to 7.9 = 49 From percetage frequecies: The percetage of people whose hemoglobi levels are betwee 7.0 ad 7.9 = 0% From cumulative percetage frequecies: The percetage of people whose hemoglobi levels are less tha or equal to 4.9 = 6% The percetage of people whose hemoglobi levels are less tha or equal to 6.9 = 78% Kig Saud Uiversity

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Displayig Grouped Frequecy Distributios: For represetig frequecy (or relative frequecy or percetage frequecy) distributios, we may use oe of the followig graphs: The Histogram The Frequecy Polygo Example: Cosider the followig frequecy distributio of the ages of 00 wome. True Class Iterval Frequecy Cumulative Mid-poits (age) 4.5-9.5 9.5-4.5 4.5-9.5 9.5-34.5 34.5-39.5 39.5-44.5 (No. of wome) 8 6 3 8 4 Frequecy 8 4 56 84 96 00 7 7 3 37 4 Total =00 Width of the iterval: W =true upper limit true lower limit = 9.5 4.5 = 5 () Histogram: Orgaizig ad Displayig Data usig Histogram: Kig Saud Uiversity 3

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 () The Frequecy Polygo: Orgaizig ad Displayig Data usig Polygo: Polygo (Ope) Polygo (Closed) Kig Saud Uiversity 4

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Kig Saud Uiversity 5

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43.4 Descriptive Statistics: Measures of Cetral Tedecy: (Measures of locatio) I the last sectio we summarize the data usig frequecy distributios (tables ad figures). I this sectio, we will itroduce the cocept of summarizatio of the data by meas of a sigle umber called "a descriptive measure". A descriptive measure computed from the values of a sample is called a "statistic". A descriptive measure computed from the values of a populatio is called a "parameter". For the variable of iterest there are: () "N" populatio values. () "" sample of values. Let X, X, K, X N be the populatio values (i geeral, they are ukow) of the variable of iterest. The populatio size = N Let x, x, K, x be the sample values (these values are kow). The sample size =. (i) A parameter is a measure (or umber) obtaied from the populatio values: X, X, K, X N. - Values of the parameters are ukow i geeral. - We are iterested to kow true values of the parameters. (ii) A statistic is a measure (or umber) obtaied from the sample values: x, x, K, x. - Values of statistics are kow i geeral. - Sice parameters are ukow, statistics are used to approximate (estimate) parameters. Kig Saud Uiversity 6

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Measures of Cetral Tedecy: (or measures of locatio): The most commoly used measures of cetral tedecy are: the mea the media the mode. The values of a variable ofte ted to be cocetrated aroud the ceter of the data. The ceter of the data ca be determied by the measures of cetral tedecy. A measure of cetral tedecy is cosidered to be a typical (or a represetative) value of the set of data as a whole. Mea: () The Populatio mea ( µ ): If X, X, K, X N are the populatio values, the the populatio mea is: N N i= X i X + X + L + X µ = = (uit) N N The populatio mea µ is a parameter (it is usually ukow, ad we are iterested to kow its value) () The Sample mea ( x ): If x, x, K, x are the sample values, the the sample mea is: xi x + x + L + x i= x = = (uit) The sample mea x is a statistic (it is kow we ca calculate it from the sample). The sample mea x is used to approximate (estimate) the populatio mea µ. Example: Suppose that we have a populatio of 5 populatio values: Kig Saud Uiversity 7

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 X = 4, X = 30, X 3 = 35, X 4 =, X 5 = 7. (N=5) Suppose that we radomly select a sample of size 3, ad the sample values we obtaied are: x = 30, x = 35, x3 = 7. (=3) The: The populatio mea is: 4+ 30 + 35 + + 7 55 µ = = = 3 (uit) The sample mea is: Notice that = 30. 67 5 = 30 + 35 + 7 = 9 = 30.67 3 3 x is approximately equals to µ = 3. 5 x (uit) Note: The uit of the mea is the same as the uit of the data. Advatages ad disadvatages of the mea: Advatages: Simplicity: The mea is easily uderstood ad easy to compute. Uiqueess: There is oe ad oly oe mea for a give set of data. The m ea takes ito accout all values of the data. Disadvatages: Extreme values have a ifluece o th e mea. Therefore, the mea may be distorted by extreme values. For example: Sample Data mea A 4 5 7 7 0 5.83 B 4 5 7 7 00 0.83 The mea ca oly be foud for quatitative variables. Media: The media of a fiite set of umbers is that value which divides the ordered array ito two equal parts. The umbers i the first part are less tha or equal to the media ad the umbers i the secod part are greater tha or equal to the Kig Saud Uiversity 8

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 media. Notice that: 50% (or less) of the data is Media 50% (or less) of the data is Media Calculatig the Media: Let x, x, K, x be the sample values. The sample size () ca be odd or eve. First we order the sample to obtai the ordered array. Suppose that the ordered array is: y, y,, K We compute the rak of the middle value (s): rak = + If the sample size () is a odd umber, there is oly oe value i the middle, ad the rak will be a iteger: + rak = = m (m is iteger) The media is the middle value of th e ordered observatios, which is: Media = y m. y Kig Saud Uiversity 9

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 If the sample size () is a eve umber, there are two values i the middle, ad the rak will be a iteger plus 0.5: = + rak = m + 0.5 Therefore, the raks of the middle values are (m) ad (m+). The media i s the m ea (average) of the two middle values of the ordered observatios: y m + ym+ Media =. Example (odd umber): Fid the media for the sample values: 0, 54,, 38, 53. Solutio:. = 5 (odd umber) There is oly oe value i the middle. The rak of the middle valu e is: + rak = 5 + = = 3. (m=3) Ordered set 0 38 53 54 (middle value) Rak (or order) 3 (m) The media =38 (uit) 4 5 Example (eve umber): Fid the media for the sample values: 0, 35, 4, 6, 0, 3 Solutio:. = 6 (eve umber) There are two values i the middle. The rak is: Kig Saud Uiversity 0

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 6 + = + = rak = 3.5 = 3 + 0.5 = m+0.5 (m=3) There fore, the raks of the middle values are:.m = 3 ad m+ = 4 Ordered set 0 6 0 3 35 4 Rak (or order) 3 4 5 6 (m) (m+) The middle values are 0 ad 3. 0 + 3 5 The media = = = = 6 (uit) Not e : The uit of the media is the same as the uit of the data. Advatages ad disadvatages of the media: Advatages: Simplicity: The media is easily uderstood ad easy to compute. Uiqueess: There is oly oe media for a give set of data. The media is ot as drastically affected by extreme values as is the mea. (i.e., the media is ot affected too much by extreme values). For example: Sample Data media A 9 4 5 9 0 7 B 9 4 5 9 00 7 Disadvatages: The media does ot take ito accout all values of the sample. I geeral, the media ca oly be foud for quatitative variables. However, i some cases, the media ca be foud for ordial qualitative variables. Mode: The mode of a set of values is that value which occurs most frequetly (i.e., with the highest frequecy). Kig Saud Uiversity

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 If all values are differet or have the same frequecies, there will be o mode. A set of data may have more tha oe mode. Example: Data set Type M ode(s) 6, 5, 5, 34 Quatitative 5 3, 7,, 6, 9 Quatitative No mode 3, 3, 7, 7,,, 6, 6, 9, 9 Quatitative No mode 3, 3,, 6, 8, 8 Quatitative 3 ad 8 B C A B B B C B B Qualitative B B C A B A B C A C Qualitative No mode B C A B B C B C C Qualitative B ad C Note: The uit of the mode is the same as the uit of the data. Advatages ad disadvatages of the mode: Advatages: Simplicity: the mode compute.. is easily uderstood ad easy to The mode is ot as drastically affected by extreme values as is the mea. (i.e., the mode is ot affected too much by extreme values). For example: Sample Data Mode A 7 4 5 7 0 7 B 7 4 5 7 00 7 The mode may be foud for both quatitative ad qualitative variables. Disadvatages: The mode is ot a good measure of locatio, because it depeds o a few values of the data. The mode does ot take ito accout all values of the sample. There might be o mode for a data set. There might be more tha oe mode for a data set. Kig Saud Uiversity

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43.6 Descriptive Statistics: Measures of Dispersio (Measures of Variatio): The dispersio (variatio) of a set of observatios refers to the variety that they exhibit. A measure of dispersio coveys iformatio regardig the amout of variability preset i a set of data. There are several measures of dispersio, some of which are: Rage, Variace, Stadard Deviatio, ad Coefficiet of Variatio. The variatio or dispersio i a set of values refers to how spread out the values is from each other. The dispersio (variatio) is small whe the values are close together. There is o dispersio (o variatio) if the values are the same. The Rage: The Rage is the differece betwee the largest value (Max) ad the smallest value (Mi). Rage (R) = Max Mi Example: Fid the rage for the sample values: 6, 5, 35, 7, 9, 9. Kig Saud Uiversity 3

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Solutio:.max = 35.mi = 5 Rage ( R) = 35 5 = 0 (uit) Notes:. The uit of the rage is the same as the uit of the data.. The usefuless of the rage is limited. The rage is a poor measure of the dispersio because it oly takes ito accout two of the values; however, it plays a sigificat role i may applicatios. The Variace: The variace is oe of the most importat measures of dispersio. The variace is a measure that uses the mea as a poit of referece. The variace of the data is small whe the observatios are close to the mea. The variace of the data is large whe the observatios are spread out from the mea. The variace of the data is zero (o variatio) whe all observatios have the same value (cocetrated at the mea). Deviatios of sample values from the sample mea: Let x, x, K, x be the sample values, ad x be the sample mea. The deviatio of the value from the sample mea x is: The squared deviatio is: xi x i ( x i x The sum of squared deviatios is: x ) Kig Saud Uiversity 4

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 i= ( x i x ) The followig graph shows the squared deviatios of the values from their mea: () The Populatio Variace σ : (Variace computed from the populatio) Le t X, X, K, X N be the populatio values. The populatio variace (σ ) is defied by: σ = = N N i= ( X ) ( X µ ) + ( X µ ) + L+ ( X N µ ) i µ N N (uit) X i i= wher e, µ = is the populatio mea, ad (N) is the N populatio size. Notes: σ is a parameter because it is obtaied from the populatio values (it is ukow i geeral). σ 0 () The Sample Variace S : (Variace computed from the sample) Let x, x, S K, x be the sample values. The sample variace ( ) is defied by: Kig Saud Uiversity 5

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 where x S i = = = x (x ( x i x) x) + ( x + L+ ( x x) x) ( uit) i i= = is the sample mea, ad () is the sample size. Notes: S is a statistic because it is obtaied from the sample values (it is kow). S is used to approximate (estimate) σ. S 0 = 0 S all observatio have the same value there is o dispersio (o variatio) Example: We wat to compute the sample variace of the followig sample values: 0,, 33, 53, 54. Solutio: =5 x S 5 x i x i = 0 + + 33 + 53 + = i= = i = 5 5 5 ( x i x) ( xi 34.) i= i= = 0 34. = 5 34. 54 7 = = 34. 5 ( ) + ( ) + ( 33 34.) + ( 53 34.) + ( 54 34.) S = 4 506.8 = = 376.7 (uit) 4 Aother Method for calculatig sample variace: x i ( x x) = ( x 34.) ( ) ( ) i i x 0-4. 585.64-3. 74.4 x i = x i 34. Kig Saud Uiversity 6

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x = 5 x i i= 5 5 x i i= x ( x x) = ( x 34.) ( ) ( ) i i i x i x 33 -..44 53 8.8 353. 44 54 9.8 39.04 = 7 5 i= 7 == = 34. a 5 ( x) = ( x) x i 34. x = 0 = 506. 8 d i S 506.8 = == 4 x i 376.7 Stadard D eviatio: The variace represets squared uits, therefore, is ot appropriate measure of dispersio whe we wish to express the cocept of dispersio i terms of the origial uit. The stadard deviatio is aother measure of dispersio. The stadard deviatio is the square root of the variace. The stadard deviatio is expressed i the origial uit of the data. () Populatio stadard deviatio is: σ = σ (uit) () Sample stadard deviatio is: S = S S = i= ( x i x) (uit) Example: For the previous example, the sample stadard deviatio is S = S = 376.7 = 9.4 (uit) Coefficiet of Variatio (C.V.): The variace ad the stadard deviatio are useful as measures of variatio of the values of a sigle variable for a sigle populatio. If we wat to compare the variatio of two variables we caot use the variace or the stadard deviatio because: Kig Saud Uiversity 7

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43. The variables might have differet uits.. The variables might have differet meas. We eed a measure of the relative variatio that will ot deped o either the uits or o how large the values are. This measure is the coefficiet of variatio (C.V.). The coefficiet of variatio is defied by: S C.V. = 00% x The C.V. is free of uit (uit-less). To compare the variability of two sets of data (i.e., to determie which set is more variable), we eed to calculate the followig quatities: Mea Stadard C.V. deviatio st data set d data set x S x S S C. V = x S C. V = x 00% 00% The data set with the larger value of CV has larger variatio. The relative variability of th e st data set is larger tha the d relative variability of the data set if C.V > C.V (ad vice versa). Example: Supp ose we have two data sets: st data set: x 66 kg, S = 4.5 kg = d da ta set: x = 36 kg, S = 4.5 kg 4.5 C. V = *00% = 6.8% 66 4.5 C. V = *00% =.5% 36 Sice C. V ariability of the d > C. V, the relative v data set is larger tha the relative variability of the st data set. Kig Saud Uiversity 8

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 If we use the stadard deviatio to compare the variability of the two data sets, we will wrogly coclude that the two data sets have the same variability because the stadard deviatio of both sets is 4.5 kg. Kig Saud Uiversity 9

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Chapter 3: Probability The Basis o Statistical Iferece 3. Itroductio 3. Probability 3.3 Elemetary Properties of Probability 3.4 Calculatig the Probability of a Evet Geeral Defiitios ad Cocepts: Probability: Probability is a measure (or umber) used to measure the chace of the occurrece of some evet. This umber is betwee 0 ad. A Experimet: A experimet is some procedure (or process) that we do. Sample Space: The sample space of a experimet is the set of all possible outcomes of a experimet. Also, it is called the uiversal set, ad is deoted by Ω. A Evet: Ay subset of the sample space Ω is called a evet. φ Ω is a evet (impossible evet) Ω Ω is a evet (sure evet) Example: Experimet: Selectig a ball from a box cotaiig 6 balls umbered from to 6 ad observig the umber o the selected ball. This experimet has 6 possible outcomes. The samp le space is: Ω = {,, 3, 4, 5, 6}. Cosider the followig evets: E = gettig a eve umber = {, 4, 6 } Ω Kig Saud Uiversity 30

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 E = gettig a umber less tha 4 = {,, 3 } Ω E 3 = gettig or 3 = {, 3 } Ω E 4 = gettig a odd umbe r = {, 3, 5} Ω E 5 = gettig a egative umber = { } = φ Ω = gettig a umber less tha 0 = {,,3, 4, 5, 6 } E = Ω Ω 6 Notatio: ( Ω)= o. of outcomes (elemets) i Ω ( E)=o. of outcomes (elemets) i the evet E Equally Likely Outcomes: The outcomes of a experimet are equally likely if the outcomes have the same chace of occurrece. Probability of A Evet: If the experimet has (Ω) equally likely outcomes, the the probability of the evet E is deoted by P(E) ad is defied by: ( ) ( E) o. of outcomes i E P E = = ( Ω) o. of outcomes i Ω Example: I the ball experimet i the previous example, suppose the ball is selected at radom. Determie the probabilities of the followig evets: E = gettig a eve umber E = gettig a umber less tha 4 = gettig or 3 E 3 Solutio: Ω =,, 3, 4, 5, 6 E E = = { } ; ( Ω) = 6 {, 4, 6} ; ( E ) = 3 {,, 3} ; ( E ) = 3 {, 3} ; ( E3 ) = E3 = The outcomes are equally likely. 3 3 P( E ), E = 6 6 = P ( ), P ( E 3 ) =, 6 Kig Saud Uiversity 3

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Some Operatios o Evets: Let A ad B be two evets defied o the sample space Ω. Uio of Two evets: ( A B) or ( A + B ) The evet A B cosists of all outcomes i A or i B or i both A ad B. The evet A B occurs if A occurs, or B occurs, or both A ad B occur. Itersectio of Two Evets: ( A B) The evet A B Cosists of all outcomes i both A ad B. The evet A B Occurs if both A ad B occur. C Complemet of a Evet: ( A ) or ( A ) or ( A' ) The complemet of the eve A is deo ted by A. The eve A cosists of all outcomes of Ω but are ot i A. The eve A occurs if A doe s ot. Example: Experimet: Selectig a ball from a box cotaiig 6 balls umbered,, 3, 4, 5, ad 6 radomly. Defie the followig evets: E = {, 4, 6} = gettig a eve umber. Kig Saud Uiversity 3

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 E = {,, 3} = gettig a umber < 4. E 4 = {, 3, 5} = gettig a odd umber. () E E = {,, 3, 4, 6 } = gettig a eve umber or a umber less tha 4. ( ) ( E E ) 5 P E E = = Ω ( ) 6 () E E 4 = {,, 3, 4, 5, 6 } = Ω = gettig a eve umber or a odd umber. ( ) ( E E4 ) 6 P E E4 = = = ( Ω) 6 Note: E E 4 = Ω. E ad E 4 are called exhaustive evets. The uio of these evets gives the whole sample space. (3) E E = { } = gettig a eve umber ad a umber less tha 4. ( ) ( E E ) P E E = = Ω ( ) 6 Kig Saud Uiversity 33

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 (4) E E 4 = φ = gettig a eve umber ad a odd umber. ( ) ( E E4 ) ( φ) 0 P E E4 = = = = 0 Ω 6 6 ( ) Note: E E 4 = φ. E ad E 4 are called disjoit (or mutually exclusive) evets. These kids of evets ca ot occurred simultaeously (together i the same time). (5) The complemet of E E {, 4,6} = ot gettig a eve umber = = {, 3, 5} = gettig a odd umber. = E 4 Mutually exclusive (disjoit) Evets: The evets A ad B are disjoit (or mutually exclusive) if: A B = φ. For this case, it is impossible that both evets occur simultaeously (i.e., together i th e sam e time). I this case: (i) P ( A B) = 0 (ii) P ( A B) = P( A) + P( B) If A B φ, the A ad B are ot mutually exclusive (ot disjoit). Kig Saud Uiversity 34

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 A B φ A ad B are ot mutually exclusive (It is possible that both evets occur i the same time) A B = φ A ad B are mutually exclusive (disjoit) (It is impossible that both evets occur i the same time) Exhaustive Evets: The evets A, A, K, A are exhaustive eve ts if: A A K A = Ω. For this case, P( A A K A ) = P( Ω) = Note:. A A = Ω (A ad A are exhaustive evets). A A = φ (A ad A are mutually exclusive (disjoi t) evets) 3. ( A) = ( Ω) ( A) 4. P ( A) = P( A) Geeral Probability Rules: 0 P A. ( ). P ( Ω) = 3. P ( φ) = 0 4. P( A) = P( A) Kig Saud Uiversity 35

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 The Additio Rule: For ay two evets A ad B: P ( A B) = P( A) + P( B) P( A B) Special Cases:. For mutually exclusive (disjoit) evets A ad B ( A B) = P( A) P( B) P +. For mutually exclusive (disjoit) evets E, E, K, E : P( E E K ) = P( E ) + P( E ) + L + P( E E ) Note: If the evets A, A, K, A are exhaustive ad mutually exclusive (disjoit) evets, the: P( A A K ) = P( A ) + P( A ) + L+ P( A ) = P ( = A Ω) Margial Probability: Give some variable that ca be broke dow ito (m) categories desigated by A, A, L, Am ad aother joitly occurrig variable that is broke dow ito () categories desigated by B B,,., L B B B B Total A A B ) A B ) A B ) A ) ( ( ( ( A A B ) A B ) A B ) (A ) ( ( (.................. A m ( A m B m ( A m B ) ( Am ) Total B ) B ) B ) (Ω) ( ( (T his t able cotais the umber of elemets i each evet) ( Kig Saud Uiversity 36

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 B B B Margial Probability A P(A B ) P( A B ) P( A B ) P( A ) A P( A B ) P( A B ) P( A B ) P( A ).................. A P A m B ) P A m B ) P A m B ) P( ) m Margial Probability ( ( ( P( ) P( B ) P( B ).00 B (This table cotais the probability of each evet) The margial probability of A i, P( A i ), is equal to the sum of the joit pro babilities of A i with all categories of B. That is: P( A i For example, ) = ( Ai B ) + P( Ai B ) + K+ P( Ai B P ) = P(A B ) j= j= i j ( A ) = P( A B ) + P( A B ) + K+ P( A P = P( A B ) j B ) We defie the margial probability of B j, P( B j ), i a similar way. Example: Table of umber of elemets i each evet: B B B 3 Total A 50 30 70 50 A 0 70 0 00 A 30 00 0 50 3 Total 00 00 00 500 Table of probabilities of each evet: B B B Margial 3 Probability A 0. 0.06 0.4 0.3 A 0.04 0.4 0.0 0. A 3 0.06 0. 0.4 0.5 Margial Probability 0. 0.4 0.4 A m Kig Saud Uiversity 37

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 For example: Applicatios: P( A ) P( A B ) + P( A B ) + P( A = 0.04 + 0.4 + 0.0 = B = 0. Example: 630 patiets are classified as follows: Blood Type O A B AB ( E ) ( E ) ( E 3 ) ( E4 ) Total No. of patiets 84 58 63 5 630 Experimet: Selectig a patiet at radom ad observe his/her blood type. This experimet has 630 equally likely outcomes ( Ω) = 6 30 Defie the evets: E = The blood type of the selected patiet is "O" E = The blood type of the selected patie t is " A" E3 = The blood type of the selected patiet is "B" E4 = The blood type of the selected patiet is "AB" Number of elemets i each evet: ( E )= 84, ( E ) = 58, ( E 3 )= 63, ( E 4 ) = 5. Probabilities of the evets: 84 58 P ( E ) = = 0.4508, P( E ) = =0.4095, 630 630 63 5 P ( E 3 ) = =0., P ( E 4 ) = =0.0397, 630 630 Some operatios o the evets:. E E 4 = the blood type of the selected patiets is "A" ad " AB". E = φ (disjoit evets / mutually exclusive evets) E 4 P ( E E4 ) = P( φ) = 0. E E 4 = the blood type of the selected patiets is "A" or "AB" ) Kig Saud Uiversity 38

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P( E E4 ) = P E ( E E 4 ) ( Ω) 58 + 5 83 = = = 0.449 630 630 or 58 5 83 + = + = = 0. 449 630 630 630 ( ) P( ) E 4 (sice E E 4 = φ ) 3. E = the blood type of the selected patiets is ot "O". E ) = ( Ω) ( E ) = 630 84 = 346 P ( ( E ) 346 ) = = ( Ω) 630 ( E = aother solutio: P( C 0.549 E ) = P( E ) = 0.4508 = 0. 549 Notes:. E, E, E3, E4 are mutually disjoit, Ei E j = φ ( i j).. E, E, E E are exhaustive evets, E E E = Ω 3, 4 E. 3 4 Example: 339 physicias are classified based o their ages ad smokig habits as follows. Smokig Habit Daily Occasioally Not at all ( B ) ( B ) ( B 3 ) Total 0-9 ( A ) 3 9 7 47 30-39 ( A ) 0 30 49 89 40-49 ( A 3) 9 9 79 50+ ( A 4 ) 6 0 8 4 Total 76 60 03 339 Age Experimet: Selectig a physicia at radom The umb er of elemets of the sample space is ( Ω) = 339. The outcomes of the experimet are equally likely. Some evets: A = the selected physicia is aged 40-49 3 Kig Saud Uiversity 39

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 ( A ) ( Ω) 3 79 P( A 3 ) = = = 0.330 339 B = the selected physicia sm okes occasioally ( B ) ( Ω) 60 P ( B ) = = 0.770 339 = A 3 B = the selected physicia is aged 40-49 ad smokes occasioally. ( A3 B ) P ( A3 B ) = = = 0.0695 ( Ω) 339 A 3 B = the selected physicia is aged 40-49 or smokes occasioally (or both) P A B = P A + P B P A B ( ) ( ) ( ) ( ) 3 A4 = P 3 79 60 = + 339 339 339 = 0.33 + 0.77 0.0695 = 0.348 the selected physicia is ot 50 years or older. A A = A3 4 = P A4 ( A ) ( ) ( A4 ) 4 = = = 0.99 ( Ω) 339 A = the selected physicia is aged 30-39 or is aged 40-49 = the selected physicia is aged 30-49 ( ) ( A A3 ) 89 + 79 68 P A A3 = = = = 0.7906 ( Ω) 339 339 or 89 79 P( A A3 ) = P( A ) + P( A3 ) = + = 0. 7906 339 339 (Sice A = φ ) A3 A 3 Example: Suppose that there is a populatio of pregat wome with: 0% of the pregat wome delivered prematurely. 5% of the pregat wome used some sort of medicatio. 3 Kig Saud Uiversity 40

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 5% of the pregat wome de livered prematurely ad used some sort of medicatio. Experimet: Selectig a woma radomly from this populatio. Defie the evets: D = The selected woma delivered prematurely. M = The selected wome used medicatio. D M = The selected woma delivered prematurely ad used some sort of medicatio. Percetages: %( D) = 0% %( M ) = 5% %( D M ) = 5% The complemet evets: D = The selected woma did ot deliver prematurely. M = The selected wome did ot use medicatio. A Two-way table: (Percetages give by a two-way table): M M Total D 5? 0 D??? Total 5? 00 M M Total D 5 5 0 D 0 70 90 Total 5 75 00 The probabilities of the give evets are: ( D) % 0% P ( D) = = = 0. 00% 00% % ( ) ( M ) 5% P M = = = 0. 5 00% 00% % ( ) ( D M ) 5% P D M = = = 0.05 00% 00% Calculatig probabilities of some evets: D M = the selected woma delivered prematurely or used medicatio. P ( D M ) = P( D) + ( M ) P( D M ) = 0.+ 0.5 0.05 = 0.3 (by the rule) Kig Saud Uiversity 4

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 M = D = The selected woma did ot use medicatio P ( M ) = P( M ) = 0.5 = 0. 75 (by the rule) 75 P M = = 0. (from the table) ( ) 75 00 The selected woma did ot deliver prematurely P D = P D = 0.0 = 0. (by the rule) ( ) ( ) 90 90 ( D ) = = 0. 90 P (from the table) 00 D M = did ot deliver prematurely ad the selected woma did ot use medicatio. 70 P D M = = 0. ( from the table) ( ) 70 00 D M = the selected woma did ot deliver prematurely ad used medicatio. 0 ( D M ) = 0. 0 P = ( from the table) 00 D M = the selected woma delivered prematurely ad did ot use medicatio. 5 ( D M ) = = 0. 05 P (from the table) 00 D M = the selected woma delivered prematurely or did ot use medicatio. P ( D M ) = P( D) + ( M ) P( D M ) = 0.+ 0.75 0. 05 = 0.8 (by the rule) D M = the selected woma did ot de liver prematurely or used medicatio. P ( D M ) = P( D ) + ( M ) P( D M ) = 0.9 + 0.5 0.0 = 0.95 (by the rule) D M = the selected woma did ot deliver prematurely or did ot use medicatio. P ( D M ) = P( D ) + ( M ) P( D M ) = 0.9 + 0.75 0.70 = 0.95 (by the rule) Coditioal Probability: The coditioal probability of the evet A whe we kow that the evet B has already occurred is defied by: P ( ) ( A B) P A B = ; P( B) 0 P B ( ) Kig Saud Uiversity 4

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P(A B) = The coditioal probability of A give B. Notes: P () ( ) ( A B) P A B = ( A B) / ( Ω) = P( B) ( B) / ( Ω) P () ( ) ( A B) P B A = P( A) (3) For calculatig ( A B) followig: (i) P( A B) = ( A B) = ( A B) ( B) P, we may use ay oe of the P (ii) P = ( A B) P( B) ( A B) ( B) (iii) Usig the restricted table dire ctly. Multiplicatio Rules of Probability: For ts A ad B, we have: P A B = P B P A B ay two eve ( ) ( ) ( ) P (A B) = P( A) P( B A) Example: Smokig Habit Daily Occasioally Not at all ( B ) ( B ) ( B 3 ) Total 0-9 ( A ) 3 9 7 47 30-39 ( A ) 0 30 49 89 40-49 ( A3 ) 9 9 79 50+ ( A 4 ) 6 0 8 4 Total 76 60 03 339 Cosider the followig evet: (B A ) = the selected physicia smokes daily give that his ge A Kig Saud Uiversity 43

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 age is betwee 30 ad 39 ( B ) 76 P( B ) = = = 0. 59 ( Ω) 339 ( P ) ( B A ) P B A = P( A ) 0.34484 = = 0.580 0.5575 ( B A ) 0 P ( B A ) = = = 0. 34484 ( Ω) 339 ( A ( ) ) 89 P A = = = 0. 5575 ( Ω) 339 aother solutio: ( ) ( B A ) 0 P B A = = = 0.580 ( A ) 89 Notice that: P ( B ) = 0. 59 P ( B A ) = 0. 580 P ( B A ) > P ( B )!! P( B ) P( B A ) What does this mea? We will aswer this questio after talkig about the cocept of idepedet evets. Example: (Multiplicatio Rule of Probability) A traiig health program cosists of two cosecutive parts. To pass this program, the traiee must pass both parts of the program. From the past experiece, it is kow that 90% of the traiees pass the first part, ad 80% of those who pass the first part pass the secod part. If you are admitted to this program, what is the probability that you will pass the program? What is the percetage of traiees who pass the program? Solutio: Defie the followig evets: A = the evet of passig the first part Kig Saud Uiversity 44

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 B = the evet of passig the secod part A B = the evet of passig the first part ad the secod Part = the evet of passig both parts = the evet of passig the program Therefore, the probability of passig the program is P(A B). From the give iformatio: The probability of passig the first part is: 90% P(A) = 0.9 = 0.9) ( 00% The probability of passig the secod part give that the traiee has already passed the first part is: 80% P(B A) = 0.8 ( = 0.8) 00% Now, we use the multiplicatio rule to fid P(A B) as follows: P( A B) = P(A) P(B A) = (0.9)(0.8) = 0.7 We ca coclude that 7% of the traiees pass the program. Idepedet Evets There are 3 cases: (A P B) > P( A) (kowig B icreases the probability of occurrece of A) P ( A B) < P(A) (kowig B decreases the probability of occurrece of A) P ( A B) = P( A) (kowig B has o effect o the probability of occurrece of A). I this case A is idepedet of B. Idepedet Evets: Two evets A ad B are idepedet if oe of the followig coditios is satisfied: ( i ) P( A B) = P( A) ( ii ) P( B A) = P( B) ( iii ) P( B A) = P( A) P( B) Note: The third coditio is the multiplicatio rule of idepedet evets. Kig Saud Uiversity 45

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: Suppose that A ad B are two evets such that: P(A) = 0.5, P(B)=0.6, P(A B)=0.. Theses two evets are ot idepedet (they are depedet) because: P(A) P(B) =0.5 0.6 = 0.3 P(A B)=0.. P(A B) P(A) P(B) P A B Also, P(A)= 0.5 P(A B) = ) = 0. = 0. 3333. Also, P(B) = 0.6 P(B A) = ( P( B) P( A B) P( A) 0.6 0. = = 0.4 0. 5 For this example, we may calculate probabilities of all evets. We ca use a two-way table of the probabilities as follows: B B Total A 0.? 0.5 A?.?? Total 0.6?.00 We complete the table: B B Total A 0. 0.3 0.5 A 0.4 0. 0.5 Total 0.6 0.4.00 P ( A) = 0.5 P ( B) = 0.4 P ( A B) = 0.3 P ( A B) = 0.4 P ( A B) = 0. P ( A B) = P( A) + P( B) P( A B) = 0.5 + 0.6 0. = 0.9 P ( A B) = P( A) + P( B) P( A B) = 0. 5 + 0.4 0.3 = 0.6 P ( A B) = exercise P ( A B) = exercise Note: The Additio Rule for Idepedet Evets: If the evets A ad B are idepedet, the Kig Saud Uiversity 46

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P ( A B) = P( A) + P( B) P( A B) = P ( A) + P( B) P(A) P(B) (Additio rule) Example: (Readig Assigmet) Suppose that a detal cliic has urses classified as follows: Nurse 3 4 5 6 7 8 9 0 Has childre Yes No No No No Yes No No Yes No No No Wo rks at ight No No Yes Yes Yes Yes No No Yes Yes Yes Yes The experimet is to radomly choose oe of these urses. Cosider the followig evets: C = the chose urse has childre N = the chose urse works ight shift a) Fid The probabilities of the followig evets:. the chose urse has childre.. the chose urse works ight shift. 3. the chose urse has childre ad works ight shift. 4. the chose urse has childre ad does ot work ight shift. b) Fid the probability of choosig a urse who woks at ight give that she has childre. c) Are the evets C ad N idepedet? Why? d) Are the evets C ad N disjoit? Why? e) Sketch the evets C ad N with their probabilities usig Ve diagram. Solutio: We ca classify the urses as follows: N N total (Night shift) (No ight shift) C 3 (Has Childre) C 6 3 9 (No Childre) total 8 4 a) The experimet has (Ω) = equally likely outcomes. ( C) 3 ( Ω) ( N) = ( Ω) P(The chose urse has childre) = P(C) = = = 0. 5 P(The chose urse works ight shift) = P(N) = P(The chose urse has childre ad works ight shift) ( C I N) ( Ω) = P(C N)= = = 0. 6667 8 = 0.6667 P(The chose urse has childre ad does ot work ight shift) Kig Saud Uiversity 47

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 ( C I N ) = P( C I N ) = = = 0. 0833 (Ω) b) The probability of choosig a urse who woks at ight give that she has childre: P( C I N) / P( N C) = = = 0.6667 P( C) 0.5 c) The evets C ad N are idepedet because P ( N C) = P( N ). d) The evets C ad N ot are disjoit because C N φ. (Note: (C N)=) e) Ve diagram Kig Saud Uiversity 48

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 3.5 Bayes' Theorem, Screeig Tests, Sesitivity, Specificity, ad Predictive Value Positive ad Negative: (pp.79-83) There are two states regardig the disease ad two states regardig the result of the screeig test: We defie the followig evets of iterest: D : the idividual has the disease (presece of the disease) D : the idividual does ot have the disease (absece of The disease) T : the idividual has a positive screeig test result T : the idividual has a egative screeig test result There are 4 possible situatios: True status of the disease +ve (D: Preset) -ve ( D :Abset) Result of +ve (T) Correct diagosig false positive result the test -ve (T ) false egative result Correct diagosig Defiitios of False Results: There are two false results:. A false positive result: This result happes whe a test idicates a positive status whe the true status is egative. Its probability is: P ( T D) = P(positive result absece of the disease). A false egative result: This result happes whe a test idicates a egative status whe the true status is positive. Its probability is: P ( T D) = P(egative result presece of the disease) Defiitios of the Sesitivity ad Specificity of the test:. The Sesitivity: Kig Saud Uiversity 49

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P The sesitivity of a test is the probability of a positive test result give the presece of the disease. ( T D) = P(positive result of the test presece of the disease). The specificity: The specificity of a test is the probability of a egative test result give the absece of the disease. P ( T D) = P(egative result of the test absece of the disease) To clarify these cocepts, suppose we have a sample of () subjects who are cross-classified accordig to Disease Status ad Screeig Test Result as follows: Disease Test Result Preset (D) Abset ( D ) Total Positive (T) a b a + b = (T) Negative (T ) c d c + d = (T ) Total a + c = (D) b + d = ( D ) For example, there are (a) subjects who have the disease ad whose screeig test result was positive. From this table we may compute the followig coditioal probabilities:. The probability of false positive result: P( T D) ( T D) = = ( D) b. The probability of false egative result: ( T D) c P( T D) = = ( D) a + c 3. The sesitivity of the screeig test: ( T D) a P ( T D) = = ( D) a + c 4. The specificity of the screeig test: ( T D) d P ( T D ) = = ( D) b + d b + d Kig Saud Uiversity 50

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Defiitios of the Predictive Value Positive ad Predictive Value Negative of a Screeig Test:. The predictive value positive of a screeig test: The predictive value positive is the probability that a subject has the disease, give that the subject has a positive screeig test result: P ( D T ) = P(the subject has the disease positive result) = P(presece of the disease positive result). The predictive value egative of a screeig test: The predictive value egative is the probability that a subject does ot have the disease, give that the subject has a egative screeig test result: P( D T ) = P(the subject does ot have the disease egative result) = P(absece of the disease egative result) Calculatig the Predictive Value Positive ad Predictive Value Negative: (How to calculate P ( D T ) ad P ( D T )): We calculate these coditioal probabilities usig the kowledge of:. The sesitivity of the test = P ( T D). The specificity of the test = P( T D) 3. The probability of the relevat disease i the geeral populatio, P(D). ( It is usually obtaied from aother idepedet study) Calculatig the Predictive Value Positive, P(T D) P(D T) = P(T ) P( D T ) : Kig Saud Uiversity 5

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 But we kow that: P(T) = P(T D) + P(T D) P(T D) P(D) (multiplicatio rule) P(T D) P(D) (multiplicatio rule) P(T D) = P(T D) = P(T) = P(T D) P(D) + P(T D) P(D) Therefore, we reach the followig versio of Bayes' Theorem: P(D T) = P(T D) P(D) P(T D) P(D) + P(T D) P(D) () Note: P(T D) = sesitivity. P(T D) = P( T D) = specificity. P(D) = The probability of the relevat disease i populatio. P(D) = - P(D). the geeral Calculatig the Predictive Value Nega tive, P( D T ) : To obtai the predictive value egative of a screeig test, we use the followig statemet of Bayes' theorem: P(T D) P(D) P(D T) = P(T D) P(D) + P(T D) P(D) () Note: P( T D) = specificity. P( T D) = P ( T D) = sesitivity. Example: A medical research team wished to evaluate a proposed screeig test for Alzheimer's disease. The test was give to a radom sample of 450 patiets with Alzheimer's disease ad a idepedet radom sample of 500 patiets without symptoms of the disease. The two samples were draw from populatios of subjects who were 65 years of age or older. The results are as follows: Kig Saud Uiversity 5

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Alzheimer Disease Test Result Preset (D) Abse t ( D ) Total Positive (T) 436 5 44 Negative (T ) 4 495 509 Total 450 500 950 Base d o aother idepedet study, it is kow that the percetage of patiets with Alzheimer's disease (the rate of prevalece of the disease) is.3% out of all subjects who were 65 years of age or older. Solutio: Usig these data we estimate the followig quatities:. The sesitivity of the test:. The specificity of the test: (T D) 436 P(T D) = = = 0.9689 (D) 450 ( T D) P(T D) = ( D) = 495 500 = 0.99 3. The probability of the disease i the geeral populatio, P(D): The rate of disease i the relevat geeral populatio, P(D), caot be computed from the sample data give i the table. However, it is give that the percetage of patiets with Alzheimer's disease is.3% out of all subjects who were 65 year s of age or older. Therefor e P(D) ca be computed to be:.3 % P(D) = = 0.3 00 % 4. The predictive value positive of the test: We wish to estimate the probability that a subject who is positive o the test has Alzheimer disease. We use the Bayes' formul a of Equati o (): P(T D) P(D) P(D T) =. P(T D) P(D) + P(T D) P(D) Fro m the tabulated dat a we compute: Kig Saud Uiversity 53

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 436 P(T D) = 0.9689 450 (T D) 5 P(T D) = = = 0.0 ( D) 500 = (From part o. ) Substitutig of these results ito Equatio (), we get: P(D T) (0.9689) P(D) = (0.9689) P(D) + (0.0) P(D) (0.9689) (0.3) = (0.9689) (0.3) + (0.0) (- 0.3) = 0.93 As we see, i this case, the predictive value positive of the test is very high. 5. The predictive value egative of the test: We wish to estimate the probability that a subject who is egative o the test does ot have Alzheimer disease. We use the Bayes' formula of Equatio (): To compute probabilities: P(T D) P(D) P( D T) = P( T D) P(D) + P(T D) P(D) P( D T), we first compute the followig 495 P( T D) = 0.99 500 P( D) = - P(D) = - 0.3 = 0.887 = (From part o. ) ( T D) 4 P( T D) = = 0.03 ( D) 450 = Substitutio i Equatio () gives: P( D T) = P(T D) P(D) (0.99)(0.887) = (0.99)(0.887) + (0.03)(0.3) = 0.996 P(T D) P(D) P(T D) P(D) As we see, the predictive value egative is also very high. + Kig Saud Uiversity 54

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER 4: Probabilistic Features of Certai Data Distributio (Probability Distributios) 4. Itroductio : The cocept of radom variables is very importat i Statistics. Some evets ca be defied usig radom variables. There are two types of radom variables: Radom variables Discrete Radom Variables Cotiuous Radom Variables 4. Probability Distributios of Discrete Radom Variables: Defiitio: The probability distributio of a discrete radom variable is a table, graph, formula, or other device used to specify all possible values of the radom variable alog with their respective probabilities. Examples of discrete r v. s The o. of patiets visitig K KUH i a week. The o. of t imes a pers o had a cold i last year. Exam ple: Cosider the followig discrete radom variable. X = The umber of times a Saudi perso had a cold i Jauary 00. Suppose we are able to cout the o. of Saudis which X = x: x Frequecy of x (o. of colds a Saudi perso had (o. of Saudi people who had a i Jauary 00) cold x times i Jauary 00) 0 0,000,000 3,000,000,000,000 3,000,000 Total N = 6,000,000 Kig Saud Uiversity 55

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Note that the possible values of the radom variable X are: x = 0,,, 3 Experimet: Selectig a perso at radom Defie the evet: (X = 0) = The evet that the selected perso had o cold. (X = ) = The evet that the selected perso had cold. (X = ) = The evet that the selected perso had colds. (X = 3) = The evet that the selected perso had 3 colds. I geeral: (X = x) =The evet that the selected perso had x colds. ( Ω) = 6,000,000 equally likely For this experimet, there are outc omes. The umber of elemets of the evet (X = x) is: (X=x) = o. of Saudi people who had a cold x times i Jauary 00. = frequecy of x. The probability of the evet (X = x) is: x P ( X = x) = ( X = x) (X = = ( Ω) 600000 freq. of x ( X x) x) 0, for x=0,,, 3 ( X x) P = ( X x) = = 6000000 = (Relative frequecy) 0 0000000 0.650 3000000 0.875 000000 0.50 3 000000 0.065 Total 6000000.0000 Note: ( ) ( X = x) frequecy P X = x = = Relative Frequecy = 6000000 6000000 The probability distributio of the give by the followig tabl e: discrete radom variable X is Kig Saud Uiversity 56

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x P ( X = x) = f (x) 0 0.650 0.874 0.50 3 0.065 Total.0000 Notes: The probability distributio of ay discrete radom variable X must satisfy the followig two properties: () 0 P ( X = x) () P( X = x) = x Usig the probability distributio of a discrete r.v. we ca fid the probability of ay evet expressed i term of the r.v. X. Example: Cosider the discrete r.v. X i the previous example. x P ( X = x) 0 0.650 0.875 0.50 3 0.065 Total.0000 () P ( ) = P( X = ) + P( X = 3) = 0.50 + 0. 065 = 0. () P ( X > ) = P( X = 3) = 0. 065 [ote: ( X ) P( (3) ( X < 3) = P( X = ) + P( X = ) = 0. 875 + 0.50 = (4) P ( X ) = P( X = 0) + P( X = ) + P( X = ) X 875 P > X )] P 0. 35 = 0.650 + 0.875 + 0.50 = 0. 9375 aother solutio: P( X ) = P (( X ) ) = P( X > ) = P( X = 3) = 0.65= 0. 9375 (5) P ( X < ) = P( X = 0) + P( X = ) = 0.650 + 0.875 = 0. 85 Kig Saud Uiversity 57

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 (6) P (.5 X <.3) = P( X = 0) + P( X = ) = 0.650 + 0.875 = 0.85 (7) P( X = 3.5) = P( φ) = 0 P ( X 0 ) = P( X = 0) + P X = + P( X = ) + P( X = 3) = P( Ω) = (8) ( ) (9) The proba bility tha t the selec ted perso had at least cold: P ( X ) = P( X = ) + P( X = 3) = 0. 875 (0) The probability that the selected p erso had at most colds: P ( X ) = 0. 9375 () The probabil ity that the selected perso had more tha colds: P( X > ) = P( X = 3) = 0. 065 () The probability that the selected perso had less tha colds: P ( X < ) = P( X = 0) + P( X = ) = 0. 85 Graphical Presetatio: The probability distributio of a discrete r. v. X ca be graphically represeted. Example: The probability distributio of the radom variable i the previous example is: x P ( X = x) 0 3 0.650 0.875 0.50 0.065 The graphical presetatio of this probability distributio is give by the followig figure: Kig Saud Uiversity 58

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Mea ad Varia ce of a Discrete Radom Variable Mea: The mea (or expected value) of a discrete radom variable X is deoted by µ or µ X. It is defied by: µ = x P( X = x) x Variace: The variace of a discrete radom variable X is deoted by σ or σ X. It is defied by: σ = ( x µ) P( X = x) x Example: We wish to calculate the mea µ ad the variace of the discrete r. v. X whose probability distributio is give by the followig table: x P ( X = x) 0 0.05 0.5 3 0.45 0.5 Solutio: x P ( X = x) x P( X = x) ( µ ) x ( x µ) ( x µ ) P( X = x) 0 0.05 0 -.9 3.6 0.805 0.5 0.5-0.9 0.8 0.05 0.45 0.9 0. 0.0 0.0045 3 0.5 0.75.. 0.305 Total µ = σ = x P( X = x) ( x µ ) P( X = x) =.9 = 0.69 ( X = x) = ( 0 )( 0.05) + ( )( 0.5) + ( )( 0.45) + ( 3)( 0.5) =. 9 µ = x P x = ( x.9) P X = x σ ( ) x = ( 0.9) ( 0.05) + (.9) ( 0.5) + (. 9) ( 0.45) + ( 3.9) ( 0.5) = 0.69 Kig Saud Uiversity 59

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Cumulative Distributios: The cumulative distributio fuctio of a discrete r. v. X is defied by: ( X x) = P( X = a) P (Sum over all values x) a x Example: Calculate the cumulative distributio of the discrete r. v. X whose probability distributio is give by the followig table: x P ( X = x) 0 0.05 0.5 0.45 3 0.5 Use the cumulative distributio to fid: P(X ), P(X<), P(X.5), P(X<.5), P(X>), P(X ) Solutio: The cumulative distributio of X is: x P( X x) 0 3 0.05 0.30 0.75.0000 ( X 0 ) = P ( X = 0) ( X ) = P( X = 0) + P( X = ) ( X ) = P( X = 0) + P( X = ) + P( X = ) ( X 3 ) = P( X = 0) + + P( X = 3) P P P P L Usig the cumulative distributio, P(X ) = 0.75 P(X<) = P(X ) = 0.30 P(X.5) = P(X ) = 0.30 P(X<.5) = P(X ) = 0.30 P(X>) = - P( ( X > ) ) = -P(X ) = - 0.30 = 0.70 P(X ) = - P( ( X ) ) = -P(X<) = - P(X 0) = - 0.05 = 0.95 Example: (Readig Assigmet) Give the followig probability distributio of a discrete radom variable X represetig the umber of defective teeth of the patiet visitig a Kig Saud Uiversity 60

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 certai detal cliic: x P(X = x) 0.5 0.35 3 0.0 4 0.5 5 K a) Fid the value of K. b) Fid the flowig probabilities:. P(X < 3). P( X 3) 3. P(X < 6) 4. P(X < ) 5. P(X = 3.5) c) Fid the probability that the patiet has at least 4 defective teeth. d) Fid the probability that the patiet has at most defective teeth. e) Fid the expected umber of defective teeth (mea of X). f) Fid the variace of X. Solutio: a) = P( X = x) = 0.5 + 0.35 + 0.0 + 0. 5 + K = 0. 95 + K K = 0.95 K = 0.05 The probability distributio of X is: x P(X = x) 0.5 0.35 3 0.0 4 0.5 5 0.05 Total.00 b) Fidig the probabilities:. P(X < 3) = P(X=)+P(X=) = 0.5+0.35 = 0.60. P( X 3) = P(X=)+P(X=)+P(X=3) = 0.8 3. P(X < 6) = P(X=)+P(X=)+ P(X=3)+P(X=4)+P(X=5)= P(Ω)= 4. P(X < ) = P(φ)=0 5. P(X = 3.5) = P(φ)=0 c) The probability that the patiet has at least 4 defective teeth P(X 4) = P(X=4)+P(X=5) =0.5+0.05=0. d) The probability that the patiet has at most defective teeth P(X ) = P(X=)+P(X=) = 0.5+0.35=0.6 Kig Saud Uiversity 6

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 e) The expected umber of defec tive teeth (mea of X) x P(X = x) x P(X = x) 0.5 0.5 0.35 0.70 3 0.0 0.60 4 0.5 0.60 5 0.05 0.5 Total P( X = x) = µ = x P( X = x) =.4 The expected umber of defective teeth (mea of X) is µ = x P( X = x) =()(0.5)+()(0.35)+(3)(0.)+(4)(0.5)+(5)(0.05)=.4 f) The variace of X: x P ( X = x) ( x µ ) ( x µ) ( x µ ) P( X = x) 0.5 -.4.96 0.49 0.35-0.4 0.6 0.056 3 0.0 0.6 0.36 0.07 4 0.5.6.56 0.384 5 0.05.6 6.76 0.338 Total σ = ( x µ ) P( X = x) =.34 The variace is σ = ( x µ ) P( X = x) =.34 Kig Saud Uiversity 6

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Combiatios: Notatio (!):! is read " factorial". It defied by: ( )( ) L( )( )! = for 0!= 5! = 5 4 3 = Example: ()()( )( )( ) 0 Combiatios: The umber of differet ways for selectig r objects from distict objects is deoted by C r or ad is give by: r! C r = ; for r = 0,,, K, r! r! ( ) Notes:. Cr is read as choose r.. C =, C 0 =, 3. Cr = C r (for example: 0C 3 = 0C 7 ) 4. Cr = umber of uordered subsets of a set of () objects s uch that each subset cotais (r) objects. Example: For = 4 ad r = : 4! 4! 4 3 C 4 = = = = = 6! ( 4 )!!! ( ) ( ) 4 C = 6 = The umber of differet ways for selectig objects from 4 distict objects. Example: Suppose th at we have the set {a, b, c, d} of (=4) objects. We wish to choose a subset of two objects. The possible subsets of this set with elemets i each subset are: {a, b}, {a, c}, {a, d}, {b, d}, {b, c}, {c, d} The umber of these subsets is 4C = 6. Kig Saud Uiversity 63

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 4.3 Biomial Distributio: Beroulli Trial: is a experimet with oly two possible outcomes: S = success ad F= failure (Boy or girl, Saudi or o-saudi, sick or well, dead or alive). Biomial distributio is a discrete distributio. Biomial distributio is used to model a experimet for which:. The experimet has a sequece of Beroulli trials.. The probability of success is P ( S ) = p, ad the probability of failure is P ( F ) p = q 3. The pr obability of success ( S ) p =. P = is costat for each trial. trial has 4. The trials are idepedet; that is the outcome of oe o effect o the outcome of ay other trial. I this type of experimet, we are iterested i the represetig the umber of successes i the trials. discrete r. v. X = The umber of successes i the trials The possible values of X (umber of success i trails) are: x = 0,, 3,, The r.v. X has a biomial distributio with parameters ad p, ad we write: X ~ Biomial, p ( ) The probability distributio of X is give by: x x Cx p q for x = 0,,, K, P( X = x) = 0 otherwise Where: C x! = x! ( x)! We ca write the probability distribu tio of X as a table as follows. x P ( X = x) 0 0 0 C p q = q 0 C p q Kig Saud Uiversity 64

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x P ( X = x) C p q M M C Total.00 p q 0 C p q = Result: (Mea ad Variace for ormal distributio) If X~ Biomial(, p), the The mea: µ = p (expected valu e) The variace: σ = pq Example: Suppose that the probability that a Saudi ma has high blood pressure is 0.5. Suppose that we radomly select a sample of 6 Saudi me. () Fid the probability distributio of the radom variable (X) represetig the umber of me with high blood pressure i the sample. ( ) Fid the expected umber of me with high blood pressure i the sample (mea of X). (3) Fid the variace X. (4) What is the probability that there will be exactly me with high blood pressure? (5) What is the probability that there will be at most me with high blood pressure? (6) What is the probabil ity that there will be at lease 4 me with high blood pressure? Solutio: We are iterested i the followig radom variable: X = The umber of me with high b lood pressure i the sample of 6 me. No tes: - Beroulli trial: diagosig whether a ma has a high blood pressure or ot. There are two outcomes for each trial: p Kig Saud Uiversity 65

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 S = Success: The ma has high blood pressure F = failure: The ma does ot have high blood pressure. - Number of trials = 6 (we eed to check 6 me) - Probability of success: P ( S ) = p = 0.5 - P robability of failure: P ( F ) = q = p = 0. 85 - Number of trials: = 6 - The trials are idepedet because of the fact that the result of each ma does ot affect the result of ay other ma sice the selectio was made ate radom. The radom variable X has a biomial distributio with parameters: =6 ad p=0.5, that is: X ~ Biomial (, p) X ~ Biomial (6, 0.5) The possible values of X are: x = 0,, 3, 4, 5, 6 () The probability distributio of X is: P ( X = x) = 6 x ( 0.5) ( 0. ) Cx 85 0 6 x ; x = 0,,,3, 4,5, 6 ; otherwise The probabilities of all values of X are: P P P ( X = 0) = 6 C0 0 6 0 6 ( 0.5) ( 0.85) = ( )( 0.5) ( 0.85) = 0. 3775 ( X =) = 6 C 5 5 ( 0.5) ( 0.85) = ( 6)( 0.5)( 0.85) = 0. 39933 ( X = ) = 6 C 4 4 ( 0.5) ( 0.85) = ( 5)( 0.5) ( 0.85) = 0. 768 ( X = 3) = 6C3 3 3 3 3 ( 0.5) ( 0.85) = ( 0)( 0.5) ( 0.85) = 0. 044 ( X = 4) = 6 C4 4 4 ( 0.5) ( 0.85) = ( 5)( 0.5) ( 0.85) = 0. 00549 ( X = 5) = 6 C5 5 5 ( 0.5) ( 0.85) = ( 6)( 0.5) ( 0.85) = 0. 0003 ( X = 6) = C 6 0 6 ( 0.5) ( 0.85) = ( )( 0.5) ( ) 0. 0000 P 5 P P 9 P 6 6 = The probability distributio of X ca by preseted by the followig table: Kig Saud Uiversity 66

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x P ( X = x) 0 0.3775 0.39933 0.768 3 0.0445 4 0.00549 5 0.00039 6 0.0000 The probability distributio of X ca by preseted by the followig graph: () The mea of the distributio (the expected umber of me out of 6 with high blood pressure) is: µ = p = ( 6 )( 0.5) = 0. 9 (3) The variace is: σ = pq = ( 6)( 0.5)( 0.85) = 0. 765 (4) The probability that there will be exactly me with high blood pressure is: P(X = ) = 0.768 (5) The probability that there will be at most me with high blood pressure is: P(X ) = P(X=0) + P(X=) + P(X=) = 0.3775 + 0.39933 + 0.768 = 0.9566 (6) The probability that there will be at lease 4 me with high blood pressure is: Kig Saud Uiversity 67

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P(X 4) = P(X=4) + P(X=5) + P(X=6) = 0.00549 + 0.00039 + 0.0000 = 0.00589 Example: (Readig Assigmet) Suppose that 5% of the people i a certai populatio have low hemoglobi levels. The experimet is to choose 5 people at radom from this populatio. Let the discrete radom variable X be the umber of people out of 5 with low hemoglobi levels. a) Fid the probability distributio of X. b) Fid the probability that at least people have low hemoglobi levels. c) Fid the probability that at most 3 people have low hemoglobi levels. d) Fid the expected umber of people with low hemoglobi levels out of the 5 people. e) Fid the variace of the umber of people with low hemoglobi levels out of the 5 people. Solutio: X = the umber of people out of 5 with low hemoglobi levels The Beroulli trail is the process of diagosig the perso Success = the perso has low hemoglobi Failure = the perso does ot have low hemoglobi = 5 (o. of trials) p = 0.5 (probability of success) q = p = 0.75 (probability of failure) a) X has a biomial distributio with parameter = 5 ad p = 0. 5 X ~ Biomial (, p) X ~ Biomial( 5, 0.5) The possible values of X are: x=0,,, 3, 4, 5 The probability distributio is: x p q ; for x 0,,, K, ( = ) = x x C = P X x 0 ; otherwise P ( X = x) 5C = 0 x 5 x x (0.5) (0.75) ; ; for otherwise x = 0,,, 3,4,5 x P(X = x) 0 5 0 0 5C0 0.5 0. 75 = 0.3730 Kig Saud Uiversity 68

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 x P(X = x) 5 C 5 0.5 0. 75 = 0.3955 5C 5 0.5 0. 75 = 0.6367 3 3 5 3 5 C 3 0.5 0. 75 = 0.08789 4 4 5 4 5C 4 0.5 0. 75 = 0.0465 5 5 5 5 5 C 5 0.5 0. 75 = 0.00098 Total P ( X = x) = b) The probability that at least people have low hemoglobi levels: P(X ) = P( X= )+P(X=3)+P(X=4)+P(X=5) = 0.6367+ 0.08789+ 0.0465+ 0.00098 = 0. 0.3679 c) The probability that at most 3 people have low hemoglobi levels: P(X 3) = P(X=0)+P(X=)+P(X=)+P(X=3) = 0.3730+ 0.3955+ 0.6367+ 0.08789 = 0.98437 d) The expected umber of people with low hemoglobi levels out of the 5 people (the mea of X): µ = p = 5 0.5=. 5 e) The variace of the umber of people with low hemoglobi levels out of the 5 people (the variace of X) is: σ = pq = 5 0.5 0.75 = 0.9375 4.4 The Poisso Distributio: It is a discrete distributio. The Poisso distributio is used to model a discrete r. v. represetig the umber of occurreces of some radom evet i a iterval of time or space (or some volume of matter). The possible values of X are: x= 0,,, 3, The discrete r. v. X is said to have a Poisso distributio with parameter (average or mea) λ if the probability distributio of X is give by Kig Saud Uiversity 69

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P ( X = x) = λ x e λ x! 0 ; ; for x = otherwise 0,,, 3, K where e =.788 (the atural umber). We write : X ~ Poisso (λ) Result: (Mea ad Variace of Poisso distributio) If X ~ Poisso (λ), the: The mea (average) of X is : µ = λ (Expected va lue) The variace of X is: σ = λ Example: Some radom quatities that ca be modeled by Poisso distributio: No. of patiets i a waitig room i a hours. No. of surgeries performed i a moth. No. of rats i each house i a particular city. Note: λ is the average (mea) of the distributio. If X = The umber of patiets see i the emergecy uit i a day, ad if X ~Poisso (λ), the:. The average (mea) of patiets see every day i the emergecy uit = λ.. The average (mea) of patiets see every moth i the emergecy uit =30λ. 3. The average (mea) of patiets see every year i the emergecy uit = 365λ. 4. The average (mea) of patiets see every hour i the emergecy uit = λ/4. Also, otice that: (i) If Y = The umber of patiets see every moth, the: Kig Saud Uiversity 70

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Y ~ Poisso (λ * ), where λ * =30λ (ii) W = The umber of patiets see every year, the: W ~ Poisso (λ * ), where λ * =365λ (iii) V = The umber of patiets see every hour, the: V ~ Poisso (λ * ), where λ * = Example: Suppose that the umber of sake bites cases see at KKUH i a year has a Poisso distributio with average 6 bite cases. ( ) What is the probability that i a year: ( i) The o. of sake bite cases will be 7? (ii) The o. of sake bite cases will be less tha? () What is the probability that there will be 0 sake bite cases i year s? (3) What is the probability that there will be o sake bite cases i a moth? Solutio: λ 4 () X = o. of s ake bite cases i a year. X ~ Poisso (6) (λ=6) 6 x e 6 P( X = x) = ; x! x = 0,,, K (i) 6 7 e 6 P( X = 7) = = 0. 3768 7! (ii) ( X ) = P( ) + P( ) P < X = 0 X = 6 0 6 e 6 e 6 = + = 0.0048 + 0.0487 = 0.0735 0!! () Y = o of sake bite cases i years * λ Y ~ Poisso() ( = λ = ( )( 6) = e y! ( = y) = : y = 0,, K P Y P e ( Y = 0) = = 0. 048 y 0! 0 (3) W = o. of sake bite cases i a moth. * W ~ Poisso (0.5) ( = 6 * λ = λ = 0. 5 ) ) Kig Saud Uiversity 7

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 e 0.5 0.5 w! ( = w) = : w = 0,, K P W ( = 0) P W e = 0.5 ( 0.5) 0! 0 w = 0.6065 4.5 Cotiuous Probability Distributios: For ay cotiuous r. v. X, there exists a fuctio f(x), calle d the probability desity fuctio (pdf) of X, for which: () The total area uder the curve of f(x) equals to. = b Total area f ( x) dx = P( a X b) = f ( x) dx = area () The probability hat X is betwee the poits (a) ad (b) equals to the area uder the curve of f(x) which is bouded by the poit a ad b. (3) I geeral, the probability of a iterval evet is give by the area uder the curve of f(x) ad above that iterval. a P P(X a)= f ( x) dx = area P (X b) = f (x) dx = area b ( a X b ) = f ( x) dx = area a Note: If X is cotiuous r.v. th e:. P ( X = a) = 0 for ay a.. P ( X a) = P( X < a) a b Kig Saud Uiversity 7

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 3. P ( X b) = P( X > b) 4. a X b = P a X < b = P a < X 5. P( X x)= cumulative probability 6. P( X a) = P( X < a) = P( X a) 7. P a X b = P X b P X a ( ) ( ) ( b) = P( a < X b) P < ( ) ( ) ( ) P ( X a) = P( X a) P( a X b) = P( X b) P( X a ) A = B Total area = f ( x) dx = f ( x) dx f ( x) dx 4.6 The Normal D istributio: Oe of t he most importat cotiuous distributios. May measurable characteristics are ormally or approximately ormally distributed. (Examples: height, weight, ) The probability desity fuctio of the ormal distributio is give by: x µ ( ) σ f ( x) = e ; < x < σ π where (e=.788) ad (π=3.459). The parameters of the distributio are the mea ( µ) ad the stadard deviatio ( σ). The cotiuous r. v. X which has a ormal distributio has several importat characteristics:. < X <,. The desity fuctio of X, f(x), has a bell-shaped curve: b a b a Kig Saud Uiversity 73

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 mea = µ stadard deviatio = σ variace = σ 3. The highest poit of the curve of f(x) at the mea µ. (Mode = µ ) 4. The curve of f(x) is symmetric about the mea µ. µ = mea = mode = media 5. The ormal distributio depeds o two parameters: mea = µ (determies the locatio) stadard deviatio = σ (determies the shape) 6. If the r.v. X is ormally distributed with mea µ ad stadard deviatio σ (variace σ ), we write: X ~ Normal ( µ,σ ) or X ~ N( µ,σ ) 7. The locatio of the ormal distributio depeds o µ. The shape of the ormal distributio depeds o σ. Not e: The locatio of the ormal distributio depeds o µ ad its shape depeds o σ. Suppose we have two ormal distributios: N(µ, σ ) ----------- N(µ, σ ) µ < µ, σ =σ Kig Saud Uiversity 74

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 µ = µ, σ < σ µ < µ, σ <σ The Stadard Normal Distributio: The ormal distributio with mea µ = 0 ad variace σ = is called the stadard ormal distributio ad is deoted by Normal (0,) or N(0,). The stadard ormal radom variable is deoted by (Z), ad we write: Z ~ N(0, ) The probability desity fuctio (pdf) of Z~N(0,) is give by: f ( z) = ( z;0,) = z e π The stadard ormal distributio, Normal (0,), is very importat because pro babilities of ay ormal distributio ca be calculated from the probabilities of the stadard ormal distributio. Kig Saud Uiversity 75

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Result: If X ~ Normal (,σ ) µ, the X µ Z = ~ Normal (0,). σ Calculatig Probabilities of Normal (0,): Suppose Z ~ Normal (0,). For the stadard ormal distributio Z ~ N(0,), there is special table used to calculate probabilities of the form: P Z a ( ) a (i) P( Z a)= From the table (ii) ( ) ( ) P Z b = P Z b Where: P( Z b) = From the table (iii) P( a Z b) = P( Z b) P( z a) Where: P Z b P z a ( ) ( ) = from the table = from the table Kig Saud Uiversity 76

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 ( ) 0 (iv) P Z = a = for every a. Example: Suppose that Z ~ N(0,) () P ( Z.50) = 0. 933 Z 0.00 0. 0 :.50 0.933 : () P ( Z.98) 0 = P( Z 0.98) = 0.8365 = 0.635 Z 0.00 0.08 : : : : 0. 90 0. 8365 (3) P (. 33 Z. 4) P ( Z.4) = P ( Z.33) = 0.99 0.098 = 0.9004 Z - 0.03 : :.30 0.098 : (4) P ( Z 0 ) = P( Z 0) = 0. 5 N otatio: P( Z Z A ) = A Kig Saud Uiversity 77

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 For example: Result: Sice the pdf of Z~N(0,) is symmetric about 0, we have: Z A = Z A For example: Z 0.35 = Z 0.35 = Z 0.65 Z 0.86 = Z 0.86 = Z 0.4 Example: Suppose that Z ~ N(0,). If P ( Z a) = 0. 9505 The a =.65 Z 0.05 :.60 0.9505 : Kig Saud Uiversity 78

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: Suppose that Z~N(0,). Fid the value of k such that P(Z k)= 0.007. Solutio:.k =.04 Notice that k= Z 0.007 =.04 Z -0.04 : :.0 0.007 : Example: If Z ~ N(0,), the: Z 0.90 =.85 Z 0.95 =.645 Z 0.975 =.96 Z 0.99 =.35 Usig the result: Z A = Z A Z 0.0 = Z 0.90 =.85 Z 0.05 = Z 0.95 =.645 Z 0.05 = Z 0.975 =.96 Z 0.0 = Z 0.99 =.35 Calculatig Probabilities of Normal ( µσ, ) Recall the result: X ~ Normal ( µσ, ) X µ = ~ σ Z Normal (0,) : Kig Saud Uiversity 79

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 X a X µ a µ a µ Z σ σ σ. ( X a) = P Z a µ σ P = From the table.. P( X a) = P( X a) = P Z 3. P X a µ σ a ( a X b) = P( X b) P( ) ( ) 0 = P Z 4. P X = a =, for every a. b µ µ P Z a σ σ 4.7 Normal Distributio Applicatio: Example Suppose that the hemoglobi levels of healthy adult males are approximately ormally distributed with a mea of 6 ad a variace of 0.8. (a) Fid that probability that a radomly chose healthy adult male has a hemoglobi level less tha 4. (b) What is the percetage of healthy adult males who have hemoglobi level less tha 4? (c) I a populatio of 0,000 healthy adult males, how may would you expect to have hemoglobi level less tha 4? Solutio: X = hemoglobi level for healthy adults males Mea: µ = 6 Variace: σ = 0.8 Stadard deviatio: σ = 0.9 X ~ Normal (6, 0.8) (a) The probability that a radomly chose healthy adult male has hemoglobi level less tha 4 is ( X 4) P. Kig Saud Uiversity 80

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P 4 µ = P Z σ 4 6 = P Z 0.9 = P ( X 4) ( Z.) = 0.03 (b) The percetage of healthy adult males who have hemoglobi level less tha 4 is: P X 4 00 % = 0.03 00 % =.3 ( ) % (c) I a populatio of 0000 healthy adult males, we would expect that the umber of males with hemoglobi level less tha 4 to be: P X 4 0000 = 0.03 0000 = 3 ( ) males Example: Suppose that the birth weight of Saudi babies has a ormal distributio with mea µ=3.4 ad stadard deviatio σ=0.35. (a) Fid the probability that a radomly chose Saudi baby has a birth weight betwee 3.0 ad 4.0 kg. (b) What is the percetage of Saudi babies who have a birth weight betwee 3.0 ad 4.0 kg? (c) I a populatio of 00000 Saudi babies, how may would you expect to have birth weight betwee 3.0 ad 4.0 kg? Solutio: X = birth weight of Saudi babies Mea: µ = 3.4 Stadard deviatio: σ = 0.35 Variace: σ = (0.35) = 0.5 X ~ Normal (3.4, 0.5 ) (a) The probability that a radomly chose Saudi baby has a P 3.0 < X < 4.0 birth weight betwee 3.0 ad 4.0 kg is ( ) Kig Saud Uiversity 8

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P ( 3.0 < X < 4.0) = P( X 4.0) P( X 3.0) 4.0 µ 3.0 µ = P Z P Z σ σ 4.0 3.4 3.0 3.4 = P Z P Z 0.35 0.35 = P ( Z.7) P( Z.4) = 0.9564 0.7 = 0.893 (b) The percetage of Saudi babies who have a birth weight betwee 3.0 ad 4.0 kg is P(3.0<X<4.0) 00%= 0.893 00%= 8.93% (c) I a populatio of 00,000 Saudi babies, we would expect that the umber of babies with birth weight betwee 3.0 ad 4.0 kg to be: P(3.0<X<4.0) 00000= 0.893 00000= 8930 babies Kig Saud Uiversity 8

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Stadard Normal Table Areas Uder the Stadard Normal Curve z -0.09-0.08-0.07-0.06-0.05-0.04-0.03-0.0-0.0-0.00 z -3.50 0.0007 0.0007 0.0008 0.0009 0.0009 0.0000 0.000 0.000 0.000 0.0003-3.50-3.40 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 0.00030 0.0003 0.0003 0.00034-3.40-3.30 0.00035 0.00036 0.00038 0.00039 0.00040 0.0004 0.00043 0.00045 0.00047 0.00048-3.30-3.0 0.00050 0.0005 0.00054 0.00056 0.00058 0.00060 0.0006 0.00064 0.00066 0.00069-3.0-3.0 0.0007 0.00074 0.00076 0.00079 0.0008 0.00084 0.00087 0.00090 0.00094 0.00097-3.0-3.00 0.0000 0.0004 0.0007 0.00 0.004 0.008 0.00 0.006 0.003 0.0035-3.00 -.90 0.0039 0.0044 0.0049 0.0054 0.0059 0.0064 0.0069 0.0075 0.008 0.0087 -.90 -.80 0.0093 0.0099 0.0005 0.00 0.009 0.006 0.0033 0.0040 0.0048 0.0056 -.80 -.70 0.0064 0.007 0.0080 0.0089 0.0098 0.00307 0.0037 0.0036 0.00336 0.00347 -.70 -.60 0.00357 0.00368 0.00379 0.0039 0.0040 0.0045 0.0047 0.00440 0.00453 0.00466 -.60 -.50 0.00480 0.00494 0.00508 0.0053 0.00539 0.00554 0.00570 0.00587 0.00604 0.006 -.50 -.40 0.00639 0.00657 0.00676 0.00695 0.0074 0.00734 0.00755 0.00776 0.00798 0.0080 -.40 -.30 0.0084 0.00866 0.00889 0.0094 0.00939 0.00964 0.00990 0.007 0.0044 0.007 -.30 -.0 0.00 0.030 0.060 0.09 0.0 0.055 0.087 0.03 0.0355 0.0390 -.0 -.0 0.046 0.0463 0.0500 0.0539 0.0578 0.068 0.0659 0.0700 0.0743 0.0786 -.0 -.00 0.083 0.0876 0.093 0.0970 0.008 0.0068 0.08 0.069 0.0 0.075 -.00 -.90 0.0330 0.0385 0.044 0.0500 0.0559 0.069 0.0680 0.0743 0.0807 0.087 -.90 -.80 0.0938 0.03005 0.03074 0.0344 0.036 0.0388 0.0336 0.03438 0.0355 0.03593 -.80 -.70 0.03673 0.03754 0.03836 0.0390 0.04006 0.04093 0.048 0.047 0.04363 0.04457 -.70 -.60 0.0455 0.04648 0.04746 0.04846 0.04947 0.05050 0.0555 0.056 0.05370 0.05480 -.60 -.50 0.0559 0.05705 0.058 0.05938 0.06057 0.0678 0.0630 0.0646 0.0655 0.0668 -.50 -.40 0.068 0.06944 0.07078 0.075 0.07353 0.07493 0.07636 0.07780 0.0797 0.08076 -.40 -.30 0.086 0.08379 0.08534 0.0869 0.0885 0.090 0.0976 0.0934 0.0950 0.09680 -.30 -.0 0.09853 0.007 0.004 0.0383 0.0565 0.0749 0.0935 0.3 0.34 0.507 -.0 -.0 0.70 0.900 0.00 0.30 0.507 0.74 0.94 0.336 0.3350 0.3567 -.0 -.00 0.3786 0.4007 0.43 0.4457 0.4686 0.497 0.55 0.5386 0.565 0.5866 -.00-0.90 0.609 0.6354 0.660 0.6853 0.706 0.736 0.769 0.7879 0.84 0.8406-0.90-0.80 0.8673 0.8943 0.95 0.9489 0.9766 0.0045 0.037 0.06 0.0897 0.86-0.80-0.70 0.476 0.770 0.065 0.363 0.663 0.965 0.370 0.3576 0.3885 0.496-0.70-0.60 0.450 0.485 0.543 0.5463 0.5785 0.609 0.6435 0.6763 0.7093 0.745-0.60-0.50 0.7760 0.8096 0.8434 0.8774 0.96 0.9460 0.9806 0.3053 0.30503 0.30854-0.50-0.40 0.307 0.356 0.398 0.376 0.3636 0.3997 0.33360 0.3374 0.3409 0.34458-0.40-0.30 0.3487 0.3597 0.35569 0.3594 0.3637 0.36693 0.37070 0.37448 0.3788 0.3809-0.30-0.0 0.3859 0.38974 0.39358 0.39743 0.409 0.4057 0.40905 0.494 0.4683 0.4074-0.0-0.0 0.4465 0.4858 0.435 0.43644 0.44038 0.44433 0.4488 0.454 0.4560 0.4607-0.0-0.00 0.4644 0.468 0.470 0.47608 0.48006 0.48405 0.48803 0.490 0.4960 0.50000-0.00 Kig Saud Uiversity 83

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Stadard Normal Table (cotiued) Areas Uder the Stadard Normal Curve z 0.00 0.0 0.0 0.03 0.04 0.05 0.06 0.07 0.08 0.09 z 0.00 0.50000 0.50399 0.50798 0.597 0.5595 0.5994 0.539 0.5790 0.5388 0.53586 0.00 0.0 0.53983 0.54380 0.54776 0.557 0.55567 0.5596 0.56356 0.56749 0.574 0.57535 0.0 0.0 0.5796 0.5837 0.58706 0.59095 0.59483 0.5987 0.6057 0.6064 0.606 0.6409 0.0 0.30 0.679 0.67 0.655 0.6930 0.63307 0.63683 0.64058 0.6443 0.64803 0.6573 0.30 0.40 0.6554 0.6590 0.6676 0.66640 0.67003 0.67364 0.6774 0.6808 0.68439 0.68793 0.40 0.50 0.6946 0.69497 0.69847 0.7094 0.70540 0.70884 0.76 0.7566 0.7904 0.740 0.50 0.60 0.7575 0.7907 0.7337 0.73565 0.7389 0.745 0.74537 0.74857 0.7575 0.75490 0.60 0.70 0.75804 0.765 0.7644 0.76730 0.77035 0.77337 0.77637 0.77935 0.7830 0.7854 0.70 0.80 0.7884 0.7903 0.79389 0.79673 0.79955 0.8034 0.805 0.80785 0.8057 0.837 0.80 0.90 0.8594 0.8859 0.8 0.838 0.8639 0.8894 0.8347 0.83398 0.83646 0.8389 0.90.00 0.8434 0.84375 0.8464 0.84849 0.85083 0.8534 0.85543 0.85769 0.85993 0.864.00.0 0.86433 0.86650 0.86864 0.87076 0.8786 0.87493 0.87698 0.87900 0.8800 0.8898.0.0 0.88493 0.88686 0.88877 0.89065 0.895 0.89435 0.8967 0.89796 0.89973 0.9047.0.30 0.9030 0.90490 0.90658 0.9084 0.90988 0.949 0.9309 0.9466 0.96 0.9774.30.40 0.994 0.9073 0.90 0.9364 0.9507 0.9647 0.9785 0.99 0.93056 0.9389.40.50 0.9339 0.93448 0.93574 0.93699 0.938 0.93943 0.9406 0.9479 0.9495 0.94408.50.60 0.9450 0.94630 0.94738 0.94845 0.94950 0.95053 0.9554 0.9554 0.9535 0.95449.60.70 0.95543 0.95637 0.9578 0.9588 0.95907 0.95994 0.96080 0.9664 0.9646 0.9637.70.80 0.96407 0.96485 0.9656 0.96638 0.967 0.96784 0.96856 0.9696 0.96995 0.9706.80.90 0.978 0.9793 0.9757 0.9730 0.9738 0.9744 0.97500 0.97558 0.9765 0.97670.90.00 0.9775 0.97778 0.9783 0.9788 0.9793 0.9798 0.98030 0.98077 0.984 0.9869.00.0 0.984 0.9857 0.98300 0.9834 0.9838 0.984 0.9846 0.98500 0.98537 0.98574.0.0 0.9860 0.98645 0.98679 0.9873 0.98745 0.98778 0.98809 0.98840 0.98870 0.98899.0.30 0.9898 0.98956 0.98983 0.9900 0.99036 0.9906 0.99086 0.99 0.9934 0.9958.30.40 0.9980 0.990 0.994 0.9945 0.9966 0.9986 0.99305 0.9934 0.99343 0.9936.40.50 0.99379 0.99396 0.9943 0.99430 0.99446 0.9946 0.99477 0.9949 0.99506 0.9950.50.60 0.99534 0.99547 0.99560 0.99573 0.99585 0.99598 0.99609 0.996 0.9963 0.99643.60.70 0.99653 0.99664 0.99674 0.99683 0.99693 0.9970 0.997 0.9970 0.9978 0.99736.70.80 0.99744 0.9975 0.99760 0.99767 0.99774 0.9978 0.99788 0.99795 0.9980 0.99807.80.90 0.9983 0.9989 0.9985 0.9983 0.99836 0.9984 0.99846 0.9985 0.99856 0.9986.90 3.00 0.99865 0.99869 0.99874 0.99878 0.9988 0.99886 0.99889 0.99893 0.99896 0.9990 3.00 3.0 0.99903 0.99906 0.9990 0.9993 0.9996 0.9998 0.999 0.9994 0.9996 0.9999 3.0 3.0 0.9993 0.99934 0.99936 0.99938 0.99940 0.9994 0.99944 0.99946 0.99948 0.99950 3.0 3.30 0.9995 0.99953 0.99955 0.99957 0.99958 0.99960 0.9996 0.9996 0.99964 0.99965 3.30 3.40 0.99966 0.99968 0.99969 0.99970 0.9997 0.9997 0.99973 0.99974 0.99975 0.99976 3.40 3.50 0.99977 0.99978 0.99978 0.99979 0.99980 0.9998 0.9998 0.9998 0.99983 0.99983 3.50 Kig Saud Uiversity 84

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER 5: Probabilistic Features of the Distributios of Certai Sample Statistics 5. Itroductio: I this Chapter we will discuss the probability distributios of some statistics. As we metio earlier, a statistic is measure computed form the radom sample. As the sample values vary from sample to sample, the value of the statistic varies accordigly. A statistic is a radom variable; it has a probability distributio, a mea ad a variace. 5. Samplig Distributio: The probability distributio of a statistic is called the samplig distributio of that statistic. The samplig distributio of the statistic is used to make statistical iferece about the ukow parameter. 5.3 Distributio of the Sample Mea: (Samplig Distributio of the Sample Mea X ): Suppose that we have a populatio with mea µ ad variace σ. Suppose that X, X, K, X is a radom sample of size () selected radomly from this populatio. We kow that the sample mea is: X = i= Suppose that we select several radom samples of size =5. st sample d sample 3rd sample Last sample Sample values 8 30 34 34 7 3 0 3 40 8 4 3 5 7 3.... 7 3 9 3 30 Sample mea X 8.4 9.9 5.8 7.8 X i. Kig Saud Uiversity 85

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 - The value of the sample mea X varies from radom sample to aother. - The value of X is radom ad it depeds o the radom sample. - The sample mea X is a radom variable. - The probability distributio of X is called the samplig distributio of the sample mea X. - Questios: o What is the samplig distributio of the sample mea X? o What is the mea of the sample mea X? o What is the variace of the sample mea X? Some Results about Samplig Distributio of X : Result (): (mea & variace of X ) If X, X, K, X is a radom sample of size from ay distri butio with mea µ ad variace σ ; the:. The mea of X is: µ X = µ.. The variace of X is: σ σ X =. 3. The Stadard deviatio of X is call the stadard error ad is defied by: σ = σ = X X σ. Result (): (Samplig from ormal populatio) If X, X, K, X is a radom sample of size from a ormal populatio with mea µ ad variace σ ; that is Normal( µ,σ ), the the sample mea has a ormal distributio with mea µ ad variace σ /, that is:. X ~ Normal µ, σ. X µ Z σ /. = ~ Normal (0,). Kig Saud Uiversity 86

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 We use this result whe samplig from ormal distributio with kow variace σ. Result (3): (Cetral Limit Theorem: Samplig from Noormal populatio) Suppose that X, X, K, X is a radom sample of size from o-ormal populatio with mea µ ad variace σ. If the sample size is large ( 30), the the sample mea has approximately a ormal distributio with mea µ ad variace σ /, that is, σ. X Normal µ (approximately) X µ. Z = Normal (0,) (approximately) σ / Note: meas approximately distributed. We use this result whe samplig from o-ormal distributio with kow variace σ ad with large sample size. Result (4): (used whe σ is ukow + ormal distributio) If X, X, K, X is a radom sample of size from a ormal distributio with mea µ ad ukow variace σ ; that is Normal( µ,σ ), the the statistic: X µ T = S / has a t- distributio with ( ) degrees of freedom, where S is the sample stadard deviatio give by: i = ( X i X ) S = S = We write: X µ T = ~ t( ) S / Notatio: degrees of freedom = df = ν Kig Saud Uiversity 87

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 The t-distributio: (Sectio 6.3. pp 7-74) Studet's t distributio. t-distributio is a distributio of a cotiuous radom variable. Recall that, if X, X,, X is a radom sample of size from a ormal distributio with mea µ ad variace σ, i.e. N(µ,σ ), the X µ Z = ~N(0,) σ / We ca apply this result oly whe σ is kow! If σ is ukow, we replace the populatio variace σ ( X ) with the sample variace = = i X i S to have the followig statistic X µ T = S / Recall: If X, X,, X is a radom sample of size from a ormal distributio with mea µ ad variace σ, i.e. N(µ,σ ), the the statistic: X µ T = S / has a t-distributio with ( ) degrees of freedom ( df = ν = ), ad we write T~ t(ν) or T~ t( ). Note: t-distributio is a cotiuous distributio. The value of t radom variable rage from - to + (that is, - <t< ). The mea of t distributio is 0. It is symmetric about the mea 0. The shape of t-distributio is similar to the shape of the stadard ormal distributio. t-distributio Stadard ormal distributio as. Kig Saud Uiversity 88

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Notatio: (t α) t α = The t-value uder which we fid a area equal to α = The t-value that leaves a area of α to the left. The value t α satisfies: P(T< t α) = α. Sice the curve of the pdf of T~ t(ν) is symmetric about 0, we have t α = t α For example: t 0.35 = t 0.35 = t 0.65 t 0.8 = t 0.86 = t 0.4 Values of t α are tabulated i a special table for several values of α ad several values of degrees of freedom. (Table E, appedix p. A-40 i the textbook). Example: Fid the t-value with ν=4 (df) that leaves a area of: (a) 0.95 to the left. (b) 0.95 to the right. Solutio: ν = 4 (df); T~ t(4) (a) The t-value that leaves a area of 0.95 to the left is t 0.95 =.76. Kig Saud Uiversity 89

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 (b) The t-value that leaves a area of 0.95 to the right is t 0.05 = t 0.05 = t 0.95 =.76 Note: Some t-tables cotai values of α that are greater tha or equal to 0.90. Whe we search for small values of α i these tables, we may use the fact that: t α = t α Example: For ν = 0 degrees of freedom (df), fid t 0.93 ad t 0.07. Solutio: t 0.93 = (.37+.8)/ =.59 (from the table) t 0.07 = t 0.07 = t 0.93 =.59 (usig the rule: t α = t α) Kig Saud Uiversity 90

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Critical Values of the t-distributio (t α ) ν=df t 0.90 t 0.95 t 0.975 t 0.99 t 0.995 3.078 6.34.706 3.8 63.657.886.90 4.303 6.965 9.95 3.638.353 3.8 4.54 5.84 4.533.3.776 3.747 4.604 5.476.05.57 3.365 4.03 6.440.943.447 3.43 3.707 7.45.895.365.998 3.499 8.397.860.306.896 3.355 9.383.833.6.8 3.50 0.37.8.8.764 3.69.363.796.0.78 3.06.356.78.79.68 3.055 3.350.77.60.650 3.0 4.345.76.45.64.977 5.34.753.3.60.947 6.337.746.0.583.9 7.333.740.0.567.898 8.330.734.0.55.878 9.38.79.093.539.86 0.35.75.086.58.845.33.7.080.58.83.3.77.074.508.89 3.39.74.069.500.807 4.38.7.064.49.797 5.36.708.060.485.787 6.35.706.056.479.779 7.34.703.05.473.77 8.33.70.048.467.763 9.3.699.045.46.756 30.30.697.04.457.750 35.306.6896.030.4377.738 40.3030.6840.00.430.7040 45.3006.6794.04.4.6896 50.987.6759.0086.4033.6778 60.958.6706.0003.390.6603 70.938.6669.9944.3808.6479 80.9.664.990.3739.6387 90.90.660.9867.3685.636 00.90.660.9840.364.659 0.886.6577.9799.3578.674 40.876.6558.977.3533.64 60.869.6544.9749.3499.6069 80.863.6534.973.347.6034 00.858.655.979.345.6006.8.645.960.36.576 Kig Saud Uiversity 9

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Applicatio: Example: (Samplig distributio of the sample mea) Suppose that the time duratio of a mior surgery is approximately ormally distributed with mea equal to 800 secods ad a stadard deviatio of 40 secods. Fid the probability that a radom sample of 6 surgeries will have average time duratio of less tha 775 secods. Solutio: X= the duratio of the surgery µ=800, σ=40, σ = 600 X~N(800, 600) Sample size: =6 Calculatig mea, variace, ad stadard error (stadard deviatio) of the sample mea X : Mea of X : µ X = µ =800 σ 600 Variace of X : σ = = 00 X = 6 Stadard error (stadard deviatio) of σ 40 X : σ = = = 0 X 6 Usig the cetral limit theorem, X has a ormal distributio with mea µ = 800 ad variace σ = 00, that is: X σ X ~ N(µ, X )=N(800,00) X µ X 800 Z = = ~N(0,) σ / 0 The probability that a radom sample of 6 surgeries will have a average time duratio that is less tha 775 secods equals to: X µ 775 µ X 800 775 800 P( X < 775) = P < = P < σ / σ / 0 0 775 800 = P Z < = P ( Z <.50) = 0.006 0 Kig Saud Uiversity 9

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: If the mea ad stadard deviatio of serum iro values for healthy mea are 0 ad 5 microgram/00ml, respectively, what is the probability that a radom sample of size 50 ormal me will yield a mea betwee 5 ad 5 microgram/00ml? Solutio: X= the serum iro value µ=0, σ=5, σ = 5 X~N(0, 5) Sample size: =50 Calculatig mea, variace, ad stadard error (stadard deviatio) of the sample mea X : Mea of X : µ X = µ =0 σ 5 Variace of X : σ = = 4. 5 X = 50 Stadard error (stadard deviatio) of σ 5 X : σ = = =. X 50 Usig the cetral limit theorem, X has a ormal distributio with mea µ = 0 ad variace σ = 4. 5, that is: X σ X ~ N(µ, X )=N(0,4.5) X µ 0 Z = = X ~N(0,) σ /. The probability that a radom sample of 50 me will yield a mea betwee 5 ad 5 microgram/00ml equals to: 5 µ X µ 5 µ P(5 < X < 5) = P < < σ / σ / σ / Kig Saud Uiversity 93

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 5 0 X µ 5 0 = P < < = P.36 < Z <.36. σ /. Z <.36 P Z <.36 P ( ) ( ) = - = 0.9909 0.009 = 0.988 ( ) 5.4 Distributio of the Differece Betwee Two Sample Meas ( X X ): Suppose that we have two populatios: -st populatio with mea µ ad variace σ -d populatio with mea µ ad variace σ We are iterested i comparig µ ad µ, or equivaletly, makig ifereces about the differece betwee the meas (µ µ ). We idepedetly select a radom sample of size from the -st populatio ad aother radom sample of size from the -d populatio: Let X ad S be the sample mea ad the sample variace of the -st sample. Let X ad S be the sample mea ad the sample variace of the -d sample. The samplig distributio of X X is used to make ifereces about µ µ. Kig Saud Uiversity 94

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 The samplig distributio of X X : Result: The mea, the variace ad the stadard deviatio of X X are: Mea of X X is: µ = µ µ X X Variace of X X is: σ σ σ = + X X Stadard error (stadard) deviatio of X X is: σ σ = σ X X = X X σ + Result: If the two radom samples were selected from ormal distributios (or o-ormal distributios with large sample σ σ sizes) with kow variaces ad, the the differece betwee the sample meas ( X ) has a ormal distributio with mea ( µ X ( σ / ) + ( σ / µ ) ad variace ( ) ), that is: σ X X ~ N µ µ, ( X X ) ( ) σ + µ µ Z = ~ N(0,) σ σ Applicatio: + Example: Suppose it has bee established that for a certai type of cliet (type A) the average legth of a home visit by a public health urse is 45 miutes with stadard deviatio of 5 miutes, ad that for secod type (type B) of cliet the average home visit is 30 miutes log with stadard deviatio of 0 miutes. If a urse radomly visits 35 cliets from the first type ad 40 Kig Saud Uiversity 95

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 cliets from the secod type, what is the probability that the average legth of home visit of first type will be greater tha the average legth of home visit of secod type by 0 or more miutes? Solutio: For the first type: µ = 45 σ = 5 σ 5 = = 35 For the secod type: µ = 30 σ = 0 σ 400 = = 40 The mea, the variace ad the stadard deviatio of X X are: Mea of X X is: µ X = µ µ X = 45 30 = 5 Variace of X X is: σ σ 5 400 σ = + = + = 6.486 X X 35 40 Stadard error (stadard) deviatio of X X is: σ X X The samplig distributio of X = σ X X = X X is: ~ 5, 6.486 X N ( ) 6.486 = 4.053 ( X X ) 5 Z = ~ N(0,) 6.486 The probability that the average legth of home visit of first type will be greater tha the average legth of home visit of secod type by 0 or more miutes is: Kig Saud Uiversity 96

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P( X ( X X > 0) = P = > 0 5 Z 4. 053 X σ ) ( µ µ ) σ + 0 ( µ µ ) > σ σ + P = P(Z>.3) = P(Z<.3) = 0.8907 = 0.093 5.5 Distributio of the Sample Proportio ( pˆ ): For the populatio: N( A)= umber of elemets i the populatio with a specified characteristic A N = total umber of elemets i the populatio (populatio size) The populatio proportio is N( A) p = (p is a parameter) N For the sample: ( A)= umber of elemets i the sample with the same characteristic A = sample size The sample proportio is ( A) p ˆ = ( pˆ is a statistic) The samplig distributio of pˆ is used to make ifereces Kig Saud Uiversity 97

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 about p. Result: The mea of the sample proportio ( pˆ ) is the populatio proportio (p); that is: µ p ˆ = p The variace of the sample proportio ( pˆ ) is: p( p) pq σ p ˆ = =. (where q= p) The stadard error (stadard deviatio) of the sample proportio ( pˆ ) is: ( p) p pq σ p ˆ = = Result: For large sample size ( 30, p > 5, q > 5 ), the sample proportio ( pˆ ) has approximately a ormal distributio with mea µ p ad a variace σ ˆ = pq, that is: p ˆ = p / pq p ˆ ~ N p, (approximately) pˆ p Z = ~ N(0,) pq (approximately) Example: Suppose that 45% of the patiets visitig a certai cliic are females. If a sample of 35 patiets was selected at radom, fid the probability that:. the proportio of females i the sample will be greater tha 0.4.. the proportio of females i the sample will be betwee 0.4 ad 0.5. Solutio:. = 35 (large) p = The populatio proportio of females = 45 = 0.45 00 Kig Saud Uiversity 98

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 pˆ = The sample proportio (proportio of females i the sample) The mea of the sample proportio ( pˆ ) is p = 0.45 The variace of the sample proportio ( pˆ ) is: p( p) pq 0.45( 0.45) = = = 0.007. 35 The stadard error (stadard deviatio) of the sample proportio ( pˆ ) is: ( p) p = 0.007 =0.084 30, p = 35 0.45 = 5.75 > 5, q = 35 0.55 = 9.5 > 5. The probability that the sample proportio of females ( pˆ ) will be greater tha 0.4 is: P( pˆ > 0.4) = P( pˆ < 0.4) = P = -P 0.4 0.45 pˆ p p 0.4 p < ( ) ( ) p p p Z < ( ) = - P ( Z < 0.59) 0.45 0.45 35 = 0.776 = 0.74. The probability that the sample proportio of females ( pˆ ) will be betwee 0.4 ad 0.5 is: P(0.4 < pˆ < 0.5) = P( pˆ < 0.5) P( pˆ < 0.4) = P = P pˆ p p 0.5 p < ( ) ( ) 0.776 p p p 0.5 0.45 Z < ( ) 0.776 0.45 0.45 35 Kig Saud Uiversity 99

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 = P ( Z < 0.59) 0.776 = 0.74 0.776 = 0.4448 5.6 Distributio of the Differece Betwee Two Sample Proportios ( pˆ pˆ ): idepedet Suppose that we have two populatios: p = proportio of elemets of type (A) i the -st populatio. p = proportio of elemets of type (A) i the -d populatio. We are iterested i comparig p ad p, or equivaletly, makig ifereces about p p. We idepedetly select a radom sample of size from the -st populatio ad aother radom sample of size from the -d populatio: Let X = o. of elemets of type (A) i the -st sample. Let X = o. of elemets of type (A) i the -d sample. ˆp = X = sample proportio of the -st sample Kig Saud Uiversity 00

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 X ˆp = = sample proportio of the -d sample The samplig distributio of pˆ pˆ is used to make ifereces about p p. The samplig distributio of pˆ ˆ p : Result: The mea, the variace ad the stadard error (stadard deviatio) of pˆ pˆ are: Mea of pˆ p is: Variace of ˆ µ pˆ p pˆ pˆ = p p is: ˆ σ p ˆ ˆ p = p q p q + Stadard error (stadard deviatio) of pˆ pˆ is: q p ad σ p ˆ p ˆ = = q = p p q p q + Result: For large samples sizes ( 30, 30, p > 5, q > 5, p > 5, q > 5 ), we have that pˆ pˆ has approximately ormal distributio with mea µ p pˆ = p ad variace σ p q p q p ˆ = +, that is: ˆ p ˆ p p q p q p ˆ + pˆ ~ N p p, (Approximately) ( pˆ pˆ ) ( p p ) Z = p q p q ~ N(0,) (Approximately) + Kig Saud Uiversity 0

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: Suppose that 40% of No-Saudi residets have medical isurace ad 30% of Saudi residets have medical isurace i a certai city. We have radomly ad idepedetly selected a sample of 30 No-Saudi residets ad aother sample of 0 Saudi residets. What is the probability that the differece betwee the sample proportios, pˆ ˆ p, will be betwee 0.05 ad 0.? Solutio: p = populatio proportio of o-saudi with medical isurace. p = populatio proportio of Saudi with medical isurace. ˆp = sample proportio of o-saudis with medical isurace. ˆp = sample proportio of Saudis with medical isurace. p = 0.4 =30 p = 0.3 =0 µ p pˆ = p =0.4-0.3 = 0. ˆ p σ p ˆ ˆ p = p q p q + = (0.4)(0.6) 30 p σ q p q pˆ pˆ = + = 0. 0036 (0.3)(0.7) 0 + = 0.0036 = 0.06 The probability that the differece betwee the sample proportios, pˆ p, will be betwee 0.05 ad 0. is: ˆ P(0.05 < pˆ pˆ <0.) = P( pˆ pˆ <0.) P( pˆ pˆ <0.05) = P ( pˆ pˆ ) ( p p ) 0. ( p p ) < p q p q p q p q + + Kig Saud Uiversity 0

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 P ( pˆ pˆ ) ( p p ) 0.05 ( p p ) < p q p q p q p q + + = P Z < - P Z < 0. 0. 0.06 0.05 0. 0.06 Z < 0.83 = P ( Z <.67) - P ( ) = 0.955 0.033 = 0.748 Kig Saud Uiversity 03

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 CHAPTER 6: Usig Sample Data to Make Estimatios About Populatio Parameters 6. Itroductio: Statistical Ifereces: (Estimatio ad Hypotheses Testig) It is the procedure by which we reach a coclusio about a populatio o the basis of the iformatio cotaied i a sample draw from that populatio. There are two mai purposes of statistics; Descriptive Statistics: (Chapter & ): Orgaizatio & summarizatio of the data Statistical Iferece: (Chapter 6 ad 7): Aswerig research questios about some ukow populatio parameters. () Estimatio: (chapter 6) Approximatig (or estimatig) the actual values of the ukow parameters: - Poit Estimate: A poit estimate is sigle value used to estimate the correspodig populatio parameter. - Iterval Estimate (or Cofidece Iterval): A iterval estimate cosists of two umerical values defiig a rage of values that most likely icludes the parameter beig estimated with a specified degree of cofidece. () Hypothesis Testig: (chapter 7) Aswerig research questios about the ukow parameters of the populatio (cofirmig or deyig some cojectures or statemets about the ukow parameters). Kig Saud Uiversity 04

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 6. Cofidece Iterval for a Populatio Mea (µ) : I this sectio we are iterested i estimatig the mea of a certai populatio ( µ ). Populatio: Populatio Size = N Populatio Values: X X,, Populatio Mea: µ Populatio Variace: σ, K N i= = N X N i i= = X N ( X µ ) i N Sample: Sample Size = Sample values: Sample Mea: x x,, X Sample Variace:, K i= = S = x i i= x ( x x) i (i) Poit Estimatio of µ: A poit estimate of the mea is a sigle umber used to estimate (or approximate) the true value of µ. - Draw a radom sample of size from the populatio: - - Compute the sample mea: Result: The sample mea populatio mea ( µ ). X = x i i= x x,,, K X = x x i i= is a "good" poit estimator of the Kig Saud Uiversity 05

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 (ii) Cofidece Iterval (Iterval Estimate) of µ: A iterval estimate of µ is a iterval (L,U) cotaiig the true value of µ "with a probability of α ". * α = is called the cofidece coefficiet (level) * L = lower limit of the cofidece iterval * U = upper limit of the cofidece iterval Result: (For the case whe σ is kow) (a) If X, X K, X is a radom sample of size from a ormal distributio with mea µ ad kow variace σ, the: α 00% cofidece iterval for µ is: A ( ) X Z X Z α X ± Z X ± Z σ σ α, α < µ < σ σ X X + Z α X + Z α α σ σ (b) If X, X K, X is a radom sample of size from a o- ormal distributio with mea µ ad kow variace σ, ad if the sample size is large ( 30), the: α 00% cofidece iterval for µ is: A approximate ( ) X Z X Z X ± Z α X ± Z σ σ α, α < µ < σ σ X X + Z α X + Z α α σ σ Kig Saud Uiversity 06

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Note that:. We are ( α ) 00% cofidet that the true value of µ belogs σ σ. to the iterval ( X Z, X + Z ) α α. Upper limit of the cofidece iterval = 3. Lower limit of the cofidece iterval = 4. Z = Reliability Coefficiet 5. α Z α σ X + Z X Z α α σ σ = margi of error = precisio of the estimate 6. I geeral the iterval estimate (cofidece iterval) may be expressed as follows: X ± Z σ α estimator ± (reliability coefficiet) (stadard Error) estimator ± margi of error 6.3 The t Distributio: (Cofidece Iterval Usig t) We have already itroduced ad discussed the t distributio. Result: (For the case whe σ is ukow + ormal populatio) If X, X K, X is a radom sample of size from a ormal distri butio with mea µ ad ukow variace σ, the: α 00% cofidece iterval for µ is: A ( ) X t α X ± t X ± t S α α ˆ σ S X X + t, X α S Kig Saud Uiversity 07

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 where the degrees of freedom is: df = ν = -. Note that: α 00% cofidet that the true value of µ belogs. We are ( ) S S to the iterval X t α, X + t α.. σˆ = S X (estimate of the stadard error of X ) 3. t = Reliability Coefficiet α 4. I this case, we replace σ by S ad Z by t. 5. I geeral the iterval estimate (cofidece iterval) may be expressed as follows: Estimator ± (Reliability Coefficiet) (Estimate of the Stadard Error) X ± t ˆ σ α Notes: (Fidig Reliability Coefficiet) () We fid the reliability coefficiet Z from the Z-table as follows: X α () We fid the reliability coefficiet follows: (df = ν = -) t from the t-table as α Kig Saud Uiversity 08

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: Suppose that Z ~ N(0,). Fid Z α () α =0. () α =0.05 (3) α =0.0 Solutio: () For α =0.: 0. α = = 0.95 () For α =0.05: 0.05 α = = 0.975 (3) For α =0.0: 0.0 α = = 0.995 for the followig cases: Z = Z 0.95 =.645 α Z = Z 0.975 =.96. α Z = Z 0.995 =.575. α Example: Suppose that t ~ t(30). Fid t for α =0.05. 0.05 = = 0.975 α Solutio: df = ν = 30 α t α = t0.975 =. 043 Kig Saud Uiversity 09

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Example: (The case where σ is kow) Diabetic ketoacidosis is a potetial fatal complicatio of diabetes mellitus throughout the world ad is characterized i part by very high blood glucose levels. I a study o 3 patiets livig i Saudi Arabia of age 5 or more who were admitted for diabetic ketoacidosis, the mea blood glucose level was 6. mmol/l. Suppose that the blood glucose levels for such patiets have a ormal distributio with a stadard deviatio of 3.3 mmol/l. () Fid a poit estimate for the mea blood glucose level of such diabetic ketoacidosis patiets. () Fid a 90% cofidece iterval for the mea blood glucose level of such diabetic ketoacidosis patiets. Solutio: Variable = X = blood glucose level (quatitative variable). Populatio = diabetic ketoacidosis patiets i Saudi Arabia of age 5 or more. Parameter of iterest is: µ = the mea blood glucose level. Distributio is ormal with stadard deviatio σ = 3. 3. σ is kow ( σ = 0.89) X ~ Normal( µ, 0.89) µ =?? (ukow- we eed to estimate µ ) Sample size: = 3 (large) Sample mea: X = 6. () Poit Estimatio: We eed to fid a poit estimate for µ. X = 6. is a poit estimate for µ. µ 6. () Iterval Estimatio (Cofidece Iterval = C. I.): We eed to fid 90% C. I. for µ. 90% = ( α ) 00% α α = 0.9 α = 0. 05 = 0. α = 0. 95 The reliability coefficiet is: = Z =. 645 Z α 90% cofidece iterval for µ is: 0.95 Kig Saud Uiversity 0

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 σ σ X Z α, X + Z α 3.3 3.3 6. (.645), 6. + (.645) 3 3 ( 6. 0.489474, 6. + 0.489474) ( 5.7059, 6.68947) We are 90% cofidet that the true value of the mea µ lies i the iterval ( 5.7, 6.69), that is: 5.7 < µ < 6.69 Note: for this example eve if the distributio is ot ormal, we may use the same solutio because the sample size =3 is large. Example: (The case where σ is ukow) A study was coducted to study the age characteristics of Saudi wome havig breast lump. A sample of Saudi wome gave a mea of 37 years with a stadard deviatio of 0 years. Assume that the ages of Saudi wome havig breast lumps are ormally distributed. (a) Fid a poit estimate for the mea age of Saudi wome havig breast lumps. (b) Costruct a 99% cofidece iterval for the mea age of Saudi wome havig breast lumps Solutio: X = Variable = age of Saudi wome havig breast lumps (quatitative variable). Populatio = All Saudi wome havig breast lumps. Parameter of iterest is: µ = the age mea of Saudi wome havig breast lumps. X ~ Normal( µ, σ ) µ =?? (ukow- we eed to estimate µ ) σ =?? (ukow) Sample size: = Sample mea: X = 37 Kig Saud Uiversity

Biostatistics - STAT 45 Departmet of Statistics Summer Semester 43/43 Sample stadard deviatio: S = 0 Degrees of freedom: df =ν = = 0 (a) Poit Estimatio: We eed to fid a poit estimate for µ. X = 37 is a "good" poit estimate for µ. µ 37 years (b) Iterval Estimatio (Cofidece Iterval = C. I.): We eed to fid 99% C. I. for µ. 99% = ( α ) 00% α α = 0.99 α = 0.0 005 = 0. ν = df = 0 The reliability coefficiet is: t α = t0.995 =. 67 α = 0. 995 99% cofidece iterval for µ is: Aother Way: 37 X ± t α 37 ± (.67) S 37 ±.38 0 ( 37.38, 37 +.38) ( 34.6, 39.38) X t α S 0 + t, X α S 0 (.67), 37 + (.67) 37.38, 37 +.38 ( ) Kig Saud Uiversity