VTU Edusat Programme 16

Similar documents
Engineering Mathematics IV(15MAT41) Module-V : SAMPLING THEORY and STOCHASTIC PROCESS

COVENANT UNIVERSITY NIGERIA TUTORIAL KIT OMEGA SEMESTER PROGRAMME: ECONOMICS

UNIT NUMBER PROBABILITY 6 (Statistics for the binomial distribution) A.J.Hobson

CH.9 Tests of Hypotheses for a Single Sample

STA301- Statistics and Probability Solved Subjective From Final term Papers. STA301- Statistics and Probability Final Term Examination - Spring 2012

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

success and failure independent from one trial to the next?

Inferential statistics

F79SM STATISTICAL METHODS

6.4 Type I and Type II Errors

QUANTITATIVE TECHNIQUES

Discrete and continuous

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Statistical Inference

Hypothesis Testing. ) the hypothesis that suggests no change from previous experience

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Hypothesis tests

Chapter 18 Sampling Distribution Models

Introduction to Statistical Inference

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Chapter 6 Continuous Probability Distributions

CENTRAL LIMIT THEOREM (CLT)

Practice Problems Section Problems

Introductory Econometrics. Review of statistics (Part II: Inference)

Chapter 5: HYPOTHESIS TESTING

MTH U481 : SPRING 2009: PRACTICE PROBLEMS FOR FINAL

CHAPTER 10 Comparing Two Populations or Groups

CHAPTER 10 Comparing Two Populations or Groups

Example. χ 2 = Continued on the next page. All cells

The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

4 Hypothesis testing. 4.1 Types of hypothesis and types of error 4 HYPOTHESIS TESTING 49

Find the value of n in order for the player to get an expected return of 9 counters per roll.

Section VII. Chi-square test for comparing proportions and frequencies. F test for means

Discrete Random Variable Practice

Chapter 7: Hypothesis Testing

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

POLI 443 Applied Political Research

Design of Engineering Experiments

A Brief Review of Probability, Bayesian Statistics, and Information Theory

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

POLI 443 Applied Political Research

The Central Limit Theorem

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Exam 2 Practice Questions, 18.05, Spring 2014

MA : Introductory Probability

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Sampling, Confidence Interval and Hypothesis Testing

Sociology 6Z03 Review II

4. Suppose that we roll two die and let X be equal to the maximum of the two rolls. Find P (X {1, 3, 5}) and draw the PMF for X.

0 0'0 2S ~~ Employment category

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Chapter 12: Inference about One Population

Evaluating Hypotheses

Advanced Herd Management Probabilities and distributions

7 Estimation. 7.1 Population and Sample (P.91-92)

Section 9.4. Notation. Requirements. Definition. Inferences About Two Means (Matched Pairs) Examples

INSTITUTE OF ACTUARIES OF INDIA

Background to Statistics

MATH 240. Chapter 8 Outlines of Hypothesis Tests

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Frequentist Statistics and Hypothesis Testing Spring

CONTINUOUS RANDOM VARIABLES

Probability and Probability Distributions. Dr. Mohammed Alahmed

S2 QUESTIONS TAKEN FROM JANUARY 2006, JANUARY 2007, JANUARY 2008, JANUARY 2009

Comparison of Bayesian and Frequentist Inference

Lecture 1: Probability Fundamentals

CS 543 Page 1 John E. Boon, Jr.

DISCRETE VARIABLE PROBLEMS ONLY

HYPOTHESIS TESTING. Hypothesis Testing

Relax and good luck! STP 231 Example EXAM #2. Instructor: Ela Jackiewicz

C A R I B B E A N E X A M I N A T I O N S C O U N C I L REPORT ON CANDIDATES WORK IN THE CARIBBEAN ADVANCED PROFICIENCY EXAMINATION MAY/JUNE 2009

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

STAT 515 fa 2016 Lec Statistical inference - hypothesis testing

INTERVAL ESTIMATION AND HYPOTHESES TESTING

MAT 2377C FINAL EXAM PRACTICE

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

Fundamental Probability and Statistics

LECTURE 12 CONFIDENCE INTERVAL AND HYPOTHESIS TESTING

Statistical Inference. Hypothesis Testing

GEOMETRIC -discrete A discrete random variable R counts number of times needed before an event occurs

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

Math 2000 Practice Final Exam: Homework problems to review. Problem numbers

STATISTICS AND BUSINESS MATHEMATICS B.com-1 Regular Annual Examination 2015

12.10 (STUDENT CD-ROM TOPIC) CHI-SQUARE GOODNESS- OF-FIT TESTS

the amount of the data corresponding to the subinterval the width of the subinterval e x2 to the left by 5 units results in another PDF g(x) = 1 π

Probability & Statistics - FALL 2008 FINAL EXAM

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Suppose that you have three coins. Coin A is fair, coin B shows heads with probability 0.6 and coin C shows heads with probability 0.8.

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Inference and Regression

CHAPTER 9, 10. Similar to a courtroom trial. In trying a person for a crime, the jury needs to decide between one of two possibilities:

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

The Purpose of Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

UNIT 4 MATHEMATICAL METHODS SAMPLE REFERENCE MATERIALS

Transcription:

VTU Edusat Programme 16 Subject : Engineering Mathematics Sub Code: 10MAT41 UNIT 8: Sampling Theory Dr. K.S.Basavarajappa Professor & Head Department of Mathematics Bapuji Institute of Engineering and of Technology Davangere-577004 Email: ksbraju@hotmail.com

Statistical Inference: It is necessary to draw some valid and reasonable conclusions concerning a large mass of individuals or things. Every individual or the entire group is known as population. Small part of this population is known as a sample. The process of drawing some valid and reasonable conclusion about the entire population is Statistical Inference. Random sampling: A large collection of individuals or attributed or numerical data can be understood as population or universe. A finite subset of the universe is called a sample. The number of individuals in a sample is called a Sample Size (n). Sampling distribution: For every sample size (n) we can compute quantities like mean, median, standard deviation etc., obviously these will not be the same. Suppose we group these characteristics according to their frequencies, the frequency distributions so generated are called Sampling Distributions. The sampling distribution of large samples are assumed to be a normal distribution. The standard deviation of a sampling distribution is also called as the standard error (SE). Testing of Hypothesis

Making certain assumption to arrive at a decision regarding the population a sample population will be referred to as hypothesis The hypothesis formulated for the purpose of its rejection under the assumption that the true is called as the null hypothesis denoted as H 0. Errors In a test process there can be four possible situations lead to the two types of errors and same is tabulated as follows: Accepting the Rejecting the hypothesis hypothesis Hypothesis is true Correct decision Wrong decision Type I error Hypothesis is false Wrong decision Correct decision Type II error In order to minimize both these types of errors we need to increase the sample size. Significance level: The probability level below which leads to the hypothesis is known as the significance level. This probability is conventionally fixed at 0.05 or 0.01 i.e., 5% or 1% Therefore rejecting hypothesis at 1% level of significance, implies that at 5% level of significance, there may be errors of either types (Type I or II) is 0.05.

TESTS OF SIGNIFICANCE AND CONFIDENCE INTERVALS The process which helps us to decide about the acceptance or rejection of the hypothesis is called as the test of significance. Suppose that we have a normal population with mean μ and S D as. If x is the sample mean of a random sample size (n), the quantity t defined by = is called as the standard normal variate (SNV) whose x = 0, σ = 1 From the table of the normal areas, we find that 95% of the area lies between t = -1.96 and t = 1.96 Further 5% level of significance is denoted by t 0.05, therefore, 1.96 1.96 1.96 x x + (3) 1.96 and x. + (1) 1.96 (2) 1.96. Similarly from the table of the normal areas 99% of the area lies between

-2.58 and 2.58. This is equivalent to the form, (4). +. Therefore representation (3) is that 95% confidence interval and Representation (3) is the 99% confidence level. Graph: Tests of significance for large samples: Let N be the large sample having n members. Let p and q denote number of success and failure respectively, then p+ q = 1. By binomial distribution, N (p + q) n denotes the frequencies of samples. Therefore N (p + q) n denotes the sampling distribution of the number of successes in the sample. We know that by binomial distribution = and = then,

Mean proportion of success = = p S.D.(or S.E) proportion of success = = Let x be the observed number of successes in a sample size (n) and =. The standard normal variate Z is defined as, Z = = If Z 2.58, we conclude that the differences is highly significant and reject the hypothesis. Then p ± 2.58 be the probable limits of Z. p 2.58 p + 2.58 For a normal distribution, only 5% of members lie outside μ ± 1.96 σ while only 1% of the members lie outside μ ± 2.58 σ If x be the observed number of successes in the sample and Z is the standard normal variate the Z = = We have the following test of significance If Z < 1.96, difference between the observed and expected number of successes is not significant. If Z > 1.96 difference is significant at 5% level of significance. If Z > 2.58, difference is significant at 1% level of significance.

Example: A coin is tossed 1000 times and it turns up head 540 times, decide on the hypothesis is un biased. Solution: Let us suppose that the coin is unbiased P = probability of getting a head in one toss = 1/2 Since p + q = 1, q = Expected number of heads in 1000 tosses = np h = 540 = h = 540 500 = 40 = = 40 1000 = 2.53 < 2.58 99% Example: h = 1000 = 500 = 2.53 < 2.58 A survey was conducted in one locality of 2000 families by selecting a sample size 800. It was revealed that 180 families were illiterates. Find the probable limits of the literate families in a population of 2000.

Solution: Probability of illiterate families = = = 0.225 Also = 1 = 1 0.225 = 0.775 Probability limits of illiterate families = ± 2.58 = 0.225 ± 2.58 0.2250.775 800 = 0.187 0.263 Therefore Probable limits of illiterate families in a sample of 2000 is = 0.1872000 0.2632000 = 374 and 526 Example: A die was thrown 9000 times and a throw of 5 or 6 was obtained 3240 times. On the assumption of random throwing, do the data indicate an unbiased die. Solution: Suppose the die is unbiased then Probability of throwing 5 or 6 with one die = p(5) or p(6) = p(5) + p(6) = (1/6 ) + (1/6) = 1/3 q = 1-p = 1- (1/3) = 2/3

Then expected number of successes np = 1 9000 = 3000 3 = μ say But the observed value of successes = 3240 Excess of observed value of successes = x np = 3240 3000 = 240 Here n = 9000, p = 1, q =, np = 3000 3 S. D = npq = 9000 1 3 2 = 44.72 3 Z SNV = x np npq = 240 = 5.36 5.4 > 2.58 44.72 Highly Signiicant hypotheses is to be rejected at 1% level of Signiicance die is biased. Example: A biased dice is tossed 500 times a particular appears120 times. Find the 95% confidence limit of obtaining the value. Also find the standard error of proportion of success (Use binomial distribution). Solution: Let p = = 0.24 then q = 0.76, n = 500. Standard error = 9.55 Then mean proportion of success = np/n = p = 0.24 and

mean proportion of S. E = npq /n= 0.019 then 95% confidence interval for proportion of success is n0.203 np n0.277 5000.203 np 5000.277 101 np 138 The interval is [101, 138 ]. We say that with 95% confidence that out of 500 times always we get particular number between 101 and 138 times. Degrees of freedom (d.f ) It is the number of values in a set which may be set arbitrarily. d.f = n -1 for n number of observations d.f = n -2 for n -1 number of observations d.f = n -3 for n - 2 number of observations etc. Ex: for 25 observations we have 24 d.f Student s t distribution It is to test the significance of a sample mean for a normal population where the population S is not known. It is given by = where = =, =,

= 1 1 We need to test the hypothesis, whether the sample mean differs significantly from the population mean. If the calculated value of t i.e. is greater than the table value of t say t 0.05, we say that the difference between and is significant at 5% level. If > t 0.01, the difference is significant at 1% level. Note: 95% confidence limits for the population mean. Is ± Example: A machine is expected to produce nails of length 3 inches. A random sample of 25 nails gave an average length of 3.1 inches with standard deviation 0.3 can it be said that the machine is producing nail as per the specification.(value of students t 0.05 for 24 d.f is 2.064 ) Solution: Given = 3, = 3.1, n = 25, s = 0.3 = = 1.67 < 2.064 The hypothesis that the machine is producing nails as per speciication is accepted at 5% level of signiicance.

Example: Ten individuals are chosen at random from a population and their heights in inches are found to be 63, 63, 66, 67, 68, 69, 70, 71,71, test the hypothesis that the mean height of the universe is 66 inches (value of t 0.05 = 2.262 for 9 d.f). Solution: We have = 66, n = 10, d.f = 9 = = = 67.8 = = 63 67.8 + + 71 67.8 =9.067 S = 3.011 We have = the problem) =.. = 1.89 < 2.262 (given in The hypothesis is accepted at 5% level of significance. Example: Eleven school boys were given a test in drawing. They were given a month s further tution and a second test of equal difficulty was healed at the end of it do the marks give evidence that the students have benefitted by extra coaching (t 0.05 for d.f = 10) = 2.228

Boys 1 2 3 4 5 6 7 8 9 10 11 Marks 23 20 19 21 18 20 18 17 23 16 19 test 1 Marks 24 19 22 18 20 22 20 20 23 20 17 test 2 Chi-Square distribution: ( ) It provides a measure of correspondence between the Theoretical frequencies and observed frequencies Let O i ( i = 1, 2,.. n ) observed frequencies E i ( i = 1, 2,.. n ) estimated frequencies The quantity (chi square) distribution is defined as = ; degrees of freedom = n-1 Chi square test as a test of goodness of fit: test helps us to test the goodness of fit of the distributions such as Binomial, Poisson and Normal distributions. If the calculated value of is less than the table value of at a specified level of significance, the hypothesis is accepted. Otherwise the hypothesis is rejected.

Example: A die is thrown 264 times and the number appearing on the face (x) follows the following frequency distribution x 1 2 3 4 5 6 f 40 32 28 58 54 60 Calculate the value of Solution: Frequencies in the given table are the observed frequencies. Assuming that the die is unbiased the expected number of frequencies for the numbers 1, 2, 3,4,5,6 to appear on the face is 264/6 = 44 each Then the data is as follows No. on the die Observed frequency(o i ) Expected frequency(e i ) = = 22 1 2 3 4 5 6 40 32 28 58 54 60 44 44 44 44 44 44

Example: Five dice were thrown 96 times and the numbers 1 or 2 or 3 appearing on the face of the die follows the following frequency distribution No. of dice showing 1 or 2 or 3 5 4 3 2 1 0 Frequency 7 19 35 24 8 3 Test the hypothesis that the data follows a binomial distribution. Solution: Probability of a single die throwing 1 or 2 or 3 is P = 1/6+1/6+1/6 = ½ q = ½ Binomial distribution to fit the data, 96 Nq + p = 96 1 2 + 1 2 =96,96 5 New table of values are

No. on the die 12 or 3 Observed frequency(o i ) Expected frequency(e i ) 5 4 3 2 1 0 7 19 35 24 8 3 3 15 30 30 15 3 = 0.05 = 11.7 > 11.07 table value Hence the hypothesis that the data follows a binomial distribution is rejected. Example: Fit the Poisson distribution for the following data and test the goodness of fit given that X 2 0.05 = 7.815 for degrees of freedom = 4 x 0 1 2 3 4 f 122 60 15 2 1 Solution: Poisson distribution to fit the data = Np(x) = Ne -m m x /x! m = np = =

Ne -m m x /x! = 200 / / where x = 0, 1, 2, 3, 4 = 121, 61, 15, 3, 0 Therefore new table is x 0 1 2 3 4 f(o i ) 122 60 15 2 1 E i 121 61 15 3 0 = 0.025 < 0.05 = 7.815 Therefore the fitness is considered good.! = The hypothesis that the fitness is good can be accepted. Example: The number of accidents per day (x) as recorded in a textile industry over a period of 400 is given below. Test the goodness of fit in respect of Poisson distribution of fit to the given data x 0 1 2 3 4 5 f 173 168 37 18 3 1

Solution: Poisson distribution to fit the data = Np(x) = Ne -m m x /x! m = np = = 0.7825 = Therefore new table is x 0 1 2 3 4 5 f(o i ) 173 168 37 18 3 1 E i 183 143 56 15 3 0 = 12.297 12.3 > 0.05 = 9.49 Therefore the fitness is not good The hypothesis that the fitness is good is rejected. Example: In experiments of pea breeding, the following frequencies of seeds were obtained Round & Wrinkled & Round & Wrinkled & total yellow yellow green green 315 101 108 32 556 Theory predicts that the frequency should be in proportion 9:3:3:1. Examine the correspondence between theory and experiment.

Solution: Corresponding frequencies are 313, 104, 104, 35. = 0.51 < 0.05 = 7.815 The calculated value of is much less than 0.05 There exists agreement between theory and experiment.