Probability We will now begin to explore issues of uncertainty and randomness and how they affect our view of nature. We will explore in lab the differences between accuracy and precision, and the role of sample size in precision and how the law of large numbers links sample size and precision. We need to formalize a bit of probability theory before We proceed toward estimation of population parameters.
Sampling from populations and estimation of population parameters will comprise our efforts for the remainder of the course. It is essential that we develop a basic understanding of ideas of sampling, probability and estimation of parameters.
It is essential that you appreciate the fact that when we study nature we get a glimpse of one (or sometimes a few) snapshots of nature. We do not get to see reality; we have to make inferences and management decisions based on an imperfect view of nature. You need to understand how our sampling influences the quality of our view, and consequently our inferences about nature. This understanding is important both to understand how to improve our sampling and to understand how strong support is for a particular management action.
We have the notion that traits (e.g., fecundity or survival probability) of individuals are distributed in a way that can be described by a probability distribution. When we estimate a parameter value for a population (in the statistical sense) we must view this process as drawing a sample from the overall population and producing our estimate for this sample. Our sample can be viewed as having been drawn from the probability distribution that characterizes the entire population.
When we draw a sample we can present the data as a frequency distribution. We get the following results for a sample of 50. FREQUENCY OF VALUES DRAWN FROM A NORMAL DISTRIBUTION µ=100, sd=20 FREQUENCY 18 16 14 12 10 8 6 4 2 0 20 40 60 80 100 120 140 160 180 VALUE
We characterize a distribution using a number of statistics, including the mean and variance (and possibly other parameters). When studying nature we can only estimate these parameters because we cannot know what the true values are. For example, on o the preceding slide, we know that the sample was drawn from a normal distribution (µ( = 100, sd = 20). Our estimates of these parameters from the data are: x = 100.9 sd = 18.1 Notice that our estimates of the mean and standard deviation are relatively close to what we know the true values to be.
A key theorem from probability, the Central Limit Theorem, tells us that: z = x µ σ n as n n,, z approaches the standard normal distribution. This distribution has µ = 0, and sd = 1. This tells us that for large samples the sample mean approaches the true mean and the standard deviation of the mean (the standard error) approaches the standard deviation of the underlying distribution divided by the square root of the sample size. We are thus justified in approximating the distribution of the means as normal no matter what the distribution of the underlying data is,, so long as we have an adequate sample.
Note that the standard error of the mean (σ/n)( is an estimate of the standard deviation of the distribution of the means if we were to draw numerous samples of size n and estimate a mean for each one. Consequently the standard error is a measure of the precision of our estimate of the mean. Generally, the standard error (standard deviation of the distribution of estimates) provides an estimate of the precision of our estimate. Note that this precision increases as n (sample size) increases.
DISTRIBUTION OF MEANS FROM SAMPLES (N=10) DRAWN FROM A NORMAL DISTRIBUTION µ=100, sd=20 DISTRIBUTION OF MEANS FROM SAMPLES (N=50) DRAWN FROM NORMAL DISTRIBUTION WITH µ=100, sd=20 10 14 NUMBER OF EXPERIMENTS 8 6 4 2 0 80 85 90 95 100 105 110 115 120 ESTIMATED MEAN NUMBER OF EXPERIMENTS 12 10 8 6 4 2 0 85 90 95 100 105 110 115 ESTIMATED MEAN
Sample size also plays a role through the law of large of large numbers as we will see in lab. For the binomial distribution the proportion of successes in n trials approaches the probability of a success to an arbitrarily small difference as n n.. Thus, precision increases as sample size increases.
FREQUENCY OF EXPERIMENTS (10 FLIPS) PRODUCING DIFFERENT PROPORTIONS HEADS FREQUENCY OF EXPERIMENTS (100 FLIPS) PRODUCING DIFFERENT PROPORTIONS HEADS 35 60 NUMBER OF EXPERIMENTS 30 P = 0.8 25 20 15 10 5 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PROPORTION SUCCESSES NUMBER OF EXPERIMENTS 50 P = 0.8 40 30 20 10 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PROPORTION SUCCESSES
Statistical inference is the process of arriving at conclusions or decisions concerning the parameters of populations on the basis of information contained in samples. (Freund 1962) Three key aspects of our approach are: 1. sampling 2. parameter estimation 3. inference about parameters 4. model selection. The latter two include hypothesis testing or some other form of inference. We ll talk more about estimation beginning next week.
The business of hypothesis testing is currently under intense discussion but it is still important for you to have a brief exposure to the basics of the traditional approach. Remember we can approximate the distribution of the sample mean using the normal distribution. I f we are interested in whether the mean of a sample differs from a particular number we can use: z = x µ σ n Because z has a standard normal distribution, a large (> 1.96) value of z tells us the there is a low probability that the sample that produced x had a mean of µ.
z 0 1.96 If z is this large or larger, it tells us that our sample mean x is very different from some hypothesized mean µ. The probability of getting this sample mean if µ is true is very small.
We haven t t discussed sampling yet. Because sample size influences precision of our parameter estimates (and as we ll see later our ability to distinguish among hypotheses) it is essential that we correctly identify sampling units. This is an area where there is still substantial confusion among practicing professionals. The key concept is that sampling units are independent of each other.. That is, information about one unit provides no information about other units. Let s s examine some examples of sampling units.
HOME RANGES OF SPECIES X n = 3