Probability. We will now begin to explore issues of uncertainty and randomness and how they affect our view of nature.

Similar documents
The t-distribution. Patrick Breheny. October 13. z tests The χ 2 -distribution The t-distribution Summary

Lecture 17: Small-Sample Inferences for Normal Populations. Confidence intervals for µ when σ is unknown

Lecture 8 Sampling Theory

Null Hypothesis Significance Testing p-values, significance level, power, t-tests Spring 2017

Relating Graph to Matlab

CS 361: Probability & Statistics

Bayesian Learning (II)

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Chapter Three. Hypothesis Testing

CS 361: Probability & Statistics

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

One-sample categorical data: approximate inference

(1) Introduction to Bayesian statistics

Hypothesis Testing and Confidence Intervals (Part 2): Cohen s d, Logic of Testing, and Confidence Intervals

The t-test Pivots Summary. Pivots and t-tests. Patrick Breheny. October 15. Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/18

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Business Statistics: Lecture 8: Introduction to Estimation & Hypothesis Testing

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA Module 10 Comparing Two Proportions

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Review: Statistical Model

Political Science Math Camp: Problem Set 2

Compute f(x θ)f(θ) dθ

CSE 103 Homework 8: Solutions November 30, var(x) = np(1 p) = P r( X ) 0.95 P r( X ) 0.

Evaluating Hypotheses

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Web-Mining Agents Computational Learning Theory

Inference for a Population Proportion

Probability theory basics

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Summary statistics, distributions of sums and means

Lecture 1: Probability Fundamentals

Lecture 22/Chapter 19 Part 4. Statistical Inference Ch. 19 Diversity of Sample Proportions

Machine Learning

Statistical Intervals (One sample) (Chs )

Management Programme. MS-08: Quantitative Analysis for Managerial Applications

HYPOTHESIS TESTING. Hypothesis Testing

7 Estimation. 7.1 Population and Sample (P.91-92)

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

PHYS 1405 Conceptual Physics 1 Laboratory #5 Momentum and Collisions. Investigation: Is the total momentum of two objects conserved during collisions?

CS 6140: Machine Learning Spring 2016

The Central Limit Theorem

Lecture Slides. Elementary Statistics. by Mario F. Triola. and the Triola Statistics Series

An Introduction to Laws of Large Numbers

If you like us, please share us on social media. The latest UCD Hyperlibrary newsletter is now complete, check it out.

Semester I BASIC STATISTICS AND PROBABILITY STS1C01

SKETCHY NOTES FOR WEEKS 7 AND 8

Unit 4 Probability. Dr Mahmoud Alhussami

Confidence intervals CE 311S

Last few slides from last time

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

ACMS Statistics for Life Sciences. Chapter 13: Sampling Distributions

Statistical inference

The Central Limit Theorem

CHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION

CS 543 Page 1 John E. Boon, Jr.

Reasoning with Uncertainty

Acceleration and Force: I

Hypothesis testing I. - In particular, we are talking about statistical hypotheses. [get everyone s finger length!] n =

Advanced Herd Management Probabilities and distributions

PHYSICS LAB: CONSTANT MOTION

DEFINITION: IF AN OUTCOME OF A RANDOM EXPERIMENT IS CONVERTED TO A SINGLE (RANDOM) NUMBER (E.G. THE TOTAL

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

Computational Learning Theory

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Lab #12: Exam 3 Review Key

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

Machine Learning

Basic Concepts of Inference

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Chapter 23. Inferences About Means. Monday, May 6, 13. Copyright 2009 Pearson Education, Inc.

Turing Machines. Nicholas Geis. February 5, 2015

Probability and Independence Terri Bittner, Ph.D.

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Vectors and Coordinate Systems

Hypothesis tests

41.2. Tests Concerning a Single Sample. Introduction. Prerequisites. Learning Outcomes

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Chapter 9 Inferences from Two Samples

Euclidean Space. This is a brief review of some basic concepts that I hope will already be familiar to you.

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

The Central Limit Theorem

Lecture 16 - Correlation and Regression

Statistical Inference

Manual of Logical Style

Unit 19 Formulating Hypotheses and Making Decisions

Poisson Distribution

Statistical Inference. Hypothesis Testing

Chapter 12 Comparing Two or More Means

Basics of Proofs. 1 The Basics. 2 Proof Strategies. 2.1 Understand What s Going On

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Objective - To understand experimental probability

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Notes for Lecture 11

The Components of a Statistical Hypothesis Testing Problem

Topic 12 Overview of Estimation

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Sampling Distributions: Central Limit Theorem

Transcription:

Probability We will now begin to explore issues of uncertainty and randomness and how they affect our view of nature. We will explore in lab the differences between accuracy and precision, and the role of sample size in precision and how the law of large numbers links sample size and precision. We need to formalize a bit of probability theory before We proceed toward estimation of population parameters.

Sampling from populations and estimation of population parameters will comprise our efforts for the remainder of the course. It is essential that we develop a basic understanding of ideas of sampling, probability and estimation of parameters.

It is essential that you appreciate the fact that when we study nature we get a glimpse of one (or sometimes a few) snapshots of nature. We do not get to see reality; we have to make inferences and management decisions based on an imperfect view of nature. You need to understand how our sampling influences the quality of our view, and consequently our inferences about nature. This understanding is important both to understand how to improve our sampling and to understand how strong support is for a particular management action.

We have the notion that traits (e.g., fecundity or survival probability) of individuals are distributed in a way that can be described by a probability distribution. When we estimate a parameter value for a population (in the statistical sense) we must view this process as drawing a sample from the overall population and producing our estimate for this sample. Our sample can be viewed as having been drawn from the probability distribution that characterizes the entire population.

When we draw a sample we can present the data as a frequency distribution. We get the following results for a sample of 50. FREQUENCY OF VALUES DRAWN FROM A NORMAL DISTRIBUTION µ=100, sd=20 FREQUENCY 18 16 14 12 10 8 6 4 2 0 20 40 60 80 100 120 140 160 180 VALUE

We characterize a distribution using a number of statistics, including the mean and variance (and possibly other parameters). When studying nature we can only estimate these parameters because we cannot know what the true values are. For example, on o the preceding slide, we know that the sample was drawn from a normal distribution (µ( = 100, sd = 20). Our estimates of these parameters from the data are: x = 100.9 sd = 18.1 Notice that our estimates of the mean and standard deviation are relatively close to what we know the true values to be.

A key theorem from probability, the Central Limit Theorem, tells us that: z = x µ σ n as n n,, z approaches the standard normal distribution. This distribution has µ = 0, and sd = 1. This tells us that for large samples the sample mean approaches the true mean and the standard deviation of the mean (the standard error) approaches the standard deviation of the underlying distribution divided by the square root of the sample size. We are thus justified in approximating the distribution of the means as normal no matter what the distribution of the underlying data is,, so long as we have an adequate sample.

Note that the standard error of the mean (σ/n)( is an estimate of the standard deviation of the distribution of the means if we were to draw numerous samples of size n and estimate a mean for each one. Consequently the standard error is a measure of the precision of our estimate of the mean. Generally, the standard error (standard deviation of the distribution of estimates) provides an estimate of the precision of our estimate. Note that this precision increases as n (sample size) increases.

DISTRIBUTION OF MEANS FROM SAMPLES (N=10) DRAWN FROM A NORMAL DISTRIBUTION µ=100, sd=20 DISTRIBUTION OF MEANS FROM SAMPLES (N=50) DRAWN FROM NORMAL DISTRIBUTION WITH µ=100, sd=20 10 14 NUMBER OF EXPERIMENTS 8 6 4 2 0 80 85 90 95 100 105 110 115 120 ESTIMATED MEAN NUMBER OF EXPERIMENTS 12 10 8 6 4 2 0 85 90 95 100 105 110 115 ESTIMATED MEAN

Sample size also plays a role through the law of large of large numbers as we will see in lab. For the binomial distribution the proportion of successes in n trials approaches the probability of a success to an arbitrarily small difference as n n.. Thus, precision increases as sample size increases.

FREQUENCY OF EXPERIMENTS (10 FLIPS) PRODUCING DIFFERENT PROPORTIONS HEADS FREQUENCY OF EXPERIMENTS (100 FLIPS) PRODUCING DIFFERENT PROPORTIONS HEADS 35 60 NUMBER OF EXPERIMENTS 30 P = 0.8 25 20 15 10 5 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PROPORTION SUCCESSES NUMBER OF EXPERIMENTS 50 P = 0.8 40 30 20 10 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PROPORTION SUCCESSES

Statistical inference is the process of arriving at conclusions or decisions concerning the parameters of populations on the basis of information contained in samples. (Freund 1962) Three key aspects of our approach are: 1. sampling 2. parameter estimation 3. inference about parameters 4. model selection. The latter two include hypothesis testing or some other form of inference. We ll talk more about estimation beginning next week.

The business of hypothesis testing is currently under intense discussion but it is still important for you to have a brief exposure to the basics of the traditional approach. Remember we can approximate the distribution of the sample mean using the normal distribution. I f we are interested in whether the mean of a sample differs from a particular number we can use: z = x µ σ n Because z has a standard normal distribution, a large (> 1.96) value of z tells us the there is a low probability that the sample that produced x had a mean of µ.

z 0 1.96 If z is this large or larger, it tells us that our sample mean x is very different from some hypothesized mean µ. The probability of getting this sample mean if µ is true is very small.

We haven t t discussed sampling yet. Because sample size influences precision of our parameter estimates (and as we ll see later our ability to distinguish among hypotheses) it is essential that we correctly identify sampling units. This is an area where there is still substantial confusion among practicing professionals. The key concept is that sampling units are independent of each other.. That is, information about one unit provides no information about other units. Let s s examine some examples of sampling units.

HOME RANGES OF SPECIES X n = 3