Introduction to Statistical Data Analysis Lecture 4: Sampling

Size: px
Start display at page:

Download "Introduction to Statistical Data Analysis Lecture 4: Sampling"

Transcription

1 Introduction to Statistical Data Analysis Lecture 4: Sampling James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1 / 30

2 Introduction In order to complete the transition from descriptive statistics to inferential statistics, we need to know how to work with a sample of a population, since in many cases gathering descriptive statistics from the entire population is impractical. Therefore, in this lecture, we discuss sampling techniques. James V. Lambers Statistical Data Analysis 2 / 30

3 Simple Sampling Systematic Sampling Cluster Sampling Stratified Sampling Once the determination is made that only a sample of a population of interest can be studied, how to obtain that sample is far from a trivial matter. It is essential that the sample not be biased; that is, the sample must be representative of the entire population, or any inferences made from the sample will not be reliable. To reduce the chance of bias, it is best to use random sampling, which means that every member of the population has a chance of being selected. We now discuss various approaches to random sampling. James V. Lambers Statistical Data Analysis 3 / 30

4 Simple Sampling Systematic Sampling Cluster Sampling Stratified Sampling Simple Sampling In simple sampling, each member of the population has an equal chance of selection. Typically, tables of random numbers are used to assist in such a selection process. For example, suppose all members of the population can be numbered. Then, the table of random numbers can be used to determine the numbers of members of the population who are to be included in the sample. James V. Lambers Statistical Data Analysis 4 / 30

5 Simple Sampling Systematic Sampling Cluster Sampling Stratified Sampling Systematic Sampling Simple sampling is susceptible to bias, if some aid such as a table of random numbers cannot be used. To avoid this bias, one can use systematic sampling, which consists of selecting every kth member of the population. If the population has N members and a sample of size n is desired, then one should choose k N/n. James V. Lambers Statistical Data Analysis 5 / 30

6 Simple Sampling Systematic Sampling Cluster Sampling Stratified Sampling Cluster Sampling In cluster sampling, the population is divided into groups, called clusters, and then random sampling is applied to the clusters. That is, entire clusters are chosen to obtain the sample. This is effective if each cluster is representative of the entire population. James V. Lambers Statistical Data Analysis 6 / 30

7 Simple Sampling Systematic Sampling Cluster Sampling Stratified Sampling Stratified Sampling In stratified sampling, the population is divided into mutually exclusive groups, called strata, and then random sampling is performed within each stratus. This approach can be used to ensure that each stratus is treated equally within the sample. For example, suppose that for a national poll, it was desired to have a sample in which each state was represented equally. Then, the strata would be the states, and a sample could be obtained from the populations of each state. James V. Lambers Statistical Data Analysis 7 / 30

8 Sampling Errors Poor Sampling Technique Sampling must be performed with care, so that any inferences made about the population from the sample have at least some validity. James V. Lambers Statistical Data Analysis 8 / 30

9 Sampling Errors Poor Sampling Technique Sampling Errors A descriptive statistic computed from a sample is only an estimate of the corresponding statistic for the population, which, in most cases, cannot be obtained. However, it is possible to estimate the error in the sample statistic, called the sampling error; we will learn how to do so later, using confidence intervals. As we will see then, choosing a larger sample reduces the sampling error. It can be made arbitrarily small by choosing a sample close to the size of the entire population, but usually this is not practical. James V. Lambers Statistical Data Analysis 9 / 30

10 Sampling Errors Poor Sampling Technique Poor Sampling Technique Even if a very large sample is chosen, conclusions made about the sample do not apply to the population if the sample is biased. On the other hand, if a sample is truly representative of the population, then it does not need to be large to be reliable. It is also important to avoid making unrealistic assumptions about the sample. James V. Lambers Statistical Data Analysis 10 / 30

11 Sampling Errors Poor Sampling Technique 1948 Presidential Election In a poll conducted during the 1948 presidential election, voters in the sample were classified as supporting Harry Truman, supporting Thomas Dewey, or undecided. The polling organization made the assumption that undecided voters should be distributed among the two candidates in the same way that the decided voters were, which led to a conclusion that Dewey would win. However, the undecided voters were actually more in favor of Truman, thus leading to his victory. James V. Lambers Statistical Data Analysis 11 / 30

12 Sampling Distribution of the Mean Suppose that it is desired to measure some quantifiable characteristic of a population, such as average height, or the percentage of the population that votes Republican. A sample of the population can be taken, and then the characteristic of the sample, whatever it is, can be computed from information obtained from each member of the sample. Now, suppose that many samples are taken, with each sample being the same size. Then, the values that are computed from these samples form a set of outcomes, where the experiment in question is the computation of the desired characteristic of the sample. This set of outcomes obtained from samples is called a sampling distribution. James V. Lambers Statistical Data Analysis 12 / 30

13 Sampling Distribution of the Mean Sampling Distribution of the Mean Sampling distributions apply to a number of different statistics, but the most commonly used is the mean. The sampling distribution of the mean is the pattern of means that is obtained from computing the sample means from all possible samples of the population. James V. Lambers Statistical Data Analysis 13 / 30

14 Sampling Distribution of the Mean Example We will illustrate the sampling distribution of the mean for an example of rolling a six-sided die. Each of the six numbers has an equal likelihood of appearing face up, so these values follow a discrete uniform probability distribution, which is a distribution that assigns the same probability to each discrete event. James V. Lambers Statistical Data Analysis 14 / 30

15 Sampling Distribution of the Mean Example, cont d The mean of such a distribution is µ = a + b 2, where a and b are the minimum and maximum values, respectively, of the distribution. The variance is given by σ 2 = 1 12 [(b a + 1)2 1]. Therefore, for the case of a six-sided die, for which a = 1 and b = 6, we have µ = 3.5 and σ 2 = 35/12. James V. Lambers Statistical Data Analysis 15 / 30

16 Sampling Distribution of the Mean Example, cont d Now, suppose we roll the die n times, where n is the size of our sample, and compute the sample mean x. Then, we repeat this process m times, gathering m samples, each of size n. The m sample means form a sampling distribution of the mean, which we can then display in a histogram. James V. Lambers Statistical Data Analysis 16 / 30

17 Sampling Distribution of the Mean Displaying the Sample Means This is accomplished in R using the following statements (assuming the values of n, the sample size, and m, the number of samples, are already defined): > means=c() > for (i in 1:m) means[i]=mean(round(runif(n,0.5,6.5))) > hist(means,seq(1,6,0.5)) James V. Lambers Statistical Data Analysis 17 / 30

18 Sampling Distribution of the Mean Code Dissection The first statement means=c() creates an empty vector called means, which will hold the sample means. The second statement for (i in 1:m) means[i]=mean(round(runif(n,0.5,6.5))) executes a loop m times, in which the ith element of the means vector is set to the mean of a vector of n numbers generated by runif from the uniform probability distribution with a = 0.5 and b = 6.5, and then rounded to the nearest integer by round to generate a sample containing numbers between 1 and 6. James V. Lambers Statistical Data Analysis 18 / 30

19 Sampling Distribution of the Mean Code Dissection, cont d The third statement hist(means,seq(1,6,0.5)) generates a histogram of the frequency distribution of the sample means, with classes chosen to have width 0.5. Recall that the expression seq(a,b,h) generates a sequence of numbers starting at a and ending at b, with spacing h. If the terms of the sequence have a spacing of 1, then the shorthand a:b can be used instead; note that this is used in the for statement. James V. Lambers Statistical Data Analysis 19 / 30

20 Sampling Distribution of the Mean Sampling Distribution of the Mean, n = 2 Suppose we use a small sample of size n = 2, and compute m = 50 samples. The means are well-distributed across the interval from 1 to 6. James V. Lambers Statistical Data Analysis 20 / 30

21 Sampling Distribution of the Mean Increasing the Sample Size Now, suppose that we increase n (keeping m fixed) and see what happens to the distribution. We see that the distribution becomes like that of a normal distribution, with its mean roughly that of the original uniform distribution. James V. Lambers Statistical Data Analysis 21 / 30

22 Standard Error The behavior in the preceding example is no coincidence; it is actually an illustration of what is known as the Central Limit Theorem. This theorem states that as the sample size n increases, the sample means tend to converge to a normal distribution around the true population mean, regardless of distribution of the population from which the sample is taken. James V. Lambers Statistical Data Analysis 22 / 30

23 Standard Error Standard Error of the Mean also states that as the sample size n increases, the standard deviation of the sample means, denoted by σ x, converges to σ x = σ n, where σ is the standard deviation of the population. This standard deviation of the sample means is called the standard error of the mean. Using the standard error σ x and the population mean µ, one can use the fact that the sample mean is normally distributed for sufficiently large n to compute the probability that the sample mean will fall within a certain interval, as has been shown previously for a general normal distribution. James V. Lambers Statistical Data Analysis 23 / 30

24 Standard Error Example In the case of the roll of a six-sided die, with a sample size of n = 20, the standard error is σ x = σ 35/12 = = n 20 Therefore, to obtain the probability that the sample mean will be greater than 4, we compute the z-score for 4: We conclude that 4 µ σ x = = P(X > 4) = 1 P(X 4) = 1 P(Z 1.309) = = That is, there is a less than 10% chance that the sample mean will be greater than 4. James V. Lambers Statistical Data Analysis 24 / 30

25 Standard Error Sampling Distribution of the Sum Suppose that instead of taking the mean of the observations in each sample, we instead take the sum. If the population mean and standard deviation are µ and σ, respectively, then as n increases, the sampling distribution of the sum converges to N (nµ, σ n). That is, the mean and standard deviation of the sampling distribution of the mean are simply multiplied by n James V. Lambers Statistical Data Analysis 25 / 30

26 In addition to the mean, we can measure the proportion of the population that possesses a characteristic that is binary in nature, such as whether a person agrees with a particular statement. Because of the binary nature of the characteristic, the experiment of determining its value for members of the population follows a binomial distribution. That is, the act of inquiring of each member of the population is a Bernoulli trial, in which success and failure correspond to yes or no responses. However, as noted previously, if the number of trials n is sufficiently large that np 5 and n(1 p) 5, where p is the probability of success, then this binomial distribution can be approximated by a normal distribution. James V. Lambers Statistical Data Analysis 26 / 30

27 Standard Error of the Proportion We therefore need the mean and standard deviation of this normal distribution. Because the population proportion p is unknown, we must instead use the sample proportion p s, which is defined to be the number of success in the sample, divided by the sample size n. Several samples can be taken, and then their proportion means can be averaged to obtain an approximate value for p. The standard deviation of this distribution, called the standard error of the proportion, is given by p(1 p) σ p =. n James V. Lambers Statistical Data Analysis 27 / 30

28 Relation to the Binomial Distribution It is worth noting that σ p is equal to the standard deviation of the binomial distribution, np(1 p), divided by n. This makes sense because in the sampling distribution of the proportion, we are not measuring the number of successes, as we are in the binomial distribution. Rather, we are measuring the proportion of successes, thus requiring the division of both the binomial distribution s mean and standard deviation by n. James V. Lambers Statistical Data Analysis 28 / 30

29 Example Suppose that through sampling, with samples of size n = 100, it is determined that 60% of voters in California support a particular ballot initiative (that is, p = 0.6). Because np = 100(0.6) = 60 and n(1 p) = 100(0.4) = 40 are large enough, we may use a normal distribution to model the sampling distribution of the proportion. James V. Lambers Statistical Data Analysis 29 / 30

30 Example, cont d The standard error of the proportion is 0.6(1 0.6) σ p = = Therefore, the probability that more than 65% of the next sample will support the initiative is P(p s > 0.65) = 1 P(p s 0.65) = 1 P(Z 1.02) = = , where the z-score for 0.65 is 0.65 p = = σ p James V. Lambers Statistical Data Analysis 30 / 30

Introduction to Statistical Data Analysis Lecture 5: Confidence Intervals

Introduction to Statistical Data Analysis Lecture 5: Confidence Intervals Introduction to Statistical Data Analysis Lecture 5: Confidence Intervals James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis 1

More information

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions Introduction to Statistical Data Analysis Lecture 3: Probability Distributions James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li)

Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li) Carolyn Anderson & YoungShil Paek (Slide contributors: Shuai Wang, Yi Zheng, Michael Culbertson, & Haiyan Li) Department of Educational Psychology University of Illinois at Urbana-Champaign 1 Inferential

More information

Stochastic calculus for summable processes 1

Stochastic calculus for summable processes 1 Stochastic calculus for summable processes 1 Lecture I Definition 1. Statistics is the science of collecting, organizing, summarizing and analyzing the information in order to draw conclusions. It is a

More information

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution

Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution Introduction to Statistical Data Analysis Lecture 7: The Chi-Square Distribution James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Chapter 18: Sampling Distributions

Chapter 18: Sampling Distributions Chapter 18: Sampling Distributions All random variables have probability distributions, and as statistics are random variables, they too have distributions. The random phenomenon that produces the statistics

More information

Review of the Normal Distribution

Review of the Normal Distribution Sampling and s Normal Distribution Aims of Sampling Basic Principles of Probability Types of Random Samples s of the Mean Standard Error of the Mean The Central Limit Theorem Review of the Normal Distribution

More information

Introduction to Statistical Data Analysis Lecture 1: Working with Data Sets

Introduction to Statistical Data Analysis Lecture 1: Working with Data Sets Introduction to Statistical Data Analysis Lecture 1: Working with Data Sets James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Discrete Distributions

Discrete Distributions Discrete Distributions STA 281 Fall 2011 1 Introduction Previously we defined a random variable to be an experiment with numerical outcomes. Often different random variables are related in that they have

More information

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p

Chapter 3. Estimation of p. 3.1 Point and Interval Estimates of p Chapter 3 Estimation of p 3.1 Point and Interval Estimates of p Suppose that we have Bernoulli Trials (BT). So far, in every example I have told you the (numerical) value of p. In science, usually the

More information

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score

Interpret Standard Deviation. Outlier Rule. Describe the Distribution OR Compare the Distributions. Linear Transformations SOCS. Interpret a z score Interpret Standard Deviation Outlier Rule Linear Transformations Describe the Distribution OR Compare the Distributions SOCS Using Normalcdf and Invnorm (Calculator Tips) Interpret a z score What is an

More information

Business Statistics:

Business Statistics: Chapter 7 Student Lecture Notes 7-1 Department of Quantitative Methods & Information Systems Business Statistics: Chapter 7 Introduction to Sampling Distributions QMIS 220 Dr. Mohammad Zainal Chapter Goals

More information

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

Lecture 7: Confidence interval and Normal approximation

Lecture 7: Confidence interval and Normal approximation Lecture 7: Confidence interval and Normal approximation 26th of November 2015 Confidence interval 26th of November 2015 1 / 23 Random sample and uncertainty Example: we aim at estimating the average height

More information

Topic 3 Populations and Samples

Topic 3 Populations and Samples BioEpi540W Populations and Samples Page 1 of 33 Topic 3 Populations and Samples Topics 1. A Feeling for Populations v Samples 2 2. Target Populations, Sampled Populations, Sampling Frames 5 3. On Making

More information

Sampling Distributions

Sampling Distributions Sampling Error As you may remember from the first lecture, samples provide incomplete information about the population In particular, a statistic (e.g., M, s) computed on any particular sample drawn from

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

ST 371 (IX): Theories of Sampling Distributions

ST 371 (IX): Theories of Sampling Distributions ST 371 (IX): Theories of Sampling Distributions 1 Sample, Population, Parameter and Statistic The major use of inferential statistics is to use information from a sample to infer characteristics about

More information

Special distributions

Special distributions Special distributions August 22, 2017 STAT 101 Class 4 Slide 1 Outline of Topics 1 Motivation 2 Bernoulli and binomial 3 Poisson 4 Uniform 5 Exponential 6 Normal STAT 101 Class 4 Slide 2 What distributions

More information

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by: Chapter 8 Probability 8. Preliminaries Definition (Sample Space). A Sample Space, Ω, is the set of all possible outcomes of an experiment. Such a sample space is considered discrete if Ω has finite cardinality.

More information

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015 Fall 2015 Population versus Sample Population: data for every possible relevant case Sample: a subset of cases that is drawn from an underlying population Inference Parameters and Statistics A parameter

More information

Business Statistics:

Business Statistics: Department of Quantitative Methods & Information Systems Business Statistics: Chapter 7 Introduction to Sampling Distributions QMIS 220 Dr. Mohammad Zainal Chapter Goals After completing this chapter,

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

STAT:5100 (22S:193) Statistical Inference I

STAT:5100 (22S:193) Statistical Inference I STAT:5100 (22S:193) Statistical Inference I Week 3 Luke Tierney University of Iowa Fall 2015 Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1 Recap Matching problem Generalized

More information

Lecture 20 Random Samples 0/ 13

Lecture 20 Random Samples 0/ 13 0/ 13 One of the most important concepts in statistics is that of a random sample. The definition of a random sample is rather abstract. However it is critical to understand the idea behind the definition,

More information

Math 10 - Compilation of Sample Exam Questions + Answers

Math 10 - Compilation of Sample Exam Questions + Answers Math 10 - Compilation of Sample Exam Questions + Sample Exam Question 1 We have a population of size N. Let p be the independent probability of a person in the population developing a disease. Answer the

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

Exam 1 Solutions. Problem Points Score Total 145

Exam 1 Solutions. Problem Points Score Total 145 Exam Solutions Read each question carefully and answer all to the best of your ability. Show work to receive as much credit as possible. At the end of the exam, please sign the box below. Problem Points

More information

Are data normally normally distributed?

Are data normally normally distributed? Standard Normal Image source Are data normally normally distributed? Sample mean: 66.78 Sample standard deviation: 3.37 (66.78-1 x 3.37, 66.78 + 1 x 3.37) (66.78-2 x 3.37, 66.78 + 2 x 3.37) (66.78-3 x

More information

Management Programme. MS-08: Quantitative Analysis for Managerial Applications

Management Programme. MS-08: Quantitative Analysis for Managerial Applications MS-08 Management Programme ASSIGNMENT SECOND SEMESTER 2013 MS-08: Quantitative Analysis for Managerial Applications School of Management Studies INDIRA GANDHI NATIONAL OPEN UNIVERSITY MAIDAN GARHI, NEW

More information

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018 Lecture 18: Central Limit Theorem Lisa Yan August 6, 2018 Announcements PS5 due today Pain poll PS6 out today Due next Monday 8/13 (1:30pm) (will not be accepted after Wed 8/15) Programming part: Java,

More information

Physics 6720 Introduction to Statistics April 4, 2017

Physics 6720 Introduction to Statistics April 4, 2017 Physics 6720 Introduction to Statistics April 4, 2017 1 Statistics of Counting Often an experiment yields a result that can be classified according to a set of discrete events, giving rise to an integer

More information

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events Section 1: Introductory Probability Basic Probability Facts Probabilities of Simple Events Overview of Set Language Venn Diagrams Probabilities of Compound Events Choices of Events The Addition Rule Combinations

More information

Day 8: Sampling. Daniel J. Mallinson. School of Public Affairs Penn State Harrisburg PADM-HADM 503

Day 8: Sampling. Daniel J. Mallinson. School of Public Affairs Penn State Harrisburg PADM-HADM 503 Day 8: Sampling Daniel J. Mallinson School of Public Affairs Penn State Harrisburg mallinson@psu.edu PADM-HADM 503 Mallinson Day 8 October 12, 2017 1 / 46 Road map Why Sample? Sampling terminology Probability

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Business Statistics: A First Course

Business Statistics: A First Course Business Statistics: A First Course 5 th Edition Chapter 7 Sampling and Sampling Distributions Basic Business Statistics, 11e 2009 Prentice-Hall, Inc. Chap 7-1 Learning Objectives In this chapter, you

More information

Unit 3 Populations and Samples

Unit 3 Populations and Samples BIOSTATS 540 Fall 2015 3. Populations and s Page 1 of 37 Unit 3 Populations and s To all the ladies present and some of those absent - Jerzy Neyman The collection of all individuals with HIV infection

More information

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning

*Karle Laska s Sections: There is no class tomorrow and Friday! Have a good weekend! Scores will be posted in Compass early Friday morning STATISTICS 100 EXAM 3 Spring 2016 PRINT NAME (Last name) (First name) *NETID CIRCLE SECTION: Laska MWF L1 Laska Tues/Thurs L2 Robin Tu Write answers in appropriate blanks. When no blanks are provided CIRCLE

More information

Lecture 8 Sampling Theory

Lecture 8 Sampling Theory Lecture 8 Sampling Theory Thais Paiva STA 111 - Summer 2013 Term II July 11, 2013 1 / 25 Thais Paiva STA 111 - Summer 2013 Term II Lecture 8, 07/11/2013 Lecture Plan 1 Sampling Distributions 2 Law of Large

More information

Review of probabilities

Review of probabilities CS 1675 Introduction to Machine Learning Lecture 5 Density estimation Milos Hauskrecht milos@pitt.edu 5329 Sennott Square Review of probabilities 1 robability theory Studies and describes random processes

More information

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with

More information

You are allowed 3? sheets of notes and a calculator.

You are allowed 3? sheets of notes and a calculator. Exam 1 is Wed Sept You are allowed 3? sheets of notes and a calculator The exam covers survey sampling umbers refer to types of problems on exam A population is the entire set of (potential) measurements

More information

Chapter 6 ESTIMATION OF PARAMETERS

Chapter 6 ESTIMATION OF PARAMETERS Chapter 6 ESTIMATION OF PARAMETERS Recall that one of the objectives of statistics is to make inferences concerning a population. And these inferences are based only in partial information regarding the

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 5 Spring 2006 Review problems UC Berkeley Department of Electrical Engineering and Computer Science EE 6: Probablity and Random Processes Solutions 5 Spring 006 Problem 5. On any given day your golf score is any integer

More information

Unit 4 Probability. Dr Mahmoud Alhussami

Unit 4 Probability. Dr Mahmoud Alhussami Unit 4 Probability Dr Mahmoud Alhussami Probability Probability theory developed from the study of games of chance like dice and cards. A process like flipping a coin, rolling a die or drawing a card from

More information

DEFINITION: IF AN OUTCOME OF A RANDOM EXPERIMENT IS CONVERTED TO A SINGLE (RANDOM) NUMBER (E.G. THE TOTAL

DEFINITION: IF AN OUTCOME OF A RANDOM EXPERIMENT IS CONVERTED TO A SINGLE (RANDOM) NUMBER (E.G. THE TOTAL CHAPTER 5: RANDOM VARIABLES, BINOMIAL AND POISSON DISTRIBUTIONS DEFINITION: IF AN OUTCOME OF A RANDOM EXPERIMENT IS CONVERTED TO A SINGLE (RANDOM) NUMBER (E.G. THE TOTAL NUMBER OF DOTS WHEN ROLLING TWO

More information

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E OBJECTIVE COURSE Understand the concept of population and sampling in the research. Identify the type

More information

Probability Distributions.

Probability Distributions. Probability Distributions http://www.pelagicos.net/classes_biometry_fa18.htm Probability Measuring Discrete Outcomes Plotting probabilities for discrete outcomes: 0.6 0.5 0.4 0.3 0.2 0.1 NOTE: Area within

More information

Lecture 5: Sampling Methods

Lecture 5: Sampling Methods Lecture 5: Sampling Methods What is sampling? Is the process of selecting part of a larger group of participants with the intent of generalizing the results from the smaller group, called the sample, to

More information

p = q ˆ = 1 -ˆp = sample proportion of failures in a sample size of n x n Chapter 7 Estimates and Sample Sizes

p = q ˆ = 1 -ˆp = sample proportion of failures in a sample size of n x n Chapter 7 Estimates and Sample Sizes Chapter 7 Estimates and Sample Sizes 7-1 Overview 7-2 Estimating a Population Proportion 7-3 Estimating a Population Mean: σ Known 7-4 Estimating a Population Mean: σ Not Known 7-5 Estimating a Population

More information

Chapters 3.2 Discrete distributions

Chapters 3.2 Discrete distributions Chapters 3.2 Discrete distributions In this section we study several discrete distributions and their properties. Here are a few, classified by their support S X. There are of course many, many more. For

More information

Section 6.2 Hypothesis Testing

Section 6.2 Hypothesis Testing Section 6.2 Hypothesis Testing GIVEN: an unknown parameter, and two mutually exclusive statements H 0 and H 1 about. The Statistician must decide either to accept H 0 or to accept H 1. This kind of problem

More information

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all Lecture 6 1 Lecture 6 Probability events Definition 1. The sample space, S, of a probability experiment is the collection of all possible outcomes of an experiment. One such outcome is called a simple

More information

Test 3 SOLUTIONS. x P(x) xp(x)

Test 3 SOLUTIONS. x P(x) xp(x) 16 1. A couple of weeks ago in class, each of you took three quizzes where you randomly guessed the answers to each question. There were eight questions on each quiz, and four possible answers to each

More information

Discrete Random Variables

Discrete Random Variables Discrete Random Variables An Undergraduate Introduction to Financial Mathematics J. Robert Buchanan Introduction The markets can be thought of as a complex interaction of a large number of random processes,

More information

Lecture Slides. Elementary Statistics. Tenth Edition. by Mario F. Triola. and the Triola Statistics Series

Lecture Slides. Elementary Statistics. Tenth Edition. by Mario F. Triola. and the Triola Statistics Series Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 7 Estimates and Sample Sizes 7-1 Overview 7-2 Estimating a Population Proportion 7-3

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland

Overview. Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland Overview Confidence Intervals Sampling and Opinion Polls Error Correcting Codes Number of Pet Unicorns in Ireland Confidence Intervals When a random variable lies in an interval a X b with a specified

More information

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University richamar@gvsu.edu Overview of Lesson In this activity students explore the properties of the distribution

More information

MIT : Quantitative Reasoning and Statistical Methods for Planning I

MIT : Quantitative Reasoning and Statistical Methods for Planning I MIT 11.220 Spring 06 March 2, 2006 MIT - 11.220: Quantitative Reasoning and Statistical Methods for Planning I I. Probability Recitation #2: Spring 2006 Probability, Normal Distribution, and Binomial Distribution

More information

4/19/2009. Probability Distributions. Inference. Example 1. Example 2. Parameter versus statistic. Normal Probability Distribution N

4/19/2009. Probability Distributions. Inference. Example 1. Example 2. Parameter versus statistic. Normal Probability Distribution N Probability Distributions Normal Probability Distribution N Chapter 6 Inference It was reported that the 2008 Super Bowl was watched by 97.5 million people. But how does anyone know that? They certainly

More information

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal X STA 291 - Lecture 16 1 Sampling Distributions Sampling

More information

Econ 325: Introduction to Empirical Economics

Econ 325: Introduction to Empirical Economics Econ 325: Introduction to Empirical Economics Chapter 9 Hypothesis Testing: Single Population Ch. 9-1 9.1 What is a Hypothesis? A hypothesis is a claim (assumption) about a population parameter: population

More information

MULTINOMIAL PROBABILITY DISTRIBUTION

MULTINOMIAL PROBABILITY DISTRIBUTION MTH/STA 56 MULTINOMIAL PROBABILITY DISTRIBUTION The multinomial probability distribution is an extension of the binomial probability distribution when the identical trial in the experiment has more than

More information

Chapter. Objectives. Sampling Distributions

Chapter. Objectives. Sampling Distributions Chapter Sampling Distributions 8 Section 8.1 Distribution of the Sample Mean Objectives 1. Describe the distribution of the sample mean: samples from normal populations 2. Describe the distribution of

More information

2.3 Estimating PDFs and PDF Parameters

2.3 Estimating PDFs and PDF Parameters .3 Estimating PDFs and PDF Parameters estimating means - discrete and continuous estimating variance using a known mean estimating variance with an estimated mean estimating a discrete pdf estimating a

More information

Section 7.5 Conditional Probability and Independent Events

Section 7.5 Conditional Probability and Independent Events Section 75 Conditional Probability and Independent Events Conditional Probability of an Event If A and B are events in an experiment and P (A) 6= 0,thentheconditionalprobabilitythattheevent B will occur

More information

Lesson B1 - Probability Distributions.notebook

Lesson B1 - Probability Distributions.notebook Learning Goals: * Define a discrete random variable * Applying a probability distribution of a discrete random variable. * Use tables, graphs, and expressions to represent the distributions. Should you

More information

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom Confidence Intervals for the Mean of Non-normal Data Class 23, 8.05 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to derive the formula for conservative normal confidence intervals for the proportion

More information

Chapter 8: Confidence Intervals

Chapter 8: Confidence Intervals Chapter 8: Confidence Intervals Introduction Suppose you are trying to determine the mean rent of a two-bedroom apartment in your town. You might look in the classified section of the newspaper, write

More information

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution 1 ACM 116: Lecture 2 Agenda Independence Bayes rule Discrete random variables Bernoulli distribution Binomial distribution Continuous Random variables The Normal distribution Expected value of a random

More information

Learning Objectives for Stat 225

Learning Objectives for Stat 225 Learning Objectives for Stat 225 08/20/12 Introduction to Probability: Get some general ideas about probability, and learn how to use sample space to compute the probability of a specific event. Set Theory:

More information

Theoretical Foundations

Theoretical Foundations Theoretical Foundations Sampling Distribution and Central Limit Theorem Monia Ranalli monia.ranalli@uniroma3.it Ranalli M. Theoretical Foundations - Sampling Distribution and Central Limit Theorem Lesson

More information

Random processes. Lecture 17: Probability, Part 1. Probability. Law of large numbers

Random processes. Lecture 17: Probability, Part 1. Probability. Law of large numbers Random processes Lecture 17: Probability, Part 1 Statistics 10 Colin Rundel March 26, 2012 A random process is a situation in which we know what outcomes could happen, but we don t know which particular

More information

AP Statistics Review Ch. 7

AP Statistics Review Ch. 7 AP Statistics Review Ch. 7 Name 1. Which of the following best describes what is meant by the term sampling variability? A. There are many different methods for selecting a sample. B. Two different samples

More information

1 of 6 7/16/2009 6:31 AM Virtual Laboratories > 11. Bernoulli Trials > 1 2 3 4 5 6 1. Introduction Basic Theory The Bernoulli trials process, named after James Bernoulli, is one of the simplest yet most

More information

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS 1. (5 points) You are a pollster for the 2016 presidential elections. You ask 0 random people whether they would vote for

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Probability Distributions

Probability Distributions EXAMPLE: Consider rolling a fair die twice. Probability Distributions Random Variables S = {(i, j : i, j {,...,6}} Suppose we are interested in computing the sum, i.e. we have placed a bet at a craps table.

More information

Cogs 14B: Introduction to Statistical Analysis

Cogs 14B: Introduction to Statistical Analysis Cogs 14B: Introduction to Statistical Analysis Statistical Tools: Description vs. Prediction/Inference Description Averages Variability Correlation Prediction (Inference) Regression Confidence intervals/

More information

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias Recap Announcements Lecture 5: Statistics 101 Mine Çetinkaya-Rundel September 13, 2011 HW1 due TA hours Thursday - Sunday 4pm - 9pm at Old Chem 211A If you added the class last week please make sure to

More information

Applications in Differentiation Page 3

Applications in Differentiation Page 3 Applications in Differentiation Page 3 Continuity and Differentiability Page 3 Gradients at Specific Points Page 5 Derivatives of Hybrid Functions Page 7 Derivatives of Composite Functions Page 8 Joining

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Section 7.1 How Likely are the Possible Values of a Statistic? The Sampling Distribution of the Proportion

Section 7.1 How Likely are the Possible Values of a Statistic? The Sampling Distribution of the Proportion Section 7.1 How Likely are the Possible Values of a Statistic? The Sampling Distribution of the Proportion CNN / USA Today / Gallup Poll September 22-24, 2008 www.poll.gallup.com 12% of Americans describe

More information

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20

Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute

More information

MATH : FINAL EXAM INFO/LOGISTICS/ADVICE

MATH : FINAL EXAM INFO/LOGISTICS/ADVICE INFO: MATH 1300-01: FINAL EXAM INFO/LOGISTICS/ADVICE WHEN: Thursday (08/06) at 11:00am DURATION: 150 mins PROBLEM COUNT: Eleven BONUS COUNT: Two There will be three Ch13 problems, three Ch14 problems,

More information

CS1512 Foundations of Computing Science 2. Lecture 4

CS1512 Foundations of Computing Science 2. Lecture 4 CS1512 Foundations of Computing Science 2 Lecture 4 Bayes Law; Gaussian Distributions 1 J R W Hunter, 2006; C J van Deemter 2007 (Revd. Thomas) Bayes Theorem P( E 1 and E 2 ) = P( E 1 )* P( E 2 E 1 ) Order

More information

X = X X n, + X 2

X = X X n, + X 2 CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk

More information

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E

Salt Lake Community College MATH 1040 Final Exam Fall Semester 2011 Form E Salt Lake Community College MATH 1040 Final Exam Fall Semester 011 Form E Name Instructor Time Limit: 10 minutes Any hand-held calculator may be used. Computers, cell phones, or other communication devices

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

1. Rolling a six sided die and observing the number on the uppermost face is an experiment with six possible outcomes; 1, 2, 3, 4, 5 and 6.

1. Rolling a six sided die and observing the number on the uppermost face is an experiment with six possible outcomes; 1, 2, 3, 4, 5 and 6. Section 7.1: Introduction to Probability Almost everybody has used some conscious or subconscious estimate of the likelihood of an event happening at some point in their life. Such estimates are often

More information

Debugging Intuition. How to calculate the probability of at least k successes in n trials?

Debugging Intuition. How to calculate the probability of at least k successes in n trials? How to calculate the probability of at least k successes in n trials? X is number of successes in n trials each with probability p # ways to choose slots for success Correct: Debugging Intuition P (X k)

More information

Review. A Bernoulli Trial is a very simple experiment:

Review. A Bernoulli Trial is a very simple experiment: Review A Bernoulli Trial is a very simple experiment: Review A Bernoulli Trial is a very simple experiment: two possible outcomes (success or failure) probability of success is always the same (p) the

More information

MTH135/STA104: Probability

MTH135/STA104: Probability MTH35/STA04: Probability Homework # 3 Due: Tuesday, Sep 0, 005 Prof. Robert Wolpert. from prob 7 p. 9 You roll a fair, six-sided die and I roll a die. You win if the number showing on your die is strictly

More information

Hypothesis tests

Hypothesis tests 6.1 6.4 Hypothesis tests Prof. Tesler Math 186 February 26, 2014 Prof. Tesler 6.1 6.4 Hypothesis tests Math 186 / February 26, 2014 1 / 41 6.1 6.2 Intro to hypothesis tests and decision rules Hypothesis

More information

Combinations. April 12, 2006

Combinations. April 12, 2006 Combinations April 12, 2006 Combinations, April 12, 2006 Binomial Coecients Denition. The number of distinct subsets with j elements that can be chosen from a set with n elements is denoted by ( n j).

More information

1 INFO Sep 05

1 INFO Sep 05 Events A 1,...A n are said to be mutually independent if for all subsets S {1,..., n}, p( i S A i ) = p(a i ). (For example, flip a coin N times, then the events {A i = i th flip is heads} are mutually

More information

Lecture 10. Variance and standard deviation

Lecture 10. Variance and standard deviation 18.440: Lecture 10 Variance and standard deviation Scott Sheffield MIT 1 Outline Defining variance Examples Properties Decomposition trick 2 Outline Defining variance Examples Properties Decomposition

More information