Objective: To draw about population parameters on the basis of information.

Similar documents
Examine characteristics of a sample and make inferences about the population

POPULATION AND SAMPLE

Business Statistics: A First Course

Why Sample? Selecting a sample is less time-consuming than selecting every item in the population (census).

FCE 3900 EDUCATIONAL RESEARCH LECTURE 8 P O P U L A T I O N A N D S A M P L I N G T E C H N I Q U E

Chapter Goals. To introduce you to data collection

Georgia Kayser, PhD. Module 4 Approaches to Sampling. Hello and Welcome to Monitoring Evaluation and Learning: Approaches to Sampling.

Prepared by: DR. ROZIAH MOHD RASDI Faculty of Educational Studies Universiti Putra Malaysia

Lecture 5: Sampling Methods

Module 4 Approaches to Sampling. Georgia Kayser, PhD The Water Institute

MN 400: Research Methods. CHAPTER 7 Sample Design

Economics th April 2011

TYPES OF SAMPLING TECHNIQUES IN PHYSICAL EDUCATION AND SPORTS

Last two weeks: Sample, population and sampling distributions finished with estimation & confidence intervals

Module 16. Sampling and Sampling Distributions: Random Sampling, Non Random Sampling

Lecturer: Dr. Adote Anum, Dept. of Psychology Contact Information:

Ch. 16 SAMPLING DESIGNS AND SAMPLING PROCEDURES

Sampling. Sampling. Sampling. Sampling. Population. Sample. Sampling unit

Figure Figure

Lesson 6 Population & Sampling

Data Collection: What Is Sampling?

Last week: Sample, population and sampling distributions finished with estimation & confidence intervals

Business Statistics:

AP Statistics Chapter 7 Multiple Choice Test

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

Application of Statistical Analysis in Population and Sampling Population

ECON1310 Quantitative Economic and Business Analysis A

Business Statistics:

Subject: Note on spatial issues in Urban South Africa From: Alain Bertaud Date: Oct 7, A. Spatial issues

Statistics for Managers Using Microsoft Excel 5th Edition

Study Island. Scatter Plots

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

3. When a researcher wants to identify particular types of cases for in-depth investigation; purpose less to generalize to larger population than to g

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Q.1 Define Population Ans In statistical investigation the interest usually lies in the assessment of the general magnitude and the study of

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

Sampling distributions and the Central Limit. Theorem. 17 October 2016

How Geography Affects Consumer Behaviour The automobile example

14.2 THREE IMPORTANT DISCRETE PROBABILITY MODELS

Day 8: Sampling. Daniel J. Mallinson. School of Public Affairs Penn State Harrisburg PADM-HADM 503

Part 7: Glossary Overview

Planning for Economic and Job Growth

ECON 214 Elements of Statistics for Economists

Chapter 2 - Lessons 1 & 2 Studying Geography, Economics

Vocabulary: Samples and Populations

CHAPTER 20. Sample Survey CONTENTS

Sample size and Sampling strategy

Introduction to Survey Data Analysis

You are permitted to use your own calculator where it has been stamped as approved by the University.

Vehicle Freq Rel. Freq Frequency distribution. Statistics

EDEXCEL S2 PAPERS MARK SCHEMES AVAILABLE AT:

INTRODUCTION TO MATHEMATICAL MODELLING

Explain the role of sampling in the research process Distinguish between probability and nonprobability sampling Understand the factors to consider

date: math analysis 2 chapter 18: curve fitting and models

Name: Date: Period: #: Chapter 1: Outline Notes What Does a Historian Do?

CHAPTER 22. Sample Survey CONTENTS

Estadística I Exercises Chapter 4 Academic year 2015/16

11.5 Regression Linear Relationships

SALES AND MARKETING Department MATHEMATICS. 2nd Semester. Bivariate statistics. Tutorials and exercises

STRAND E: STATISTICS E2 Data Presentation

TECH 646 Analysis of Research in Industry and Technology

Do not copy, post, or distribute

Indiana Academic Standards Science Grade: 3 - Adopted: 2016

Part 3: Inferential Statistics

3. If a forecast is too high when compared to an actual outcome, will that forecast error be positive or negative?

Draft Proof - Do not copy, post, or distribute

Part I. Sampling design. Overview. INFOWO Lecture M6: Sampling design and Experiments. Outline. Sampling design Experiments.

Lectures of STA 231: Biostatistics

Sampling Distribution Models. Chapter 17

σ. We further know that if the sample is from a normal distribution then the sampling STAT 2507 Assignment # 3 (Chapters 7 & 8)

Chapter 5: Ordinary Least Squares Estimation Procedure The Mechanics Chapter 5 Outline Best Fitting Line Clint s Assignment Simple Regression Model o

[ z = 1.48 ; accept H 0 ]

Chapter 6 ESTIMATION OF PARAMETERS

Sampling Techniques. Esra Akdeniz. February 9th, 2016

BIG IDEAS. Area of Learning: SOCIAL STUDIES Urban Studies Grade 12. Learning Standards. Curricular Competencies

Now we will define some common sampling plans and discuss their strengths and limitations.

Typical information required from the data collection can be grouped into four categories, enumerated as below.

Sampling in Space and Time. Natural experiment? Analytical Surveys

Forecasts from the Strategy Planning Model

Rhode Island World-Class Standards Science Grade: K - Adopted: 2006

6. For any event E, which is associated to an experiment, we have 0 P( 7. If E 1

Data Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1

Statistics for IT Managers

Page No. (and line no. if applicable):

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Algebra I Practice Exam

AP Statistics Review Ch. 7

What is sampling? shortcut whole population small part Why sample? not enough; time, energy, money, labour/man power, equipment, access measure

Chapter 1: Revie of Calculus and Probability

Introduction to Statistics

SAMPLING- Method of Psychology. By- Mrs Neelam Rathee, Dept of Psychology. PGGCG-11, Chandigarh.

Stat472/572 Sampling: Theory and Practice Instructor: Yan Lu

1 Bewley Economies with Aggregate Uncertainty

Diploma Part 2. Quantitative Methods. Examiners Suggested Answers

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

Third Grade Social Studies Indicators Class Summary

ANALYSIS OF SURVEY DATA USING SPSS

IMPORTANCE SAMPLING & STRATIFIED SAMPLING

Transcription:

Topic 1 --- page 1 Topic 1: Sampling and Sampling Distributions Chapter 7: Sections 7.1-7.6, 7.8, and 7.9 Objective: To draw about population parameters on the basis of information. In Economics 245 (Descriptive Statistics) we saw that statistics are largely applied to describing parameters, and estimating or testing hypotheses about characteristics. Example: measures of central tendency, such as the population,median, and mode; measures of, such as the population variance, standard deviation, coefficient of variation. If all values of a population are, we can determine these parameters. But for many populations, there are and constraints to the act of gathering data from an entire to determine the population parameter of interest. Hence, are taken to estimate the parameters.

Topic 1 --- page 2 In Topic 1 we will explore how a is taken, and how the information generated from the can be applied. To deal with the issues of proper, we must ask the following questions: 1) What are the means of collecting samples that generate the best representation of the population? 2) How do we clearly sample information in the most useful manner? 3) How do we generate about the population from sample summaries? 4) How are the inferences generated from sample information?

Topic 1 --- page 3 Sample Design (7.2) To answer question #1, we must describe how sample data is. (i.e. describe the procedure or plan of action.) Definition: : is a pre-determined depicting how to collect a sample from a given population. That is, before any collection takes place, the researcher plans how he or she will collect the data. The primary objective of the sample design is to collect a sample that the population that one is trying to replicate by a sample. Ideally, the sample contains population characteristics in the same proportion or concentration. Ex. Suppose some population contains a 6.5% unemployment rate. (Obtained from the Census). The sample must contain 6.5 % unemployed. Ex. 50% of the population is over 60 years old. Want a sample that has 50% of the people surveyed over 60 years old.

Topic 1 --- page 4 Eg. Income structure: 10% population are in a 50% income-tax bracket; 55% of population are in a 40% income-tax bracket; 35% of the population are in a <40% income-tax bracket. Want a that contains the same proportion of income distributions. Used for determining election platform: more jobs versus tax cuts. There are many ways of collecting a sample. Let us look at some of the that can be created from bad sampling designs: (I) - Errors: these are errors that include all different errors, such as mistakes in collecting, analysing and reporting data: Examples: i) Wrong is sampled: An error of misrepresentation can arise when the wrong population is sampled by mistake; i.e. A survey question, when applied to a certain sample, misrepresents the true response because the sample did not accurately replicate the population. Example: Should hours of operation for bars downtown be extended? asked to the average person in a bar at 2 a.m. in downtown. The positive or negative responses may be higher or lower than the Canadian average.

Topic 1 --- page 5 ii) Bias: Another type of error, know as bias, affects results in surveys. Due to: poor on questionaries. Interview techniques that cause the interviewee to respond to a question that does not reflect his / her opinion. Distorts the truth. Example: Air quality control survey: Do you drive an automobile that is environmentally friendly? Answer no if you do not have a car. Or perception of your car s environmental friendliness is high, but in reality your car would fail every test available, so your response misrepresents the population. Your perception may be subjective if the survey does not specify some criterion. (II) Errors: Errors that represent the differences that may exist between a statistic and the population parameter being estimated are called errors. Will always occur in all data collection except a where all items are surveyed. Occurs in a situation where the sample does not represent the true population being examined because the sample only represents a of a population.

Topic 1 --- page 6 An inferential error may be made because the data collected from the sample does not truly reflect the characteristic of the true population being explored. Example: The unemployment rate calculated from the Labour Survey understates the actual unemployment rate because it excludes certain groups within the level, such as people on native reserves, people living in the NWT, Yukon and Nunavut.

Topic 1 --- page 7 Example: The actual amount of people earn may be higher than reported because the sample randomly selected a portion of the population that had a higher than average number of members relative to the total economy. There is usually a trade-off between sample design and the cost of making an error. In order to minimize the cost of erroneous decisions based on inaccurate information, the researcher must improve the sample design. But such improvements are. Balance the costs of making an error and cost of sampling.

Topic 1 --- page 8 Next: Determining Which Sample Designs Most Effectively Minimize Sampling Errors I) Sampling Based on a selection process. One criterion for a good sample is that every item in the population being examined has an and independent chance of being chosen as part of the sample. A) Simple Samples: Every item of a sample of size n (from a population of size N,) is equally likely to be chosen. Requirement: access to all items in the population. Can be difficult if the population is large and elements are difficult to identify. B) Sampling: A random starting point in the population is selected, and every k th element thereafter is a sample point in the sample. Not equivalent to simple random sampling because every set of n items does not have an equal probability of being selected. Bias will result if we have to the items in the population.

Topic 1 --- page 9 Example: Want to determine average time in movie line-ups. Sample the wait times on cheap night Tuesday: Sampling time on a single day of the week, rather than a daily pattern, will result in bias amount recorded. Example: Sleeping: What is the average amount of sleep an individual achieves each night? Sample times on Friday night. Sampling time on a single day of the week, rather than a daily pattern, will result in bias amount recorded.

Topic 1 --- page 10 The systematic way of drawing a simple random sample is to select a sample with the help of a random number table / generator. In Random number table, sequence of random integers between 0-9 are generated at random. Each has the same of occurrence.

Topic 1 --- page 11 Example Using EViews: Random Number Generator in EViews: (i) Uniform random integer Generator: The command will create (pseudo) random integers drawn uniformly from zero to a specifed maximum. The command ignores the current sample and fills the entire object with random integers. Syntax Command: rndint(object_name, n) Type the name of the series object to fill followed by an integer value representing the maximum value n of the random integers. n should be a positive integer. Example: Suppose we want to create a series of 10 random integers with a value from zero to 100. Type in the following code in the command window in a new workfile (with a range of 10): series x rndint(x,100) Hit return.

Different set of random number every time you use the command. Topic 1 --- page 12

Topic 1 --- page 13 (ii)nrnd: Normal random number generator. When used in a series expression, nrnd generates (pseudo) random draws from a normal distribution with zero mean and unit variance. Syntax: series z= Generates a Z series whose are normally distributed with a mean of zero and standard deviation of 1. Different set of random number every time you use the command.

Topic 1 --- page 14 Sampling With Knowledge Simple random sampling may be difficult when the population is and certain elements of the population are hard to identify. Problems: There are two random sampling techniques that use prior knowledge about the population and hence, reduce the costs of simple random sampling: 1) Sampling 2) Sampling (I) Sampling This technique divides the population into a number of distinct and similar subgroups, and then selects a proportionate number of items from each subgroup. The use of sampling requires that a population be divided into homogeneous groups called strata. Each stratum is then sampled according to certain specified criteria. If we can identify certain in the population and can separate these characteristics into subgroups, we require fewer sample points to determine the level or concentration of the characteristic under study.

Topic 1 --- page 15 The optimal method of selecting strata is to find groups with a large between strata, but with only a small variability within the strata. i.e. Groups should have large -strata variation, but little -strata variation. Each subgroup contains persons who share common traits and each subgroup is distinctly different. Example: A politician hires a company to determine the political platform he/she should stress: job creation or lower taxes. The research team divides the population into income classes: Upper, middle and lower income groups. Then, they sample the proportionate amount from each strata: In this case, we know the population is composed of 15% upper income, 55% middle income and 30% lower income. We then take a sample that contains 15% from the upper income strata, 55% from the middle income strata, and 30% from the lower income strata.

Topic 1 --- page 16 Example: Studying the average rate in Canada: It is believed that people in large cities earn higher wages than people in smaller towns. Could divide the population of Canada into urban and rural strata. Then, study the wage rate of each group to get an average wage rate in Canada. Example: We want to determine if there is a difference between the average amount of consumed by the rich and the poor. The researcher would divide the people into strata according to their gross incomes. Then the researcher could sample each stratum to determine if there is a statistically significant difference between healthcare usage among the two groups. Advantages: (1) If homogeneous subsets of a population can be identified, then only a relatively number of sample observations are needed to determine the characteristics of each subset. Thus, stratified sampling is usually expensive than simple random sampling, because we only require a small number of sample points to get an accurate measure of the characteristic under study. (2) Use of prior knowledge about the population may improve the of the statistical inference based on stratified sampling as compared to simple random sampling improvement in the efficiency of the estimate.

Topic 1 --- page 17 Example: The cost of post secondary education is higher for people who do not live in a city with a university. We want to measure the average size of a student loan. 65% of university students are residents of the city. 35% are from other parts of the province. If we divide students into two strata: residents and non-residents, and then sample in the same proportion, we may get a better average student loan estimate. 65% of university students are residents of the city 35% are from other parts of the province (Disproportionate stratified sampling). (3) By-Product: A side-effect In dividing the population into strata, a valuable side effect is produced. Can determine inferences about each group without further sampling.

Topic 1 --- page 18 Example: Continuation from above: If you wanted to know the average loan size of students that are not residents of the university city, we would sample from that group. (II) Sampling The population is subdivided into groups called, where each cluster has the same characteristics as the population. Each cluster is assumed to be representative of the. Hence, the researcher has only to pick a few clusters to constitute the sample in order to estimate the characteristics of the population. All in each cluster is taken to constitute the sample. Procedure: 1) Divide the population into clusters. 2) Randomly select a sample of clusters. 3) Select all the elements in each selected cluster.

Topic 1 --- page 19 Example: Want to know the average income level in large Canadian cities. It would take a lot of time gathering information from all large cities in Canada. Could simply gather the data from a few large cities to represent the average income in large urban cities. Toronto Vancouver Cluster #1 Cluster #2 Example: The researcher is interested in some student characteristic. All the students in Canada is the population. Form clusters: Group together all students at all universities: Uvic, UBC, SFU, York, Guelph, Queens, McGill, etc.,. It would be very costly to sample all universities, but we could pick a few universities and look at all of the students at those universities to estimate some population characteristic. Example: CPI data collection: The CPI gathers information about inflation with its basket of goods and services that are supposed to represent the typical household s consumption choices.

Topic 1 --- page 20 Optimal Way of Selecting Clusters: Have groups that have a lot of -cluster variation, but little -cluster variation. i.e. There should be little variation between groups, but a high variability within each group that represents the population. Advantages: 1) Lowers the cost of sampling when the population is large. 2) Better results when do not have a total accounting or full access to all the information about the population in question. Disadvantage: 1) Clusters may not be truly representative of the population: bias. The student cluster at the University of Northern B.C. may be different to the typical area campus cluster. Because of the location of the university, there may be large differences in class size, loan size, family background, average age, political beliefs, etc... Or not.

Topic 1 --- page 21 Other Sampling Procedures (1) Sampling Randomly select n items from a population of size N. Select the k th item in the sample. Start at some randomly chosen point in the population and include every k th element in the population as a sample point. Example: Population of 500 items (N=500) and sample size of 50; Pick every 10 th item. Cost efficient, but sample could be if the data has a cyclical pattern to it. (2) Two Sampling These are samples within. Samples where elements are drawn in two different stages. Example: Explore some student characteristic: Cluster all universities and colleges and pick a few representative clusters in stage 1; Within these clusters, randomly select a sample of students.

Topic 1 --- page 22 (3) Sampling Like two-stage sampling. Sampling is done in 2 or more stages. Initial sampling: Results are Perfect no more sampling OK no more sampling Not OK more samples required Sample is perfect, OK not OK Process carries on until we have a sample. Stage 1: Within your cluster of students in a university, you cluster all non-resident students. Take a sample to determine the average student loan size. Your results are very low compared to other university clusters. Stage 2: Take another sample from your original cluster. This time the average is 50% higher than the first. Stage 3: Take another sample from the cluster and your results are somewhere between the results from stage 1 and 2. Stop sampling.

Topic 1 --- page 23 The motivation for these three other sampling techniques is savings. This is typically because: σ 2 -Sampling is. -Sampling is. -Research team travel is and expensive. Non-Probabilistic Sampling 1) Sampling: These procedures involve samples i.e. Selection of elements in a sample are determined by some, opinion or belief of one individual or many individuals. Usually employed when a random sample cannot be taken or it is not practical to take a random sample. Example: The local government is concerned about a proposed change in speed limits in town. Seventy-five drivers are asked their opinion regarding speed limits. Out of the 75 interviews, the researcher disregards any response from a driver with any speeding ticket history. Example: The instructor picks 10 students to honestly assess the class s understanding of the course material. Such students come to class regularly.

Topic 1 --- page 24 2) Sampling The number of sample observations gathered is dictated to the researcher before the survey begins, but it is left up to the researcher who he/she picks to survey. If guidelines are clear and the researcher is experienced and honest, this type of sampling is very cost efficient. Problem: researcher bias may enter the survey process. Usually used for: market research political preferences 3) Sampling This sampling procedure selects observations on the basis of to the researcher. Least representative sampling. Example: Elian Gonzalez from Cuba: TV interviewer questions the average person on the street to collect public opinion. Example: An interviewer asks whoever she meets on the street whether: laws should be tougher to deter serious crime, or whether gun control should be tighter to prevent serious accidents, or whether capital punishment should be reinstated after some serious crime has been committed by a 25 year old,..etc.. results Used primarily for preliminary studies.

Topic 1 --- page 25 Summary: Statistical Sampling Probabilistic Procedures Non-Probabilistic

Topic 1 --- page 26 Section 7.3 Sample Statistics The primary objective of sampling from a population is to something about that population. The sample design determines the degree of of the information gathered from the sample. Next we will pull together sampling and the notion of probability from Economics 245, to analyze sample results: Notation: Suppose we plan to take a sample of n observations in order to determine a characteristic of some random variable x. i) Let n = # of observations we are planning on taking in the sample. i.e. n = sample size (N = population size) N Population is size N Sample is size n Each observation in the is represented by xi = x 1, x 2, x 3, x 4,...,x n. There are n xi s in the sample space:

Topic 1 --- page 27 ii) Assume the sample is. Then each sample observation is. X1,...Xn are because the sample space of each Xi is the entire population of X values. In simple random sampling, every element in the population has an chance of being the observation that occurs first. iii) Each random variable (x1, x2, x3,...xn) has a theoretical. (i.e. The range of possible values that the variable can take and the frequency of each possible value.) iv) Under simple sampling, each Xi will have the same probability distribution as the population random variable X, since the sample space for each one is the entire population of X-values. Example: Population is Age (Xi = age) of students giving blood on Friday morning: 18 18 19 37 31 20 21 18 22 18 19 25 25 24 18 18 19 21 29 19 Population is size 25 (N = 25)

Topic 1 --- page 28 Let n=4 In our first random sample: S 1 :{X 1 =18, X 2 =18, X 3 = 27, X 4 =19}. In our second random sample: S 2 :{X 1 =31, X 2 =20, X 3 = 20, X 4 =21}. If we keep sampling, we will derive the distribution. By repeat sampling we drive a distribution. Note: Probability of any Xi occurring is the same. Example: N=4 and n=2 18 21 25 19 All the possible samples of size n = 2 (with replacement) are: {18, 21} {18, 25} {18, 19} {25, 18} {25, 19} {19, 21} {18, 18} {21, 21}{21, 18} {21, 25} {21, 19} {25, 21} {19, 18} {19, 25} {25, 25} {19, 19} The average of each sample: [19.5] [21.5] [18.5] [21.5] [22] [20] [18] [21] [19.5] [23] [20] [23] [18.5] [22] [25] [19] The population mean = ; The mean of all the samples taken together also =. (More on this later)

Topic 1 --- page 29 The population parameters of interest are usually a measure of central location and dispersion. Definitions: (I) parameter is some characteristic of the population. For example: μ = population mean; σ 2 = population variance; = population proportion; (II) A statistic is any function of the random sample values. Example from above: 18 21 25 19 Sample mean: X = 1 n n i= 1 X i For a sample of size 2: X = X1+ X2 2 One possible random sample will have a sample mean equal to: {(18+25)/2} = 21.5

Topic 1 --- page 30 (III) Any of a random variable is also a random variable. Hence, the statistic is random and it has its own probability distribution. The sample mean, sample variance and sample standard deviation provide the most widely used information about the population mean, population variance and population standard deviation. Sample Mean (7.1): X n 1 = Xi = ( X1 + X2+... + Xn ) / n n (Equation 7.1) i= 1 Similar to the mean calculation. Sample Variance: Take the sum of the squared deviations about X. S 2 1 n 2 1 2 2 ) [( ) ( ) ( ) 2 i X = X1 X + X 2 X + L + X n X i= 1 = ( X ] (Equation 7.2) Recall the population variance is: σ N 1 = ( X N μ i ) 2 2 We divide by ( ) because is gives the best estimate of the unknown population variance. (More on this later.) i= 1

Topic 1 --- page 31 Sample Standard Deviation: S = S 2 (Equation 7.3) Square root of the sample variance. Same units as the mean. Example: Price of 2 litre. We take a sample of 5 brands of ice cream to estimate the average cost of a 2 litre carton of ice cream. Find the sample mean, variance and standard deviation: Ice cream i = brand Price (X) in (Xi - X ) (Xi - X ) 2 $ Safeway 1 [4.69-5.59]=-0.9 Island Farms 2 [3.99-5.59]=-1.6 Penny Lite 3 [4.29-5.59]=-1.3 Breyers 4 [5.99-5.59]=0.4 Dairy Queen 5 [8.99-5.59]=3.4 n=5 Xi = (Xi - X ) = 0 (Xi - X ) 2 = X =5.59 Sample Mean= /5 = $5.59; Sample Variance = /(5-1)= $ 2 4.19; Sample Standard deviation = $2.05

Topic 1 --- page 32 Using EViews: Notice that EViews assumes that we are using a, not a, in its calculations.

Topic 1 --- page 33 Sample Statistics Using Frequencies Data are often presented in the form of a distribution. That is, some values of Xi occur more than once. The number of times Xi occurs (its,) is denoted as fi. If data is in the form of a distribution, then we make the following adjustments to our formulas: The Sample Mean: k 1 X = X i = ( X1 f1 + X 2 f2 +... + n i= 1 X k f k ) / (Equation 7.4) where: k = # of different values of Xi fi = # of times the value of Xi occurs The Sample Variance: S 2 1 k 2 1 2 2 ) [( ) ( ) ( ) 2 i X = X1 X f1 + X2 X f2 + L+ Xk X i= 1 n 1 = ( X f k ]

Topic 1 --- page 34 Example: Examine the number of visits to each month: As a vehicle goes through the drive-through on the last day of the month, each customer is asked how often they made a purchase at that month: # of Visits Freq (V*F) V-Xbar (V-xbar)2 (V-xbar)2*F (V) (F) 1 _ 5-3.618 13.091 65.456 2 _ 16-2.618 6.8549 54.839 3 30-1.618 2.6185 26.185 4 _ 28-0.618 0.3821 2.675 5 _ 30 0.3818 0.1458 0.875 6 _ 30 1.3818 1.909 9.547 7 _ 28 2.3818 5.673 22.692 8 5 40 3.3818 11.437 57.183 9 3 27 4.3820 19.200 57.601 10 2 20 5.3820 28.964 57.928 n= = Mean= Variance = / 54 =6.574 Standard Deviation = 2.563 Concluding Note: Recall, the objective in calculating statistics (mean, variance, standard deviation), is to estimate population parameters. When we take samples from a population, we may end up with results each time we take a different sample.

Topic 1 --- page 35 The only way to determine the true population parameters is to take account of every element in the population ( ). Problem: Census is too. Thus, we must accept that sample statistics generated from sampling will be used to estimate population parameters, and we must make statements on how or accurate such sample statistics are in describing population parameters.

Topic 1 --- page 36 Probability Distribution of A Statistic Section 7.4: Sampling Distribution of Each sample statistic is a variable. Recall from the notes: i) Let n = # of observations we are planning on taking in the sample. i.e. n = sample size (N = population size) Each observation in the sample is represented by xi = x 1, x 2,,...,xn. There are n xi s in the sample space. ii) Assume the sample is. Then each sample observation is. X 1,...X n are because the sample space of each Xi is the entire population of X values. In simple sampling, every element in the population has an equal chance of being the observation that occurs. iii) Each random variable (x 1, x 2, x 3,...x n ) has a theoretical distribution. (i.e. The range of possible values that the variable can take and the frequency of each possible value.)

Topic 1 --- page 37 iv) Under simple sampling, each Xi will have the same probability distribution as the population random variable X, since the sample space for each one is the entire population of X-values. Therefore, each statistic has its own distribution. The probability distribution of a sample statistic is called its:. So it is legitimate to ask: What is the value of X [E( X )]? Or What is the of X [V( X )]? The results help determine how or accurate our sample statistics are as measures of the corresponding population parameters. i.e. How good are our estimates of the population parameters. Recall: (i) Each sample we draw from a particular population will have its own statistic. i.e. For a particular sample, it will have a particular derived sample mean, sample variance, etc..

Topic 1 --- page 38 (ii) Usually each sample implies a statistical value. i.e. Each sample will have a different sample mean, sample variance, etc.. Example: Population=[ ]; N=5 Let n=2. Sample 1= {3, 4} X 1 7 = = 35. 2 Sample 2= {7, 10} X 2 17... Etc. = = 85. 2 (iii) If we derive possible samples of a fixed size (n), calculate each sample s statistics, and build up the associated probability distribution, this is what we call the distribution.

Topic 1 --- page 39 Illustration: Sampling Distribution of X (the Sample Mean) The distribution of X is the probability distribution of all possible values of X that could occur when a of size n is taken from some specified population. Note: A sampling distribution is a because it contains all the possible values of some sample statistic. Example: Consider a parent population with only 4 values: [ ] (the number of vacations in the past 5 years by four people) that occur with equal probability: P(x) 0.25 4 6 8 10 X If the population X = { } then µ = 28/4 = 7. The average number of trips is 7. The Variance is

Topic 1 --- page 40 X i (X i - X ) (X- X ) 2 Var(X)=Σ (X- X ) 2 /N -3 9 20/4 =5-1 1 1 1 3 9 Now assume we do know much about the population and decide to take a random sample of size n =2 (with replacement). Your sample will look like one of the following ordered pairs: (X 1, X 2 ) X (4, 4) (4, 6) (6, 4) (4, 8) (6, 6) (8, 4) (4, 10) (6, 8) (8, 6) (10, 4) (6, 10) 8 (8, 8) 8 (10, 6) 8 (8, 10) 9 (10, 8) 9 (10, 10) 10

Topic 1 --- page 41 Sampling Distribution of X for n= 2 when X ={ }: X P( X ) 1 16 2 = 1 16 8 3 16 4 1 16 = 4 3 16 2 = 1 16 8 1 16 P( X ) 4 = 1 16 4 3 16 2 = 1 16 8 1 16 4 5 6 7 8 9 10 X

Topic 1 --- page 42 Mean of The Sample Mean X Notation: Let μ X denote the mean of the sampling distribution of X. i.e. It is the mean of possible sample means (the population of X -values). So : E( X) = = X P( X ) k i= 1 where I=1, 2,.k, and k is the number of distinct possible values of X. In our last example k=7 : i ( 1 ) ( 2 ) ( 3 ) ( ) 16 16 16 7 4 16 ( ) ( ) ( ) 7 μ E( X) = (_) + (_) + (_) + ( ) + () 8 3 16 + () 9 2 16 + ( 10) 1 16 = = i So for the population: μ = μ = 7 X

Topic 1 --- page 43 It is not a coincidence that these 2 means are!! _ = μ for any parent population and any given sample size Hence, if µ is the mean of the population of X s and μ X is the mean of the population of X s, then the expected value of X : E( X) = = (Equation 7.5) Proof: (Footnote page 254) Let Xi ~( μ X, σ 2 ) for all i X is an independently distributed random variable, Since : (i) X 1 = ( n X + X + + X 1 2 L n) with a mean of μ X and σ 2 variance of. and (ii) E (X i ) = μ, we can apply the rules of expectation:

Topic 1 --- page 44 n 1 E( X) = E X n 1 = n E i= 1 n i= 1 X i i 1 = ( ) n E X X X 1 + 2+ L+ n 1 = n E X + E X + E X + + E X 1 = ( μ + μ + μ + L+ μ) n 1 μ = ( ) n n μ = μ. # X ( ) ( ) ( ) L ( ) [ ] 1 2 3 n